Re: [Linux-nvdimm] another pmem variant

2015-04-13 Thread Dan Williams
On Mon, Apr 13, 2015 at 2:01 AM, Greg KH  wrote:
> On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote:
>> On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig  wrote:
>> > On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote:
>> >> This is mostly ok and does not collide too much with the upcoming ACPI
>> >> mechanism for this stuff.  I do worry that the new
>> >> "memmap=nn[KMG]!ss[KMG]" kernel command line option will only be
>> >> relevant for at most one kernel cycle given the imminent publication
>> >> of the spec that unblocks our release.
>> >
>> > I don't think we can just get rid of it as legacy systems won't be
>> > upgraded to the new discovery mechanism.  Or do you mean you plan to
>> > introduce a better override on the command line?  In that case speak
>> > up now!
>>
>> The kernel command line would simply be the standard/existing memmap=
>> to reserve a memory range.  Then, when the platform device loads, it
>> does a request_firmware() to inject a binary table that further carves
>> memory into ranges to which the pmem driver attaches.  No need for the
>> legacy system BIOS to be upgraded to the "new way".
>
> Um, what parses that "binary table"?  The kernel better not be doing
> that, as that's not what the firmware interface is for.  The firmware
> interface is for "pass through" only directly to hardware.

I had been using it as a generic/device-model-integrated way to do
what amounts to ACPI table injection [1].  But, now that the new
memmap= command line is upstream, most of the benefits of this
approach are moot and no longer outweigh the downsides [2].  Consider
it tabled.


[1]: https://01.org/linux-acpi/documentation/overriding-dsdt
[2]: http://marc.info/?l=linux-netdev=135793331325647=2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-04-13 Thread Greg KH
On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote:
> On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig  wrote:
> > On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote:
> >> This is mostly ok and does not collide too much with the upcoming ACPI
> >> mechanism for this stuff.  I do worry that the new
> >> "memmap=nn[KMG]!ss[KMG]" kernel command line option will only be
> >> relevant for at most one kernel cycle given the imminent publication
> >> of the spec that unblocks our release.
> >
> > I don't think we can just get rid of it as legacy systems won't be
> > upgraded to the new discovery mechanism.  Or do you mean you plan to
> > introduce a better override on the command line?  In that case speak
> > up now!
> 
> The kernel command line would simply be the standard/existing memmap=
> to reserve a memory range.  Then, when the platform device loads, it
> does a request_firmware() to inject a binary table that further carves
> memory into ranges to which the pmem driver attaches.  No need for the
> legacy system BIOS to be upgraded to the "new way".

Um, what parses that "binary table"?  The kernel better not be doing
that, as that's not what the firmware interface is for.  The firmware
interface is for "pass through" only directly to hardware.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-04-13 Thread Greg KH
On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote:
 On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig h...@lst.de wrote:
  On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote:
  This is mostly ok and does not collide too much with the upcoming ACPI
  mechanism for this stuff.  I do worry that the new
  memmap=nn[KMG]!ss[KMG] kernel command line option will only be
  relevant for at most one kernel cycle given the imminent publication
  of the spec that unblocks our release.
 
  I don't think we can just get rid of it as legacy systems won't be
  upgraded to the new discovery mechanism.  Or do you mean you plan to
  introduce a better override on the command line?  In that case speak
  up now!
 
 The kernel command line would simply be the standard/existing memmap=
 to reserve a memory range.  Then, when the platform device loads, it
 does a request_firmware() to inject a binary table that further carves
 memory into ranges to which the pmem driver attaches.  No need for the
 legacy system BIOS to be upgraded to the new way.

Um, what parses that binary table?  The kernel better not be doing
that, as that's not what the firmware interface is for.  The firmware
interface is for pass through only directly to hardware.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-04-13 Thread Dan Williams
On Mon, Apr 13, 2015 at 2:01 AM, Greg KH gre...@linuxfoundation.org wrote:
 On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote:
 On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig h...@lst.de wrote:
  On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote:
  This is mostly ok and does not collide too much with the upcoming ACPI
  mechanism for this stuff.  I do worry that the new
  memmap=nn[KMG]!ss[KMG] kernel command line option will only be
  relevant for at most one kernel cycle given the imminent publication
  of the spec that unblocks our release.
 
  I don't think we can just get rid of it as legacy systems won't be
  upgraded to the new discovery mechanism.  Or do you mean you plan to
  introduce a better override on the command line?  In that case speak
  up now!

 The kernel command line would simply be the standard/existing memmap=
 to reserve a memory range.  Then, when the platform device loads, it
 does a request_firmware() to inject a binary table that further carves
 memory into ranges to which the pmem driver attaches.  No need for the
 legacy system BIOS to be upgraded to the new way.

 Um, what parses that binary table?  The kernel better not be doing
 that, as that's not what the firmware interface is for.  The firmware
 interface is for pass through only directly to hardware.

I had been using it as a generic/device-model-integrated way to do
what amounts to ACPI table injection [1].  But, now that the new
memmap= command line is upstream, most of the benefits of this
approach are moot and no longer outweigh the downsides [2].  Consider
it tabled.


[1]: https://01.org/linux-acpi/documentation/overriding-dsdt
[2]: http://marc.info/?l=linux-netdevm=135793331325647w=2
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant V2

2015-04-01 Thread Boaz Harrosh
On 04/01/2015 10:50 AM, Ingo Molnar wrote:
> 
> * Dan Williams  wrote:
> 
>> On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig  wrote:
>>> On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote:
 I'd be fine with that too - mind sending an updated series?
>>>
>>> I will send an updated one tonight or early tomorrow.
>>>
>>> Btw, do you want to keep the E820_PRAM name instead of E820_PMEM?
>>> Seems like most people either don't care or prefer E820_PMEM. I'm
>>> fine either way.
>>
>> FWIW, I like the idea of having a separate E820_PRAM name for 
>> type-12 memory vs future "can't yet disclose" UEFI memory type.  The 
>> E820_PRAM type potentially has the property of being relegated to 
>> "legacy" NVDIMMs.  We can later add E820_PMEM as a memory type that, 
>> for example, is not automatically backed by struct page.  That said, 
>> I'm fine either way.
> 
> I agree that it's a minor detail, but I think the separation is 
> useful in two ways:
> 
>  - We have a generic 'pmem' driver, but the low level, platform 
>specific RAM enumeration name does not use that name.
> 
>  - 'E820_PRAM' is a more natural extension of 'E820_RAM'.
> 
> Later on we can then do a:
> 
> s/E820_PRAM/E820_LEGACY_PRAM
> 
> rename or so.

If Dan does not like E820_PMEM. Than please let us just call it
E820_PMEM_LEGACY right from the let go. But PRAM is exactly not very
good because it is similar to RAM.

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant V2

2015-04-01 Thread Ingo Molnar

* Dan Williams  wrote:

> On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig  wrote:
> > On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote:
> >> I'd be fine with that too - mind sending an updated series?
> >
> > I will send an updated one tonight or early tomorrow.
> >
> > Btw, do you want to keep the E820_PRAM name instead of E820_PMEM?
> > Seems like most people either don't care or prefer E820_PMEM. I'm
> > fine either way.
> 
> FWIW, I like the idea of having a separate E820_PRAM name for 
> type-12 memory vs future "can't yet disclose" UEFI memory type.  The 
> E820_PRAM type potentially has the property of being relegated to 
> "legacy" NVDIMMs.  We can later add E820_PMEM as a memory type that, 
> for example, is not automatically backed by struct page.  That said, 
> I'm fine either way.

I agree that it's a minor detail, but I think the separation is 
useful in two ways:

 - We have a generic 'pmem' driver, but the low level, platform 
   specific RAM enumeration name does not use that name.

 - 'E820_PRAM' is a more natural extension of 'E820_RAM'.

Later on we can then do a:

s/E820_PRAM/E820_LEGACY_PRAM

rename or so.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant V2

2015-04-01 Thread Boaz Harrosh
On 04/01/2015 10:50 AM, Ingo Molnar wrote:
 
 * Dan Williams dan.j.willi...@intel.com wrote:
 
 On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig h...@lst.de wrote:
 On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote:
 I'd be fine with that too - mind sending an updated series?

 I will send an updated one tonight or early tomorrow.

 Btw, do you want to keep the E820_PRAM name instead of E820_PMEM?
 Seems like most people either don't care or prefer E820_PMEM. I'm
 fine either way.

 FWIW, I like the idea of having a separate E820_PRAM name for 
 type-12 memory vs future can't yet disclose UEFI memory type.  The 
 E820_PRAM type potentially has the property of being relegated to 
 legacy NVDIMMs.  We can later add E820_PMEM as a memory type that, 
 for example, is not automatically backed by struct page.  That said, 
 I'm fine either way.
 
 I agree that it's a minor detail, but I think the separation is 
 useful in two ways:
 
  - We have a generic 'pmem' driver, but the low level, platform 
specific RAM enumeration name does not use that name.
 
  - 'E820_PRAM' is a more natural extension of 'E820_RAM'.
 
 Later on we can then do a:
 
 s/E820_PRAM/E820_LEGACY_PRAM
 
 rename or so.

If Dan does not like E820_PMEM. Than please let us just call it
E820_PMEM_LEGACY right from the let go. But PRAM is exactly not very
good because it is similar to RAM.

Thanks
Boaz

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant V2

2015-04-01 Thread Ingo Molnar

* Dan Williams dan.j.willi...@intel.com wrote:

 On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig h...@lst.de wrote:
  On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote:
  I'd be fine with that too - mind sending an updated series?
 
  I will send an updated one tonight or early tomorrow.
 
  Btw, do you want to keep the E820_PRAM name instead of E820_PMEM?
  Seems like most people either don't care or prefer E820_PMEM. I'm
  fine either way.
 
 FWIW, I like the idea of having a separate E820_PRAM name for 
 type-12 memory vs future can't yet disclose UEFI memory type.  The 
 E820_PRAM type potentially has the property of being relegated to 
 legacy NVDIMMs.  We can later add E820_PMEM as a memory type that, 
 for example, is not automatically backed by struct page.  That said, 
 I'm fine either way.

I agree that it's a minor detail, but I think the separation is 
useful in two ways:

 - We have a generic 'pmem' driver, but the low level, platform 
   specific RAM enumeration name does not use that name.

 - 'E820_PRAM' is a more natural extension of 'E820_RAM'.

Later on we can then do a:

s/E820_PRAM/E820_LEGACY_PRAM

rename or so.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant V2

2015-03-31 Thread Dan Williams
On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig  wrote:
> On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote:
>> I'd be fine with that too - mind sending an updated series?
>
> I will send an updated one tonight or early tomorrow.
>
> Btw, do you want to keep the E820_PRAM name instead of E820_PMEM?
> Seems like most people either don't care or prefer E820_PMEM. I'm
> fine either way.

FWIW, I like the idea of having a separate E820_PRAM name for type-12
memory vs future "can't yet disclose" UEFI memory type.  The E820_PRAM
type potentially has the property of being relegated to "legacy"
NVDIMMs.  We can later add E820_PMEM as a memory type that, for
example, is not automatically backed by struct page.  That said, I'm
fine either way.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant V2

2015-03-31 Thread Dan Williams
On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig h...@lst.de wrote:
 On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote:
 I'd be fine with that too - mind sending an updated series?

 I will send an updated one tonight or early tomorrow.

 Btw, do you want to keep the E820_PRAM name instead of E820_PMEM?
 Seems like most people either don't care or prefer E820_PMEM. I'm
 fine either way.

FWIW, I like the idea of having a separate E820_PRAM name for type-12
memory vs future can't yet disclose UEFI memory type.  The E820_PRAM
type potentially has the property of being relegated to legacy
NVDIMMs.  We can later add E820_PMEM as a memory type that, for
example, is not automatically backed by struct page.  That said, I'm
fine either way.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Christoph Hellwig
Hi Adam,

we're all well aware of the deficits of the current interface, but for now
that's all we have, and due to lead times in implementing bioses it will
be all we have for quite a while.

We're all eagerly looking forward to better interfaces and bioses that will
support them.

But for now evryone would love to just be able to use existing systems with
an out of the box Linux kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Brooks, Adam J
>The other two patches are a heavily rewritten version of the code that
>Intel gave to various storage vendors to discover the type 12 (and earlier
>type 6) nvdimms, which I massaged into a form that is hopefully suitable
>for mainline.

The problem is that the e820 or the UEFI Memory Map Table on their own are 
really bad ways to represent NVDIMMs.  The memory table idea was originally 
developed 6 years ago prior to NVDIMMs existing.  It was used to define 
traditional battery backed memory.  With traditional battery backed memory 
either the whole region was going to be valid or the whole region was going to 
be gone.  There was also no concept of arming.  You simply have x hours of data 
retention based on your battery be y% charged.  Fast forward a couple years, 
and we continued using the memory table method for something called Copy To 
Flash where the CPU would copy memory from the DIMMs to a SSD of some sort.  
Again this was a whole region or none of the region solution and because we 
were typically using SATA SSD there was no need to "arm" anything.  
Additionally the restore operation (and even the save operation if you were 
brave enough) could be done from the OS.  Therefore there was no need for the 
BIOS to pass up any status regarding if the recovery was successful or not.

Fast forward again to the present day and NVDIMMs.  We used the memory table 
model initially for NVDIMM because 1) the BIOS code was already in place 2) we 
had a non-upstreamed driver (something that predated pmem by several years 
called ADRBD).  In a perfect world where there are no hardware failures 
e820+ADRBD work great for NVDIMMs.  However in the real world where there are 
failures it has a number of short comings.  Mainly there are the following 
issues with it:
1) The region may now be comprised for 2+ different NVDIMMs that have different 
statuses. A subset of NVDIMMs may have failed the restore.  An NVDIMM may have 
been added since after the last save/restore of the existing NVDIMM
2) Just based on the e820 table, the OS has no one of knowing where the 
boundaries of the NVDIMMs are.  It has no one of knowing if they are all 
interleaved together where a failure of single NVDIMM means the loss of the 
whole region, or if the NVDIMMs are non-interleaved and can be treated as 
separate memory regions to prevent the failure of one NVDIMM from causing data 
to be lost form all NVDIMM
2) Due to the requirement to restore the MRS/RC registers the NVDIMM restore 
must be done from the BIOS.  Depending on the security settings of the platform 
the OS may not be able to directly interrogate the individual NVDIMMs to find 
their status.  Even if the OS can get to the NVDIMM over SMBUS all information 
about the status of the last restore attempt may have been wiped if the BIOS 
was also configured to do the erase/arm operation

For those reasons (and more) simply using the current memory tables is not a 
good solution. A more detailed NVDIMM specific table is required to surface the 
status and configuration of the NVDIMMs.  Unfortunately that table has been 
perpetually delayed, and a result people are trying to move forward with Type 
12.  I understand why this has been done, and for highly embedded storage 
appliances it is fine, because those users probably inherently know the 
configuration of the NVDIMMs.  However for general purpose systems where the 
user has no way of knowing the exact configuration of the DIMMS, just using the 
e820 or UEFI Memory Map table is not sufficient.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Dan Williams
On Wed, Mar 25, 2015 at 10:04 AM, Christoph Hellwig  wrote:
> On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote:
>> The kernel command line would simply be the standard/existing memmap=
>> to reserve a memory range.  Then, when the platform device loads, it
>> does a request_firmware() to inject a binary table that further carves
>> memory into ranges to which the pmem driver attaches.  No need for the
>> legacy system BIOS to be upgraded to the "new way".
>
> Ewww...
>
>> It does do the right thing in kernel space.  The userspace utility
>> creates the binary table (once) that can be compiled into the platform
>> device driver or auto-loaded by an initrd.  The problem with a new
>> memmap= is that it is too coarse.  For example you can't do things
>> like specify a pmem range per-NUMA node.
>
> Sure you can as long as you know the layout.  memmap= can be specified
> multiple times.   Again, I see absolutely zero benefit of doing crap
> like request_firmware() to convert interface, and I'm also tired of
> having this talk about code that will eventually be released and should
> be superior (and from all that I can guess so far will actually be far
> worse).

You and me both...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Christoph Hellwig
On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote:
> The kernel command line would simply be the standard/existing memmap=
> to reserve a memory range.  Then, when the platform device loads, it
> does a request_firmware() to inject a binary table that further carves
> memory into ranges to which the pmem driver attaches.  No need for the
> legacy system BIOS to be upgraded to the "new way".

Ewww...

> It does do the right thing in kernel space.  The userspace utility
> creates the binary table (once) that can be compiled into the platform
> device driver or auto-loaded by an initrd.  The problem with a new
> memmap= is that it is too coarse.  For example you can't do things
> like specify a pmem range per-NUMA node.

Sure you can as long as you know the layout.  memmap= can be specified
multiple times.   Again, I see absolutely zero benefit of doing crap
like request_firmware() to convert interface, and I'm also tired of
having this talk about code that will eventually be released and should
be superior (and from all that I can guess so far will actually be far
worse).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Dan Williams
On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig  wrote:
> On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote:
>> This is mostly ok and does not collide too much with the upcoming ACPI
>> mechanism for this stuff.  I do worry that the new
>> "memmap=nn[KMG]!ss[KMG]" kernel command line option will only be
>> relevant for at most one kernel cycle given the imminent publication
>> of the spec that unblocks our release.
>
> I don't think we can just get rid of it as legacy systems won't be
> upgraded to the new discovery mechanism.  Or do you mean you plan to
> introduce a better override on the command line?  In that case speak
> up now!

The kernel command line would simply be the standard/existing memmap=
to reserve a memory range.  Then, when the platform device loads, it
does a request_firmware() to inject a binary table that further carves
memory into ranges to which the pmem driver attaches.  No need for the
legacy system BIOS to be upgraded to the "new way".

>> Our planned solution to the "legacy pmem" problem is to have a
>> userspace utility craft a list of address ranges in the form that ACPI
>> expects and attach that to a platform device (one time setup).  It
>> only requires that the memory be marked reserved, not necessarily
>> marked type-12.
>
> I can't see any benefit of that over just doign the right thing in
> kernel space.

It does do the right thing in kernel space.  The userspace utility
creates the binary table (once) that can be compiled into the platform
device driver or auto-loaded by an initrd.  The problem with a new
memmap= is that it is too coarse.  For example you can't do things
like specify a pmem range per-NUMA node.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Christoph Hellwig
On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote:
> This is mostly ok and does not collide too much with the upcoming ACPI
> mechanism for this stuff.  I do worry that the new
> "memmap=nn[KMG]!ss[KMG]" kernel command line option will only be
> relevant for at most one kernel cycle given the imminent publication
> of the spec that unblocks our release.

I don't think we can just get rid of it as legacy systems won't be
upgraded to the new discovery mechanism.  Or do you mean you plan to
introduce a better override on the command line?  In that case speak
up now!

> Our planned solution to the "legacy pmem" problem is to have a
> userspace utility craft a list of address ranges in the form that ACPI
> expects and attach that to a platform device (one time setup).  It
> only requires that the memory be marked reserved, not necessarily
> marked type-12.

I can't see any benefit of that over just doign the right thing in
kernel space.

> > The other two patches are a heavily rewritten version of the code that
> > Intel gave to various storage vendors to discover the type 12 (and earlier
> > type 6) nvdimms, which I massaged into a form that is hopefully suitable
> > for mainline.
> 
> I'd prefer E820_PMEM over E820_PROTECTED_KERN, I don't know why I
> chose that name initially, but to each his own bike shed.

Sounds fine to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Dan Williams
On Wed, Mar 25, 2015 at 9:04 AM, Christoph Hellwig  wrote:
> Here is another version of the same trivial pmem driver, because two
> obviously aren't enough.

Welcome to the party! :-)

> The first patch is the same pmem driver
> that Ross posted a short time ago, just modified to use platform_devices
> to find the persistant memory region instead of hardconding it in the
> Kconfig.  This allows to keep pmem.c separate from any discovery mechanism,
> but still allow auto-discovery.

This is mostly ok and does not collide too much with the upcoming ACPI
mechanism for this stuff.  I do worry that the new
"memmap=nn[KMG]!ss[KMG]" kernel command line option will only be
relevant for at most one kernel cycle given the imminent publication
of the spec that unblocks our release.

Our planned solution to the "legacy pmem" problem is to have a
userspace utility craft a list of address ranges in the form that ACPI
expects and attach that to a platform device (one time setup).  It
only requires that the memory be marked reserved, not necessarily
marked type-12.

> The other two patches are a heavily rewritten version of the code that
> Intel gave to various storage vendors to discover the type 12 (and earlier
> type 6) nvdimms, which I massaged into a form that is hopefully suitable
> for mainline.

I'd prefer E820_PMEM over E820_PROTECTED_KERN, I don't know why I
chose that name initially, but to each his own bike shed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Dan Williams
On Wed, Mar 25, 2015 at 9:04 AM, Christoph Hellwig h...@lst.de wrote:
 Here is another version of the same trivial pmem driver, because two
 obviously aren't enough.

Welcome to the party! :-)

 The first patch is the same pmem driver
 that Ross posted a short time ago, just modified to use platform_devices
 to find the persistant memory region instead of hardconding it in the
 Kconfig.  This allows to keep pmem.c separate from any discovery mechanism,
 but still allow auto-discovery.

This is mostly ok and does not collide too much with the upcoming ACPI
mechanism for this stuff.  I do worry that the new
memmap=nn[KMG]!ss[KMG] kernel command line option will only be
relevant for at most one kernel cycle given the imminent publication
of the spec that unblocks our release.

Our planned solution to the legacy pmem problem is to have a
userspace utility craft a list of address ranges in the form that ACPI
expects and attach that to a platform device (one time setup).  It
only requires that the memory be marked reserved, not necessarily
marked type-12.

 The other two patches are a heavily rewritten version of the code that
 Intel gave to various storage vendors to discover the type 12 (and earlier
 type 6) nvdimms, which I massaged into a form that is hopefully suitable
 for mainline.

I'd prefer E820_PMEM over E820_PROTECTED_KERN, I don't know why I
chose that name initially, but to each his own bike shed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Dan Williams
On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig h...@lst.de wrote:
 On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote:
 This is mostly ok and does not collide too much with the upcoming ACPI
 mechanism for this stuff.  I do worry that the new
 memmap=nn[KMG]!ss[KMG] kernel command line option will only be
 relevant for at most one kernel cycle given the imminent publication
 of the spec that unblocks our release.

 I don't think we can just get rid of it as legacy systems won't be
 upgraded to the new discovery mechanism.  Or do you mean you plan to
 introduce a better override on the command line?  In that case speak
 up now!

The kernel command line would simply be the standard/existing memmap=
to reserve a memory range.  Then, when the platform device loads, it
does a request_firmware() to inject a binary table that further carves
memory into ranges to which the pmem driver attaches.  No need for the
legacy system BIOS to be upgraded to the new way.

 Our planned solution to the legacy pmem problem is to have a
 userspace utility craft a list of address ranges in the form that ACPI
 expects and attach that to a platform device (one time setup).  It
 only requires that the memory be marked reserved, not necessarily
 marked type-12.

 I can't see any benefit of that over just doign the right thing in
 kernel space.

It does do the right thing in kernel space.  The userspace utility
creates the binary table (once) that can be compiled into the platform
device driver or auto-loaded by an initrd.  The problem with a new
memmap= is that it is too coarse.  For example you can't do things
like specify a pmem range per-NUMA node.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Christoph Hellwig
On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote:
 The kernel command line would simply be the standard/existing memmap=
 to reserve a memory range.  Then, when the platform device loads, it
 does a request_firmware() to inject a binary table that further carves
 memory into ranges to which the pmem driver attaches.  No need for the
 legacy system BIOS to be upgraded to the new way.

Ewww...

 It does do the right thing in kernel space.  The userspace utility
 creates the binary table (once) that can be compiled into the platform
 device driver or auto-loaded by an initrd.  The problem with a new
 memmap= is that it is too coarse.  For example you can't do things
 like specify a pmem range per-NUMA node.

Sure you can as long as you know the layout.  memmap= can be specified
multiple times.   Again, I see absolutely zero benefit of doing crap
like request_firmware() to convert interface, and I'm also tired of
having this talk about code that will eventually be released and should
be superior (and from all that I can guess so far will actually be far
worse).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Christoph Hellwig
On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote:
 This is mostly ok and does not collide too much with the upcoming ACPI
 mechanism for this stuff.  I do worry that the new
 memmap=nn[KMG]!ss[KMG] kernel command line option will only be
 relevant for at most one kernel cycle given the imminent publication
 of the spec that unblocks our release.

I don't think we can just get rid of it as legacy systems won't be
upgraded to the new discovery mechanism.  Or do you mean you plan to
introduce a better override on the command line?  In that case speak
up now!

 Our planned solution to the legacy pmem problem is to have a
 userspace utility craft a list of address ranges in the form that ACPI
 expects and attach that to a platform device (one time setup).  It
 only requires that the memory be marked reserved, not necessarily
 marked type-12.

I can't see any benefit of that over just doign the right thing in
kernel space.

  The other two patches are a heavily rewritten version of the code that
  Intel gave to various storage vendors to discover the type 12 (and earlier
  type 6) nvdimms, which I massaged into a form that is hopefully suitable
  for mainline.
 
 I'd prefer E820_PMEM over E820_PROTECTED_KERN, I don't know why I
 chose that name initially, but to each his own bike shed.

Sounds fine to me.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Dan Williams
On Wed, Mar 25, 2015 at 10:04 AM, Christoph Hellwig h...@lst.de wrote:
 On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote:
 The kernel command line would simply be the standard/existing memmap=
 to reserve a memory range.  Then, when the platform device loads, it
 does a request_firmware() to inject a binary table that further carves
 memory into ranges to which the pmem driver attaches.  No need for the
 legacy system BIOS to be upgraded to the new way.

 Ewww...

 It does do the right thing in kernel space.  The userspace utility
 creates the binary table (once) that can be compiled into the platform
 device driver or auto-loaded by an initrd.  The problem with a new
 memmap= is that it is too coarse.  For example you can't do things
 like specify a pmem range per-NUMA node.

 Sure you can as long as you know the layout.  memmap= can be specified
 multiple times.   Again, I see absolutely zero benefit of doing crap
 like request_firmware() to convert interface, and I'm also tired of
 having this talk about code that will eventually be released and should
 be superior (and from all that I can guess so far will actually be far
 worse).

You and me both...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Brooks, Adam J
The other two patches are a heavily rewritten version of the code that
Intel gave to various storage vendors to discover the type 12 (and earlier
type 6) nvdimms, which I massaged into a form that is hopefully suitable
for mainline.

The problem is that the e820 or the UEFI Memory Map Table on their own are 
really bad ways to represent NVDIMMs.  The memory table idea was originally 
developed 6 years ago prior to NVDIMMs existing.  It was used to define 
traditional battery backed memory.  With traditional battery backed memory 
either the whole region was going to be valid or the whole region was going to 
be gone.  There was also no concept of arming.  You simply have x hours of data 
retention based on your battery be y% charged.  Fast forward a couple years, 
and we continued using the memory table method for something called Copy To 
Flash where the CPU would copy memory from the DIMMs to a SSD of some sort.  
Again this was a whole region or none of the region solution and because we 
were typically using SATA SSD there was no need to arm anything.  
Additionally the restore operation (and even the save operation if you were 
brave enough) could be done from the OS.  Therefore there was no need for the 
BIOS to pass up any status regarding if the recovery was successful or not.

Fast forward again to the present day and NVDIMMs.  We used the memory table 
model initially for NVDIMM because 1) the BIOS code was already in place 2) we 
had a non-upstreamed driver (something that predated pmem by several years 
called ADRBD).  In a perfect world where there are no hardware failures 
e820+ADRBD work great for NVDIMMs.  However in the real world where there are 
failures it has a number of short comings.  Mainly there are the following 
issues with it:
1) The region may now be comprised for 2+ different NVDIMMs that have different 
statuses. A subset of NVDIMMs may have failed the restore.  An NVDIMM may have 
been added since after the last save/restore of the existing NVDIMM
2) Just based on the e820 table, the OS has no one of knowing where the 
boundaries of the NVDIMMs are.  It has no one of knowing if they are all 
interleaved together where a failure of single NVDIMM means the loss of the 
whole region, or if the NVDIMMs are non-interleaved and can be treated as 
separate memory regions to prevent the failure of one NVDIMM from causing data 
to be lost form all NVDIMM
2) Due to the requirement to restore the MRS/RC registers the NVDIMM restore 
must be done from the BIOS.  Depending on the security settings of the platform 
the OS may not be able to directly interrogate the individual NVDIMMs to find 
their status.  Even if the OS can get to the NVDIMM over SMBUS all information 
about the status of the last restore attempt may have been wiped if the BIOS 
was also configured to do the erase/arm operation

For those reasons (and more) simply using the current memory tables is not a 
good solution. A more detailed NVDIMM specific table is required to surface the 
status and configuration of the NVDIMMs.  Unfortunately that table has been 
perpetually delayed, and a result people are trying to move forward with Type 
12.  I understand why this has been done, and for highly embedded storage 
appliances it is fine, because those users probably inherently know the 
configuration of the NVDIMMs.  However for general purpose systems where the 
user has no way of knowing the exact configuration of the DIMMS, just using the 
e820 or UEFI Memory Map table is not sufficient.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-nvdimm] another pmem variant

2015-03-25 Thread Christoph Hellwig
Hi Adam,

we're all well aware of the deficits of the current interface, but for now
that's all we have, and due to lead times in implementing bioses it will
be all we have for quite a while.

We're all eagerly looking forward to better interfaces and bioses that will
support them.

But for now evryone would love to just be able to use existing systems with
an out of the box Linux kernel.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/