Re: [Linux-nvdimm] another pmem variant
On Mon, Apr 13, 2015 at 2:01 AM, Greg KH wrote: > On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote: >> On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig wrote: >> > On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote: >> >> This is mostly ok and does not collide too much with the upcoming ACPI >> >> mechanism for this stuff. I do worry that the new >> >> "memmap=nn[KMG]!ss[KMG]" kernel command line option will only be >> >> relevant for at most one kernel cycle given the imminent publication >> >> of the spec that unblocks our release. >> > >> > I don't think we can just get rid of it as legacy systems won't be >> > upgraded to the new discovery mechanism. Or do you mean you plan to >> > introduce a better override on the command line? In that case speak >> > up now! >> >> The kernel command line would simply be the standard/existing memmap= >> to reserve a memory range. Then, when the platform device loads, it >> does a request_firmware() to inject a binary table that further carves >> memory into ranges to which the pmem driver attaches. No need for the >> legacy system BIOS to be upgraded to the "new way". > > Um, what parses that "binary table"? The kernel better not be doing > that, as that's not what the firmware interface is for. The firmware > interface is for "pass through" only directly to hardware. I had been using it as a generic/device-model-integrated way to do what amounts to ACPI table injection [1]. But, now that the new memmap= command line is upstream, most of the benefits of this approach are moot and no longer outweigh the downsides [2]. Consider it tabled. [1]: https://01.org/linux-acpi/documentation/overriding-dsdt [2]: http://marc.info/?l=linux-netdev=135793331325647=2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote: > On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig wrote: > > On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote: > >> This is mostly ok and does not collide too much with the upcoming ACPI > >> mechanism for this stuff. I do worry that the new > >> "memmap=nn[KMG]!ss[KMG]" kernel command line option will only be > >> relevant for at most one kernel cycle given the imminent publication > >> of the spec that unblocks our release. > > > > I don't think we can just get rid of it as legacy systems won't be > > upgraded to the new discovery mechanism. Or do you mean you plan to > > introduce a better override on the command line? In that case speak > > up now! > > The kernel command line would simply be the standard/existing memmap= > to reserve a memory range. Then, when the platform device loads, it > does a request_firmware() to inject a binary table that further carves > memory into ranges to which the pmem driver attaches. No need for the > legacy system BIOS to be upgraded to the "new way". Um, what parses that "binary table"? The kernel better not be doing that, as that's not what the firmware interface is for. The firmware interface is for "pass through" only directly to hardware. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote: On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig h...@lst.de wrote: On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote: This is mostly ok and does not collide too much with the upcoming ACPI mechanism for this stuff. I do worry that the new memmap=nn[KMG]!ss[KMG] kernel command line option will only be relevant for at most one kernel cycle given the imminent publication of the spec that unblocks our release. I don't think we can just get rid of it as legacy systems won't be upgraded to the new discovery mechanism. Or do you mean you plan to introduce a better override on the command line? In that case speak up now! The kernel command line would simply be the standard/existing memmap= to reserve a memory range. Then, when the platform device loads, it does a request_firmware() to inject a binary table that further carves memory into ranges to which the pmem driver attaches. No need for the legacy system BIOS to be upgraded to the new way. Um, what parses that binary table? The kernel better not be doing that, as that's not what the firmware interface is for. The firmware interface is for pass through only directly to hardware. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Mon, Apr 13, 2015 at 2:01 AM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote: On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig h...@lst.de wrote: On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote: This is mostly ok and does not collide too much with the upcoming ACPI mechanism for this stuff. I do worry that the new memmap=nn[KMG]!ss[KMG] kernel command line option will only be relevant for at most one kernel cycle given the imminent publication of the spec that unblocks our release. I don't think we can just get rid of it as legacy systems won't be upgraded to the new discovery mechanism. Or do you mean you plan to introduce a better override on the command line? In that case speak up now! The kernel command line would simply be the standard/existing memmap= to reserve a memory range. Then, when the platform device loads, it does a request_firmware() to inject a binary table that further carves memory into ranges to which the pmem driver attaches. No need for the legacy system BIOS to be upgraded to the new way. Um, what parses that binary table? The kernel better not be doing that, as that's not what the firmware interface is for. The firmware interface is for pass through only directly to hardware. I had been using it as a generic/device-model-integrated way to do what amounts to ACPI table injection [1]. But, now that the new memmap= command line is upstream, most of the benefits of this approach are moot and no longer outweigh the downsides [2]. Consider it tabled. [1]: https://01.org/linux-acpi/documentation/overriding-dsdt [2]: http://marc.info/?l=linux-netdevm=135793331325647w=2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant V2
On 04/01/2015 10:50 AM, Ingo Molnar wrote: > > * Dan Williams wrote: > >> On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig wrote: >>> On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote: I'd be fine with that too - mind sending an updated series? >>> >>> I will send an updated one tonight or early tomorrow. >>> >>> Btw, do you want to keep the E820_PRAM name instead of E820_PMEM? >>> Seems like most people either don't care or prefer E820_PMEM. I'm >>> fine either way. >> >> FWIW, I like the idea of having a separate E820_PRAM name for >> type-12 memory vs future "can't yet disclose" UEFI memory type. The >> E820_PRAM type potentially has the property of being relegated to >> "legacy" NVDIMMs. We can later add E820_PMEM as a memory type that, >> for example, is not automatically backed by struct page. That said, >> I'm fine either way. > > I agree that it's a minor detail, but I think the separation is > useful in two ways: > > - We have a generic 'pmem' driver, but the low level, platform >specific RAM enumeration name does not use that name. > > - 'E820_PRAM' is a more natural extension of 'E820_RAM'. > > Later on we can then do a: > > s/E820_PRAM/E820_LEGACY_PRAM > > rename or so. If Dan does not like E820_PMEM. Than please let us just call it E820_PMEM_LEGACY right from the let go. But PRAM is exactly not very good because it is similar to RAM. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant V2
* Dan Williams wrote: > On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig wrote: > > On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote: > >> I'd be fine with that too - mind sending an updated series? > > > > I will send an updated one tonight or early tomorrow. > > > > Btw, do you want to keep the E820_PRAM name instead of E820_PMEM? > > Seems like most people either don't care or prefer E820_PMEM. I'm > > fine either way. > > FWIW, I like the idea of having a separate E820_PRAM name for > type-12 memory vs future "can't yet disclose" UEFI memory type. The > E820_PRAM type potentially has the property of being relegated to > "legacy" NVDIMMs. We can later add E820_PMEM as a memory type that, > for example, is not automatically backed by struct page. That said, > I'm fine either way. I agree that it's a minor detail, but I think the separation is useful in two ways: - We have a generic 'pmem' driver, but the low level, platform specific RAM enumeration name does not use that name. - 'E820_PRAM' is a more natural extension of 'E820_RAM'. Later on we can then do a: s/E820_PRAM/E820_LEGACY_PRAM rename or so. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant V2
On 04/01/2015 10:50 AM, Ingo Molnar wrote: * Dan Williams dan.j.willi...@intel.com wrote: On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig h...@lst.de wrote: On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote: I'd be fine with that too - mind sending an updated series? I will send an updated one tonight or early tomorrow. Btw, do you want to keep the E820_PRAM name instead of E820_PMEM? Seems like most people either don't care or prefer E820_PMEM. I'm fine either way. FWIW, I like the idea of having a separate E820_PRAM name for type-12 memory vs future can't yet disclose UEFI memory type. The E820_PRAM type potentially has the property of being relegated to legacy NVDIMMs. We can later add E820_PMEM as a memory type that, for example, is not automatically backed by struct page. That said, I'm fine either way. I agree that it's a minor detail, but I think the separation is useful in two ways: - We have a generic 'pmem' driver, but the low level, platform specific RAM enumeration name does not use that name. - 'E820_PRAM' is a more natural extension of 'E820_RAM'. Later on we can then do a: s/E820_PRAM/E820_LEGACY_PRAM rename or so. If Dan does not like E820_PMEM. Than please let us just call it E820_PMEM_LEGACY right from the let go. But PRAM is exactly not very good because it is similar to RAM. Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant V2
* Dan Williams dan.j.willi...@intel.com wrote: On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig h...@lst.de wrote: On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote: I'd be fine with that too - mind sending an updated series? I will send an updated one tonight or early tomorrow. Btw, do you want to keep the E820_PRAM name instead of E820_PMEM? Seems like most people either don't care or prefer E820_PMEM. I'm fine either way. FWIW, I like the idea of having a separate E820_PRAM name for type-12 memory vs future can't yet disclose UEFI memory type. The E820_PRAM type potentially has the property of being relegated to legacy NVDIMMs. We can later add E820_PMEM as a memory type that, for example, is not automatically backed by struct page. That said, I'm fine either way. I agree that it's a minor detail, but I think the separation is useful in two ways: - We have a generic 'pmem' driver, but the low level, platform specific RAM enumeration name does not use that name. - 'E820_PRAM' is a more natural extension of 'E820_RAM'. Later on we can then do a: s/E820_PRAM/E820_LEGACY_PRAM rename or so. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant V2
On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig wrote: > On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote: >> I'd be fine with that too - mind sending an updated series? > > I will send an updated one tonight or early tomorrow. > > Btw, do you want to keep the E820_PRAM name instead of E820_PMEM? > Seems like most people either don't care or prefer E820_PMEM. I'm > fine either way. FWIW, I like the idea of having a separate E820_PRAM name for type-12 memory vs future "can't yet disclose" UEFI memory type. The E820_PRAM type potentially has the property of being relegated to "legacy" NVDIMMs. We can later add E820_PMEM as a memory type that, for example, is not automatically backed by struct page. That said, I'm fine either way. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant V2
On Tue, Mar 31, 2015 at 10:24 AM, Christoph Hellwig h...@lst.de wrote: On Tue, Mar 31, 2015 at 06:44:56PM +0200, Ingo Molnar wrote: I'd be fine with that too - mind sending an updated series? I will send an updated one tonight or early tomorrow. Btw, do you want to keep the E820_PRAM name instead of E820_PMEM? Seems like most people either don't care or prefer E820_PMEM. I'm fine either way. FWIW, I like the idea of having a separate E820_PRAM name for type-12 memory vs future can't yet disclose UEFI memory type. The E820_PRAM type potentially has the property of being relegated to legacy NVDIMMs. We can later add E820_PMEM as a memory type that, for example, is not automatically backed by struct page. That said, I'm fine either way. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
Hi Adam, we're all well aware of the deficits of the current interface, but for now that's all we have, and due to lead times in implementing bioses it will be all we have for quite a while. We're all eagerly looking forward to better interfaces and bioses that will support them. But for now evryone would love to just be able to use existing systems with an out of the box Linux kernel. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Linux-nvdimm] another pmem variant
>The other two patches are a heavily rewritten version of the code that >Intel gave to various storage vendors to discover the type 12 (and earlier >type 6) nvdimms, which I massaged into a form that is hopefully suitable >for mainline. The problem is that the e820 or the UEFI Memory Map Table on their own are really bad ways to represent NVDIMMs. The memory table idea was originally developed 6 years ago prior to NVDIMMs existing. It was used to define traditional battery backed memory. With traditional battery backed memory either the whole region was going to be valid or the whole region was going to be gone. There was also no concept of arming. You simply have x hours of data retention based on your battery be y% charged. Fast forward a couple years, and we continued using the memory table method for something called Copy To Flash where the CPU would copy memory from the DIMMs to a SSD of some sort. Again this was a whole region or none of the region solution and because we were typically using SATA SSD there was no need to "arm" anything. Additionally the restore operation (and even the save operation if you were brave enough) could be done from the OS. Therefore there was no need for the BIOS to pass up any status regarding if the recovery was successful or not. Fast forward again to the present day and NVDIMMs. We used the memory table model initially for NVDIMM because 1) the BIOS code was already in place 2) we had a non-upstreamed driver (something that predated pmem by several years called ADRBD). In a perfect world where there are no hardware failures e820+ADRBD work great for NVDIMMs. However in the real world where there are failures it has a number of short comings. Mainly there are the following issues with it: 1) The region may now be comprised for 2+ different NVDIMMs that have different statuses. A subset of NVDIMMs may have failed the restore. An NVDIMM may have been added since after the last save/restore of the existing NVDIMM 2) Just based on the e820 table, the OS has no one of knowing where the boundaries of the NVDIMMs are. It has no one of knowing if they are all interleaved together where a failure of single NVDIMM means the loss of the whole region, or if the NVDIMMs are non-interleaved and can be treated as separate memory regions to prevent the failure of one NVDIMM from causing data to be lost form all NVDIMM 2) Due to the requirement to restore the MRS/RC registers the NVDIMM restore must be done from the BIOS. Depending on the security settings of the platform the OS may not be able to directly interrogate the individual NVDIMMs to find their status. Even if the OS can get to the NVDIMM over SMBUS all information about the status of the last restore attempt may have been wiped if the BIOS was also configured to do the erase/arm operation For those reasons (and more) simply using the current memory tables is not a good solution. A more detailed NVDIMM specific table is required to surface the status and configuration of the NVDIMMs. Unfortunately that table has been perpetually delayed, and a result people are trying to move forward with Type 12. I understand why this has been done, and for highly embedded storage appliances it is fine, because those users probably inherently know the configuration of the NVDIMMs. However for general purpose systems where the user has no way of knowing the exact configuration of the DIMMS, just using the e820 or UEFI Memory Map table is not sufficient. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 10:04 AM, Christoph Hellwig wrote: > On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote: >> The kernel command line would simply be the standard/existing memmap= >> to reserve a memory range. Then, when the platform device loads, it >> does a request_firmware() to inject a binary table that further carves >> memory into ranges to which the pmem driver attaches. No need for the >> legacy system BIOS to be upgraded to the "new way". > > Ewww... > >> It does do the right thing in kernel space. The userspace utility >> creates the binary table (once) that can be compiled into the platform >> device driver or auto-loaded by an initrd. The problem with a new >> memmap= is that it is too coarse. For example you can't do things >> like specify a pmem range per-NUMA node. > > Sure you can as long as you know the layout. memmap= can be specified > multiple times. Again, I see absolutely zero benefit of doing crap > like request_firmware() to convert interface, and I'm also tired of > having this talk about code that will eventually be released and should > be superior (and from all that I can guess so far will actually be far > worse). You and me both... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote: > The kernel command line would simply be the standard/existing memmap= > to reserve a memory range. Then, when the platform device loads, it > does a request_firmware() to inject a binary table that further carves > memory into ranges to which the pmem driver attaches. No need for the > legacy system BIOS to be upgraded to the "new way". Ewww... > It does do the right thing in kernel space. The userspace utility > creates the binary table (once) that can be compiled into the platform > device driver or auto-loaded by an initrd. The problem with a new > memmap= is that it is too coarse. For example you can't do things > like specify a pmem range per-NUMA node. Sure you can as long as you know the layout. memmap= can be specified multiple times. Again, I see absolutely zero benefit of doing crap like request_firmware() to convert interface, and I'm also tired of having this talk about code that will eventually be released and should be superior (and from all that I can guess so far will actually be far worse). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig wrote: > On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote: >> This is mostly ok and does not collide too much with the upcoming ACPI >> mechanism for this stuff. I do worry that the new >> "memmap=nn[KMG]!ss[KMG]" kernel command line option will only be >> relevant for at most one kernel cycle given the imminent publication >> of the spec that unblocks our release. > > I don't think we can just get rid of it as legacy systems won't be > upgraded to the new discovery mechanism. Or do you mean you plan to > introduce a better override on the command line? In that case speak > up now! The kernel command line would simply be the standard/existing memmap= to reserve a memory range. Then, when the platform device loads, it does a request_firmware() to inject a binary table that further carves memory into ranges to which the pmem driver attaches. No need for the legacy system BIOS to be upgraded to the "new way". >> Our planned solution to the "legacy pmem" problem is to have a >> userspace utility craft a list of address ranges in the form that ACPI >> expects and attach that to a platform device (one time setup). It >> only requires that the memory be marked reserved, not necessarily >> marked type-12. > > I can't see any benefit of that over just doign the right thing in > kernel space. It does do the right thing in kernel space. The userspace utility creates the binary table (once) that can be compiled into the platform device driver or auto-loaded by an initrd. The problem with a new memmap= is that it is too coarse. For example you can't do things like specify a pmem range per-NUMA node. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote: > This is mostly ok and does not collide too much with the upcoming ACPI > mechanism for this stuff. I do worry that the new > "memmap=nn[KMG]!ss[KMG]" kernel command line option will only be > relevant for at most one kernel cycle given the imminent publication > of the spec that unblocks our release. I don't think we can just get rid of it as legacy systems won't be upgraded to the new discovery mechanism. Or do you mean you plan to introduce a better override on the command line? In that case speak up now! > Our planned solution to the "legacy pmem" problem is to have a > userspace utility craft a list of address ranges in the form that ACPI > expects and attach that to a platform device (one time setup). It > only requires that the memory be marked reserved, not necessarily > marked type-12. I can't see any benefit of that over just doign the right thing in kernel space. > > The other two patches are a heavily rewritten version of the code that > > Intel gave to various storage vendors to discover the type 12 (and earlier > > type 6) nvdimms, which I massaged into a form that is hopefully suitable > > for mainline. > > I'd prefer E820_PMEM over E820_PROTECTED_KERN, I don't know why I > chose that name initially, but to each his own bike shed. Sounds fine to me. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 9:04 AM, Christoph Hellwig wrote: > Here is another version of the same trivial pmem driver, because two > obviously aren't enough. Welcome to the party! :-) > The first patch is the same pmem driver > that Ross posted a short time ago, just modified to use platform_devices > to find the persistant memory region instead of hardconding it in the > Kconfig. This allows to keep pmem.c separate from any discovery mechanism, > but still allow auto-discovery. This is mostly ok and does not collide too much with the upcoming ACPI mechanism for this stuff. I do worry that the new "memmap=nn[KMG]!ss[KMG]" kernel command line option will only be relevant for at most one kernel cycle given the imminent publication of the spec that unblocks our release. Our planned solution to the "legacy pmem" problem is to have a userspace utility craft a list of address ranges in the form that ACPI expects and attach that to a platform device (one time setup). It only requires that the memory be marked reserved, not necessarily marked type-12. > The other two patches are a heavily rewritten version of the code that > Intel gave to various storage vendors to discover the type 12 (and earlier > type 6) nvdimms, which I massaged into a form that is hopefully suitable > for mainline. I'd prefer E820_PMEM over E820_PROTECTED_KERN, I don't know why I chose that name initially, but to each his own bike shed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 9:04 AM, Christoph Hellwig h...@lst.de wrote: Here is another version of the same trivial pmem driver, because two obviously aren't enough. Welcome to the party! :-) The first patch is the same pmem driver that Ross posted a short time ago, just modified to use platform_devices to find the persistant memory region instead of hardconding it in the Kconfig. This allows to keep pmem.c separate from any discovery mechanism, but still allow auto-discovery. This is mostly ok and does not collide too much with the upcoming ACPI mechanism for this stuff. I do worry that the new memmap=nn[KMG]!ss[KMG] kernel command line option will only be relevant for at most one kernel cycle given the imminent publication of the spec that unblocks our release. Our planned solution to the legacy pmem problem is to have a userspace utility craft a list of address ranges in the form that ACPI expects and attach that to a platform device (one time setup). It only requires that the memory be marked reserved, not necessarily marked type-12. The other two patches are a heavily rewritten version of the code that Intel gave to various storage vendors to discover the type 12 (and earlier type 6) nvdimms, which I massaged into a form that is hopefully suitable for mainline. I'd prefer E820_PMEM over E820_PROTECTED_KERN, I don't know why I chose that name initially, but to each his own bike shed. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 9:44 AM, Christoph Hellwig h...@lst.de wrote: On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote: This is mostly ok and does not collide too much with the upcoming ACPI mechanism for this stuff. I do worry that the new memmap=nn[KMG]!ss[KMG] kernel command line option will only be relevant for at most one kernel cycle given the imminent publication of the spec that unblocks our release. I don't think we can just get rid of it as legacy systems won't be upgraded to the new discovery mechanism. Or do you mean you plan to introduce a better override on the command line? In that case speak up now! The kernel command line would simply be the standard/existing memmap= to reserve a memory range. Then, when the platform device loads, it does a request_firmware() to inject a binary table that further carves memory into ranges to which the pmem driver attaches. No need for the legacy system BIOS to be upgraded to the new way. Our planned solution to the legacy pmem problem is to have a userspace utility craft a list of address ranges in the form that ACPI expects and attach that to a platform device (one time setup). It only requires that the memory be marked reserved, not necessarily marked type-12. I can't see any benefit of that over just doign the right thing in kernel space. It does do the right thing in kernel space. The userspace utility creates the binary table (once) that can be compiled into the platform device driver or auto-loaded by an initrd. The problem with a new memmap= is that it is too coarse. For example you can't do things like specify a pmem range per-NUMA node. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote: The kernel command line would simply be the standard/existing memmap= to reserve a memory range. Then, when the platform device loads, it does a request_firmware() to inject a binary table that further carves memory into ranges to which the pmem driver attaches. No need for the legacy system BIOS to be upgraded to the new way. Ewww... It does do the right thing in kernel space. The userspace utility creates the binary table (once) that can be compiled into the platform device driver or auto-loaded by an initrd. The problem with a new memmap= is that it is too coarse. For example you can't do things like specify a pmem range per-NUMA node. Sure you can as long as you know the layout. memmap= can be specified multiple times. Again, I see absolutely zero benefit of doing crap like request_firmware() to convert interface, and I'm also tired of having this talk about code that will eventually be released and should be superior (and from all that I can guess so far will actually be far worse). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 09:33:52AM -0700, Dan Williams wrote: This is mostly ok and does not collide too much with the upcoming ACPI mechanism for this stuff. I do worry that the new memmap=nn[KMG]!ss[KMG] kernel command line option will only be relevant for at most one kernel cycle given the imminent publication of the spec that unblocks our release. I don't think we can just get rid of it as legacy systems won't be upgraded to the new discovery mechanism. Or do you mean you plan to introduce a better override on the command line? In that case speak up now! Our planned solution to the legacy pmem problem is to have a userspace utility craft a list of address ranges in the form that ACPI expects and attach that to a platform device (one time setup). It only requires that the memory be marked reserved, not necessarily marked type-12. I can't see any benefit of that over just doign the right thing in kernel space. The other two patches are a heavily rewritten version of the code that Intel gave to various storage vendors to discover the type 12 (and earlier type 6) nvdimms, which I massaged into a form that is hopefully suitable for mainline. I'd prefer E820_PMEM over E820_PROTECTED_KERN, I don't know why I chose that name initially, but to each his own bike shed. Sounds fine to me. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
On Wed, Mar 25, 2015 at 10:04 AM, Christoph Hellwig h...@lst.de wrote: On Wed, Mar 25, 2015 at 10:00:26AM -0700, Dan Williams wrote: The kernel command line would simply be the standard/existing memmap= to reserve a memory range. Then, when the platform device loads, it does a request_firmware() to inject a binary table that further carves memory into ranges to which the pmem driver attaches. No need for the legacy system BIOS to be upgraded to the new way. Ewww... It does do the right thing in kernel space. The userspace utility creates the binary table (once) that can be compiled into the platform device driver or auto-loaded by an initrd. The problem with a new memmap= is that it is too coarse. For example you can't do things like specify a pmem range per-NUMA node. Sure you can as long as you know the layout. memmap= can be specified multiple times. Again, I see absolutely zero benefit of doing crap like request_firmware() to convert interface, and I'm also tired of having this talk about code that will eventually be released and should be superior (and from all that I can guess so far will actually be far worse). You and me both... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Linux-nvdimm] another pmem variant
The other two patches are a heavily rewritten version of the code that Intel gave to various storage vendors to discover the type 12 (and earlier type 6) nvdimms, which I massaged into a form that is hopefully suitable for mainline. The problem is that the e820 or the UEFI Memory Map Table on their own are really bad ways to represent NVDIMMs. The memory table idea was originally developed 6 years ago prior to NVDIMMs existing. It was used to define traditional battery backed memory. With traditional battery backed memory either the whole region was going to be valid or the whole region was going to be gone. There was also no concept of arming. You simply have x hours of data retention based on your battery be y% charged. Fast forward a couple years, and we continued using the memory table method for something called Copy To Flash where the CPU would copy memory from the DIMMs to a SSD of some sort. Again this was a whole region or none of the region solution and because we were typically using SATA SSD there was no need to arm anything. Additionally the restore operation (and even the save operation if you were brave enough) could be done from the OS. Therefore there was no need for the BIOS to pass up any status regarding if the recovery was successful or not. Fast forward again to the present day and NVDIMMs. We used the memory table model initially for NVDIMM because 1) the BIOS code was already in place 2) we had a non-upstreamed driver (something that predated pmem by several years called ADRBD). In a perfect world where there are no hardware failures e820+ADRBD work great for NVDIMMs. However in the real world where there are failures it has a number of short comings. Mainly there are the following issues with it: 1) The region may now be comprised for 2+ different NVDIMMs that have different statuses. A subset of NVDIMMs may have failed the restore. An NVDIMM may have been added since after the last save/restore of the existing NVDIMM 2) Just based on the e820 table, the OS has no one of knowing where the boundaries of the NVDIMMs are. It has no one of knowing if they are all interleaved together where a failure of single NVDIMM means the loss of the whole region, or if the NVDIMMs are non-interleaved and can be treated as separate memory regions to prevent the failure of one NVDIMM from causing data to be lost form all NVDIMM 2) Due to the requirement to restore the MRS/RC registers the NVDIMM restore must be done from the BIOS. Depending on the security settings of the platform the OS may not be able to directly interrogate the individual NVDIMMs to find their status. Even if the OS can get to the NVDIMM over SMBUS all information about the status of the last restore attempt may have been wiped if the BIOS was also configured to do the erase/arm operation For those reasons (and more) simply using the current memory tables is not a good solution. A more detailed NVDIMM specific table is required to surface the status and configuration of the NVDIMMs. Unfortunately that table has been perpetually delayed, and a result people are trying to move forward with Type 12. I understand why this has been done, and for highly embedded storage appliances it is fine, because those users probably inherently know the configuration of the NVDIMMs. However for general purpose systems where the user has no way of knowing the exact configuration of the DIMMS, just using the e820 or UEFI Memory Map table is not sufficient. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-nvdimm] another pmem variant
Hi Adam, we're all well aware of the deficits of the current interface, but for now that's all we have, and due to lead times in implementing bioses it will be all we have for quite a while. We're all eagerly looking forward to better interfaces and bioses that will support them. But for now evryone would love to just be able to use existing systems with an out of the box Linux kernel. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/