Re: [PATCH 0/5] [v4] Allow persistent memory to be used like normal RAM
On Thu, Jan 24, 2019 at 03:14:41PM -0800, Dave Hansen wrote: > v3 spurred a bunch of really good discussion. Thanks to everybody > that made comments and suggestions! > > I would still love some Acks on this from the folks on cc, even if it > is on just the patch touching your area. > > Note: these are based on commit d2f33c19644 in: > > git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git > libnvdimm-pending > > Changes since v3: > * Move HMM-related resource warning instead of removing it > * Use __request_resource() directly instead of devm. > * Create a separate DAX_PMEM Kconfig option, complete with help text > * Update patch descriptions and cover letter to give a better >overview of use-cases and hardware where this might be useful. > > Changes since v2: > * Updates to dev_dax_kmem_probe() in patch 5: >* Reject probes for devices with bad NUMA nodes. Keeps slow > memory from being added to node 0. >* Use raw request_mem_region() >* Add comments about permanent reservation >* use dev_*() instead of printk's > * Add references to nvdimm documentation in descriptions > * Remove unneeded GPL export > * Add Kconfig prompt and help text > > Changes since v1: > * Now based on git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git > * Use binding/unbinding from "dax bus" code > * Move over to a "dax bus" driver from being an nvdimm driver > > -- > > Persistent memory is cool. But, currently, you have to rewrite > your applications to use it. Wouldn't it be cool if you could > just have it show up in your system like normal RAM and get to > it like a slow blob of memory? Well... have I got the patch > series for you! > > == Background / Use Cases == > > Persistent Memory (aka Non-Volatile DIMMs / NVDIMMS) themselves > are described in detail in Documentation/nvdimm/nvdimm.txt. > However, this documentation focuses on actually using them as > storage. This set is focused on using NVDIMMs as DRAM replacement. > > This is intended for Intel-style NVDIMMs (aka. Intel Optane DC > persistent memory) NVDIMMs. These DIMMs are physically persistent, > more akin to flash than traditional RAM. They are also expected to > be more cost-effective than using RAM, which is why folks want this > set in the first place. What variant of NVDIMM's F/P or both? > > This set is not intended for RAM-based NVDIMMs. Those are not > cost-effective vs. plain RAM, and this using them here would simply > be a waste. > Sounds like NVDIMM (P) > But, why would you bother with this approach? Intel itself [1] > has announced a hardware feature that does something very similar: > "Memory Mode" which turns DRAM into a cache in front of persistent > memory, which is then as a whole used as normal "RAM"? > > Here are a few reasons: > 1. The capacity of memory mode is the size of your persistent >memory that you dedicate. DRAM capacity is "lost" because it >is used for cache. With this, you get PMEM+DRAM capacity for >memory. > 2. DRAM acts as a cache with memory mode, and caches can lead to >unpredictable latencies. Since memory mode is all-or-nothing >(either all your DRAM is used as cache or none is), your entire >memory space is exposed to these unpredictable latencies. This >solution lets you guarantee DRAM latencies if you need them. > 3. The new "tier" of memory is exposed to software. That means >that you can build tiered applications or infrastructure. A >cloud provider could sell cheaper VMs that use more PMEM and >more expensive ones that use DRAM. That's impossible with >memory mode. > > Don't take this as criticism of memory mode. Memory mode is > awesome, and doesn't strictly require *any* software changes (we > have software changes proposed for optimizing it though). It has > tons of other advantages over *this* approach. Basically, we > believe that the approach in these patches is complementary to > memory mode and that both can live side-by-side in harmony. > > == Patch Set Overview == > > This series adds a new "driver" to which pmem devices can be > attached. Once attached, the memory "owned" by the device is > hot-added to the kernel and managed like any other memory. On > systems with an HMAT (a new ACPI table), each socket (roughly) > will have a separate NUMA node for its persistent memory so > this newly-added memory can be selected by its unique NUMA > node. NUMA is distance based topology, does HMAT solve these problems? How do we prevent fallback nodes of normal nodes being pmem nodes? On an unexpected crash/failure is there a scrubbing mechanism or do we rely on the allocator to do the right thing prior to reallocating any memory. Will frequent zero'ing hurt NVDIMM/pmem's life times? Balbir Singh. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem
On Wed, May 2, 2018 at 6:57 AM, Dan Williams <dan.j.willi...@intel.com> wrote: > On Thu, Apr 5, 2018 at 8:00 AM, Dan Williams <dan.j.willi...@intel.com> wrote: >> On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin <npig...@gmail.com> wrote: > [,,] >>> What's the problem with just counting bytes copied like usercopy -- >>> why is that harder than cacheline accuracy? >>> >>>> I'd rather implement the existing interface and port/support the new >>>> interface >>>> as it becomes available >>> >>> Fair enough. >> >> I have patches already in progress to change the interface. My >> preference is to hold off on adding a new implementation that will >> need to be immediately reworked. When I say "immediate" I mean that >> should be able to post what I have for review within the next few >> days. >> >> Whether this is all too late for 4.17 is another question... > > Here is the x86 version of a 'bytes remaining' memcpy_mcsafe() implemenation: > > https://lists.01.org/pipermail/linux-nvdimm/2018-May/015548.html Thanks for the heads up! I'll work on the implementation for powerpc. Balbir Singh. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v3 1/4] libnvdimm: Add of_node to region and bus descriptors
On Sat, Apr 7, 2018 at 4:28 AM, Dan Williams <dan.j.willi...@intel.com> wrote: > On Thu, Apr 5, 2018 at 10:21 PM, Oliver O'Halloran <ooh...@gmail.com> wrote: >> We want to be able to cross reference the region and bus devices >> with the device tree node that they were spawned from. libNVDIMM >> handles creating the actual devices for these internally, so we >> need to pass in a pointer to the relevant node in the descriptor. >> >> Signed-off-by: Oliver O'Halloran <ooh...@gmail.com> >> Acked-by: Dan Williams <dan.j.willi...@intel.com> > > These look good to me. I'll get them applied today and let them soak > over the weekend for a pull request on Monday. > > If anyone wants to add Acks or Reviews I can append them to the merge > tag. If there are any NAKs please speak up now, but as far as I know > there are no pending device-tree design concerns. Hi, Dan I can ack Oliver's work, will do so in each patch Overall Acked-by: Balbir Singh <bsinghar...@gmail.com> Balbir Singh ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem
On Fri, Apr 6, 2018 at 11:26 AM, Nicholas Piggin <npig...@gmail.com> wrote: > On Thu, 05 Apr 2018 16:40:26 -0400 > Jeff Moyer <jmo...@redhat.com> wrote: > >> Nicholas Piggin <npig...@gmail.com> writes: >> >> > On Thu, 5 Apr 2018 15:53:07 +1000 >> > Balbir Singh <bsinghar...@gmail.com> wrote: >> >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem >> >> context >> >> in the context of a machine check exception. Also, do we want to be byte >> >> accurate or cache-line accurate for the bytes remaining? The former is >> >> much >> >> easier than the latter :) >> > >> > The ideal would be a linear measure of how much of your copy reached >> > (or can reach) non-volatile storage with nothing further copied. You >> > may have to allow for some relaxing of the semantics depending on >> > what the architecture can support. >> >> I think you've got that backwards. memcpy_mcsafe is used to copy *from* >> persistent memory. The idea is to catch errors when reading pmem, not >> writing to it. >> I know the comment in x86 says posted writes and cares for only loads, but I don't see why both sides should not be handled. >> > What's the problem with just counting bytes copied like usercopy -- >> > why is that harder than cacheline accuracy? >> >> He said the former (i.e. bytes) is easier. So, I think you're on the >> same page. :) > > Oh well that makes a lot more sense in my mind now, thanks :) I thought the cache-aligned might make sense, since usually we'd expect the failure to be at a cache-line level, but our copy_tofrom_user does accurate accounting Balbir Singh. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v2 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem
On Thu, Apr 5, 2018 at 9:26 PM, Oliver <ooh...@gmail.com> wrote: > On Thu, Apr 5, 2018 at 5:14 PM, Balbir Singh <bsinghar...@gmail.com> wrote: >> The pmem infrastructure uses memcpy_mcsafe in the pmem >> layer so as to convert machine check excpetions into >> a return value on failure in case a machine check >> exception is encoutered during the memcpy. >> > > Would it be possible to move the bulk of the copyuser code into a > seperate file which can be #included once the these err macros are > defined? Anton's memcpy is pretty hairy and I don't think anyone wants > to have multiple copies of it in the tree, even in a cut down form. > I've split it out for now, in the future that might be a good thing to do. The copy_tofrom_user_power7 falls backs on __copy_tofrom_user_base to track exactly how much is left over. Adding these changes there would create a larger churn and need way more testing. I've taken this short-cut for now with a promise to fix that as the semantics of memcpy_mcsafe() change to do more accurate tracking of how much was copied over. Balbir Singh. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2 3/3] powerpc/mce: Handle memcpy_mcsafe
Add a blocking notifier callback to be called in real-mode on machine check exceptions for UE (ld/st) errors only. The patch registers a callback on boot to be notified of machine check exceptions and returns a NOTIFY_STOP when a page of interest is seen as the source of the machine check exception. This page of interest is a ZONE_DEVICE page and hence for now, for memcpy_mcsafe to work, the page needs to belong to ZONE_DEVICE and memcpy_mcsafe should be used to access the memory. The patch also modifies the NIP of the exception context to go back to the fixup handler (in memcpy_mcsafe) and does not print any error message as the error is treated as returned via a return value and handled. Signed-off-by: Balbir Singh <bsinghar...@gmail.com> --- arch/powerpc/include/asm/mce.h | 3 +- arch/powerpc/kernel/mce.c | 77 -- 2 files changed, 77 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 3a1226e9b465..a76638e3e47e 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -125,7 +125,8 @@ struct machine_check_event { enum MCE_UeErrorType ue_error_type:8; uint8_t effective_address_provided; uint8_t physical_address_provided; - uint8_t reserved_1[5]; + uint8_t error_return; + uint8_t reserved_1[4]; uint64_teffective_address; uint64_tphysical_address; uint8_t reserved_2[8]; diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index efdd16a79075..b9e4881fa8c5 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -28,7 +28,9 @@ #include #include #include +#include +#include #include #include @@ -54,6 +56,52 @@ static struct irq_work mce_event_process_work = { DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); +static BLOCKING_NOTIFIER_HEAD(mce_notifier_list); + +int register_mce_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(_notifier_list, nb); +} +EXPORT_SYMBOL_GPL(register_mce_notifier); + +int unregister_mce_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(_notifier_list, nb); +} +EXPORT_SYMBOL_GPL(unregister_mce_notifier); + + +static int check_memcpy_mcsafe(struct notifier_block *nb, + unsigned long val, void *data) +{ + /* +* val contains the physical_address of the bad address +*/ + unsigned long pfn = val >> PAGE_SHIFT; + struct page *page = realmode_pfn_to_page(pfn); + int rc = NOTIFY_DONE; + + if (!page) + goto out; + + if (is_zone_device_page(page)) /* for HMM and PMEM */ + rc = NOTIFY_STOP; +out: + return rc; +} + +struct notifier_block memcpy_mcsafe_nb = { + .priority = 0, + .notifier_call = check_memcpy_mcsafe, +}; + +int mce_mcsafe_register(void) +{ + register_mce_notifier(_mcsafe_nb); + return 0; +} +arch_initcall(mce_mcsafe_register); + static void mce_set_error_info(struct machine_check_event *mce, struct mce_error_info *mce_err) { @@ -151,9 +199,31 @@ void save_mce_event(struct pt_regs *regs, long handled, mce->u.ue_error.effective_address_provided = true; mce->u.ue_error.effective_address = addr; if (phys_addr != ULONG_MAX) { + int rc; + const struct exception_table_entry *entry; + + /* +* Once we have the physical address, we check to +* see if the current nip has a fixup entry. +* Having a fixup entry plus the notifier stating +* that it can handle the exception is an indication +* that we should return to the fixup entry and +* return an error from there +*/ mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; - machine_check_ue_event(mce); + + rc = blocking_notifier_call_chain(_notifier_list, + phys_addr, NULL); + if (rc & NOTIFY_STOP_MASK) { + entry = search_exception_tables(regs->nip); + if (entry != NULL) { + mce->u.ue_error.error_return = 1; + regs->nip = extable_fixup(entry); +
[PATCH v2 1/3] powerpc/mce: Bug fixes for MCE handling in kernel space
The code currently assumes PAGE_SHIFT as the shift value of the pfn, this works correctly (mostly) for user space pages, but the correct thing to do is 1. Extract the shift value returned via the pte-walk API's 2. Use the shift value to access the instruction address. Note, the final physical address still use PAGE_SHIFT for computation. handle_ierror() is not modified and handle_derror() is modified just for extracting the correct instruction address. This is largely due to __find_linux_pte() returning pfn's shifted by pdshift. The code is much more generic and can handle shift values returned. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh <bsinghar...@gmail.com> --- arch/powerpc/kernel/mce_power.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index fe6fc63251fe..bd9754def479 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -36,7 +36,8 @@ * Convert an address related to an mm to a PFN. NOTE: we are in real * mode, we could potentially race with page table updates. */ -static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) +static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr, + unsigned int *shift) { pte_t *ptep; unsigned long flags; @@ -49,13 +50,15 @@ static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) local_irq_save(flags); if (mm == current->mm) - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); + ptep = find_current_mm_pte(mm->pgd, addr, NULL, shift); else - ptep = find_init_mm_pte(addr, NULL); + ptep = find_init_mm_pte(addr, shift); local_irq_restore(flags); if (!ptep || pte_special(*ptep)) return ULONG_MAX; - return pte_pfn(*ptep); + if (!*shift) + *shift = PAGE_SHIFT; + return (pte_val(*ptep) & PTE_RPN_MASK) >> *shift; } /* flush SLBs and reload */ @@ -353,15 +356,16 @@ static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, unsigned long pfn, instr_addr; struct instruction_op op; struct pt_regs tmp = *regs; + unsigned int shift; - pfn = addr_to_pfn(regs, regs->nip); + pfn = addr_to_pfn(regs, regs->nip, ); if (pfn != ULONG_MAX) { - instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK); + instr_addr = (pfn << shift) + (regs->nip & ((1 << shift) - 1)); instr = *(unsigned int *)(instr_addr); if (!analyse_instr(, , instr)) { - pfn = addr_to_pfn(regs, op.ea); + pfn = addr_to_pfn(regs, op.ea, ); *addr = op.ea; - *phys_addr = (pfn << PAGE_SHIFT); + *phys_addr = (pfn << shift); return 0; } /* @@ -435,12 +439,14 @@ static int mce_handle_ierror(struct pt_regs *regs, if (mce_err->severity == MCE_SEV_ERROR_SYNC && table[i].error_type == MCE_ERROR_TYPE_UE) { unsigned long pfn; + unsigned int shift; if (get_paca()->in_mce < MAX_MCE_DEPTH) { - pfn = addr_to_pfn(regs, regs->nip); + pfn = addr_to_pfn(regs, regs->nip, + ); if (pfn != ULONG_MAX) { *phys_addr = - (pfn << PAGE_SHIFT); + (pfn << shift); handled = 1; } } -- 2.13.6 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v2 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem
The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to convert machine check excpetions into a return value on failure in case a machine check exception is encoutered during the memcpy. This patch largely borrows from the copyuser_power7 logic and does not add the VMX optimizations, largely to keep the patch simple. If needed those optimizations can be folded in. Signed-off-by: Balbir Singh <bsinghar...@gmail.com> Acked-by: Nicholas Piggin <npig...@gmail.com> --- arch/powerpc/include/asm/string.h | 2 + arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 212 3 files changed, 215 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h index 9b8cedf618f4..b7e872a64726 100644 --- a/arch/powerpc/include/asm/string.h +++ b/arch/powerpc/include/asm/string.h @@ -30,7 +30,9 @@ extern void * memcpy_flushcache(void *,const void *,__kernel_size_t); #ifdef CONFIG_PPC64 #define __HAVE_ARCH_MEMSET32 #define __HAVE_ARCH_MEMSET64 +#define __HAVE_ARCH_MEMCPY_MCSAFE +extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz); extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t); extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t); extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t); diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 3c29c9009bbf..048afee9f518 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -24,7 +24,7 @@ endif obj64-y+= copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \ - memcpy_64.o memcmp_64.o pmem.o + memcpy_64.o memcmp_64.o pmem.o memcpy_mcsafe_64.o obj64-$(CONFIG_SMP)+= locks.o obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S b/arch/powerpc/lib/memcpy_mcsafe_64.S new file mode 100644 index ..e7eaa9b6cded --- /dev/null +++ b/arch/powerpc/lib/memcpy_mcsafe_64.S @@ -0,0 +1,212 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) IBM Corporation, 2011 + * Derived from copyuser_power7.s by Anton Blanchard <an...@au.ibm.com> + * Author - Balbir Singh <bsinghar...@gmail.com> + */ +#include +#include + + .macro err1 +100: + EX_TABLE(100b,.Ldo_err1) + .endm + + .macro err2 +200: + EX_TABLE(200b,.Ldo_err2) + .endm + +.Ldo_err2: + ld r22,STK_REG(R22)(r1) + ld r21,STK_REG(R21)(r1) + ld r20,STK_REG(R20)(r1) + ld r19,STK_REG(R19)(r1) + ld r18,STK_REG(R18)(r1) + ld r17,STK_REG(R17)(r1) + ld r16,STK_REG(R16)(r1) + ld r15,STK_REG(R15)(r1) + ld r14,STK_REG(R14)(r1) + addir1,r1,STACKFRAMESIZE +.Ldo_err1: + li r3,-EFAULT + blr + + +_GLOBAL(memcpy_mcsafe) + cmpldi r5,16 + blt .Lshort_copy + +.Lcopy: + /* Get the source 8B aligned */ + neg r6,r4 + mtocrf 0x01,r6 + clrldi r6,r6,(64-3) + + bf cr7*4+3,1f +err1; lbz r0,0(r4) + addir4,r4,1 +err1; stb r0,0(r3) + addir3,r3,1 + +1: bf cr7*4+2,2f +err1; lhz r0,0(r4) + addir4,r4,2 +err1; sth r0,0(r3) + addir3,r3,2 + +2: bf cr7*4+1,3f +err1; lwz r0,0(r4) + addir4,r4,4 +err1; stw r0,0(r3) + addir3,r3,4 + +3: sub r5,r5,r6 + cmpldi r5,128 + blt 5f + + mflrr0 + stdur1,-STACKFRAMESIZE(r1) + std r14,STK_REG(R14)(r1) + std r15,STK_REG(R15)(r1) + std r16,STK_REG(R16)(r1) + std r17,STK_REG(R17)(r1) + std r18,STK_REG(R18)(r1) + std r19,STK_REG(R19)(r1) + std r20,STK_REG(R20)(r1) + std r21,STK_REG(R21)(r1) + std r22,STK_REG(R22)(r1) + std r0,STACKFRAMESIZE+16(r1) + + srdir6,r5,7 + mtctr r6 + + /* Now do cacheline (128B) sized loads and stores. */ + .align 5 +4: +err2; ld r0,0(r4) +err2; ld r6,8(r4) +err2; ld r7,16(r4) +err2; ld r8,24(r4) +err2; ld r9,32(r4) +err2; ld r10,40(r4) +err2; ld r11,48(r4) +err2; ld r12,56(r4) +err2; ld r14,64(r4) +err2; ld r15,72(r4) +err2; ld r16,80(r4) +err2; ld r17,88(r4) +err2; ld r18,96(r4) +err2; ld r19,104(r4) +err2; ld r20,112(r4) +err2; ld r21,120(r4) + addir4,r4,128 +err2; std r0,0(r3) +err2; std r6,8(r3) +err2; std r7,16(r3) +err2; std r8,24(r3) +err2; std r9,32(r3) +err2; std r10,40(r3) +err2; std r11,48(r3) +err2; std r12,56(r3) +err2; std r14,64(r3) +err2; std r15,72(r3) +err2; std
Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem
On Thu, 5 Apr 2018 15:04:05 +1000 Nicholas Piggin <npig...@gmail.com> wrote: > On Wed, 4 Apr 2018 20:00:52 -0700 > Dan Williams <dan.j.willi...@intel.com> wrote: > > > [ adding Matthew, Christoph, and Tony ] > > > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npig...@gmail.com> wrote: > > > On Thu, 5 Apr 2018 09:19:42 +1000 > > > Balbir Singh <bsinghar...@gmail.com> wrote: > > > > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem > > >> layer so as to convert machine check excpetions into > > >> a return value on failure in case a machine check > > >> exception is encoutered during the memcpy. > > >> > > >> This patch largely borrows from the copyuser_power7 > > >> logic and does not add the VMX optimizations, largely > > >> to keep the patch simple. If needed those optimizations > > >> can be folded in. > > > > > > So memcpy_mcsafe doesn't return number of bytes copied? > > > Huh, well that makes it simple. > > > > Well, not in current kernels, but we need to add that support or > > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking > > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), > > but for copy_to_user we also need to handle bytes remaining for write > > faults. That fix is hopefully something that can land in an early > > 4.17-rc, but it won't be ready for -rc1. > > I wonder if the powerpc implementation should just go straight to > counting bytes. Backporting to this interface would be trivial, but > it would just mean there's only one variant of the code to support. > That's up to Balbir though. > I'm thinking about it, I wonder what "bytes remaining" mean in pmem context in the context of a machine check exception. Also, do we want to be byte accurate or cache-line accurate for the bytes remaining? The former is much easier than the latter :) I'd rather implement the existing interface and port/support the new interface as it becomes available Balbir Singh. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [RESEND 1/3] powerpc/mce: Bug fixes for MCE handling in kernel space
On Thu, 5 Apr 2018 09:49:00 +1000 Nicholas Piggin <npig...@gmail.com> wrote: > On Thu, 5 Apr 2018 09:19:41 +1000 > Balbir Singh <bsinghar...@gmail.com> wrote: > > > The code currently assumes PAGE_SHIFT as the shift value of > > the pfn, this works correctly (mostly) for user space pages, > > but the correct thing to do is > > It would be good to actually explain the problem in the > changelog. I would have thought pte_pfn returns a > PAGE_SIZE based pfn value? > The issue is hidden inside of hugepte_offset() as invoked by __find_linux_pte(). I will send a new version because the code needs to do << (shift - PAGE_SHIFT) for instruction address. > > > > 1. Extrace the shift value returned via the pte-walk API's > > ^^^ extract? Thanks, yes, typo! Balbir Singh. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[RESEND 3/3] powerpc/mce: Handle memcpy_mcsafe
Add a blocking notifier callback to be called in real-mode on machine check exceptions for UE (ld/st) errors only. The patch registers a callback on boot to be notified of machine check exceptions and returns a NOTIFY_STOP when a page of interest is seen as the source of the machine check exception. This page of interest is a ZONE_DEVICE page and hence for now, for memcpy_mcsafe to work, the page needs to belong to ZONE_DEVICE and memcpy_mcsafe should be used to access the memory. The patch also modifies the NIP of the exception context to go back to the fixup handler (in memcpy_mcsafe) and does not print any error message as the error is treated as returned via a return value and handled. Signed-off-by: Balbir Singh <bsinghar...@gmail.com> --- arch/powerpc/include/asm/mce.h | 3 +- arch/powerpc/kernel/mce.c | 77 -- 2 files changed, 77 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 3a1226e9b465..a76638e3e47e 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -125,7 +125,8 @@ struct machine_check_event { enum MCE_UeErrorType ue_error_type:8; uint8_t effective_address_provided; uint8_t physical_address_provided; - uint8_t reserved_1[5]; + uint8_t error_return; + uint8_t reserved_1[4]; uint64_teffective_address; uint64_tphysical_address; uint8_t reserved_2[8]; diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index efdd16a79075..b9e4881fa8c5 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -28,7 +28,9 @@ #include #include #include +#include +#include #include #include @@ -54,6 +56,52 @@ static struct irq_work mce_event_process_work = { DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); +static BLOCKING_NOTIFIER_HEAD(mce_notifier_list); + +int register_mce_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(_notifier_list, nb); +} +EXPORT_SYMBOL_GPL(register_mce_notifier); + +int unregister_mce_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(_notifier_list, nb); +} +EXPORT_SYMBOL_GPL(unregister_mce_notifier); + + +static int check_memcpy_mcsafe(struct notifier_block *nb, + unsigned long val, void *data) +{ + /* +* val contains the physical_address of the bad address +*/ + unsigned long pfn = val >> PAGE_SHIFT; + struct page *page = realmode_pfn_to_page(pfn); + int rc = NOTIFY_DONE; + + if (!page) + goto out; + + if (is_zone_device_page(page)) /* for HMM and PMEM */ + rc = NOTIFY_STOP; +out: + return rc; +} + +struct notifier_block memcpy_mcsafe_nb = { + .priority = 0, + .notifier_call = check_memcpy_mcsafe, +}; + +int mce_mcsafe_register(void) +{ + register_mce_notifier(_mcsafe_nb); + return 0; +} +arch_initcall(mce_mcsafe_register); + static void mce_set_error_info(struct machine_check_event *mce, struct mce_error_info *mce_err) { @@ -151,9 +199,31 @@ void save_mce_event(struct pt_regs *regs, long handled, mce->u.ue_error.effective_address_provided = true; mce->u.ue_error.effective_address = addr; if (phys_addr != ULONG_MAX) { + int rc; + const struct exception_table_entry *entry; + + /* +* Once we have the physical address, we check to +* see if the current nip has a fixup entry. +* Having a fixup entry plus the notifier stating +* that it can handle the exception is an indication +* that we should return to the fixup entry and +* return an error from there +*/ mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; - machine_check_ue_event(mce); + + rc = blocking_notifier_call_chain(_notifier_list, + phys_addr, NULL); + if (rc & NOTIFY_STOP_MASK) { + entry = search_exception_tables(regs->nip); + if (entry != NULL) { + mce->u.ue_error.error_return = 1; + regs->nip = extable_fixup(entry); +
[RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem
The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to convert machine check excpetions into a return value on failure in case a machine check exception is encoutered during the memcpy. This patch largely borrows from the copyuser_power7 logic and does not add the VMX optimizations, largely to keep the patch simple. If needed those optimizations can be folded in. Signed-off-by: Balbir Singh <bsinghar...@gmail.com> --- arch/powerpc/include/asm/string.h | 2 + arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 212 3 files changed, 215 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h index 9b8cedf618f4..b7e872a64726 100644 --- a/arch/powerpc/include/asm/string.h +++ b/arch/powerpc/include/asm/string.h @@ -30,7 +30,9 @@ extern void * memcpy_flushcache(void *,const void *,__kernel_size_t); #ifdef CONFIG_PPC64 #define __HAVE_ARCH_MEMSET32 #define __HAVE_ARCH_MEMSET64 +#define __HAVE_ARCH_MEMCPY_MCSAFE +extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz); extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t); extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t); extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t); diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 3c29c9009bbf..048afee9f518 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -24,7 +24,7 @@ endif obj64-y+= copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \ - memcpy_64.o memcmp_64.o pmem.o + memcpy_64.o memcmp_64.o pmem.o memcpy_mcsafe_64.o obj64-$(CONFIG_SMP)+= locks.o obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S b/arch/powerpc/lib/memcpy_mcsafe_64.S new file mode 100644 index ..e7eaa9b6cded --- /dev/null +++ b/arch/powerpc/lib/memcpy_mcsafe_64.S @@ -0,0 +1,212 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) IBM Corporation, 2011 + * Derived from copyuser_power7.s by Anton Blanchard <an...@au.ibm.com> + * Author - Balbir Singh <bsinghar...@gmail.com> + */ +#include +#include + + .macro err1 +100: + EX_TABLE(100b,.Ldo_err1) + .endm + + .macro err2 +200: + EX_TABLE(200b,.Ldo_err2) + .endm + +.Ldo_err2: + ld r22,STK_REG(R22)(r1) + ld r21,STK_REG(R21)(r1) + ld r20,STK_REG(R20)(r1) + ld r19,STK_REG(R19)(r1) + ld r18,STK_REG(R18)(r1) + ld r17,STK_REG(R17)(r1) + ld r16,STK_REG(R16)(r1) + ld r15,STK_REG(R15)(r1) + ld r14,STK_REG(R14)(r1) + addir1,r1,STACKFRAMESIZE +.Ldo_err1: + li r3,-EFAULT + blr + + +_GLOBAL(memcpy_mcsafe) + cmpldi r5,16 + blt .Lshort_copy + +.Lcopy: + /* Get the source 8B aligned */ + neg r6,r4 + mtocrf 0x01,r6 + clrldi r6,r6,(64-3) + + bf cr7*4+3,1f +err1; lbz r0,0(r4) + addir4,r4,1 +err1; stb r0,0(r3) + addir3,r3,1 + +1: bf cr7*4+2,2f +err1; lhz r0,0(r4) + addir4,r4,2 +err1; sth r0,0(r3) + addir3,r3,2 + +2: bf cr7*4+1,3f +err1; lwz r0,0(r4) + addir4,r4,4 +err1; stw r0,0(r3) + addir3,r3,4 + +3: sub r5,r5,r6 + cmpldi r5,128 + blt 5f + + mflrr0 + stdur1,-STACKFRAMESIZE(r1) + std r14,STK_REG(R14)(r1) + std r15,STK_REG(R15)(r1) + std r16,STK_REG(R16)(r1) + std r17,STK_REG(R17)(r1) + std r18,STK_REG(R18)(r1) + std r19,STK_REG(R19)(r1) + std r20,STK_REG(R20)(r1) + std r21,STK_REG(R21)(r1) + std r22,STK_REG(R22)(r1) + std r0,STACKFRAMESIZE+16(r1) + + srdir6,r5,7 + mtctr r6 + + /* Now do cacheline (128B) sized loads and stores. */ + .align 5 +4: +err2; ld r0,0(r4) +err2; ld r6,8(r4) +err2; ld r7,16(r4) +err2; ld r8,24(r4) +err2; ld r9,32(r4) +err2; ld r10,40(r4) +err2; ld r11,48(r4) +err2; ld r12,56(r4) +err2; ld r14,64(r4) +err2; ld r15,72(r4) +err2; ld r16,80(r4) +err2; ld r17,88(r4) +err2; ld r18,96(r4) +err2; ld r19,104(r4) +err2; ld r20,112(r4) +err2; ld r21,120(r4) + addir4,r4,128 +err2; std r0,0(r3) +err2; std r6,8(r3) +err2; std r7,16(r3) +err2; std r8,24(r3) +err2; std r9,32(r3) +err2; std r10,40(r3) +err2; std r11,48(r3) +err2; std r12,56(r3) +err2; std r14,64(r3) +err2; std r15,72(r3) +err2; std r16,80(r3) +err2; std r17,88(r3) +err2; std
[RESEND 0/3] Add support for memcpy_mcsafe
memcpy_mcsafe() is an API currently used by the pmem subsystem to convert errors while doing a memcpy (machine check exception errors) to a return value. This patchset consists of three patches 1. The first patch is a bug fix to handle machine check errors correctly while walking the page tables in kernel mode, due to huge pmd/pud sizes 2. The second patch adds memcpy_mcsafe() support, this is largely derived from existing code 3. The third patch registers for callbacks on machine check exceptions and in them uses specialized knowledge of the type of page to decide whether to handle the MCE as is or to return to a fixup address present in memcpy_mcsafe(). If a fixup address is used, then we return an error value of -EFAULT to the caller. Testing A large part of the testing was done under a simulator by selectively inserting machine check exceptions in a test driver doing memcpy_mcsafe via ioctls. Balbir Singh (3): powerpc/mce: Bug fixes for MCE handling in kernel space powerpc/memcpy: Add memcpy_mcsafe for pmem powerpc/mce: Handle memcpy_mcsafe arch/powerpc/include/asm/mce.h | 3 +- arch/powerpc/include/asm/string.h | 2 + arch/powerpc/kernel/mce.c | 76 +++- arch/powerpc/kernel/mce_power.c | 17 +-- arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 225 6 files changed, 314 insertions(+), 11 deletions(-) create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S -- 2.13.6 ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [RESEND v2 4/4] powerpc/powernv: Create platform devs for nvdimm buses
On Wed, 4 Apr 2018 00:24:15 +1000 Oliver O'Halloran <ooh...@gmail.com> wrote: > Scan the devicetree for an nvdimm-bus compatible and create > a platform device for them. > > Signed-off-by: Oliver O'Halloran <ooh...@gmail.com> > --- Acked-by: Balbir Singh <bsinghar...@gmail.com> ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [RESEND v2 1/4] libnvdimm: Add of_node to region and bus descriptors
On Wed, 4 Apr 2018 00:24:12 +1000 Oliver O'Halloran <ooh...@gmail.com> wrote: > We want to be able to cross reference the region and bus devices > with the device tree node that they were spawned from. libNVDIMM > handles creating the actual devices for these internally, so we > need to pass in a pointer to the relevant node in the descriptor. > > Signed-off-by: Oliver O'Halloran <ooh...@gmail.com> > Acked-by: Dan Williams <dan.j.willi...@intel.com> > --- Acked-by: Balbir Singh <bsinghar...@gmail.com> ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 3/6] libnvdimm: Add device-tree based driver
NXIO; > + > + pr_err("registering region for %pOF\n", np); > + > + if (of_address_to_resource(np, 0, _res)) { > + pr_warn("Unable to parse reg[0] for %pOF\n", np); > + return -ENXIO; > + } > + > + memset(_desc, 0, sizeof(ndr_desc)); > + ndr_desc.res = _res; > + ndr_desc.of_node = np; > + ndr_desc.attr_groups = region_attr_groups; > + ndr_desc.numa_node = of_node_to_nid(np); We probably need an ndr_desc.provider > + set_bit(ND_REGION_PAGEMAP, _desc.flags); > + > + /* > + * NB: libnvdimm copies the data from ndr_desc into it's own structures > + * so passing stack pointers is fine. > + */ > + if (of_get_property(np, "volatile", NULL)) > + region = nvdimm_volatile_region_create(bus, _desc); > + else > + region = nvdimm_pmem_region_create(bus, _desc); > + > + pr_warn("registered pmem region %px\n", region); > + if (!region) > + return -ENXIO; > + > + platform_set_drvdata(pdev, region); > + > + return 0; > +} > + > +static int of_nd_region_remove(struct platform_device *pdev) > +{ > + struct nd_region *r = platform_get_drvdata(pdev); > + > + nd_region_destroy(r); > + > + return 0; > +} > + > +static const struct of_device_id of_nd_region_match[] = { > + { .compatible = "nvdimm-region" }, > + { }, > +}; > + > +static struct platform_driver of_nd_region_driver = { > + .probe = of_nd_region_probe, > + .remove = of_nd_region_remove, > + .driver = { > + .name = "of_nd_region", > + .owner = THIS_MODULE, > + .of_match_table = of_nd_region_match, > + }, > +}; > + > +/* bus wrangling */ > + > +static int __init of_nvdimm_init(void) > +{ > + /* register */ > + bus_desc.attr_groups = bus_attr_groups; > + bus_desc.provider_name = "of_nvdimm"; > + bus_desc.module = THIS_MODULE; > + > + /* does parent == NULL work? */ > + bus = nvdimm_bus_register(NULL, _desc); > + if (!bus) > + return -ENODEV; > + > + platform_driver_register(_nd_region_driver); > + > + return 0; > +} > +module_init(of_nvdimm_init); > + > +static void __init of_nvdimm_exit(void) > +{ > + nvdimm_bus_unregister(bus); > + platform_driver_unregister(_nd_region_driver); > +} > +module_exit(of_nvdimm_exit); > + > +MODULE_DEVICE_TABLE(of, of_nd_region_match); > +MODULE_LICENSE("GPL"); > +MODULE_AUTHOR("IBM Corporation"); Balbir Singh. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 2/6] libnvdimm: Add nd_region_destroy()
On Fri, 23 Mar 2018 19:12:05 +1100 Oliver O'Halloran <ooh...@gmail.com> wrote: > Currently there's no way to remove a region from and nvdimm_bus without > tearing down the whole bus. This patch adds an API for removing a single > region from the bus so that we can implement a sensible unbind operation > for the of_nd_region platform driver. > > Signed-off-by: Oliver O'Halloran <ooh...@gmail.com> > --- > drivers/nvdimm/region_devs.c | 6 ++ > include/linux/libnvdimm.h| 1 + > 2 files changed, 7 insertions(+) > > diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c > index 2f1d5771100e..76f46fd1fae0 100644 > --- a/drivers/nvdimm/region_devs.c > +++ b/drivers/nvdimm/region_devs.c > @@ -1039,6 +1039,12 @@ struct nd_region *nvdimm_blk_region_create(struct > nvdimm_bus *nvdimm_bus, > } > EXPORT_SYMBOL_GPL(nvdimm_blk_region_create); > > +void nd_region_destroy(struct nd_region *region) > +{ > + nd_device_unregister(>dev, ND_SYNC); child_unregister seems to do the same thing, but is expected to be used as a callback from device_for_each_child() I'd suggest we merge the two and rename child_unregister and nd_region_unregister Balbir Singh. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 1/6] libnvdimm: Add of_node to region and bus descriptors
On Fri, 23 Mar 2018 19:12:04 +1100 Oliver O'Halloran <ooh...@gmail.com> wrote: > We want to be able to cross reference the region and bus devices > with the device tree node that they were spawned from. libNVDIMM > handles creating the actual devices for these internally, so we > need to pass in a pointer to the relevant node in the descriptor. > > Signed-off-by: Oliver O'Halloran <ooh...@gmail.com> > Acked-by: Dan Williams <dan.j.willi...@intel.com> > --- Acked-by: Balbir Singh <bsinghar...@gmail.com> ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote: > From: Rashmica Gupta <rashmic...@gmail.com> > > Adds support for removing bolted (i.e kernel linear mapping) mappings on > powernv. This is needed to support memory hot unplug operations which > are required for the teardown of DAX/PMEM devices. > > Cc: Rashmica Gupta <rashmic...@gmail.com> > Cc: Anton Blanchard <an...@samba.org> > Signed-off-by: Oliver O'Halloran <ooh...@gmail.com> > --- > Could the original author of this add their S-o-b? I pulled it out of > Rashmica's memtrace patch, but I remember someone saying Anton wrote > it originally. > --- > arch/powerpc/mm/hash_native_64.c | 31 +++ > 1 file changed, 31 insertions(+) > > diff --git a/arch/powerpc/mm/hash_native_64.c > b/arch/powerpc/mm/hash_native_64.c > index 65bb8f33b399..9ba91d4905a4 100644 > --- a/arch/powerpc/mm/hash_native_64.c > +++ b/arch/powerpc/mm/hash_native_64.c > @@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned long > newpp, unsigned long ea, > tlbie(vpn, psize, psize, ssize, 0); > } > > +/* > + * Remove a bolted kernel entry. Memory hotplug uses this. > + * > + * No need to lock here because we should be the only user. As long as this is after the necessary isolation and is called from arch_remove_memory(), I think we should be fine > + */ > +static int native_hpte_removebolted(unsigned long ea, int psize, int ssize) > +{ > + unsigned long vpn; > + unsigned long vsid; > + long slot; > + struct hash_pte *hptep; > + > + vsid = get_kernel_vsid(ea, ssize); > + vpn = hpt_vpn(ea, vsid, ssize); > + > + slot = native_hpte_find(vpn, psize, ssize); > + if (slot == -1) > + return -ENOENT; If slot == -1, it means someone else removed the HPTE entry? Are we racing? I suspect we should never hit this situation during hotunplug, specifically since this is bolted. > + > + hptep = htab_address + slot; > + > + /* Invalidate the hpte */ > + hptep->v = 0; Under DEBUG or otherwise, I would add more checks like 1. was hpte_v & HPTE_V_VALID and BOLTED set? If not, we've already invalidated that hpte and we can skip the tlbie. Since this was bolted you might be right that it is always valid and bolted > + > + /* Invalidate the TLB */ > + tlbie(vpn, psize, psize, ssize, 0); The API also does not clear linear_map_hash_slots[] under DEBUG_PAGEALLOC > + return 0; > +} > + > + Balbir Singh. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote: > Flip the switch. Running around and screaming "IT'S ALIVE" is optional, > but recommended. > > Signed-off-by: Oliver O'Halloran <ooh...@gmail.com> > --- > mm/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/Kconfig b/mm/Kconfig > index 43d000e44424..d696af58f97f 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -724,7 +724,7 @@ config ZONE_DEVICE > depends on MEMORY_HOTPLUG > depends on MEMORY_HOTREMOVE > depends on SPARSEMEM_VMEMMAP > - depends on X86_64 #arch_add_memory() comprehends device memory > + depends on (X86_64 || PPC_BOOK3S_64) #arch_add_memory() comprehends > device memory Reviewed-by: Balbir Singh <bsinghar...@gmail.com> ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH 5/9] powerpc/vmemmap: Add altmap support
On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote: > Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An > altmap is a driver provided region that is used to provide the backing > storage for the struct pages of ZONE_DEVICE memory. In situations where > large amount of ZONE_DEVICE memory is being added to the system the > altmap reduces pressure on main system memory by allowing the mm/ > metadata to be stored on the device itself rather in main memory. > > Signed-off-by: Oliver O'Halloran <ooh...@gmail.com> > --- Reviewed-by: Balbir Singh <bsinghar...@gmail.com> ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm