Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
On Wed, 2023-08-02 at 21:27 +0530, Aneesh Kumar K V wrote: > On 8/2/23 9:24 PM, David Hildenbrand wrote: > > On 02.08.23 17:50, Michal Hocko wrote: > > > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote: > > > > On 8/1/23 4:20 PM, Michal Hocko wrote: > > > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote: > > > > > > On 8/1/23 2:28 PM, Michal Hocko wrote: > > > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote: > > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. > > > > > > > > Memory > > > > > > > > hotplug done after the mode update will use the new > > > > > > > > mmemap_on_memory > > > > > > > > value. > > > > > > > > > > > > > > Well, this is a user space kABI extension and as such you should > > > > > > > spend > > > > > > > more words about the usecase. Why we could live with this static > > > > > > > and now > > > > > > > need dynamic? > > > > > > > > > > > > > > > > > > > This enables easy testing of memmap_on_memory feature without a > > > > > > kernel reboot. > > > > > > > > > > Testing alone is rather weak argument to be honest. > > > > > > > > > > > I also expect people wanting to use that when they find dax kmem > > > > > > memory online > > > > > > failing because of struct page allocation failures[1]. User could > > > > > > reboot back with > > > > > > memmap_on_memory=y kernel parameter. But being able to enable it > > > > > > via sysfs makes > > > > > > the feature much more useful. > > > > > > > > > > Sure it can be useful but that holds for any feature, right. The main > > > > > question is whether this is worth maintaing. The current > > > > > implementation > > > > > seems rather trivial which is an argument to have it but are there any > > > > > risks long term? Have you evaluated a potential long term maintenance > > > > > cost? There is no easy way to go back and disable it later on without > > > > > breaking some userspace. > > > > > > > > > > All that should be in the changelog! > > > > > > > > I updated it as below. > > > > > > > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter > > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. Memory > > > > hotplug done after the mode update will use the new mmemap_on_memory > > > > value. > > > > > > > > It is now possible to test the memmap_on_memory feature easily without > > > > the need for a kernel reboot. Additionally, for those encountering > > > > struct page allocation failures while using dax kmem memory online, this > > > > feature may prove useful. Instead of rebooting with the > > > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs, > > > > which greatly enhances its usefulness. > > > > > > > > > I do not really see a solid argument why rebooting is really a problem > > > TBH. Also is the global policy knob really the right fit for existing > > > hotplug usecases? In other words, if we really want to make > > > memmap_on_memory more flexible would it make more sense to have it per > > > memory block property instead (the global knob being the default if > > > admin doesn't specify it differently). > > > > Per memory block isn't possible, due to the creation order. Also, I think > > it's not the right approach. > > > > I thought about driver toggles. At least for dax/kmem people are looking > > into that: > > > > https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5...@intel.com > > > > Where that can also be toggled per device. > > > > That still is dependent on the global memmap_on_memory enabled right? We > could make the dax facility independent of the > global feature toggle? Correct - the latest version of those David linked does depend on the global memmap_on_memory param. Since kmem's behavior for dax devices is set up to be turned on / off dynamically via sysfs, it would be nice to have a similar facility for this flag. I did try the making dax independent of memmap_on_memory approach in my first posting: https://lore.kernel.org/nvdimm/20230613-vv-kmem_memmap-v1-1-f6de9c6af...@intel.com/ We can certainly revisit that if it looks more suitable.
Re: [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block
On Tue, 2023-08-01 at 10:11 +0530, Aneesh Kumar K.V wrote: > With memmap on memory, some architecture needs more details w.r.t altmap > such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of > computing them again when we remove a memory block, embed vmem_altmap > details in struct memory_block if we are using memmap on memory block > feature. > > Acked-by: David Hildenbrand > Signed-off-by: Aneesh Kumar K.V > --- > drivers/base/memory.c | 27 + > include/linux/memory.h | 8 ++ > mm/memory_hotplug.c | 55 ++ > 3 files changed, 53 insertions(+), 37 deletions(-) > > @@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node); > > static int __ref try_remove_memory(u64 start, u64 size) > { > - struct vmem_altmap mhp_altmap = {}; > - struct vmem_altmap *altmap = NULL; > - unsigned long nr_vmemmap_pages; > + int ret; Minor nit - there is already an 'int rc' below - just use that, or rename it to 'ret' if that's better for consistency. > + struct memory_block *mem; > int rc = 0, nid = NUMA_NO_NODE; > + struct vmem_altmap *altmap = NULL; > > BUG_ON(check_hotplug_memory_range(start, size)); > > @@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 > size) > * the same granularity it was added - a single memory block. > */ > if (mhp_memmap_on_memory()) { > - nr_vmemmap_pages = walk_memory_blocks(start, size, NULL, > - > get_nr_vmemmap_pages_cb); > - if (nr_vmemmap_pages) { > + ret = walk_memory_blocks(start, size, , > test_has_altmap_cb); > + if (ret) { > if (size != memory_block_size_bytes()) { > pr_warn("Refuse to remove %#llx - %#llx," > "wrong granularity\n", > start, start + size); > return -EINVAL; > } > - > + altmap = mem->altmap; > /* > - * Let remove_pmd_table->free_hugepage_table do the > - * right thing if we used vmem_altmap when hot-adding > - * the range. > + * Mark altmap NULL so that we can add a debug > + * check on memblock free. > */ > - mhp_altmap.base_pfn = PHYS_PFN(start); > - mhp_altmap.free = nr_vmemmap_pages; > - mhp_altmap.alloc = nr_vmemmap_pages; > - altmap = _altmap; > + mem->altmap = NULL; > } > } >
Re: [PATCH] libnvdimm: Add a NULL entry to 'nvdimm_firmware_attributes'
On Fri, 2020-08-14 at 10:10 -0700, Ira Weiny wrote: > On Fri, Aug 14, 2020 at 08:35:09PM +0530, Vaibhav Jain wrote: > > We recently discovered a kernel oops with 'papr_scm' module while > > booting ppc64 phyp guest with following back-trace: > > > > BUG: Kernel NULL pointer dereference on write at 0x0188 > > Faulting instruction address: 0xc05d7084 > > Oops: Kernel access of bad area, sig: 11 [#1] > > > > Call Trace: > > internal_create_group+0x128/0x4c0 (unreliable) > > internal_create_groups.part.4+0x70/0x130 > > device_add+0x458/0x9c0 > > nd_async_device_register+0x28/0xa0 [libnvdimm] > > async_run_entry_fn+0x78/0x1f0 > > process_one_work+0x2c0/0x5b0 > > worker_thread+0x88/0x650 > > kthread+0x1a8/0x1b0 > > ret_from_kernel_thread+0x5c/0x6c > > > > A bisect lead to the 'commit 48001ea50d17f ("PM, libnvdimm: Add runtime > > firmware activation support")' and on investigation discovered that > > the newly introduced 'struct attribute *nvdimm_firmware_attributes[]' > > is missing a terminating NULL entry in the array. This causes a loop > > in sysfs's 'create_files()' to read garbage beyond bounds of > > 'nvdimm_firmware_attributes' and trigger the oops. > > > > Fixes: 48001ea50d17f ("PM, libnvdimm: Add runtime firmware activation > > support") > > Reported-by: Sandipan Das > > Signed-off-by: Vaibhav Jain > > Reviewed-by: Ira Weiny Thanks Vaibhav and Ira. I see this was also reported and fixed by Zqiang a couple days ago. I'll pick that, merge these trailers and add it to the fixes queue. > > > --- > > drivers/nvdimm/dimm_devs.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c > > index 61374def51555..b59032e0859b7 100644 > > --- a/drivers/nvdimm/dimm_devs.c > > +++ b/drivers/nvdimm/dimm_devs.c > > @@ -529,6 +529,7 @@ static DEVICE_ATTR_ADMIN_RW(activate); > > static struct attribute *nvdimm_firmware_attributes[] = { > > _attr_activate.attr, > > _attr_result.attr, > > + NULL, > > }; > > > > static umode_t nvdimm_firmware_visible(struct kobject *kobj, struct > > attribute *a, int n) > > -- > > 2.26.2 > > > ___ > Linux-nvdimm mailing list -- linux-nvd...@lists.01.org > To unsubscribe send an email to linux-nvdimm-le...@lists.01.org
Re: [PATCH] mm/nvdimm: Fix endian conversion issues
On Fri, 2019-06-07 at 12:17 +0530, Aneesh Kumar K.V wrote: > nd_label->dpa issue was observed when trying to enable the namespace created > with little-endian kernel on a big-endian kernel. That made me run > `sparse` on the rest of the code and other changes are the result of that. > > Signed-off-by: Aneesh Kumar K.V > --- > drivers/nvdimm/btt.c| 8 > drivers/nvdimm/namespace_devs.c | 7 --- > 2 files changed, 8 insertions(+), 7 deletions(-) The two BTT fixes seem like they may apply to stable as well, the problematic code was introduced in relatively recent reworks/fixes. Perhaps - Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing") Fixes: 9dedc73a4658 ("libnvdimm/btt: Fix LBA masking during 'free list' population") Other than that, these look good to me. Reviewed-by: Vishal Verma > > diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c > index 4671776f5623..4ac0f5dde467 100644 > --- a/drivers/nvdimm/btt.c > +++ b/drivers/nvdimm/btt.c > @@ -400,9 +400,9 @@ static int btt_flog_write(struct arena_info *arena, u32 > lane, u32 sub, > arena->freelist[lane].sub = 1 - arena->freelist[lane].sub; > if (++(arena->freelist[lane].seq) == 4) > arena->freelist[lane].seq = 1; > - if (ent_e_flag(ent->old_map)) > + if (ent_e_flag(le32_to_cpu(ent->old_map))) > arena->freelist[lane].has_err = 1; > - arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map)); > + arena->freelist[lane].block = ent_lba(le32_to_cpu(ent->old_map)); > > return ret; > } > @@ -568,8 +568,8 @@ static int btt_freelist_init(struct arena_info *arena) >* FIXME: if error clearing fails during init, we want to make >* the BTT read-only >*/ > - if (ent_e_flag(log_new.old_map) && > - !ent_normal(log_new.old_map)) { > + if (ent_e_flag(le32_to_cpu(log_new.old_map)) && > + !ent_normal(le32_to_cpu(log_new.old_map))) { > arena->freelist[i].has_err = 1; > ret = arena_clear_freelist_error(arena, i); > if (ret) > diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c > index c4c5a191b1d6..500c37db496a 100644 > --- a/drivers/nvdimm/namespace_devs.c > +++ b/drivers/nvdimm/namespace_devs.c > @@ -1995,7 +1995,7 @@ static struct device *create_namespace_pmem(struct > nd_region *nd_region, > nd_mapping = _region->mapping[i]; > label_ent = list_first_entry_or_null(_mapping->labels, > typeof(*label_ent), list); > - label0 = label_ent ? label_ent->label : 0; > + label0 = label_ent ? label_ent->label : NULL; > > if (!label0) { > WARN_ON(1); > @@ -2330,8 +2330,9 @@ static struct device **scan_labels(struct nd_region > *nd_region) > continue; > > /* skip labels that describe extents outside of the region */ > - if (nd_label->dpa < nd_mapping->start || nd_label->dpa > > map_end) > - continue; > + if (__le64_to_cpu(nd_label->dpa) < nd_mapping->start || > + __le64_to_cpu(nd_label->dpa) > map_end) > + continue; > > i = add_namespace_resource(nd_region, nd_label, devs, count); > if (i < 0)