Re: [PATCH v7 7/7] mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter

2023-08-02 Thread Verma, Vishal L
On Wed, 2023-08-02 at 21:27 +0530, Aneesh Kumar K V wrote:
> On 8/2/23 9:24 PM, David Hildenbrand wrote:
> > On 02.08.23 17:50, Michal Hocko wrote:
> > > On Wed 02-08-23 10:15:04, Aneesh Kumar K V wrote:
> > > > On 8/1/23 4:20 PM, Michal Hocko wrote:
> > > > > On Tue 01-08-23 14:58:29, Aneesh Kumar K V wrote:
> > > > > > On 8/1/23 2:28 PM, Michal Hocko wrote:
> > > > > > > On Tue 01-08-23 10:11:16, Aneesh Kumar K.V wrote:
> > > > > > > > Allow updating memmap_on_memory mode after the kernel boot. 
> > > > > > > > Memory
> > > > > > > > hotplug done after the mode update will use the new 
> > > > > > > > mmemap_on_memory
> > > > > > > > value.
> > > > > > > 
> > > > > > > Well, this is a user space kABI extension and as such you should 
> > > > > > > spend
> > > > > > > more words about the usecase. Why we could live with this static 
> > > > > > > and now
> > > > > > > need dynamic?
> > > > > > > 
> > > > > > 
> > > > > > This enables easy testing of memmap_on_memory feature without a 
> > > > > > kernel reboot.
> > > > > 
> > > > > Testing alone is rather weak argument to be honest.
> > > > > 
> > > > > > I also expect people wanting to use that when they find dax kmem 
> > > > > > memory online
> > > > > > failing because of struct page allocation failures[1]. User could 
> > > > > > reboot back with
> > > > > > memmap_on_memory=y kernel parameter. But being able to enable it 
> > > > > > via sysfs makes
> > > > > > the feature much more useful.
> > > > > 
> > > > > Sure it can be useful but that holds for any feature, right. The main
> > > > > question is whether this is worth maintaing. The current 
> > > > > implementation
> > > > > seems rather trivial which is an argument to have it but are there any
> > > > > risks long term? Have you evaluated a potential long term maintenance
> > > > > cost? There is no easy way to go back and disable it later on without
> > > > > breaking some userspace.
> > > > > 
> > > > > All that should be in the changelog!
> > > > 
> > > > I updated it as below.
> > > > 
> > > > mm/memory_hotplug: Enable runtime update of memmap_on_memory parameter
> > > > 
> > > > Allow updating memmap_on_memory mode after the kernel boot. Memory
> > > > hotplug done after the mode update will use the new mmemap_on_memory
> > > > value.
> > > > 
> > > > It is now possible to test the memmap_on_memory feature easily without
> > > > the need for a kernel reboot. Additionally, for those encountering
> > > > struct page allocation failures while using dax kmem memory online, this
> > > > feature may prove useful. Instead of rebooting with the
> > > > memmap_on_memory=y kernel parameter, users can now enable it via sysfs,
> > > > which greatly enhances its usefulness.
> > > 
> > > 
> > > I do not really see a solid argument why rebooting is really a problem
> > > TBH. Also is the global policy knob really the right fit for existing
> > > hotplug usecases? In other words, if we really want to make
> > > memmap_on_memory more flexible would it make more sense to have it per
> > > memory block property instead (the global knob being the default if
> > > admin doesn't specify it differently).
> > 
> > Per memory block isn't possible, due to the creation order. Also, I think 
> > it's not the right approach.
> > 
> > I thought about driver toggles. At least for dax/kmem people are looking 
> > into that:
> > 
> > https://lkml.kernel.org/r/20230801-vv-kmem_memmap-v3-2-406e9aaf5...@intel.com
> > 
> > Where that can also be toggled per device.
> > 
> 
> That still is dependent on the global memmap_on_memory enabled right? We 
> could make the dax facility independent of the
> global feature toggle? 

Correct - the latest version of those David linked does depend on the
global memmap_on_memory param. Since kmem's behavior for dax devices is
set up to be turned on / off dynamically via sysfs, it would be nice to
have a similar facility for this flag.

I did try the making dax independent of memmap_on_memory approach in my
first posting:

https://lore.kernel.org/nvdimm/20230613-vv-kmem_memmap-v1-1-f6de9c6af...@intel.com/

We can certainly revisit that if it looks more suitable.


Re: [PATCH v7 6/7] mm/memory_hotplug: Embed vmem_altmap details in memory block

2023-08-01 Thread Verma, Vishal L
On Tue, 2023-08-01 at 10:11 +0530, Aneesh Kumar K.V wrote:
> With memmap on memory, some architecture needs more details w.r.t altmap
> such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of
> computing them again when we remove a memory block, embed vmem_altmap
> details in struct memory_block if we are using memmap on memory block
> feature.
> 
> Acked-by: David Hildenbrand 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/base/memory.c  | 27 +
>  include/linux/memory.h |  8 ++
>  mm/memory_hotplug.c    | 55 ++
>  3 files changed, 53 insertions(+), 37 deletions(-)
> 


> @@ -2136,10 +2148,10 @@ EXPORT_SYMBOL(try_offline_node);
>  
>  static int __ref try_remove_memory(u64 start, u64 size)
>  {
> -   struct vmem_altmap mhp_altmap = {};
> -   struct vmem_altmap *altmap = NULL;
> -   unsigned long nr_vmemmap_pages;
> +   int ret;

Minor nit - there is already an 'int rc' below - just use that, or
rename it to 'ret' if that's better for consistency.

> +   struct memory_block *mem;
> int rc = 0, nid = NUMA_NO_NODE;
> +   struct vmem_altmap *altmap = NULL;
>  
> BUG_ON(check_hotplug_memory_range(start, size));
>  
> @@ -2161,25 +2173,20 @@ static int __ref try_remove_memory(u64 start, u64 
> size)
>  * the same granularity it was added - a single memory block.
>  */
> if (mhp_memmap_on_memory()) {
> -   nr_vmemmap_pages = walk_memory_blocks(start, size, NULL,
> - 
> get_nr_vmemmap_pages_cb);
> -   if (nr_vmemmap_pages) {
> +   ret = walk_memory_blocks(start, size, , 
> test_has_altmap_cb);
> +   if (ret) {
> if (size != memory_block_size_bytes()) {
> pr_warn("Refuse to remove %#llx - %#llx,"
> "wrong granularity\n",
> start, start + size);
> return -EINVAL;
> }
> -
> +   altmap = mem->altmap;
> /*
> -    * Let remove_pmd_table->free_hugepage_table do the
> -    * right thing if we used vmem_altmap when hot-adding
> -    * the range.
> +    * Mark altmap NULL so that we can add a debug
> +    * check on memblock free.
>  */
> -   mhp_altmap.base_pfn = PHYS_PFN(start);
> -   mhp_altmap.free = nr_vmemmap_pages;
> -   mhp_altmap.alloc = nr_vmemmap_pages;
> -   altmap = _altmap;
> +   mem->altmap = NULL;
> }
> }
>  


Re: [PATCH] libnvdimm: Add a NULL entry to 'nvdimm_firmware_attributes'

2020-08-14 Thread Verma, Vishal L
On Fri, 2020-08-14 at 10:10 -0700, Ira Weiny wrote:
> On Fri, Aug 14, 2020 at 08:35:09PM +0530, Vaibhav Jain wrote:
> > We recently discovered a kernel oops with 'papr_scm' module while
> > booting ppc64 phyp guest with following back-trace:
> > 
> > BUG: Kernel NULL pointer dereference on write at 0x0188
> > Faulting instruction address: 0xc05d7084
> > Oops: Kernel access of bad area, sig: 11 [#1]
> > 
> > Call Trace:
> >  internal_create_group+0x128/0x4c0 (unreliable)
> >  internal_create_groups.part.4+0x70/0x130
> >  device_add+0x458/0x9c0
> >  nd_async_device_register+0x28/0xa0 [libnvdimm]
> >  async_run_entry_fn+0x78/0x1f0
> >  process_one_work+0x2c0/0x5b0
> >  worker_thread+0x88/0x650
> >  kthread+0x1a8/0x1b0
> >  ret_from_kernel_thread+0x5c/0x6c
> > 
> > A bisect lead to the 'commit 48001ea50d17f ("PM, libnvdimm: Add runtime
> > firmware activation support")' and on investigation discovered that
> > the newly introduced 'struct attribute *nvdimm_firmware_attributes[]'
> > is missing a terminating NULL entry in the array. This causes a loop
> > in sysfs's 'create_files()' to read garbage beyond bounds of
> > 'nvdimm_firmware_attributes' and trigger the oops.
> > 
> > Fixes: 48001ea50d17f ("PM, libnvdimm: Add runtime firmware activation 
> > support")
> > Reported-by: Sandipan Das 
> > Signed-off-by: Vaibhav Jain 
> 
> Reviewed-by: Ira Weiny 

Thanks Vaibhav and Ira. I see this was also reported and fixed by Zqiang
a couple days ago. I'll pick that, merge these trailers and add it to
the fixes queue.

> 
> > ---
> >  drivers/nvdimm/dimm_devs.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
> > index 61374def51555..b59032e0859b7 100644
> > --- a/drivers/nvdimm/dimm_devs.c
> > +++ b/drivers/nvdimm/dimm_devs.c
> > @@ -529,6 +529,7 @@ static DEVICE_ATTR_ADMIN_RW(activate);
> >  static struct attribute *nvdimm_firmware_attributes[] = {
> > _attr_activate.attr,
> > _attr_result.attr,
> > +   NULL,
> >  };
> >  
> >  static umode_t nvdimm_firmware_visible(struct kobject *kobj, struct 
> > attribute *a, int n)
> > -- 
> > 2.26.2
> > 
> ___
> Linux-nvdimm mailing list -- linux-nvd...@lists.01.org
> To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] mm/nvdimm: Fix endian conversion issues 

2019-06-10 Thread Verma, Vishal L
On Fri, 2019-06-07 at 12:17 +0530, Aneesh Kumar K.V wrote:
> nd_label->dpa issue was observed when trying to enable the namespace created
> with little-endian kernel on a big-endian kernel. That made me run
> `sparse` on the rest of the code and other changes are the result of that.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/nvdimm/btt.c| 8 
>  drivers/nvdimm/namespace_devs.c | 7 ---
>  2 files changed, 8 insertions(+), 7 deletions(-)

The two BTT fixes seem like they may apply to stable as well, the
problematic code was introduced in relatively recent reworks/fixes.
Perhaps -

Fixes: d9b83c756953 ("libnvdimm, btt: rework error clearing")
Fixes: 9dedc73a4658 ("libnvdimm/btt: Fix LBA masking during 'free list' 
population")

Other than that, these look good to me.
Reviewed-by: Vishal Verma 

> 
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index 4671776f5623..4ac0f5dde467 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -400,9 +400,9 @@ static int btt_flog_write(struct arena_info *arena, u32 
> lane, u32 sub,
>   arena->freelist[lane].sub = 1 - arena->freelist[lane].sub;
>   if (++(arena->freelist[lane].seq) == 4)
>   arena->freelist[lane].seq = 1;
> - if (ent_e_flag(ent->old_map))
> + if (ent_e_flag(le32_to_cpu(ent->old_map)))
>   arena->freelist[lane].has_err = 1;
> - arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map));
> + arena->freelist[lane].block = ent_lba(le32_to_cpu(ent->old_map));
>  
>   return ret;
>  }
> @@ -568,8 +568,8 @@ static int btt_freelist_init(struct arena_info *arena)
>* FIXME: if error clearing fails during init, we want to make
>* the BTT read-only
>*/
> - if (ent_e_flag(log_new.old_map) &&
> - !ent_normal(log_new.old_map)) {
> + if (ent_e_flag(le32_to_cpu(log_new.old_map)) &&
> + !ent_normal(le32_to_cpu(log_new.old_map))) {
>   arena->freelist[i].has_err = 1;
>   ret = arena_clear_freelist_error(arena, i);
>   if (ret)
> diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
> index c4c5a191b1d6..500c37db496a 100644
> --- a/drivers/nvdimm/namespace_devs.c
> +++ b/drivers/nvdimm/namespace_devs.c
> @@ -1995,7 +1995,7 @@ static struct device *create_namespace_pmem(struct 
> nd_region *nd_region,
>   nd_mapping = _region->mapping[i];
>   label_ent = list_first_entry_or_null(_mapping->labels,
>   typeof(*label_ent), list);
> - label0 = label_ent ? label_ent->label : 0;
> + label0 = label_ent ? label_ent->label : NULL;
>  
>   if (!label0) {
>   WARN_ON(1);
> @@ -2330,8 +2330,9 @@ static struct device **scan_labels(struct nd_region 
> *nd_region)
>   continue;
>  
>   /* skip labels that describe extents outside of the region */
> - if (nd_label->dpa < nd_mapping->start || nd_label->dpa > 
> map_end)
> - continue;
> + if (__le64_to_cpu(nd_label->dpa) < nd_mapping->start ||
> + __le64_to_cpu(nd_label->dpa) > map_end)
> + continue;
>  
>   i = add_namespace_resource(nd_region, nd_label, devs, count);
>   if (i < 0)