[PATCH] nvdimm: Explicitly include correct DT includes

2023-07-14 Thread Rob Herring
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Signed-off-by: Rob Herring 
---
 drivers/nvdimm/of_pmem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
index 10dbdcdfb9ce..1b9f5b8a6167 100644
--- a/drivers/nvdimm/of_pmem.c
+++ b/drivers/nvdimm/of_pmem.c
@@ -2,11 +2,11 @@
 
 #define pr_fmt(fmt) "of_pmem: " fmt
 
-#include 
-#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct of_pmem_private {
-- 
2.40.1




Re: [PATCH v7 0/9] Remove .notify callback in acpi_device_ops

2023-07-14 Thread Rafael J. Wysocki
On Mon, Jul 3, 2023 at 10:03 AM Michal Wilczynski
 wrote:
>
> *** IMPORTANT ***
> This is part 1 - only drivers in acpi directory to ease up review
> process. Rest of the drivers will be handled in separate patchsets.
>
> Currently drivers support ACPI event handlers by defining .notify
> callback in acpi_device_ops. This solution is suboptimal as event
> handler installer installs intermediary function acpi_notify_device as a
> handler in every driver. Also this approach requires extra variable
> 'flags' for specifying event types that the driver want to subscribe to.
> Additionally this is a pre-work required to align acpi_driver with
> platform_driver and eventually replace acpi_driver with platform_driver.
>
> Remove .notify callback from the acpi_device_ops. Replace it with each
> driver installing and removing it's event handlers.
>
> This is part 1 - only drivers in acpi directory to ease up review
> process.
>
> v7:
>  - fix warning by declaring acpi_nfit_remove_notify_handler() as static
>
> v6:
>  - fixed unnecessary RCT in all drivers, as it's not a purpose of
>this patch series
>  - changed error label names to simplify them
>  - dropped commit that remove a comma
>  - squashed commit moving code for nfit
>  - improved nfit driver to use devm instead of .remove()
>  - re-based as Rafael changes [1] are merged already
>
> v5:
>  - rebased on top of Rafael changes [1], they're not merged yet
>  - fixed rollback in multiple drivers so they don't leak resources on
>failure
>  - made this part 1, meaning only drivers in acpi directory, rest of
>the drivers will be handled in separate patchsets to ease up review
>
> v4:
>  - added one commit for previously missed driver sony-laptop,
>refactored return statements, added NULL check for event installer
> v3:
>  - lkp still reported some failures for eeepc, fujitsu and
>toshiba_bluetooth, fix those
> v2:
>  - fix compilation errors for drivers
>
> [1]: https://lore.kernel.org/linux-acpi/1847933.atdPhlSkOF@kreacher/
>
> Michal Wilczynski (9):
>   acpi/bus: Introduce wrappers for ACPICA event handler install/remove
>   acpi/bus: Set driver_data to NULL every time .add() fails
>   acpi/ac: Move handler installing logic to driver
>   acpi/video: Move handler installing logic to driver
>   acpi/battery: Move handler installing logic to driver
>   acpi/hed: Move handler installing logic to driver
>   acpi/nfit: Move handler installing logic to driver
>   acpi/nfit: Remove unnecessary .remove callback
>   acpi/thermal: Move handler installing logic to driver

Dan hasn't spoken up yet, but I went ahead and queued up the patches
(with some modifications) for 6.6.

I've edited the subjects and rewritten the changelogs and I've
adjusted some white space around function calls (nothing major).

Thanks!



Re: [PATCH 0/3] mm: use memmap_on_memory semantics for dax/kmem

2023-07-14 Thread Jeff Moyer
David Hildenbrand  writes:

> On 13.07.23 21:12, Jeff Moyer wrote:
>> David Hildenbrand  writes:
>>
>>> On 16.06.23 00:00, Vishal Verma wrote:
 The dax/kmem driver can potentially hot-add large amounts of memory
 originating from CXL memory expanders, or NVDIMMs, or other 'device
 memories'. There is a chance there isn't enough regular system memory
 available to fit ythe memmap for this new memory. It's therefore
 desirable, if all other conditions are met, for the kmem managed memory
 to place its memmap on the newly added memory itself.

 Arrange for this by first allowing for a module parameter override for
 the mhp_supports_memmap_on_memory() test using a flag, adjusting the
 only other caller of this interface in dirvers/acpi/acpi_memoryhotplug.c,
 exporting the symbol so it can be called by kmem.c, and finally changing
 the kmem driver to add_memory() in chunks of memory_block_size_bytes().
>>>
>>> 1) Why is the override a requirement here? Just let the admin
>>> configure it then then add conditional support for kmem.
>>>
>>> 2) I recall that there are cases where we don't want the memmap to
>>> land on slow memory (which online_movable would achieve). Just imagine
>>> the slow PMEM case. So this might need another configuration knob on
>>> the kmem side.
>>
>>  From my memory, the case where you don't want the memmap to land on
>> *persistent memory* is when the device is small (such as NVDIMM-N), and
>> you want to reserve as much space as possible for the application data.
>> This has nothing to do with the speed of access.
>
> Now that you mention it, I also do remember the origin of the altmap --
> to achieve exactly that: place the memmap on the device.
>
> commit 4b94ffdc4163bae1ec73b6e977ffb7a7da3d06d3
> Author: Dan Williams 
> Date:   Fri Jan 15 16:56:22 2016 -0800
>
> x86, mm: introduce vmem_altmap to augment vmemmap_populate()
>   In support of providing struct page for large persistent memory
> capacities, use struct vmem_altmap to change the default policy for
> allocating memory for the memmap array.  The default vmemmap_populate()
> allocates page table storage area from the page allocator.  Given
> persistent memory capacities relative to DRAM it may not be feasible to
> store the memmap in 'System Memory'.  Instead vmem_altmap represents
> pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
> requests.
>
> In PFN_MODE_PMEM (and only then), we use the altmap (don't see a way to
> configure it).

Configuration is done at pmem namespace creation time.  The metadata for
the namespace indicates where the memmap resides.  See the
ndctl-create-namespace man page:

   -M, --map=
   A pmem namespace in "fsdax" or "devdax" mode requires allocation of
   per-page metadata. The allocation can be drawn from either:

   ·   "mem": typical system memory

   ·   "dev": persistent memory reserved from the namespace

   Given relative capacities of "Persistent Memory" to "System
   RAM" the allocation defaults to reserving space out of the
   namespace directly ("--map=dev"). The overhead is 64-bytes 
per
   4K (16GB per 1TB) on x86.

> BUT that case is completely different from the "System RAM" mode. The memmap
> of an NVDIMM in pmem mode is barely used by core-mm (i.e., not the buddy).

Right.  (btw, I don't think system ram mode existed back then.)

> In comparison, if the buddy and everybody else works on the memmap in
> "System RAM", it's much more significant if that resides on slow memory.

Agreed.

> Looking at
>
> commit 9b6e63cbf85b89b2dbffa4955dbf2df8250e5375
> Author: Michal Hocko 
> Date:   Tue Oct 3 16:16:19 2017 -0700
>
> mm, page_alloc: add scheduling point to memmap_init_zone
>   memmap_init_zone gets a pfn range to initialize and it can be
> really
> large resulting in a soft lockup on non-preemptible kernels
> NMI watchdog: BUG: soft lockup - CPU#31 stuck for 23s!
> [kworker/u642:5:1720]
>   [...]
>   task: 88ecd7e902c0 ti: 88eca4e5 task.ti: 88eca4e5
>   RIP: move_pfn_range_to_zone+0x185/0x1d0
>   [...]
>   Call Trace:
> devm_memremap_pages+0x2c7/0x430
> pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
> nvdimm_bus_probe+0x64/0x110 [libnvdimm]
>
>
> It's hard to tell if that was only required due to the memmap for these 
> devices
> being that large, or also partially because the access to the memmap is slower
> that it makes a real difference.

I believe the main driver was the size.  At the time, Intel was
advertising 3TiB/socket for pmem.  I can't remember the exact DRAM
configuration sizes from the time.

> I recall that we're also often using ZONE_MOVABLE on such slow memory
> to not end up placing other kernel data structures on there: especially,
> user space page tables as I've been told.

Part of the issu

Re: [PATCH 0/3] mm: use memmap_on_memory semantics for dax/kmem

2023-07-14 Thread David Hildenbrand

On 13.07.23 21:12, Jeff Moyer wrote:

David Hildenbrand  writes:


On 16.06.23 00:00, Vishal Verma wrote:

The dax/kmem driver can potentially hot-add large amounts of memory
originating from CXL memory expanders, or NVDIMMs, or other 'device
memories'. There is a chance there isn't enough regular system memory
available to fit ythe memmap for this new memory. It's therefore
desirable, if all other conditions are met, for the kmem managed memory
to place its memmap on the newly added memory itself.

Arrange for this by first allowing for a module parameter override for
the mhp_supports_memmap_on_memory() test using a flag, adjusting the
only other caller of this interface in dirvers/acpi/acpi_memoryhotplug.c,
exporting the symbol so it can be called by kmem.c, and finally changing
the kmem driver to add_memory() in chunks of memory_block_size_bytes().


1) Why is the override a requirement here? Just let the admin
configure it then then add conditional support for kmem.

2) I recall that there are cases where we don't want the memmap to
land on slow memory (which online_movable would achieve). Just imagine
the slow PMEM case. So this might need another configuration knob on
the kmem side.


 From my memory, the case where you don't want the memmap to land on
*persistent memory* is when the device is small (such as NVDIMM-N), and
you want to reserve as much space as possible for the application data.
This has nothing to do with the speed of access.


Now that you mention it, I also do remember the origin of the altmap --
to achieve exactly that: place the memmap on the device.

commit 4b94ffdc4163bae1ec73b6e977ffb7a7da3d06d3
Author: Dan Williams 
Date:   Fri Jan 15 16:56:22 2016 -0800

x86, mm: introduce vmem_altmap to augment vmemmap_populate()

In support of providing struct page for large persistent memory

capacities, use struct vmem_altmap to change the default policy for
allocating memory for the memmap array.  The default vmemmap_populate()
allocates page table storage area from the page allocator.  Given
persistent memory capacities relative to DRAM it may not be feasible to
store the memmap in 'System Memory'.  Instead vmem_altmap represents
pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
requests.

In PFN_MODE_PMEM (and only then), we use the altmap (don't see a way to
configure it).


BUT that case is completely different from the "System RAM" mode. The memmap
of an NVDIMM in pmem mode is barely used by core-mm (i.e., not the buddy).

In comparison, if the buddy and everybody else works on the memmap in
"System RAM", it's much more significant if that resides on slow memory.


Looking at

commit 9b6e63cbf85b89b2dbffa4955dbf2df8250e5375
Author: Michal Hocko 
Date:   Tue Oct 3 16:16:19 2017 -0700

mm, page_alloc: add scheduling point to memmap_init_zone

memmap_init_zone gets a pfn range to initialize and it can be really

large resulting in a soft lockup on non-preemptible kernels

  NMI watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [kworker/u642:5:1720]

  [...]
  task: 88ecd7e902c0 ti: 88eca4e5 task.ti: 88eca4e5
  RIP: move_pfn_range_to_zone+0x185/0x1d0
  [...]
  Call Trace:
devm_memremap_pages+0x2c7/0x430
pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
nvdimm_bus_probe+0x64/0x110 [libnvdimm]


It's hard to tell if that was only required due to the memmap for these devices
being that large, or also partially because the access to the memmap is slower
that it makes a real difference.


I recall that we're also often using ZONE_MOVABLE on such slow memory
to not end up placing other kernel data structures on there: especially,
user space page tables as I've been told.


@Dan, any insight on the performance aspects when placing the memmap on
(slow) memory and having that memory be consumed by the buddy where we 
frequently
operate on the memmap?

--
Cheers,

David / dhildenb