On 15.10.25 10:27, Kevin Brodsky wrote:
The implementation of the lazy MMU mode is currently entirely
arch-specific; core code directly calls arch helpers:
arch_{enter,leave}_lazy_mmu_mode().

We are about to introduce support for nested lazy MMU sections.
As things stand we'd have to duplicate that logic in every arch
implementing lazy_mmu - adding to a fair amount of logic
already duplicated across lazy_mmu implementations.

This patch therefore introduces a new generic layer that calls the
existing arch_* helpers. Two pair of calls are introduced:

* lazy_mmu_mode_enable() ... lazy_mmu_mode_disable()
     This is the standard case where the mode is enabled for a given
     block of code by surrounding it with enable() and disable()
     calls.

* lazy_mmu_mode_pause() ... lazy_mmu_mode_resume()
     This is for situations where the mode is temporarily disabled
     by first calling pause() and then resume() (e.g. to prevent any
     batching from occurring in a critical section).

The documentation in <linux/pgtable.h> will be updated in a
subsequent patch.

No functional change should be introduced at this stage.
The implementation of enable()/resume() and disable()/pause() is
currently identical, but nesting support will change that.

Most of the call sites have been updated using the following
Coccinelle script:

@@
@@
{
...
- arch_enter_lazy_mmu_mode();
+ lazy_mmu_mode_enable();
...
- arch_leave_lazy_mmu_mode();
+ lazy_mmu_mode_disable();
...
}

@@
@@
{
...
- arch_leave_lazy_mmu_mode();
+ lazy_mmu_mode_pause();
...
- arch_enter_lazy_mmu_mode();
+ lazy_mmu_mode_resume();
...
}

A couple of cases are noteworthy:

* madvise_*_pte_range() call arch_leave() in multiple paths, some
   followed by an immediate exit/rescheduling and some followed by a
   conditional exit. These functions assume that they are called
   with lazy MMU disabled and we cannot simply use pause()/resume()
   to address that. This patch leaves the situation unchanged by
   calling enable()/disable() in all cases.

I'm confused, the function simply does

(a) enables lazy mmu
(b) does something on the page table
(c) disables lazy mmu
(d) does something expensive (split folio -> take sleepable locks,
    flushes tlb)
(e) go to (a)

Why would we use enable/disable instead?


* x86/Xen is currently the only case where explicit handling is
   required for lazy MMU when context-switching. This is purely an
   implementation detail and using the generic lazy_mmu_mode_*
   functions would cause trouble when nesting support is introduced,
   because the generic functions must be called from the current task.
   For that reason we still use arch_leave() and arch_enter() there.

How does this interact with patch #11?


Note: x86 calls arch_flush_lazy_mmu_mode() unconditionally in a few
places, but only defines it if PARAVIRT_XXL is selected, and we are
removing the fallback in <linux/pgtable.h>. Add a new fallback
definition to <asm/pgtable.h> to keep things building.

I can see a call in __kernel_map_pages() and arch_kmap_local_post_map()/arch_kmap_local_post_unmap().

I guess that is ... harmless/irrelevant in the context of this series?

[...]


--
Cheers

David / dhildenb


Reply via email to