Re: [RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag

2024-03-07 Thread Julia Lawall



On Wed, 6 Mar 2024, Vincent Guittot wrote:

> Hi Prateek,
>
> Adding Julia who could be interested in this patchset. Your patchset
> should trigger idle load balance instead of newly idle load balance
> now when the polling is used. This was one reason for not migrating
> task in idle CPU

My situation is roughly as follows:

The machine is an Intel 6130 with two sockets and 32 hardware threads
(subsequently referred to as cores) per socket.  The test is bt.B of the
OpenMP version of the NAS benchmark suite.  Initially there is one
thread per core.  NUMA balancing occurs, resulting in a move, and thus 31
threads on one socket and 33 on the other.

Load balancing should result in the idle core pulling one of the threads
from the other socket.  But that doesn't happen in normal load balancing,
because all 33 threads on the overloaded socket are considered to have a
preference for that socket.  Active balancing could pull a thread, but it
is not triggered because the idle core is seen as being newly idle.

The question is then why a core that has been idle for up to multiple
seconds is continually seen as newly idle.  Every 4ms, a scheduler tick
submits some work to try to load balance.  This submission process
previously broke out of the idle loop due to a need_resched, hence the
same issue as involved in this patch series.  The need_resched caused
invocation of schedule, which would then see that there was no task to
pick, making the core be considered to be newly idle.  The classification
as newly idle doesn't take into account whether any task was running prior
to the call to schedule.

The load balancing work that was submitted every 4ms is also a NOP due a
test for need_resched.

This patch series no longer makes need resched be the only way out of the
idle loop.  Without the need resched, the load balancing work that is
submitted every 4ms can actually try to do load balancing.  The core is
not newly idle, so active balancing could in principle occur.  But now
nothing happens because the work is run by ksoftirqd.  The presence of
ksoftirqd on the idle core means that the core is no longer idle.  Thus
there is no more need for load balancing.

So this patch series in itself doesn't solve the problem.  I did 500 runs
with this patch series and 500 runs with the Linux kernel that this patch
series builds on, and there is essentially no difference in the
performance.

julia


>
> On Tue, 20 Feb 2024 at 18:15, K Prateek Nayak  wrote:
> >
> > Hello everyone,
> >
> > Before jumping into the issue, let me clarify the Cc list. Everyone have
> > been cc'ed on Patch 0 through Patch 3. Respective arch maintainers,
> > reviewers, and committers returned by scripts/get_maintainer.pl have
> > been cc'ed on the respective arch side changes. Scheduler and CPU Idle
> > maintainers and reviewers have been included for the entire series. If I
> > have missed anyone, please do add them. If you would like to be dropped
> > from the cc list, wholly or partially, for the future iterations, please
> > do let me know.
> >
> > With that out of the way ...
> >
> > Problem statement
> > =
> >
> > When measuring IPI throughput using a modified version of Anton
> > Blanchard's ipistorm benchmark [1], configured to measure time taken to
> > perform a fixed number of smp_call_function_single() (with wait set to
> > 1), an increase in benchmark time was observed between v5.7 and the
> > current upstream release (v6.7-rc6 at the time of encounter).
> >
> > Bisection pointed to commit b2a02fc43a1f ("smp: Optimize
> > send_call_function_single_ipi()") as the reason behind this increase in
> > runtime.
> >
> >
> > Experiments
> > ===
> >
> > Since the commit cannot be cleanly reverted on top of the current
> > tip:sched/core, the effects of the optimizations were reverted by:
> >
> > 1. Removing the check for call_function_single_prep_ipi() in
> >send_call_function_single_ipi(). With this change
> >send_call_function_single_ipi() always calls
> >arch_send_call_function_single_ipi()
> >
> > 2. Removing the call to flush_smp_call_function_queue() in do_idle()
> >since every smp_call_function, with (1.), would unconditionally send
> >an IPI to an idle CPU in TIF_POLLING mode.
> >
> > Following is the diff of the above described changes which will be
> > henceforth referred to as the "revert":
> >
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index 31231925f1ec..735184d98c0f 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -332,11 +332,6 @@ static void do_idle(void)
> >  */
> > smp_mb__after_atomic();
> >
> > -   /*
> > -* RCU relies on this call to be done outside of an RCU read-side
> > -* critical section.
> > -*/
> > -   flush_smp_call_function_queue();
> > schedule_idle();
> >
> > if (unlikely(klp_patch_pending(current)))
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index 

Re: [PATCH v2 3/3] arch: define CONFIG_PAGE_SIZE_*KB on all architectures

2024-03-07 Thread Andreas Larsson
On 2024-03-06 15:14, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> Most architectures only support a single hardcoded page size. In order
> to ensure that each one of these sets the corresponding Kconfig symbols,
> change over the PAGE_SHIFT definition to the common one and allow
> only the hardware page size to be selected.
> 
> Acked-by: Guo Ren 
> Acked-by: Heiko Carstens 
> Acked-by: Stafford Horne 
> Acked-by: Johannes Berg 
> Signed-off-by: Arnd Bergmann 
> ---
> No changes from v1

>  arch/sparc/Kconfig | 2 ++
>  arch/sparc/include/asm/page_32.h   | 2 +-
>  arch/sparc/include/asm/page_64.h   | 3 +--

Acked-by: Andreas Larsson 

Thanks,
Andreas




Re: [PATCH v2 2/3] arch: simplify architecture specific page size configuration

2024-03-07 Thread Michael Ellerman
Arnd Bergmann  writes:
> From: Arnd Bergmann 
>
> arc, arm64, parisc and powerpc all have their own Kconfig symbols
> in place of the common CONFIG_PAGE_SIZE_4KB symbols. Change these
> so the common symbols are the ones that are actually used, while
> leaving the arhcitecture specific ones as the user visible
> place for configuring it, to avoid breaking user configs.
>
> Reviewed-by: Christophe Leroy  (powerpc32)
> Acked-by: Catalin Marinas 
> Acked-by: Helge Deller  # parisc
> Signed-off-by: Arnd Bergmann 
> ---
> No changes from v1
>
>  arch/arc/Kconfig  |  3 +++
>  arch/arc/include/uapi/asm/page.h  |  6 ++
>  arch/arm64/Kconfig| 29 +
>  arch/arm64/include/asm/page-def.h |  2 +-
>  arch/parisc/Kconfig   |  3 +++
>  arch/parisc/include/asm/page.h| 10 +-
>  arch/powerpc/Kconfig  | 31 ++-
>  arch/powerpc/include/asm/page.h   |  2 +-
>  scripts/gdb/linux/constants.py.in |  2 +-
>  scripts/gdb/linux/mm.py   |  2 +-
>  10 files changed, 32 insertions(+), 58 deletions(-)

Acked-by: Michael Ellerman  (powerpc)

cheers