Re: [PATCH v2 00/14] Introducing TIF_NOTIFY_IPI flag

2024-06-18 Thread K Prateek Nayak

Hello Chenyu,

On 6/18/2024 1:19 PM, Chen Yu wrote:

[..snip..]



Vincent [5] pointed out a case where the idle load kick will fail to
run on an idle CPU since the IPI handler launching the ILB will check
for need_resched(). In such cases, the idle CPU relies on
newidle_balance() to pull tasks towards itself.


Is this the need_resched() in _nohz_idle_balance() ? Should we change
this to 'need_resched() && (rq->nr_running || rq->ttwu_pending)' or
something long those lines?


It's not only this but also in do_idle() as well which exits the loop
to look for tasks to schedule



I mean, it's fairly trivial to figure out if there really is going to be
work there.


Using an alternate flag instead of NEED_RESCHED to indicate a pending
IPI was suggested as the correct approach to solve this problem on the
same thread.


So adding per-arch changes for this seems like something we shouldn't
unless there really is no other sane options.

That is, I really think we should start with something like the below
and then fix any fallout from that.


The main problem is that need_resched becomes somewhat meaningless
because it doesn't  only mean "I need to resched a task" and we have
to add more tests around even for those not using polling



diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0935f9d4bb7b..cfa45338ae97 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5799,7 +5800,7 @@ static inline struct task_struct *
   __pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags 
*rf)
   {
  const struct sched_class *class;
-   struct task_struct *p;
+   struct task_struct *p = NULL;

  /*
   * Optimization: we know that if all tasks are in the fair class we 
can
@@ -5810,9 +5811,11 @@ __pick_next_task(struct rq *rq, struct task_struct 
*prev, struct rq_flags *rf)
  if (likely(!sched_class_above(prev->sched_class, _sched_class) &&
 rq->nr_running == rq->cfs.h_nr_running)) {

-   p = pick_next_task_fair(rq, prev, rf);
-   if (unlikely(p == RETRY_TASK))
-   goto restart;
+   if (rq->nr_running) {


How do you make the diff between a spurious need_resched() because of
polling and a cpu becoming idle ? isn't rq->nr_running null in both
cases ?
In the later case, we need to call sched_balance_newidle() but not in the former



Not sure if I understand correctly, if the goal of smp_call_function_single() 
is to
kick the idle CPU and do not force it to launch the 
schedule()->sched_balance_newidle(),
can we set the _TIF_POLLING_NRFLAG rather than _TIF_NEED_RESCHED in 
set_nr_if_polling()?
I think writing any value to the monitor address would wakeup the idle CPU. And 
_TIF_POLLING_NRFLAG
will be cleared once that idle CPU exit the idle loop, so we don't introduce 
arch-wide flag.

Although this might work for MWAIT, there is no way for the generic idle
path to know if there is a pending interrupt within a TIF_POLLING_NRFLAG
section. do_idle() sets TIF_POLLING_NRFLAG and relies on a bunch of
need_resched() checks along the way to bail early until finally doing a
current_clr_polling_and_test() before handing off to the cpuidle driver
in call_cpuidle(). I believe this section will necessarily need the sender
to indicate a pending interrupt via TIF_NEED_RESCHED flag to enable the
early bail out before going into the cpuidle driver since this case cannot
be considered the same as a break from MWAIT.



I see, this is a good point. So you mean with only TIF_POLLING_NRFLAG there is
possibility that the 'ipi kick CPU out of idle' is lost after the CPU enters
do_idle() and before finally entering the idle state. While setting 
_TIF_NEED_RESCHED
could help the do_idle() loop to detect pending request easier.


Yup, that is correct.


BTW, before the
commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()"), the
lost of ipi after entering do_idle() and before entering driver idle state
is also possible, right(the local irq is disabled)?


From what I understand, the IPI remains pending until the interrupts
are enabled again. Before the optimization, the interrupts would be
disabled all the way until the instruction that is used to put the CPU
to sleep which is what __sti_mwait() and native_safe_halt() does. The
CPU would have received the IPI then and broke out of idle before
Peter's optimization went in. There is an elaborate comment on this in
do_idle() function above the call to local_irq_disable(). In  commit
edc8fc01f608 ("x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer
reprogram") Peter describes a case of actually missing the break from
an interrupt as the driver enabled interrupts much earlier than
executing the sleep instruction.

Since the CPU was in TIF_POLLING_NRFLAG state, one could simply get away
by setting TIF_NEED_RESCHED and not sending an actual IPI which the
need_resched() checks in the idle path would catch and the
flush_smp_call_function_queue() on 

Re: [PATCH v2 00/14] Introducing TIF_NOTIFY_IPI flag

2024-06-17 Thread K Prateek Nayak

Hello Chenyu,

On 6/14/2024 10:01 PM, Chen Yu wrote:

On 2024-06-14 at 12:48:37 +0200, Vincent Guittot wrote:

On Fri, 14 Jun 2024 at 11:28, Peter Zijlstra  wrote:


On Thu, Jun 13, 2024 at 06:15:59PM +, K Prateek Nayak wrote:

Effects of call_function_single_prep_ipi()
==

To pull a TIF_POLLING thread out of idle to process an IPI, the sender
sets the TIF_NEED_RESCHED bit in the idle task's thread info in
call_function_single_prep_ipi() and avoids sending an actual IPI to the
target. As a result, the scheduler expects a task to be enqueued when
exiting the idle path. This is not the case with non-polling idle states
where the idle CPU exits the non-polling idle state to process the
interrupt, and since need_resched() returns false, soon goes back to
idle again.

When TIF_NEED_RESCHED flag is set, do_idle() will call schedule_idle(),
a large part of which runs with local IRQ disabled. In case of ipistorm,
when measuring IPI throughput, this large IRQ disabled section delays
processing of IPIs. Further auditing revealed that in absence of any
runnable tasks, pick_next_task_fair(), which is called from the
pick_next_task() fast path, will always call newidle_balance() in this
scenario, further increasing the time spent in the IRQ disabled section.

Following is the crude visualization of the problem with relevant
functions expanded:
--
CPU0  CPU1
  
   do_idle() {
   
__current_set_polling();
   ...
   monitor(addr);
   if 
(!need_resched())
   mwait() {
   /* 
Waiting */
smp_call_function_single(CPU1, func, wait = 1) {  
...
   ... 
...
   set_nr_if_polling(CPU1) {   
...
   /* Realizes CPU1 is polling */  
...
   try_cmpxchg(addr,   
...
   ,   
...
   val | _TIF_NEED_RESCHED);   
...
   } /* Does not send an IPI */
...
   ... } /* 
mwait exit due to write at addr */
   csd_lock_wait() {   }
   /* Waiting */   
preempt_set_need_resched();
   ... 
__current_clr_polling();
   ... 
flush_smp_call_function_queue() {
   ... func();
   } /* End of wait */ }
} schedule_idle() {
   ...
   
local_irq_disable();
smp_call_function_single(CPU1, func, wait = 1) {  ...
   ... ...
   arch_send_call_function_single_ipi(CPU1);   ...
   \   ...
\  
newidle_balance() {
 \ 
...
 /* Delay */   
...
   \   }
\  ...
 \-->  
local_irq_enable();
   /* 
Processes the IPI */
--


Skipping newidle_balance()
==

In an earlier attempt to solve the challenge of the long IRQ disabled
section, newidle_balance() was skipped when a CPU waking up from idle
was found to have no runnable tasks, and was transitioning back to
idle [2]. Tim [3] and David [4] had pointed out that newidle_balance()
may be viable for CPUs that are idling with tick enabled, where the
newidle_balance() has the opportunity to pull tasks onto the idle CPU.


I don't think we should be rely

Re: [PATCH v2 00/14] Introducing TIF_NOTIFY_IPI flag

2024-06-17 Thread K Prateek Nayak

Hello Vincent, Peter,

On 6/16/2024 8:27 PM, Vincent Guittot wrote:

On Sat, 15 Jun 2024 at 03:28, Peter Zijlstra  wrote:


On Fri, Jun 14, 2024 at 12:48:37PM +0200, Vincent Guittot wrote:

On Fri, 14 Jun 2024 at 11:28, Peter Zijlstra  wrote:



Vincent [5] pointed out a case where the idle load kick will fail to
run on an idle CPU since the IPI handler launching the ILB will check
for need_resched(). In such cases, the idle CPU relies on
newidle_balance() to pull tasks towards itself.


Is this the need_resched() in _nohz_idle_balance() ? Should we change
this to 'need_resched() && (rq->nr_running || rq->ttwu_pending)' or
something long those lines?


It's not only this but also in do_idle() as well which exits the loop
to look for tasks to schedule


Is that really a problem? Reading the initial email the problem seems to
be newidle balance, not hitting schedule. Schedule should be fairly
quick if there's nothing to do, no?


There are 2 problems:
- Because of NEED_RESCHED being set, we go through the full schedule
path for no reason and we finally do a sched_balance_newidle()


Peter's patch up in the thread seems to improve the above case by
speeding up the schedule() loop similar to the very first solution
I tried with
https://lore.kernel.org/lkml/20240119084548.2788-1-kprateek.na...@amd.com/

I do see same level of improvements (if not better) with Peter's
SM_IDLE solution:

  ==
  Test  : ipistorm (modified)
  Units : Normalized runtime
  Interpretation: Lower is better
  Statistic : AMean
  ==
  kernel:   time [pct imp]
  tip:sched/core1.00 [baseline]
  tip:sched/core + revert   0.40 [60.26%]
  tip:sched/core + TIF_NOTIFY_IPI   0.46 [54.88%]
  tip:sched/core + SM_IDLE  0.38 [72.64%]


- Because of need_resched being set o wake up the cpu, we will not
kick the softirq to run the nohz idle load balance and get a chance to
pull a task on an idle CPU


However, this issues with need_resched() still remains. Any
need_resched() check within an interrupt context will return true if the
target CPU is perceived to be in a polling idle state by the sender as a
result of the optimization in commit b2a02fc43a1f ("smp: Optimize
send_call_function_single_ipi()").

If TIF_POLLING_NRFLAG is defined by an arch, do_idle() will set the
flag until the path hits call_cpuidle() where the flag is cleared just
before handing off the state entry to the cpuidle driver. An incoming
interrupt in this window will allow the idle path to bail early and
return before calling the driver specific routine since it'll be
indicated by TIF_NEED_RESCHED being set in the idle task's thread info.
Beyond that point, the cpuidle driver handles the idle entry.

I think an arch may define TIF_POLLING_NRFLAG just to utilize this
optimization in the generic idle path to answer Vincent's observation
on ARM32 having TIF_POLLING_NRFLAG.






I mean, it's fairly trivial to figure out if there really is going to be
work there.


Using an alternate flag instead of NEED_RESCHED to indicate a pending
IPI was suggested as the correct approach to solve this problem on the
same thread.


So adding per-arch changes for this seems like something we shouldn't
unless there really is no other sane options.

That is, I really think we should start with something like the below
and then fix any fallout from that.


The main problem is that need_resched becomes somewhat meaningless
because it doesn't  only mean "I need to resched a task" and we have
to add more tests around even for those not using polling


True, however we already had some of that by having the wakeup list,
that made nr_running less 'reliable'.

The thing is, most architectures seem to have the TIF_POLLING_NRFLAG
bit, even if their main idle routine isn't actually using it, much of


Yes, I'm surprised that Arm arch has the TIF_POLLING_NRFLAG whereas it
has never been supported by the arch


the idle loop until it hits the arch idle will be having it set and will
thus tickle these cases *sometimes*.
[..snip..]


--
Thanks and Regards,
Prateek


Re: [PATCH v2 00/14] Introducing TIF_NOTIFY_IPI flag

2024-06-17 Thread K Prateek Nayak

Hello Russell,

On 6/15/2024 7:56 PM, Russell King (Oracle) wrote:

On Thu, Jun 13, 2024 at 06:15:59PM +, K Prateek Nayak wrote:

o Dropping the ARM results since I never got my hands on the ARM64
   system I used in my last testing. If I do manage to get my hands on it
   again, I'll rerun the experiments and share the results on the thread.
   To test the case where TIF_NOTIFY_IPI is not enabled for a particular
   architecture, I applied the series only until Patch 3 and tested the
   same on my x86 machine with a WARN_ON_ONCE() in do_idle() to check if
   tif_notify_ipi() ever return true and then repeated the same with
   Patch 4 applied.


Confused. ARM (32-bit) or ARM64? You patch 32-bit ARM, but you don't
touch 64-bit Arm. "ARM" on its own in the context above to me suggests
32-bit, since you refer to ARM64 later.



In my first RFC posting, I had shared the results for ipistorm on an
ARM64 server [1]. Vincent and Linus Walleij brought to my attention that
ARM32 and ARM64 do not share the thread info flags and I probably saw a
one-off behavior during my testing. Since then, it has been slightly
challenging to get my hands on that machine again in a stable condition
to see if there was any scenario that I might have missed but I tried a
bunch of things on my x86 machine to confirm that an arch that does not
define the TIF_NOTIFY_IPI would not hit these changes.

Rest assured, Patch 5 is for ARM32 machines that currently define
TIF_POLLING_NRFLAG

[1] https://lore.kernel.org/lkml/20240220171457.703-6-kprateek.na...@amd.com/

--
Thanks and Regards,
Prateek


[PATCH v2 03/14] sched/core: Use TIF_NOTIFY_IPI to notify an idle CPU in TIF_POLLING mode of pending IPI

2024-06-13 Thread K Prateek Nayak
: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Gautham R. Shenoy 
Co-developed-by: K Prateek Nayak 
Signed-off-by: K Prateek Nayak 
---
v1..v2:
o Updated benchmark numbers.
---
 include/linux/sched/idle.h |  8 
 kernel/sched/core.c| 41 ++
 kernel/sched/idle.c| 16 +++
 3 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
index 497518b84e8d..4757a6ab5c2c 100644
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -58,8 +58,8 @@ static __always_inline bool __must_check 
current_set_polling_and_test(void)
__current_set_polling();
 
/*
-* Polling state must be visible before we test NEED_RESCHED,
-* paired by resched_curr()
+* Polling state must be visible before we test NEED_RESCHED or
+* NOTIFY_IPI paired by resched_curr() or notify_ipi_if_polling()
 */
smp_mb__after_atomic();
 
@@ -71,8 +71,8 @@ static __always_inline bool __must_check 
current_clr_polling_and_test(void)
__current_clr_polling();
 
/*
-* Polling state must be visible before we test NEED_RESCHED,
-* paired by resched_curr()
+* Polling state must be visible before we test NEED_RESCHED or
+* NOTIFY_IPI paired by resched_curr() or notify_ipi_if_polling()
 */
smp_mb__after_atomic();
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0935f9d4bb7b..bb01b063320b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -911,12 +911,30 @@ static inline bool set_nr_and_not_polling(struct 
task_struct *p)
 }
 
 /*
- * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.
+ * Certain architectures that support TIF_POLLING_NRFLAG may not support
+ * TIF_NOTIFY_IPI to notify an idle CPU in TIF_POLLING mode of a pending
+ * IPI. On such architectures, set TIF_NEED_RESCHED instead to wake the
+ * idle CPU and process the pending IPI.
+ */
+#ifdef _TIF_NOTIFY_IPI
+#define _TIF_WAKE_FLAG _TIF_NOTIFY_IPI
+#else
+#define _TIF_WAKE_FLAG _TIF_NEED_RESCHED
+#endif
+
+/*
+ * Atomically set TIF_WAKE_FLAG when TIF_POLLING_NRFLAG is set.
+ *
+ * On architectures that define TIF_NOTIFY_IPI, the same is set in the
+ * idle task's thread_info to pull the CPU out of idle and process
+ * the pending interrupt. On architectures that don't support
+ * TIF_NOTIFY_IPI, TIF_NEED_RESCHED is set instead to notify the
+ * pending IPI.
  *
- * If this returns true, then the idle task promises to call
- * sched_ttwu_pending() and reschedule soon.
+ * If this returns true, then the idle task promises to process the
+ * call function soon.
  */
-static bool set_nr_if_polling(struct task_struct *p)
+static bool notify_ipi_if_polling(struct task_struct *p)
 {
struct thread_info *ti = task_thread_info(p);
typeof(ti->flags) val = READ_ONCE(ti->flags);
@@ -924,9 +942,16 @@ static bool set_nr_if_polling(struct task_struct *p)
do {
if (!(val & _TIF_POLLING_NRFLAG))
return false;
-   if (val & _TIF_NEED_RESCHED)
+   /*
+* If TIF_NEED_RESCHED flag is set in addition to
+* TIF_POLLING_NRFLAG, the CPU will soon fall out of
+* idle. Since flush_smp_call_function_queue() is called
+* soon after the idle exit, setting TIF_WAKE_FLAG is
+* not necessary.
+*/
+   if (val & (_TIF_NEED_RESCHED | _TIF_WAKE_FLAG))
return true;
-   } while (!try_cmpxchg(>flags, , val | _TIF_NEED_RESCHED));
+   } while (!try_cmpxchg(>flags, , val | _TIF_WAKE_FLAG));
 
return true;
 }
@@ -939,7 +964,7 @@ static inline bool set_nr_and_not_polling(struct 
task_struct *p)
 }
 
 #ifdef CONFIG_SMP
-static inline bool set_nr_if_polling(struct task_struct *p)
+static inline bool notify_ipi_if_polling(struct task_struct *p)
 {
return false;
 }
@@ -3710,7 +3735,7 @@ void sched_ttwu_pending(void *arg)
  */
 bool call_function_single_prep_ipi(int cpu)
 {
-   if (set_nr_if_polling(cpu_rq(cpu)->idle)) {
+   if (notify_ipi_if_polling(cpu_rq(cpu)->idle)) {
trace_sched_wake_idle_without_ipi(cpu);
return false;
}
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 7de94df5d477..6748735156a7 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -329,13 +329,13 @@ static void do_idle(void)
}
 
/*
-* Since we fell out of the loop above, we know TIF_NEED_RESCHED must
-* be set, propagate it into PREEMPT_NEED_RESCHED.
+* Since we fell out of the loop above, TIF_NEED_RESCHED may be set.
+* Propagate it into PREEMPT_NEED_RESCHED.

[PATCH v2 02/14] sched: Define a need_resched_or_ipi() helper and use it treewide

2024-06-13 Thread K Prateek Nayak
From: "Gautham R. Shenoy" 

Currently TIF_NEED_RESCHED is being overloaded, to wakeup an idle CPU in
TIF_POLLING mode to service an IPI even if there are no new tasks being
woken up on the said CPU.

In preparation of a proper fix, introduce a new helper
"need_resched_or_ipi()" which is intended to return true if either
the TIF_NEED_RESCHED flag or if TIF_NOTIFY_IPI flag is set. Use this
helper function in place of need_resched() in idle loops where
TIF_POLLING_NRFLAG is set.

To preserve bisectibility and avoid unbreakable idle loops, all the
need_resched() checks within TIF_POLLING_NRFLAGS sections, have been
replaced tree-wide with the need_resched_or_ipi() check.

[ prateek: Replaced some of the missed out occurrences of
  need_resched() within a TIF_POLLING sections with
  need_resched_or_ipi() ]

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Russell King 
Cc: Guo Ren 
Cc: Michal Simek 
Cc: Dinh Nguyen 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: Stafford Horne 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Naveen N. Rao" 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: John Paul Adrian Glaubitz 
Cc: "David S. Miller" 
Cc: Andreas Larsson 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Lezcano 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Valentin Schneider 
Cc: Andrew Donnellan 
Cc: Benjamin Gray 
Cc: Frederic Weisbecker 
Cc: Xin Li 
Cc: Kees Cook 
Cc: Rick Edgecombe 
Cc: Tony Battersby 
Cc: Bjorn Helgaas 
Cc: Brian Gerst 
Cc: Leonardo Bras 
Cc: Imran Khan 
Cc: "Paul E. McKenney" 
Cc: Rik van Riel 
Cc: Tim Chen 
Cc: David Vernet 
Cc: Julia Lawall 
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-openr...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Gautham R. Shenoy 
Co-developed-by: K Prateek Nayak 
Signed-off-by: K Prateek Nayak 
---
v1..v2:
o Fixed a conflict with commit edc8fc01f608 ("x86: Fix
  CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram") that touched
  mwait_idle_with_hints() in arch/x86/include/asm/mwait.h
---
 arch/x86/include/asm/mwait.h  | 2 +-
 arch/x86/kernel/process.c | 2 +-
 drivers/cpuidle/cpuidle-powernv.c | 2 +-
 drivers/cpuidle/cpuidle-pseries.c | 2 +-
 drivers/cpuidle/poll_state.c  | 2 +-
 include/linux/sched.h | 5 +
 include/linux/sched/idle.h| 4 ++--
 kernel/sched/idle.c   | 7 ---
 8 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 920426d691ce..3fa6f0bbd74f 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -125,7 +125,7 @@ static __always_inline void mwait_idle_with_hints(unsigned 
long eax, unsigned lo
 
__monitor((void *)_thread_info()->flags, 0, 0);
 
-   if (!need_resched()) {
+   if (!need_resched_or_ipi()) {
if (ecx & 1) {
__mwait(eax, ecx);
} else {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b8441147eb5e..dd73cd6f735c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -901,7 +901,7 @@ static __cpuidle void mwait_idle(void)
}
 
__monitor((void *)_thread_info()->flags, 0, 0);
-   if (!need_resched()) {
+   if (!need_resched_or_ipi()) {
__sti_mwait(0, 0);
raw_local_irq_disable();
}
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 9ebedd972df0..77c3bb371f56 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -79,7 +79,7 @@ static int snooze_loop(struct cpuidle_device *dev,
dev->poll_time_limit = false;
ppc64_runlatch_off();
HMT_very_low();
-   while (!need_resched()) {
+   while (!need_resched_or_ipi()) {
if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
/*
 * Task has not woken up but we are exiting the polling
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 14db9b7d985d..4f2b490f8b73 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -46,7 +46,7 @@ int snooze_loop(struct cpuidle_device *dev, struc

[PATCH v2 01/14] thread_info: Add helpers to test and clear TIF_NOTIFY_IPI

2024-06-13 Thread K Prateek Nayak
From: "Gautham R. Shenoy" 

Introduce the notion of TIF_NOTIFY_IPI flag. When a processor in
TIF_POLLING mode needs to process an IPI, the sender sets NEED_RESCHED
bit in idle task's thread_info to pull the target out of idle and avoids
sending an interrupt to the idle CPU. When NEED_RESCHED is set, the
scheduler assumes that a new task has been queued on the idle CPU and
calls schedule_idle(), however, it is not necessary that an IPI on an
idle CPU will necessarily end up waking a task on the said CPU. To avoid
spurious calls to schedule_idle() assuming an IPI on an idle CPU will
always wake a task on the said CPU, TIF_NOTIFY_IPI will be used to pull
a TIF_POLLING CPU out of idle.

Since the IPI handlers are processed before the call to schedule_idle(),
schedule_idle() will be called only if one of the handlers have woken up
a new task on the CPU and has set NEED_RESCHED.

Add tif_notify_ipi() and current_clr_notify_ipi() helpers to test if
TIF_NOTIFY_IPI is set in the current task's thread_info, and to clear it
respectively. These interfaces will be used in subsequent patches as
TIF_NOTIFY_IPI notion is integrated in the scheduler and in the idle
path.

[ prateek: Split the changes into a separate patch, add commit log ]

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Russell King 
Cc: Guo Ren 
Cc: Michal Simek 
Cc: Dinh Nguyen 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: Stafford Horne 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Naveen N. Rao" 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: John Paul Adrian Glaubitz 
Cc: "David S. Miller" 
Cc: Andreas Larsson 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Lezcano 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Valentin Schneider 
Cc: Andrew Donnellan 
Cc: Benjamin Gray 
Cc: Frederic Weisbecker 
Cc: Xin Li 
Cc: Kees Cook 
Cc: Rick Edgecombe 
Cc: Tony Battersby 
Cc: Bjorn Helgaas 
Cc: Brian Gerst 
Cc: Leonardo Bras 
Cc: Imran Khan 
Cc: "Paul E. McKenney" 
Cc: Rik van Riel 
Cc: Tim Chen 
Cc: David Vernet 
Cc: Julia Lawall 
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-openr...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Gautham R. Shenoy 
Co-developed-by: K Prateek Nayak 
Signed-off-by: K Prateek Nayak 
---
v1..v2:
o No changes.
---
 include/linux/thread_info.h | 43 +
 1 file changed, 43 insertions(+)

diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 9ea0b28068f4..1e10dd8c0227 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -195,6 +195,49 @@ static __always_inline bool tif_need_resched(void)
 
 #endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H */
 
+#ifdef TIF_NOTIFY_IPI
+
+#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
+
+static __always_inline bool tif_notify_ipi(void)
+{
+   return arch_test_bit(TIF_NOTIFY_IPI,
+(unsigned long *)(_thread_info()->flags));
+}
+
+static __always_inline void current_clr_notify_ipi(void)
+{
+   arch_clear_bit(TIF_NOTIFY_IPI,
+  (unsigned long *)(_thread_info()->flags));
+}
+
+#else
+
+static __always_inline bool tif_notify_ipi(void)
+{
+   return test_bit(TIF_NOTIFY_IPI,
+   (unsigned long *)(_thread_info()->flags));
+}
+
+static __always_inline void current_clr_notify_ipi(void)
+{
+   clear_bit(TIF_NOTIFY_IPI,
+ (unsigned long *)(_thread_info()->flags));
+}
+
+#endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H */
+
+#else /* !TIF_NOTIFY_IPI */
+
+static __always_inline bool tif_notify_ipi(void)
+{
+   return false;
+}
+
+static __always_inline void current_clr_notify_ipi(void) { }
+
+#endif /* TIF_NOTIFY_IPI */
+
 #ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES
 static inline int arch_within_stack_frames(const void * const stack,
   const void * const stackend,
-- 
2.34.1



[PATCH v2 00/14] Introducing TIF_NOTIFY_IPI flag

2024-06-13 Thread K Prateek Nayak
...
set_nr_if_polling(CPU1) {   
...
/* Realizes CPU1 is polling */  
...
try_cmpxchg(addr,   
...
,   
...
val | _TIF_NOTIFY_IPI); 
...
} /* Does not send an IPI */
...
... } /* 
mwait exit due to write at addr */
csd_lock_wait() {   ... 
/* Waiting */   
preempt_fold_need_resched(); /* fold if NEED_RESCHED */
... 
__current_clr_polling();
... 
flush_smp_call_function_queue() {
... func(); 
/* Will set NEED_RESCHED if sched_ttwu_pending() */
} /* End of wait */ }
}   if 
(need_resched()) {

schedule_idle();
smp_call_function_single(CPU1, func, wait = 1) {}
... ... /* IRQs 
remain enabled */
arch_send_call_function_single_ipi(CPU1); --->  /* Processes 
the IPI */
--

Results
===

With the TIF_NOTIFY_IPI, the time taken to complete a fixed set of IPIs
using ipistorm improves drastically and is closer the numbers same with
the revert. Following are the numbers from the same dual socket 3rd
Generation EPYC system (2 x 64C/128T) (boost on, C2 disabled) running
ipistorm between CPU8 and CPU16:

cmdline: insmod ipistorm.ko numipi=10 single=1 offset=8 cpulist=8 wait=1

  ==
  Test  : ipistorm (modified)
  Units : Normalized runtime
  Interpretation: Lower is better
  Statistic : AMean
  ==
  kernel:   time [pct imp]
  tip:sched/core1.00 [baseline]
  tip:sched/core + revert   0.40 [60.26]
  tip:sched/core + TIF_NOTIFY_IPI   0.46 [54.88]

netperf and tbench results with the patch match the results on tip on
the dual socket 3rd Generation AMD system (2 x 64C/128T). Additionally,
hackbench, stream, and schbench too were tested, with results from the
patched kernel matching that of the tip.


Additional benefits
===

In nohz_csd_func(), the need_resched() check returns true when an idle
CPU in TIF_POLLING mode is woken up to do an idle load balance which
leads to the idle load balance bailing out early today since
send_call_function_single_ipi() ends up setting the TIF_NEED_RESCHED
flag to put the CPU out of idle and the flag is not cleared until
__schedule() is called much later in the call path.

With TIF_NOTIFY_IPI, this is no longer the case since TIF_NEED_RESCHED
is only set if there is a genuine need to call schedule() and not used
in an overloaded manner to notify a pending IPI.

Links
=

[1] https://github.com/antonblanchard/ipistorm
[2] https://lore.kernel.org/lkml/20240119084548.2788-1-kprateek.na...@amd.com/
[3] 
https://lore.kernel.org/lkml/b4f5ac150685456cf45a342e3bb1f28cdd557a53.ca...@linux.intel.com/
[4] https://lore.kernel.org/lkml/20240123211756.GA221793@maniforge/
[5] 
https://lore.kernel.org/lkml/cakftptc446lo9catpp7pexdklhhqfobuy-jmgc7agohy4hs...@mail.gmail.com/

This series is based on tip:sched/core at commit c793a62823d1
("sched/core: Drop spinlocks on contention iff kernel is preemptible")
--
Gautham R. Shenoy (4):
  thread_info: Add helpers to test and clear TIF_NOTIFY_IPI
  sched: Define a need_resched_or_ipi() helper and use it treewide
  sched/core: Use TIF_NOTIFY_IPI to notify an idle CPU in TIF_POLLING
    mode of pending IPI
  x86/thread_info: Introduce TIF_NOTIFY_IPI flag

K Prateek Nayak (10):
  arm/thread_info: Introduce TIF_NOTIFY_IPI flag
  alpha/thread_info: Introduce TIF_NOTIFY_IPI flag
  openrisc/thread_info: Introduce TIF_NOTIFY_IPI flag
  powerpc/thread_info: Introduce TIF_NOTIFY_IPI flag
  sh/thread_info: Introduce TIF_NOTIFY_IPI flag
  sparc/thread_info: Introduce TIF_NOTIFY_IPI flag
  csky/thread_info: Introduce TIF_NOTIFY_IPI flag
  parisc/thread_info: Introduce TIF_NOTIFY_IPI flag
  nios2/thread_info: Introduce TIF_NOTIFY_IPI flag
  microblaze/thread_info: Introduce TIF_NOTIFY_IPI flag
--
Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Russell King 
Cc: Guo Ren 
Cc: Michal Simek 
Cc: Dinh Nguyen 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: S

[PATCH v2 08/14] powerpc/thread_info: Introduce TIF_NOTIFY_IPI flag

2024-06-13 Thread K Prateek Nayak
Add support for TIF_NOTIFY_IPI on PowerPC. With TIF_NOTIFY_IPI, a sender
sending an IPI to an idle CPU in TIF_POLLING mode will set the
TIF_NOTIFY_IPI flag in the target's idle tasks's thread_info to pull the
CPU out of idle, as opposed to setting TIF_NEED_RESCHED previously. This
avoids spurious calls to schedule_idle() in cases where an IPI does not
necessarily wake up a task on the idle CPU.

Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Naveen N. Rao" 
Cc: Benjamin Gray 
Cc: Andrew Donnellan 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Lezcano 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Valentin Schneider 
Cc: K Prateek Nayak 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Cc: linux...@vger.kernel.org
Signed-off-by: K Prateek Nayak 
---
v1..v2:
o No changes.
---
 arch/powerpc/include/asm/thread_info.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 15c5691dd218..9545e164463b 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -103,6 +103,7 @@ void arch_setup_new_exec(void);
 #define TIF_PATCH_PENDING  6   /* pending live patching update */
 #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
 #define TIF_SINGLESTEP 8   /* singlestepping active */
+#define TIF_NOTIFY_IPI 9   /* Pending IPI on TIF_POLLLING idle CPU 
*/
 #define TIF_SECCOMP10  /* secure computing */
 #define TIF_RESTOREALL 11  /* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR12  /* Force successful syscall return */
@@ -129,6 +130,7 @@ void arch_setup_new_exec(void);
 #define _TIF_PATCH_PENDING (1<

Re: [RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag

2024-03-06 Thread K Prateek Nayak
Hello Vincent,

Thank you for taking a look at the series.

On 3/6/2024 3:29 PM, Vincent Guittot wrote:
> Hi Prateek,
> 
> Adding Julia who could be interested in this patchset. Your patchset
> should trigger idle load balance instead of newly idle load balance
> now when the polling is used. This was one reason for not migrating
> task in idle CPU

Thank you.

> 
> On Tue, 20 Feb 2024 at 18:15, K Prateek Nayak  wrote:
>>
>> Hello everyone,
>>
>> [..snip..]
>>
>>
>> Skipping newidle_balance()
>> ==
>>
>> In an earlier attempt to solve the challenge of the long IRQ disabled
>> section, newidle_balance() was skipped when a CPU waking up from idle
>> was found to have no runnable tasks, and was transitioning back to
>> idle [2]. Tim [3] and David [4] had pointed out that newidle_balance()
>> may be viable for CPUs that are idling with tick enabled, where the
>> newidle_balance() has the opportunity to pull tasks onto the idle CPU.
>>
>> Vincent [5] pointed out a case where the idle load kick will fail to
>> run on an idle CPU since the IPI handler launching the ILB will check
>> for need_resched(). In such cases, the idle CPU relies on
>> newidle_balance() to pull tasks towards itself.
> 
> Calling newidle_balance() instead of the normal idle load balance
> prevents the CPU to pull tasks from other groups

Thank you for the correction.

> 
>>
>> Using an alternate flag instead of NEED_RESCHED to indicate a pending
>> IPI was suggested as the correct approach to solve this problem on the
>> same thread.
>>
>>
>> Proposed solution: TIF_NOTIFY_IPI
>> =
>>
>> Instead of reusing TIF_NEED_RESCHED bit to pull an TIF_POLLING CPU out
>> of idle, TIF_NOTIFY_IPI is a newly introduced flag that
>> call_function_single_prep_ipi() sets on a target TIF_POLLING CPU to
>> indicate a pending IPI, which the idle CPU promises to process soon.
>>
>> On architectures that do not support the TIF_NOTIFY_IPI flag (this
>> series only adds support for x86 and ARM processors for now),
> 
> I'm surprised that you are mentioning ARM processors because they
> don't use TIF_POLLING.

Yup I just realised that after Linus Walleij pointed it out on the
thread.

> 
>> call_function_single_prep_ipi() will fallback to setting
>> TIF_NEED_RESCHED bit to pull the TIF_POLLING CPU out of idle.
>>
>> Since the pending IPI handlers are processed before the call to
>> schedule_idle() in do_idle(), schedule_idle() will only be called if the
>> IPI handler have woken / migrated a new task on the idle CPU and has set
>> TIF_NEED_RESCHED bit to indicate the same. This avoids running into the
>> long IRQ disabled section in schedule_idle() unnecessarily, and any
>> need_resched() check within a call function will accurately notify if a
>> task is waiting for CPU time on the CPU handling the IPI.
>>
>> Following is the crude visualization of how the situation changes with
>> the newly introduced TIF_NOTIFY_IPI flag:
>> --
>> CPU0CPU1
>> 
>> do_idle() {
>> 
>> __current_set_polling();
>> ...
>> 
>> monitor(addr);
>> if 
>> (!need_resched_or_ipi())
>> 
>> mwait() {
>> /* 
>> Waiting */
>> smp_call_function_single(CPU1, func, wait = 1) { 
>>...
>> ...  
>>...
>> set_nr_if_polling(CPU1) {
>>...
>> /* Realizes CPU1 is polling */   
>>...
>> try_cmpxchg(addr,
>>...
>> ,
>>...
>> val | _TIF_NOTIFY_IPI);  
>>...
>> } /* Does not send an IPI */ 
&

Re: [RFC] sched/eevdf: sched feature to dismiss lag on wakeup

2024-02-28 Thread K Prateek Nayak
(+ Xuewen Yan, Ke Wang)

Hello Tobias,

On 2/28/2024 9:40 PM, Tobias Huschle wrote:
> The previously used CFS scheduler gave tasks that were woken up an
> enhanced chance to see runtime immediately by deducting a certain value
> from its vruntime on runqueue placement during wakeup.
> 
> This property was used by some, at least vhost, to ensure, that certain
> kworkers are scheduled immediately after being woken up. The EEVDF
> scheduler, does not support this so far. Instead, if such a woken up
> entitiy carries a negative lag from its previous execution, it will have
> to wait for the current time slice to finish, which affects the
> performance of the process expecting the immediate execution negatively.
> 
> To address this issue, implement EEVDF strategy #2 for rejoining
> entities, which dismisses the lag from previous execution and allows
> the woken up task to run immediately (if no other entities are deemed
> to be preferred for scheduling by EEVDF).
> 
> The vruntime is decremented by an additional value of 1 to make sure,
> that the woken up tasks gets to actually run. This is of course not
> following strategy #2 in an exact manner but guarantees the expected
> behavior for the scenario described above. Without the additional
> decrement, the performance goes south even more. So there are some
> side effects I could not get my head around yet.
> 
> Questions:
> 1. The kworker getting its negative lag occurs in the following scenario
>- kworker and a cgroup are supposed to execute on the same CPU
>- one task within the cgroup is executing and wakes up the kworker
>- kworker with 0 lag, gets picked immediately and finishes its
>  execution within ~5000ns
>- on dequeue, kworker gets assigned a negative lag
>Is this expected behavior? With this short execution time, I would
>expect the kworker to be fine.
>For a more detailed discussion on this symptom, please see:
>https://lore.kernel.org/netdev/ZWbapeL34Z8AMR5f@DESKTOP-2CCOB1S./T/

Does the lag clamping path from Xuewen Yan [1] work for the vhost case
mentioned in the thread? Instead of placing the task just behind the
0-lag point, clamping the lag seems to be more principled approach since
EEVDF already does it in update_entity_lag().

If the lag is still too large, maybe the above coupled with Peter's
delayed dequeue patch can help [2] (Note: tree is prone to force
updates)

[1] https://lore.kernel.org/lkml/20240130080643.1828-1-xuewen@unisoc.com/
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf=e62ef63a888c97188a977daddb72b61548da8417

> 2. The proposed code change of course only addresses the symptom. Am I
>assuming correctly that this is in general the exepected behavior and
>that the task waking up the kworker should rather do an explicit
>reschedule of itself to grant the kworker time to execute?
>In the vhost case, this is currently attempted through a cond_resched
>which is not doing anything because the need_resched flag is not set.
> 
> Feedback and opinions would be highly appreciated.
> 
> Signed-off-by: Tobias Huschle 
> ---
>  kernel/sched/fair.c | 5 +
>  kernel/sched/features.h | 1 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 533547e3c90a..c20ae6d62961 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5239,6 +5239,11 @@ place_entity(struct cfs_rq *cfs_rq, struct 
> sched_entity *se, int flags)
>   lag = div_s64(lag, load);
>   }
>  
> + if (sched_feat(NOLAG_WAKEUP) && (flags & ENQUEUE_WAKEUP)) {
> + se->vlag = 0;
> + lag = 1;
> + }
> +
>   se->vruntime = vruntime - lag;
>  
>   /*
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 143f55df890b..d3118e7568b4 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -7,6 +7,7 @@
>  SCHED_FEAT(PLACE_LAG, true)
>  SCHED_FEAT(PLACE_DEADLINE_INITIAL, true)
>  SCHED_FEAT(RUN_TO_PARITY, true)
> +SCHED_FEAT(NOLAG_WAKEUP, true)
>  
>  /*
>   * Prefer to schedule the task we woke last (assuming it failed

--
Thanks and Regards,
Prateek


[RFC PATCH 08/14] powerpc/thread_info: Introduce TIF_NOTIFY_IPI flag

2024-02-20 Thread K Prateek Nayak
Add support for TIF_NOTIFY_IPI on PowerPC. With TIF_NOTIFY_IPI, a sender
sending an IPI to an idle CPU in TIF_POLLING mode will set the
TIF_NOTIFY_IPI flag in the target's idle tasks's thread_info to pull the
CPU out of idle, as opposed to setting TIF_NEED_RESCHED previously. This
avoids spurious calls to schedule_idle() in cases where an IPI does not
necessarily wake up a task on the idle CPU.

Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Aneesh Kumar K.V" 
Cc: "Naveen N. Rao" 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Lezcano 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Valentin Schneider 
Cc: Andrew Donnellan 
Cc: K Prateek Nayak 
Cc: Nicholas Miehlbradt 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Cc: linux...@vger.kernel.org
Signed-off-by: K Prateek Nayak 
---
 arch/powerpc/include/asm/thread_info.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index bf5dde1a4114..b48db55192e0 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -103,6 +103,7 @@ void arch_setup_new_exec(void);
 #define TIF_PATCH_PENDING  6   /* pending live patching update */
 #define TIF_SYSCALL_AUDIT  7   /* syscall auditing active */
 #define TIF_SINGLESTEP 8   /* singlestepping active */
+#define TIF_NOTIFY_IPI 9   /* Pending IPI on TIF_POLLLING idle CPU 
*/
 #define TIF_SECCOMP10  /* secure computing */
 #define TIF_RESTOREALL 11  /* Restore all regs (implies NOERROR) */
 #define TIF_NOERROR12  /* Force successful syscall return */
@@ -129,6 +130,7 @@ void arch_setup_new_exec(void);
 #define _TIF_PATCH_PENDING (1<

[RFC PATCH 03/14] sched/core: Use TIF_NOTIFY_IPI to notify an idle CPU in TIF_POLLING mode of pending IPI

2024-02-20 Thread K Prateek Nayak
c: "Aneesh Kumar K.V" 
Cc: "Naveen N. Rao" 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: John Paul Adrian Glaubitz 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Lezcano 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Valentin Schneider 
Cc: Al Viro 
Cc: Linus Walleij 
Cc: Ard Biesheuvel 
Cc: Andrew Donnellan 
Cc: Nicholas Miehlbradt 
Cc: Andrew Morton 
Cc: Arnd Bergmann 
Cc: Josh Poimboeuf 
Cc: "Kirill A. Shutemov" 
Cc: Rick Edgecombe 
Cc: Tony Battersby 
Cc: Brian Gerst 
Cc: Tim Chen 
Cc: David Vernet 
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-al...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-openr...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@vger.kernel.org
Signed-off-by: Gautham R. Shenoy 
Co-developed-by: K Prateek Nayak 
Signed-off-by: K Prateek Nayak 
---
 include/linux/sched/idle.h |  8 
 kernel/sched/core.c| 41 ++
 kernel/sched/idle.c| 16 +++
 3 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
index d739ab810e00..c22312087c30 100644
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -58,8 +58,8 @@ static __always_inline bool __must_check 
current_set_polling_and_test(void)
__current_set_polling();
 
/*
-* Polling state must be visible before we test NEED_RESCHED,
-* paired by resched_curr()
+* Polling state must be visible before we test NEED_RESCHED or
+* NOTIFY_IPI paired by resched_curr() or notify_ipi_if_polling()
 */
smp_mb__after_atomic();
 
@@ -71,8 +71,8 @@ static __always_inline bool __must_check 
current_clr_polling_and_test(void)
__current_clr_polling();
 
/*
-* Polling state must be visible before we test NEED_RESCHED,
-* paired by resched_curr()
+* Polling state must be visible before we test NEED_RESCHED or
+* NOTIFY_IPI paired by resched_curr() or notify_ipi_if_polling()
 */
smp_mb__after_atomic();
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index db4be4921e7f..6fb6e5b75724 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -909,12 +909,30 @@ static inline bool set_nr_and_not_polling(struct 
task_struct *p)
 }
 
 /*
- * Atomically set TIF_NEED_RESCHED if TIF_POLLING_NRFLAG is set.
+ * Certain architectures that support TIF_POLLING_NRFLAG may not support
+ * TIF_NOTIFY_IPI to notify an idle CPU in TIF_POLLING mode of a pending
+ * IPI. On such architectures, set TIF_NEED_RESCHED instead to wake the
+ * idle CPU and process the pending IPI.
+ */
+#ifdef _TIF_NOTIFY_IPI
+#define _TIF_WAKE_FLAG _TIF_NOTIFY_IPI
+#else
+#define _TIF_WAKE_FLAG _TIF_NEED_RESCHED
+#endif
+
+/*
+ * Atomically set TIF_WAKE_FLAG when TIF_POLLING_NRFLAG is set.
+ *
+ * On architectures that define TIF_NOTIFY_IPI, the same is set in the
+ * idle task's thread_info to pull the CPU out of idle and process
+ * the pending interrupt. On architectures that don't support
+ * TIF_NOTIFY_IPI, TIF_NEED_RESCHED is set instead to notify the
+ * pending IPI.
  *
- * If this returns true, then the idle task promises to call
- * sched_ttwu_pending() and reschedule soon.
+ * If this returns true, then the idle task promises to process the
+ * call function soon.
  */
-static bool set_nr_if_polling(struct task_struct *p)
+static bool notify_ipi_if_polling(struct task_struct *p)
 {
struct thread_info *ti = task_thread_info(p);
typeof(ti->flags) val = READ_ONCE(ti->flags);
@@ -922,9 +940,16 @@ static bool set_nr_if_polling(struct task_struct *p)
do {
if (!(val & _TIF_POLLING_NRFLAG))
return false;
-   if (val & _TIF_NEED_RESCHED)
+   /*
+* If TIF_NEED_RESCHED flag is set in addition to
+* TIF_POLLING_NRFLAG, the CPU will soon fall out of
+* idle. Since flush_smp_call_function_queue() is called
+* soon after the idle exit, setting TIF_WAKE_FLAG is
+* not necessary.
+*/
+   if (val & (_TIF_NEED_RESCHED | _TIF_WAKE_FLAG))
return true;
-   } while (!try_cmpxchg(>flags, , val | _TIF_NEED_RESCHED));
+   } while (!try_cmpxchg(>flags, , val | _TIF_WAKE_FLAG));
 
return true;
 }
@@ -937,7 +962,7 @@ static inline bool set_nr_and_not_polling(struct 
task_struct *p)
 }
 
 #ifdef CONFIG_SMP
-static inline bool

[RFC PATCH 02/14] sched: Define a need_resched_or_ipi() helper and use it treewide

2024-02-20 Thread K Prateek Nayak
From: "Gautham R. Shenoy" 

Currently TIF_NEED_RESCHED is being overloaded, to wakeup an idle CPU in
TIF_POLLING mode to service an IPI even if there are no new tasks being
woken up on the said CPU.

In preparation of a proper fix, introduce a new helper
"need_resched_or_ipi()" which is intended to return true if either
the TIF_NEED_RESCHED flag or if TIF_NOTIFY_IPI flag is set. Use this
helper function in place of need_resched() in idle loops where
TIF_POLLING_NRFLAG is set.

To preserve bisectibility and avoid unbreakable idle loops, all the
need_resched() checks within TIF_POLLING_NRFLAGS sections, have been
replaced tree-wide with the need_resched_or_ipi() check.

[ prateek: Replaced some of the missed out occurrences of
  need_resched() within a TIF_POLLING sections with
  need_resched_or_ipi() ]

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Russell King 
Cc: Guo Ren 
Cc: Michal Simek 
Cc: Dinh Nguyen 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: Stafford Horne 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Aneesh Kumar K.V" 
Cc: "Naveen N. Rao" 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: John Paul Adrian Glaubitz 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Lezcano 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Valentin Schneider 
Cc: Al Viro 
Cc: Linus Walleij 
Cc: Ard Biesheuvel 
Cc: Andrew Donnellan 
Cc: Nicholas Miehlbradt 
Cc: Andrew Morton 
Cc: Arnd Bergmann 
Cc: Josh Poimboeuf 
Cc: "Kirill A. Shutemov" 
Cc: Rick Edgecombe 
Cc: Tony Battersby 
Cc: Brian Gerst 
Cc: Tim Chen 
Cc: David Vernet 
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-al...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-openr...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@vger.kernel.org
Signed-off-by: Gautham R. Shenoy 
Co-developed-by: K Prateek Nayak 
Signed-off-by: K Prateek Nayak 
---
 arch/x86/include/asm/mwait.h  | 2 +-
 arch/x86/kernel/process.c | 2 +-
 drivers/cpuidle/cpuidle-powernv.c | 2 +-
 drivers/cpuidle/cpuidle-pseries.c | 2 +-
 drivers/cpuidle/poll_state.c  | 2 +-
 include/linux/sched.h | 5 +
 include/linux/sched/idle.h| 4 ++--
 kernel/sched/idle.c   | 7 ---
 8 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 778df05f8539..ac1370143407 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -115,7 +115,7 @@ static __always_inline void mwait_idle_with_hints(unsigned 
long eax, unsigned lo
}
 
__monitor((void *)_thread_info()->flags, 0, 0);
-   if (!need_resched())
+   if (!need_resched_or_ipi())
__mwait(eax, ecx);
}
current_clr_polling();
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b6f4e8399fca..ca6cb7e28cba 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -925,7 +925,7 @@ static __cpuidle void mwait_idle(void)
}
 
__monitor((void *)_thread_info()->flags, 0, 0);
-   if (!need_resched()) {
+   if (!need_resched_or_ipi()) {
__sti_mwait(0, 0);
raw_local_irq_disable();
}
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 9ebedd972df0..77c3bb371f56 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -79,7 +79,7 @@ static int snooze_loop(struct cpuidle_device *dev,
dev->poll_time_limit = false;
ppc64_runlatch_off();
HMT_very_low();
-   while (!need_resched()) {
+   while (!need_resched_or_ipi()) {
if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
/*
 * Task has not woken up but we are exiting the polling
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 14db9b7d985d..4f2b490f8b73 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -46,7 +46,7 @@ int snooze_loop(struct cpuidle_device *dev, struct 
cpuidle_driver *drv,
snooze_exit_time = get_tb() + snooze_timeout;
dev->poll_time_limit = false;
 
-   while (!need_resched()) {
+   while (!need_resched_or_ipi()) {
HMT_low();
HMT

[RFC PATCH 01/14] thread_info: Add helpers to test and clear TIF_NOTIFY_IPI

2024-02-20 Thread K Prateek Nayak
From: "Gautham R. Shenoy" 

Introduce the notion of TIF_NOTIFY_IPI flag. When a processor in
TIF_POLLING mode needs to process an IPI, the sender sets NEED_RESCHED
bit in idle task's thread_info to pull the target out of idle and avoids
sending an interrupt to the idle CPU. When NEED_RESCHED is set, the
scheduler assumes that a new task has been queued on the idle CPU and
calls schedule_idle(), however, it is not necessary that an IPI on an
idle CPU will necessarily end up waking a task on the said CPU. To avoid
spurious calls to schedule_idle() assuming an IPI on an idle CPU will
always wake a task on the said CPU, TIF_NOTIFY_IPI will be used to pull
a TIF_POLLING CPU out of idle.

Since the IPI handlers are processed before the call to schedule_idle(),
schedule_idle() will be called only if one of the handlers have woken up
a new task on the CPU and has set NEED_RESCHED.

Add tif_notify_ipi() and current_clr_notify_ipi() helpers to test if
TIF_NOTIFY_IPI is set in the current task's thread_info, and to clear it
respectively. These interfaces will be used in subsequent patches as
TIF_NOTIFY_IPI notion is integrated in the scheduler and in the idle
path.

[ prateek: Split the changes into a separate patch, add commit log ]

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Russell King 
Cc: Guo Ren 
Cc: Michal Simek 
Cc: Dinh Nguyen 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: Stafford Horne 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Aneesh Kumar K.V" 
Cc: "Naveen N. Rao" 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: John Paul Adrian Glaubitz 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Lezcano 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Valentin Schneider 
Cc: Al Viro 
Cc: Linus Walleij 
Cc: Ard Biesheuvel 
Cc: Andrew Donnellan 
Cc: Nicholas Miehlbradt 
Cc: Andrew Morton 
Cc: Arnd Bergmann 
Cc: Josh Poimboeuf 
Cc: "Kirill A. Shutemov" 
Cc: Rick Edgecombe 
Cc: Tony Battersby 
Cc: Brian Gerst 
Cc: Tim Chen 
Cc: David Vernet 
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-al...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-openr...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@vger.kernel.org
Signed-off-by: Gautham R. Shenoy 
Co-developed-by: K Prateek Nayak 
Signed-off-by: K Prateek Nayak 
---
 include/linux/thread_info.h | 43 +
 1 file changed, 43 insertions(+)

diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index 9ea0b28068f4..1e10dd8c0227 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -195,6 +195,49 @@ static __always_inline bool tif_need_resched(void)
 
 #endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H */
 
+#ifdef TIF_NOTIFY_IPI
+
+#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
+
+static __always_inline bool tif_notify_ipi(void)
+{
+   return arch_test_bit(TIF_NOTIFY_IPI,
+(unsigned long *)(_thread_info()->flags));
+}
+
+static __always_inline void current_clr_notify_ipi(void)
+{
+   arch_clear_bit(TIF_NOTIFY_IPI,
+  (unsigned long *)(_thread_info()->flags));
+}
+
+#else
+
+static __always_inline bool tif_notify_ipi(void)
+{
+   return test_bit(TIF_NOTIFY_IPI,
+   (unsigned long *)(_thread_info()->flags));
+}
+
+static __always_inline void current_clr_notify_ipi(void)
+{
+   clear_bit(TIF_NOTIFY_IPI,
+ (unsigned long *)(_thread_info()->flags));
+}
+
+#endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H */
+
+#else /* !TIF_NOTIFY_IPI */
+
+static __always_inline bool tif_notify_ipi(void)
+{
+   return false;
+}
+
+static __always_inline void current_clr_notify_ipi(void) { }
+
+#endif /* TIF_NOTIFY_IPI */
+
 #ifndef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES
 static inline int arch_within_stack_frames(const void * const stack,
   const void * const stackend,
-- 
2.34.1



[RFC PATCH 00/14] Introducing TIF_NOTIFY_IPI flag

2024-02-20 Thread K Prateek Nayak
aken to complete a fixed set of IPIs
using ipistorm improves drastically. Following are the numbers from the
same dual socket 3rd Generation EPYC system (2 x 64C/128T) (boost on,
C2 disabled) running ipistorm between CPU8 and CPU16:

cmdline: insmod ipistorm.ko numipi=10 single=1 offset=8 cpulist=8 wait=1

  ==
  Test  : ipistorm (modified)
  Units : Normalized runtime
  Interpretation: Lower is better
  Statistic : AMean
  ==
  kernel:   time [pct imp]
  tip:sched/core1.00 [0.00]
  tip:sched/core + revert   0.81 [19.36]
  tip:sched/core + TIF_NOTIFY_IPI   0.20 [80.99]

Same experiment was repeated on an dual socket ARM server (2 x 64C)
which too saw a significant improvement in the ipistorm performance:

  ==
  Test  : ipistorm (modified)
  Units : Normalized runtime
  Interpretation: Lower is better
  Statistic : AMean
  ==
  kernel:   time [pct imp]
  tip:sched/core1.00 [0.00]
  tip:sched/core + TIF_NOTIFY_IPI   0.41 [59.29]

netperf and tbench results with the patch match the results on tip on
the dual socket 3rd Generation AMD system (2 x 64C/128T). Additionally,
hackbench, stream, and schbench too were tested, with results from the
patched kernel matching that of the tip.


Future Work
===

Evaluate impact of newidle_balance() when scheduler tick hits an idle
CPU. The call to newidle_balance() will be skipped with the
TIF_NOTIFY_IPI solution similar to [2]. Counter argument for the case is
that if the idle state did not set the TIF_POLLING bit, the idle CPU
would not have called schedule_idle() unless the IPI handler set the
NEED_RESCHED bit.


Links
=

[1] https://github.com/antonblanchard/ipistorm
[2] https://lore.kernel.org/lkml/20240119084548.2788-1-kprateek.na...@amd.com/
[3] 
https://lore.kernel.org/lkml/b4f5ac150685456cf45a342e3bb1f28cdd557a53.ca...@linux.intel.com/
[4] https://lore.kernel.org/lkml/20240123211756.GA221793@maniforge/
[5] 
https://lore.kernel.org/lkml/cakftptc446lo9catpp7pexdklhhqfobuy-jmgc7agohy4hs...@mail.gmail.com/

This series is based on tip:sched/core at tag "sched-core-2024-01-08".
---
Gautham R. Shenoy (4):
  thread_info: Add helpers to test and clear TIF_NOTIFY_IPI
  sched: Define a need_resched_or_ipi() helper and use it treewide
  sched/core: Use TIF_NOTIFY_IPI to notify an idle CPU in TIF_POLLING
mode of pending IPI
  x86/thread_info: Introduce TIF_NOTIFY_IPI flag

K Prateek Nayak (10):
  arm/thread_info: Introduce TIF_NOTIFY_IPI flag
  alpha/thread_info: Introduce TIF_NOTIFY_IPI flag
  openrisc/thread_info: Introduce TIF_NOTIFY_IPI flag
  powerpc/thread_info: Introduce TIF_NOTIFY_IPI flag
  sh/thread_info: Introduce TIF_NOTIFY_IPI flag
  sparc/thread_info: Introduce TIF_NOTIFY_IPI flag
  csky/thread_info: Introduce TIF_NOTIFY_IPI flag
  parisc/thread_info: Introduce TIF_NOTIFY_IPI flag
  nios2/thread_info: Introduce TIF_NOTIFY_IPI flag
  microblaze/thread_info: Introduce TIF_NOTIFY_IPI flag
---
Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Russell King 
Cc: Guo Ren 
Cc: Michal Simek 
Cc: Dinh Nguyen 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: Stafford Horne 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Aneesh Kumar K.V" 
Cc: "Naveen N. Rao" 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: John Paul Adrian Glaubitz 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: "Rafael J. Wysocki" 
Cc: Daniel Lezcano 
Cc: Peter Zijlstra 
Cc: Juri Lelli 
Cc: Vincent Guittot 
Cc: Dietmar Eggemann 
Cc: Steven Rostedt 
Cc: Ben Segall 
Cc: Mel Gorman 
Cc: Daniel Bristot de Oliveira 
Cc: Valentin Schneider 
Cc: Al Viro 
Cc: Linus Walleij 
Cc: Ard Biesheuvel 
Cc: Andrew Donnellan 
Cc: Nicholas Miehlbradt 
Cc: Andrew Morton 
Cc: Arnd Bergmann 
Cc: Josh Poimboeuf 
Cc: "Kirill A. Shutemov" 
Cc: Rick Edgecombe 
Cc: Tony Battersby 
Cc: Brian Gerst 
Cc: Tim Chen 
Cc: David Vernet 
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-al...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-openr...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@vger.kernel.org
---
 arch/alpha/include/asm/thread_info.h  |  2 ++
 arch/arm/include/asm/thread_info.h|  3 ++
 arch/csky/include/asm/thread_info.h   |  2 ++
 arch/microblaze/include/asm/thread_info.h |  2 +