date:20230112

ia64 removal (was: Re: lockref scalability on x86-64 vs cpu_relax)

2023-01-12 Thread Ard Biesheuvel

On Fri, 13 Jan 2023 at 01:31, Luck, Tony  wrote:
>
> > Yeah, if it was ia64-only, it's a non-issue these days. It's dead and
> > in pure maintenance mode from a kernel perspective (if even that).
>
> There's not much "simultaneous" in the SMT on ia64. One thread in a
> spin loop will hog the core until the h/w switches to the other thread some
> number of cycles (hundreds, thousands? I really can remember). So I
> was pretty generous with dropping cpu_relax() into any kind of spin loop.
>
> Is it time yet for:
>
> $ git rm -r arch/ia64
>

Hi Tony,

Can I take that as an ack on [0]? The EFI subsystem has evolved
substantially over the years, and there is really no way to do any
IA64 testing beyond build testing, so from that perspective, dropping
it entirely would be welcomed.

Thanks,
Ard.



[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=remove-ia64

Re: lockref scalability on x86-64 vs cpu_relax

2023-01-12 Thread Nicholas Piggin

On Fri Jan 13, 2023 at 2:15 PM AEST, Linus Torvalds wrote:
> On Thu, Jan 12, 2023 at 9:20 PM Nicholas Piggin  wrote:
> >
> > Actually what we'd really want is an arch specific implementation of
> > lockref.
>
> The problem is mainly that then you need to generate the asm versions
> of all those different CMPXCHG_LOOP()  variants.
>
> They are all fairly simple, though, and it woudln't be hard to make
> the current lib/lockref.c just be the generic fallback if you don't
> have an arch-specific one.

Yeah, it doesn't look too onerous so it's probably worth seeing what
the code and some numbers look like here.

> And even if you do have the arch-specific LL/SC version, you'd still
> want the generic fallback for the case where a spinlock isn't a single
> word any more (which happens when the spinlock debugging options are
> on).

You're right, good point.

Thanks,
Nick

Re: lockref scalability on x86-64 vs cpu_relax

2023-01-12 Thread Linus Torvalds

On Thu, Jan 12, 2023 at 9:20 PM Nicholas Piggin  wrote:
>
> Actually what we'd really want is an arch specific implementation of
> lockref.

The problem is mainly that then you need to generate the asm versions
of all those different CMPXCHG_LOOP()  variants.

They are all fairly simple, though, and it woudln't be hard to make
the current lib/lockref.c just be the generic fallback if you don't
have an arch-specific one.

And even if you do have the arch-specific LL/SC version, you'd still
want the generic fallback for the case where a spinlock isn't a single
word any more (which happens when the spinlock debugging options are
on).

Linus

Re: lockref scalability on x86-64 vs cpu_relax

2023-01-12 Thread Linus Torvalds

On Thu, Jan 12, 2023 at 7:12 PM Mateusz Guzik  wrote:
>
> I did not want to make such a change without redoing the ThunderX2
> benchmark, or at least something else arm64-y. I may be able to bench it
> tomorrow on whatever arm-y stuff can be found on Amazon's EC2, assuming
> no arm64 people show up with their results.

I don't think ThunderX2 itself is particularly interesting, but sure,
it would be good to have numbers for some modern arm64 cores.

The newer Amazon EC2 cores (Graviton 2/3) sound more relevant (or
Ampere?)  The more different architecture numbers we'd have for that
"remove cpu_relax()", the better.

   Linus

Re: lockref scalability on x86-64 vs cpu_relax

2023-01-12 Thread Nicholas Piggin

On Fri Jan 13, 2023 at 10:13 AM AEST, Linus Torvalds wrote:
> [ Adding linux-arch, which is relevant but not very specific, and the
> arm64 and powerpc maintainers that are the more specific cases for an
> architecture where this might actually matter.
>
>   See
>
> 
> https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=w...@mail.gmail.com/
>
>   for original full email, but it might be sufficiently clear just
> from this heavily cut-down context too ]
>
> Side note on your access() changes - if it turns out that you can
> remove all the cred games, we should possibly then revert my old
> commit d7852fbd0f04 ("access: avoid the RCU grace period for the
> temporary subjective credentials") which avoided the biggest issue
> with the unnecessary cred switching.
>
> I *think* access() is the only user of that special 'non_rcu' thing,
> but it is possible that the whole 'non_rcu' thing ends up mattering
> for cases where the cred actually does change because euid != uid (ie
> suid programs), so this would need a bit more effort to do performance
> testing on.
>
> On Thu, Jan 12, 2023 at 5:36 PM Mateusz Guzik  wrote:
> >
> > To my understanding on said architecture failed cmpxchg still grants you
> > exclusive access to the cacheline, making immediate retry preferable
> > when trying to inc/dec unless a certain value is found.
>
> I actually suspect that is _always_ the case - this is not like a
> contended spinlock where we want to pause because we're waiting for
> the value to change and become unlocked, this cmpxchg loop is likely
> always better off just retrying with the new value.

Yes this should be true for powerpc (POWER CPUs, at least).

> That said, the "likely always better off" is purely about performance.
>
> So I have this suspicion that the reason Tony added the cpu_relax()
> was simply not about performance, but about other issues, like
> fairness in SMT situations.
>
> That said, evern from a fairness perspective the cpu_relax() sounds a
> bit odd and unlikely - we're literally yielding when we lost a race,
> so it hurts the _loser_, not the winner, and thus might make fairness
> worse too.

Worse is that we've also actually just *won* a race when the cmpxchg
comes back, i.e., to get the line exclusive in our cache line. Then
we'll just sit there waiting and probably holding off other snoopers
for a while.

I don't see much of a fairness concern really. If there's a lot of
contention here we'll be stalled for a long time waiting on the line,
so SMT heuristics had better send resources to other threads making
better progress anyway as it should for any cache miss situation. So
this loop shouldn't be hogging up a lot of resources from the other
thread(s).

> I dunno.  Tony may have some memory of what the issue was.
>
> > ... without numbers attached to it. Given the above linked thread it
> > looks like the arch this was targeting was itanium, not x86-64, but
> > the change landed for everyone.
>
> Yeah, if it was ia64-only, it's a non-issue these days. It's dead and
> in pure maintenance mode from a kernel perspective (if even that).
>
> > Later it was further augmented with:
> > commit 893a7d32e8e04ca4d6c882336b26ed660ca0a48d
> > Author: Jan Glauber 
> > Date:   Wed Jun 5 15:48:49 2019 +0200
> >
> > lockref: Limit number of cmpxchg loop retries
> > [snip]
> > With the retry limit the performance of an open-close testcase
> > improved between 60-70% on ThunderX2.
> >
> > While the benchmark was specifically on ThunderX2, the change once more
> > was made for all archs.
>
> Actually, in that case I did ask for the test to be run on x86
> hardware too, and exactly like you found:
>
> > I should note in my tests the retry limit was never reached fwiw.
>
> the max loop retry number just isn't an issue. It fundamentally only
> affects extremely unfair platforms, so it's arguably always the right
> thing to do.
>
> So it may be "ThunderX2 specific" in that that is where it was
> noticed, but I think we can safely just consider the max loop thing to
> be a generic safety net that hopefully simply never triggers in
> practice on any sane platform.
>

If there are a lot of threads contending I'm sure x86, POWER, probably
most CPUs could quite possibly starve for hundreds if not thousands or
more of iterations here.

And I'm not really a fan of scattering random implementation specific
crutches ad hoc throughout our primitives. At least it could be specific
to the arch where it matters.

Interesting that improves performance so much though. I wonder why?
Hitting the limit will take the lock and that will cause all other CPUs
to drop out of the "fast" path so it will degenerate to a spinlock.
queued spinlock is pretty scalable but it really shouldn't be more
scalable than an atomic OP. I bet this cpu_relax isn't helping, and
probably ll/sc implementation of cmpxchg primitive doesn't help either.

I reckon if you removed the cpu_relax there, big x86 systems

Re: lockref scalability on x86-64 vs cpu_relax

2023-01-12 Thread Linus Torvalds

On Thu, Jan 12, 2023 at 6:31 PM Luck, Tony  wrote:
>
> There's not much "simultaneous" in the SMT on ia64.

Oh, I forgot about the whole SoEMT fiasco.

Yeah, that might make ia64 act a bit differently here.

But I don't think anybody cares any more, so I don't think that merits
making this a per-architecture choice.

The s390 people hated cpu_relax() here, but for them it was really
because it was bad *everywhere*, and they just made it a no-op (see
commit 22b6430d3665 "locking/core, s390: Make cpu_relax() a barrier
again"). There had been a (failed) attempt at "cpu_relax_lowlatency()"
for the s390 issues.

  Linus

RE: lockref scalability on x86-64 vs cpu_relax

2023-01-12 Thread Luck, Tony

> Yeah, if it was ia64-only, it's a non-issue these days. It's dead and
> in pure maintenance mode from a kernel perspective (if even that).

There's not much "simultaneous" in the SMT on ia64. One thread in a
spin loop will hog the core until the h/w switches to the other thread some
number of cycles (hundreds, thousands? I really can remember). So I
was pretty generous with dropping cpu_relax() into any kind of spin loop.

Is it time yet for:

$ git rm -r arch/ia64

-Tony

Re: lockref scalability on x86-64 vs cpu_relax

2023-01-12 Thread Linus Torvalds

[ Adding linux-arch, which is relevant but not very specific, and the
arm64 and powerpc maintainers that are the more specific cases for an
architecture where this might actually matter.

  See

https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=w...@mail.gmail.com/

  for original full email, but it might be sufficiently clear just
from this heavily cut-down context too ]

Side note on your access() changes - if it turns out that you can
remove all the cred games, we should possibly then revert my old
commit d7852fbd0f04 ("access: avoid the RCU grace period for the
temporary subjective credentials") which avoided the biggest issue
with the unnecessary cred switching.

I *think* access() is the only user of that special 'non_rcu' thing,
but it is possible that the whole 'non_rcu' thing ends up mattering
for cases where the cred actually does change because euid != uid (ie
suid programs), so this would need a bit more effort to do performance
testing on.

On Thu, Jan 12, 2023 at 5:36 PM Mateusz Guzik  wrote:
>
> To my understanding on said architecture failed cmpxchg still grants you
> exclusive access to the cacheline, making immediate retry preferable
> when trying to inc/dec unless a certain value is found.

I actually suspect that is _always_ the case - this is not like a
contended spinlock where we want to pause because we're waiting for
the value to change and become unlocked, this cmpxchg loop is likely
always better off just retrying with the new value.

That said, the "likely always better off" is purely about performance.

So I have this suspicion that the reason Tony added the cpu_relax()
was simply not about performance, but about other issues, like
fairness in SMT situations.

That said, evern from a fairness perspective the cpu_relax() sounds a
bit odd and unlikely - we're literally yielding when we lost a race,
so it hurts the _loser_, not the winner, and thus might make fairness
worse too.

I dunno.  Tony may have some memory of what the issue was.

> ... without numbers attached to it. Given the above linked thread it
> looks like the arch this was targeting was itanium, not x86-64, but
> the change landed for everyone.

Yeah, if it was ia64-only, it's a non-issue these days. It's dead and
in pure maintenance mode from a kernel perspective (if even that).

> Later it was further augmented with:
> commit 893a7d32e8e04ca4d6c882336b26ed660ca0a48d
> Author: Jan Glauber 
> Date:   Wed Jun 5 15:48:49 2019 +0200
>
> lockref: Limit number of cmpxchg loop retries
> [snip]
> With the retry limit the performance of an open-close testcase
> improved between 60-70% on ThunderX2.
>
> While the benchmark was specifically on ThunderX2, the change once more
> was made for all archs.

Actually, in that case I did ask for the test to be run on x86
hardware too, and exactly like you found:

> I should note in my tests the retry limit was never reached fwiw.

the max loop retry number just isn't an issue. It fundamentally only
affects extremely unfair platforms, so it's arguably always the right
thing to do.

So it may be "ThunderX2 specific" in that that is where it was
noticed, but I think we can safely just consider the max loop thing to
be a generic safety net that hopefully simply never triggers in
practice on any sane platform.

> All that said, I think the thing to do here is to replace cpu_relax
> with a dedicated arch-dependent macro, akin to the following:

I would actually prefer just removing it entirely and see if somebody
else hollers. You have the numbers to prove it hurts on real hardware,
and I don't think we have any numbers to the contrary.

So I think it's better to trust the numbers and remove it as a
failure, than say "let's just remove it on x86-64 and leave everybody
else with the potentially broken code"

Because I do think that a cmpxchg loop that updates the value it
compares and exchanges is fundamentally different from a "busy-loop,
trying to read while locked", and with your numbers as ammunition, I
think it's better to just remove that cpu_relax() entirely.

Then other architectures can try to run their numbers, and only *if*
it then turns out that they have a reason to do something else should
we make this conditional and different on different architectures.

Let's try to keep the code as common as possibly until we have hard
evidence for special cases, in other words.

 Linus

[PATCH 4/8] perf/core: Add perf_sample_save_brstack() helper

2023-01-12 Thread Namhyung Kim

When it saves the branch stack to the perf sample data, it needs to
update the sample flags and the dynamic size.  To make sure this,
add the perf_sample_save_brstack() helper and convert all call sites.

Cc: linuxppc-dev@lists.ozlabs.org
Cc: x...@kernel.org
Suggested-by: Peter Zijlstra 
Signed-off-by: Namhyung Kim 
---
 arch/powerpc/perf/core-book3s.c |  3 +-
 arch/x86/events/amd/core.c  |  6 +--
 arch/x86/events/intel/core.c|  6 +--
 arch/x86/events/intel/ds.c  |  9 ++---
 include/linux/perf_event.h  | 66 -
 kernel/events/core.c| 16 +++-
 6 files changed, 53 insertions(+), 53 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index bf318dd9b709..8c1f7def596e 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2313,8 +2313,7 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
struct cpu_hw_events *cpuhw;
cpuhw = this_cpu_ptr(_hw_events);
power_pmu_bhrb_read(event, cpuhw);
-   data.br_stack = >bhrb_stack;
-   data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+   perf_sample_save_brstack(, event, 
>bhrb_stack);
}
 
if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index d6f3703e4119..463f3eb8bbd7 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -928,10 +928,8 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
if (!x86_perf_event_set_period(event))
continue;
 
-   if (has_branch_stack(event)) {
-   data.br_stack = >lbr_stack;
-   data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
-   }
+   if (has_branch_stack(event))
+   perf_sample_save_brstack(, event, 
>lbr_stack);
 
if (perf_event_overflow(event, , regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 29d2d0411caf..14f0a746257d 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3036,10 +3036,8 @@ static int handle_pmi_common(struct pt_regs *regs, u64 
status)
 
perf_sample_data_init(, 0, event->hw.last_period);
 
-   if (has_branch_stack(event)) {
-   data.br_stack = >lbr_stack;
-   data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
-   }
+   if (has_branch_stack(event))
+   perf_sample_save_brstack(, event, 
>lbr_stack);
 
if (perf_event_overflow(event, , regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 158cf845fc80..07c8a2cdc3ee 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1720,10 +1720,8 @@ static void setup_pebs_fixed_sample_data(struct 
perf_event *event,
data->sample_flags |= PERF_SAMPLE_TIME;
}
 
-   if (has_branch_stack(event)) {
-   data->br_stack = >lbr_stack;
-   data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
-   }
+   if (has_branch_stack(event))
+   perf_sample_save_brstack(data, event, >lbr_stack);
 }
 
 static void adaptive_pebs_save_regs(struct pt_regs *regs,
@@ -1883,8 +1881,7 @@ static void setup_pebs_adaptive_sample_data(struct 
perf_event *event,
 
if (has_branch_stack(event)) {
intel_pmu_store_pebs_lbrs(lbr);
-   data->br_stack = >lbr_stack;
-   data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+   perf_sample_save_brstack(data, event, >lbr_stack);
}
}
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 569dfac5887f..7db0e9cc2682 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1102,6 +1102,31 @@ extern u64 perf_event_read_value(struct perf_event 
*event,
 
 extern struct perf_callchain_entry *perf_callchain(struct perf_event *event, 
struct pt_regs *regs);
 
+static inline bool branch_sample_no_flags(const struct perf_event *event)
+{
+   return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_NO_FLAGS;
+}
+
+static inline bool branch_sample_no_cycles(const struct perf_event *event)
+{
+   return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_NO_CYCLES;
+}
+
+static inline bool branch_sample_type(const struct perf_event *event)
+{
+   return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_TYPE_SAVE;
+}
+
+static inline bool branch_sample_hw_index(const struct perf_event *event)
+{
+   return event->attr.branch_sample_type &

Re: [PATCH net-next 03/10] net: mdio: mux-bcm-iproc: Separate C22 and C45 transactions

2023-01-12 Thread Florian Fainelli


On 1/12/23 07:15, Michael Walle wrote:

From: Andrew Lunn 

The MDIO mux broadcom iproc can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
Apparently, in the c45 case, the reg value including the MII_ADDR_C45
bit is written to the hardware. Looks weird, that a "random" software
bit is written to a register. Florian is that correct? Also, with this
patch this flag isn't set anymore.


We should be masking the MII_ADDR_C45 bit because the MDIO_ADDR_OFFSET 
only defines bits 0 through 20 as being read/write and bits above being 
read-only. In practice, this is probably not making any difference or harm.

--
Florian

Re: [PATCH v3 41/51] cpuidle,clk: Remove trace_.*_rcuidle()

2023-01-12 Thread Stephen Boyd

Quoting Peter Zijlstra (2023-01-12 11:43:55)
> OMAP was the one and only user.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Reviewed-by: Ulf Hansson 
> Acked-by: Rafael J. Wysocki 
> Acked-by: Frederic Weisbecker 
> Tested-by: Tony Lindgren 
> Tested-by: Ulf Hansson 
> ---

Acked-by: Stephen Boyd

[PATCH v3 37/51] cpuidle,omap3: Push RCU-idle into omap_sram_idle()

2023-01-12 Thread Peter Zijlstra

OMAP3 uses full SoC suspend modes as idle states, as such it needs the
whole power-domain and clock-domain code from the idle path.

All that code is not suitable to run with RCU disabled, as such push
RCU-idle deeper still.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Tony Lindgren 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/arm/mach-omap2/cpuidle34xx.c |4 +---
 arch/arm/mach-omap2/pm.h  |2 +-
 arch/arm/mach-omap2/pm34xx.c  |   12 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -133,9 +133,7 @@ static int omap3_enter_idle(struct cpuid
}
 
/* Execute ARM wfi */
-   ct_cpuidle_enter();
-   omap_sram_idle();
-   ct_cpuidle_exit();
+   omap_sram_idle(true);
 
/*
 * Call idle CPU PM enter notifier chain to restore
--- a/arch/arm/mach-omap2/pm.h
+++ b/arch/arm/mach-omap2/pm.h
@@ -29,7 +29,7 @@ static inline int omap4_idle_init(void)
 
 extern void *omap3_secure_ram_storage;
 extern void omap3_pm_off_mode_enable(int);
-extern void omap_sram_idle(void);
+extern void omap_sram_idle(bool rcuidle);
 extern int omap_pm_clkdms_setup(struct clockdomain *clkdm, void *unused);
 
 #if defined(CONFIG_PM_OPP)
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -174,7 +175,7 @@ static int omap34xx_do_sram_idle(unsigne
return 0;
 }
 
-void omap_sram_idle(void)
+void omap_sram_idle(bool rcuidle)
 {
/* Variable to tell what needs to be saved and restored
 * in omap_sram_idle*/
@@ -254,11 +255,18 @@ void omap_sram_idle(void)
 */
if (save_state)
omap34xx_save_context(omap3_arm_context);
+
+   if (rcuidle)
+   ct_cpuidle_enter();
+
if (save_state == 1 || save_state == 3)
cpu_suspend(save_state, omap34xx_do_sram_idle);
else
omap34xx_do_sram_idle(save_state);
 
+   if (rcuidle)
+   ct_cpuidle_exit();
+
/* Restore normal SDRC POWER settings */
if (cpu_is_omap3430() && omap_rev() >= OMAP3430_REV_ES3_0 &&
(omap_type() == OMAP2_DEVICE_TYPE_EMU ||
@@ -316,7 +324,7 @@ static int omap3_pm_suspend(void)
 
omap3_intc_suspend();
 
-   omap_sram_idle();
+   omap_sram_idle(false);
 
 restore:
/* Restore next_pwrsts */

[PATCH v3 40/51] cpuidle,powerdomain: Remove trace_.*_rcuidle()

2023-01-12 Thread Peter Zijlstra

OMAP was the one and only user.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Ulf Hansson 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/arm/mach-omap2/powerdomain.c |   10 +-
 drivers/base/power/runtime.c  |   24 
 2 files changed, 17 insertions(+), 17 deletions(-)

--- a/arch/arm/mach-omap2/powerdomain.c
+++ b/arch/arm/mach-omap2/powerdomain.c
@@ -187,9 +187,9 @@ static int _pwrdm_state_switch(struct po
trace_state = (PWRDM_TRACE_STATES_FLAG |
   ((next & OMAP_POWERSTATE_MASK) << 8) |
   ((prev & OMAP_POWERSTATE_MASK) << 0));
-   trace_power_domain_target_rcuidle(pwrdm->name,
- trace_state,
- 
raw_smp_processor_id());
+   trace_power_domain_target(pwrdm->name,
+ trace_state,
+ raw_smp_processor_id());
}
break;
default:
@@ -541,8 +541,8 @@ int pwrdm_set_next_pwrst(struct powerdom
 
if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) {
/* Trace the pwrdm desired target state */
-   trace_power_domain_target_rcuidle(pwrdm->name, pwrst,
- raw_smp_processor_id());
+   trace_power_domain_target(pwrdm->name, pwrst,
+ raw_smp_processor_id());
/* Program the pwrdm desired target state */
ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst);
}
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -442,7 +442,7 @@ static int rpm_idle(struct device *dev,
int (*callback)(struct device *);
int retval;
 
-   trace_rpm_idle_rcuidle(dev, rpmflags);
+   trace_rpm_idle(dev, rpmflags);
retval = rpm_check_suspend_allowed(dev);
if (retval < 0)
;   /* Conditions are wrong. */
@@ -481,7 +481,7 @@ static int rpm_idle(struct device *dev,
dev->power.request_pending = true;
queue_work(pm_wq, >power.work);
}
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, 0);
+   trace_rpm_return_int(dev, _THIS_IP_, 0);
return 0;
}
 
@@ -493,7 +493,7 @@ static int rpm_idle(struct device *dev,
wake_up_all(>power.wait_queue);
 
  out:
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, retval);
+   trace_rpm_return_int(dev, _THIS_IP_, retval);
return retval ? retval : rpm_suspend(dev, rpmflags | RPM_AUTO);
 }
 
@@ -557,7 +557,7 @@ static int rpm_suspend(struct device *de
struct device *parent = NULL;
int retval;
 
-   trace_rpm_suspend_rcuidle(dev, rpmflags);
+   trace_rpm_suspend(dev, rpmflags);
 
  repeat:
retval = rpm_check_suspend_allowed(dev);
@@ -708,7 +708,7 @@ static int rpm_suspend(struct device *de
}
 
  out:
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, retval);
+   trace_rpm_return_int(dev, _THIS_IP_, retval);
 
return retval;
 
@@ -760,7 +760,7 @@ static int rpm_resume(struct device *dev
struct device *parent = NULL;
int retval = 0;
 
-   trace_rpm_resume_rcuidle(dev, rpmflags);
+   trace_rpm_resume(dev, rpmflags);
 
  repeat:
if (dev->power.runtime_error) {
@@ -925,7 +925,7 @@ static int rpm_resume(struct device *dev
spin_lock_irq(>power.lock);
}
 
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, retval);
+   trace_rpm_return_int(dev, _THIS_IP_, retval);
 
return retval;
 }
@@ -1081,7 +1081,7 @@ int __pm_runtime_idle(struct device *dev
if (retval < 0) {
return retval;
} else if (retval > 0) {
-   trace_rpm_usage_rcuidle(dev, rpmflags);
+   trace_rpm_usage(dev, rpmflags);
return 0;
}
}
@@ -1119,7 +1119,7 @@ int __pm_runtime_suspend(struct device *
if (retval < 0) {
return retval;
} else if (retval > 0) {
-   trace_rpm_usage_rcuidle(dev, rpmflags);
+   trace_rpm_usage(dev, rpmflags);
return 0;
}
}
@@ -1202,7 +1202,7 @@ int pm_runtime_get_if_active(struct devi
} else {
retval = atomic_inc_not_zero(>power.usage_count);
}
-   trace_rpm_usage_rcuidle(dev, 0);
+   trace_rpm_usage(dev, 0);
spin_unlock_irqrestore(>power.lock, flags);
 
return retval;
@@ -1566,7 +1566,7 @@ void pm_runtime_allow(struct device

[PATCH v3 12/51] cpuidle,dt: Push RCU-idle into driver

2023-01-12 Thread Peter Zijlstra

Doing RCU-idle outside the driver, only to then temporarily enable it
again before going idle is daft.

Notably: this converts all dt_init_idle_driver() and
__CPU_PM_CPU_IDLE_ENTER() users for they are inextrably intertwined.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 drivers/acpi/processor_idle.c|2 ++
 drivers/cpuidle/cpuidle-arm.c|1 +
 drivers/cpuidle/cpuidle-big_little.c |8 ++--
 drivers/cpuidle/cpuidle-psci.c   |1 +
 drivers/cpuidle/cpuidle-qcom-spm.c   |1 +
 drivers/cpuidle/cpuidle-riscv-sbi.c  |1 +
 drivers/cpuidle/dt_idle_states.c |2 +-
 include/linux/cpuidle.h  |2 ++
 8 files changed, 15 insertions(+), 3 deletions(-)

--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1219,6 +1219,8 @@ static int acpi_processor_setup_lpi_stat
state->target_residency = lpi->min_residency;
if (lpi->arch_flags)
state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+   if (i != 0 && lpi->entry_method == ACPI_CSTATE_FFH)
+   state->flags |= CPUIDLE_FLAG_RCU_IDLE;
state->enter = acpi_idle_lpi_enter;
drv->safe_state_index = i;
}
--- a/drivers/cpuidle/cpuidle-big_little.c
+++ b/drivers/cpuidle/cpuidle-big_little.c
@@ -64,7 +64,8 @@ static struct cpuidle_driver bl_idle_lit
.enter  = bl_enter_powerdown,
.exit_latency   = 700,
.target_residency   = 2500,
-   .flags  = CPUIDLE_FLAG_TIMER_STOP,
+   .flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE,
.name   = "C1",
.desc   = "ARM little-cluster power down",
},
@@ -85,7 +86,8 @@ static struct cpuidle_driver bl_idle_big
.enter  = bl_enter_powerdown,
.exit_latency   = 500,
.target_residency   = 2000,
-   .flags  = CPUIDLE_FLAG_TIMER_STOP,
+   .flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE,
.name   = "C1",
.desc   = "ARM big-cluster power down",
},
@@ -124,11 +126,13 @@ static int bl_enter_powerdown(struct cpu
struct cpuidle_driver *drv, int idx)
 {
cpu_pm_enter();
+   ct_idle_enter();
 
cpu_suspend(0, bl_powerdown_finisher);
 
/* signals the MCPM core that CPU is out of low power state */
mcpm_cpu_powered_up();
+   ct_idle_exit();
 
cpu_pm_exit();
 
--- a/drivers/cpuidle/dt_idle_states.c
+++ b/drivers/cpuidle/dt_idle_states.c
@@ -77,7 +77,7 @@ static int init_state_node(struct cpuidl
if (err)
desc = state_node->name;
 
-   idle_state->flags = 0;
+   idle_state->flags = CPUIDLE_FLAG_RCU_IDLE;
if (of_property_read_bool(state_node, "local-timer-stop"))
idle_state->flags |= CPUIDLE_FLAG_TIMER_STOP;
/*
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -289,7 +289,9 @@ extern s64 cpuidle_governor_latency_req(
if (!is_retention)  \
__ret =  cpu_pm_enter();\
if (!__ret) {   \
+   ct_idle_enter();\
__ret = low_level_idle_enter(state);\
+   ct_idle_exit(); \
if (!is_retention)  \
cpu_pm_exit();  \
}   \

[PATCH v3 35/51] trace,hardirq: No moar _rcuidle() tracing

2023-01-12 Thread Peter Zijlstra

Robot reported that trace_hardirqs_{on,off}() tickle the forbidden
_rcuidle() tracepoint through local_irq_{en,dis}able().

For 'sane' configs, these calls will only happen with RCU enabled and
as such can use the regular tracepoint. This also means it's possible
to trace them from NMI context again.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/trace/trace_preemptirq.c |   21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -20,6 +20,15 @@
 static DEFINE_PER_CPU(int, tracing_irq_cpu);
 
 /*
+ * ...
+ */
+#ifdef CONFIG_ARCH_WANTS_NO_INSTR
+#define trace(point)   trace_##point
+#else
+#define trace(point)   if (!in_nmi()) trace_##point##_rcuidle
+#endif
+
+/*
  * Like trace_hardirqs_on() but without the lockdep invocation. This is
  * used in the low level entry code where the ordering vs. RCU is important
  * and lockdep uses a staged approach which splits the lockdep hardirq
@@ -28,8 +37,7 @@ static DEFINE_PER_CPU(int, tracing_irq_c
 void trace_hardirqs_on_prepare(void)
 {
if (this_cpu_read(tracing_irq_cpu)) {
-   if (!in_nmi())
-   trace_irq_enable(CALLER_ADDR0, CALLER_ADDR1);
+   trace(irq_enable)(CALLER_ADDR0, CALLER_ADDR1);
tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
this_cpu_write(tracing_irq_cpu, 0);
}
@@ -40,8 +48,7 @@ NOKPROBE_SYMBOL(trace_hardirqs_on_prepar
 void trace_hardirqs_on(void)
 {
if (this_cpu_read(tracing_irq_cpu)) {
-   if (!in_nmi())
-   trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+   trace(irq_enable)(CALLER_ADDR0, CALLER_ADDR1);
tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
this_cpu_write(tracing_irq_cpu, 0);
}
@@ -63,8 +70,7 @@ void trace_hardirqs_off_finish(void)
if (!this_cpu_read(tracing_irq_cpu)) {
this_cpu_write(tracing_irq_cpu, 1);
tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
-   if (!in_nmi())
-   trace_irq_disable(CALLER_ADDR0, CALLER_ADDR1);
+   trace(irq_disable)(CALLER_ADDR0, CALLER_ADDR1);
}
 
 }
@@ -78,8 +84,7 @@ void trace_hardirqs_off(void)
if (!this_cpu_read(tracing_irq_cpu)) {
this_cpu_write(tracing_irq_cpu, 1);
tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
-   if (!in_nmi())
-   trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+   trace(irq_disable)(CALLER_ADDR0, CALLER_ADDR1);
}
 }
 EXPORT_SYMBOL(trace_hardirqs_off);

[PATCH v3 19/51] cpuidle,intel_idle: Fix CPUIDLE_FLAG_INIT_XSTATE

2023-01-12 Thread Peter Zijlstra

vmlinux.o: warning: objtool: intel_idle_s2idle+0xd5: call to fpu_idle_fpregs() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_xstate+0x11: call to fpu_idle_fpregs() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: fpu_idle_fpregs+0x9: call to xfeatures_in_use() 
leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/x86/include/asm/fpu/xcr.h   |4 ++--
 arch/x86/include/asm/special_insns.h |2 +-
 arch/x86/kernel/fpu/core.c   |4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/fpu/xcr.h
+++ b/arch/x86/include/asm/fpu/xcr.h
@@ -5,7 +5,7 @@
 #define XCR_XFEATURE_ENABLED_MASK  0x
 #define XCR_XFEATURE_IN_USE_MASK   0x0001
 
-static inline u64 xgetbv(u32 index)
+static __always_inline u64 xgetbv(u32 index)
 {
u32 eax, edx;
 
@@ -27,7 +27,7 @@ static inline void xsetbv(u32 index, u64
  *
  * Callers should check X86_FEATURE_XGETBV1.
  */
-static inline u64 xfeatures_in_use(void)
+static __always_inline u64 xfeatures_in_use(void)
 {
return xgetbv(XCR_XFEATURE_IN_USE_MASK);
 }
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -295,7 +295,7 @@ static inline int enqcmds(void __iomem *
return 0;
 }
 
-static inline void tile_release(void)
+static __always_inline void tile_release(void)
 {
/*
 * Instruction opcode for TILERELEASE; supported in binutils
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -856,12 +856,12 @@ int fpu__exception_code(struct fpu *fpu,
  * Initialize register state that may prevent from entering low-power idle.
  * This function will be invoked from the cpuidle driver only when needed.
  */
-void fpu_idle_fpregs(void)
+noinstr void fpu_idle_fpregs(void)
 {
/* Note: AMX_TILE being enabled implies XGETBV1 support */
if (cpu_feature_enabled(X86_FEATURE_AMX_TILE) &&
(xfeatures_in_use() & XFEATURE_MASK_XTILE)) {
tile_release();
-   fpregs_deactivate(>thread.fpu);
+   __this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
 }

[PATCH v3 02/51] x86/idle: Replace x86_idle with a static_call

2023-01-12 Thread Peter Zijlstra

Typical boot time setup; no need to suffer an indirect call for that.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Frederic Weisbecker 
Reviewed-by: Rafael J. Wysocki 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/x86/kernel/process.c |   50 +-
 1 file changed, 28 insertions(+), 22 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -692,7 +693,23 @@ void __switch_to_xtra(struct task_struct
 unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-static void (*x86_idle)(void);
+/*
+ * We use this if we don't have any better idle routine..
+ */
+void __cpuidle default_idle(void)
+{
+   raw_safe_halt();
+}
+#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
+EXPORT_SYMBOL(default_idle);
+#endif
+
+DEFINE_STATIC_CALL_NULL(x86_idle, default_idle);
+
+static bool x86_idle_set(void)
+{
+   return !!static_call_query(x86_idle);
+}
 
 #ifndef CONFIG_SMP
 static inline void play_dead(void)
@@ -715,28 +732,17 @@ void arch_cpu_idle_dead(void)
 /*
  * Called from the generic idle code.
  */
-void arch_cpu_idle(void)
-{
-   x86_idle();
-}
-
-/*
- * We use this if we don't have any better idle routine..
- */
-void __cpuidle default_idle(void)
+void __cpuidle arch_cpu_idle(void)
 {
-   raw_safe_halt();
+   static_call(x86_idle)();
 }
-#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
-EXPORT_SYMBOL(default_idle);
-#endif
 
 #ifdef CONFIG_XEN
 bool xen_set_default_idle(void)
 {
-   bool ret = !!x86_idle;
+   bool ret = x86_idle_set();
 
-   x86_idle = default_idle;
+   static_call_update(x86_idle, default_idle);
 
return ret;
 }
@@ -859,20 +865,20 @@ void select_idle_routine(const struct cp
if (boot_option_idle_override == IDLE_POLL && smp_num_siblings > 1)
pr_warn_once("WARNING: polling idle and HT enabled, performance 
may degrade\n");
 #endif
-   if (x86_idle || boot_option_idle_override == IDLE_POLL)
+   if (x86_idle_set() || boot_option_idle_override == IDLE_POLL)
return;
 
if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
pr_info("using AMD E400 aware idle routine\n");
-   x86_idle = amd_e400_idle;
+   static_call_update(x86_idle, amd_e400_idle);
} else if (prefer_mwait_c1_over_halt(c)) {
pr_info("using mwait in idle threads\n");
-   x86_idle = mwait_idle;
+   static_call_update(x86_idle, mwait_idle);
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
pr_info("using TDX aware idle routine\n");
-   x86_idle = tdx_safe_halt;
+   static_call_update(x86_idle, tdx_safe_halt);
} else
-   x86_idle = default_idle;
+   static_call_update(x86_idle, default_idle);
 }
 
 void amd_e400_c1e_apic_setup(void)
@@ -925,7 +931,7 @@ static int __init idle_setup(char *str)
 * To continue to load the CPU idle driver, don't touch
 * the boot_option_idle_override.
 */
-   x86_idle = default_idle;
+   static_call_update(x86_idle, default_idle);
boot_option_idle_override = IDLE_HALT;
} else if (!strcmp(str, "nomwait")) {
/*

[PATCH v3 00/51] cpuidle,rcu: Clean up the mess

2023-01-12 Thread Peter Zijlstra

Hi All!

The (hopefully) final respin of cpuidle vs rcu cleanup patches. Barring any
objections I'll be queueing these patches in tip/sched/core in the next few
days.

v2: https://lkml.kernel.org/r/20220919095939.761690...@infradead.org

These here patches clean up the mess that is cpuidle vs rcuidle.

At the end of the ride there's only on RCU_NONIDLE user left:

  arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit());

And I know Mark has been prodding that with something sharp.

The last version was tested by a number of people and I'm hoping to not have
broken anything in the meantime ;-)


Changes since v2:

 - rebased to v6.2-rc3; as available at:
 git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/idle

 - folded: 
https://lkml.kernel.org/r/y3ubwyny15etu...@hirez.programming.kicks-ass.net
   which makes the ARM cpuidle index 0 consistently not use
   CPUIDLE_FLAG_RCU_IDLE, as requested by Ulf.

 - added a few more __always_inline to empty stub functions as found by the
   robot.

 - Used _RET_IP_ instead of _THIS_IP_ in a few placed because of:
   https://github.com/ClangBuiltLinux/linux/issues/263

 - Added new patches to address various robot reports:

 #35:  trace,hardirq: No moar _rcuidle() tracing
 #47:  cpuidle: Ensure ct_cpuidle_enter() is always called from 
noinstr/__cpuidle
 #48:  cpuidle,arch: Mark all ct_cpuidle_enter() callers __cpuidle
 #49:  cpuidle,arch: Mark all regular cpuidle_state::enter methods __cpuidle
 #50:  cpuidle: Comments about noinstr/__cpuidle
 #51:  context_tracking: Fix noinstr vs KASAN


---
 arch/alpha/kernel/process.c   |  1 -
 arch/alpha/kernel/vmlinux.lds.S   |  1 -
 arch/arc/kernel/process.c |  3 ++
 arch/arc/kernel/vmlinux.lds.S |  1 -
 arch/arm/include/asm/vmlinux.lds.h|  1 -
 arch/arm/kernel/cpuidle.c |  4 +-
 arch/arm/kernel/process.c |  1 -
 arch/arm/kernel/smp.c |  6 +--
 arch/arm/mach-davinci/cpuidle.c   |  4 +-
 arch/arm/mach-gemini/board-dt.c   |  3 +-
 arch/arm/mach-imx/cpuidle-imx5.c  |  4 +-
 arch/arm/mach-imx/cpuidle-imx6q.c |  8 ++--
 arch/arm/mach-imx/cpuidle-imx6sl.c|  4 +-
 arch/arm/mach-imx/cpuidle-imx6sx.c|  9 ++--
 arch/arm/mach-imx/cpuidle-imx7ulp.c   |  4 +-
 arch/arm/mach-omap2/common.h  |  6 ++-
 arch/arm/mach-omap2/cpuidle34xx.c | 16 ++-
 arch/arm/mach-omap2/cpuidle44xx.c | 29 +++--
 arch/arm/mach-omap2/omap-mpuss-lowpower.c | 12 +-
 arch/arm/mach-omap2/pm.h  |  2 +-
 arch/arm/mach-omap2/pm24xx.c  | 51 +-
 arch/arm/mach-omap2/pm34xx.c  | 14 +--
 arch/arm/mach-omap2/pm44xx.c  |  2 +-
 arch/arm/mach-omap2/powerdomain.c | 10 ++---
 arch/arm/mach-s3c/cpuidle-s3c64xx.c   |  5 +--
 arch/arm64/kernel/cpuidle.c   |  2 +-
 arch/arm64/kernel/idle.c  |  1 -
 arch/arm64/kernel/smp.c   |  4 +-
 arch/arm64/kernel/vmlinux.lds.S   |  1 -
 arch/csky/kernel/process.c|  1 -
 arch/csky/kernel/smp.c|  2 +-
 arch/csky/kernel/vmlinux.lds.S|  1 -
 arch/hexagon/kernel/process.c |  1 -
 arch/hexagon/kernel/vmlinux.lds.S |  1 -
 arch/ia64/kernel/process.c|  1 +
 arch/ia64/kernel/vmlinux.lds.S|  1 -
 arch/loongarch/kernel/idle.c  |  1 +
 arch/loongarch/kernel/vmlinux.lds.S   |  1 -
 arch/m68k/kernel/vmlinux-nommu.lds|  1 -
 arch/m68k/kernel/vmlinux-std.lds  |  1 -
 arch/m68k/kernel/vmlinux-sun3.lds |  1 -
 arch/microblaze/kernel/process.c  |  1 -
 arch/microblaze/kernel/vmlinux.lds.S  |  1 -
 arch/mips/kernel/idle.c   | 14 +++
 arch/mips/kernel/vmlinux.lds.S|  1 -
 arch/nios2/kernel/process.c   |  1 -
 arch/nios2/kernel/vmlinux.lds.S   |  1 -
 arch/openrisc/kernel/process.c|  1 +
 arch/openrisc/kernel/vmlinux.lds.S|  1 -
 arch/parisc/kernel/process.c  |  2 -
 arch/parisc/kernel/vmlinux.lds.S  |  1 -
 arch/powerpc/kernel/idle.c|  5 +--
 arch/powerpc/kernel/vmlinux.lds.S |  1 -
 arch/riscv/kernel/process.c   |  1 -
 arch/riscv/kernel/vmlinux-xip.lds.S   |  1 -
 arch/riscv/kernel/vmlinux.lds.S   |  1 -
 arch/s390/kernel/idle.c   |  1 -
 arch/s390/kernel/vmlinux.lds.S|  1 -
 arch/sh/kernel/idle.c |  1 +
 arch/sh/kernel/vmlinux.lds.S  |  1 -
 arch/sparc/kernel/leon_pmc.c  |  4 ++
 arch/sparc/kernel/process_32.c|  1 -
 arch/sparc/kernel/process_64.c|  3 +-
 arch/sparc/kernel/vmlinux.lds.S   |  1 -
 arch/um/kernel/dyn.lds.S  |  1 -
 arch/um/kernel/process.c

[PATCH v3 09/51] cpuidle,omap3: Push RCU-idle into driver

2023-01-12 Thread Peter Zijlstra

Doing RCU-idle outside the driver, only to then teporarily enable it
again before going idle is daft.

Notably the cpu_pm_*() calls implicitly re-enable RCU for a bit.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Frederic Weisbecker 
Reviewed-by: Tony Lindgren 
Acked-by: Rafael J. Wysocki 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/arm/mach-omap2/cpuidle34xx.c |   16 
 1 file changed, 16 insertions(+)

--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -133,7 +133,9 @@ static int omap3_enter_idle(struct cpuid
}
 
/* Execute ARM wfi */
+   ct_idle_enter();
omap_sram_idle();
+   ct_idle_exit();
 
/*
 * Call idle CPU PM enter notifier chain to restore
@@ -265,6 +267,7 @@ static struct cpuidle_driver omap3_idle_
.owner= THIS_MODULE,
.states = {
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 2 + 2,
.target_residency = 5,
@@ -272,6 +275,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 10 + 10,
.target_residency = 30,
@@ -279,6 +283,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 50 + 50,
.target_residency = 300,
@@ -286,6 +291,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU RET + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 1500 + 1800,
.target_residency = 4000,
@@ -293,6 +299,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU OFF + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 2500 + 7500,
.target_residency = 12000,
@@ -300,6 +307,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU RET + CORE RET",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 3000 + 8500,
.target_residency = 15000,
@@ -307,6 +315,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU OFF + CORE RET",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 1 + 3,
.target_residency = 3,
@@ -328,6 +337,7 @@ static struct cpuidle_driver omap3430_id
.owner= THIS_MODULE,
.states = {
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 110 + 162,
.target_residency = 5,
@@ -335,6 +345,7 @@ static struct cpuidle_driver omap3430_id
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 106 + 180,
.target_residency = 309,
@@ -342,6 +353,7 @@ static struct cpuidle_driver omap3430_id
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 107 + 410,
.target_residency = 46057,
@@ -349,6 +361,7 @@ static struct cpuidle_driver omap3430_id
.desc = "MPU RET + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter

[PATCH v3 16/51] cpuidle: Annotate poll_idle()

2023-01-12 Thread Peter Zijlstra

The __cpuidle functions will become a noinstr class, as such they need
explicit annotations.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 drivers/cpuidle/poll_state.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -13,7 +13,10 @@
 static int __cpuidle poll_idle(struct cpuidle_device *dev,
   struct cpuidle_driver *drv, int index)
 {
-   u64 time_start = local_clock();
+   u64 time_start;
+
+   instrumentation_begin();
+   time_start = local_clock();
 
dev->poll_time_limit = false;
 
@@ -39,6 +42,7 @@ static int __cpuidle poll_idle(struct cp
raw_local_irq_disable();
 
current_clr_polling();
+   instrumentation_end();
 
return index;
 }

[PATCH v3 50/51] cpuidle: Comments about noinstr/__cpuidle

2023-01-12 Thread Peter Zijlstra

Add a few words on noinstr / __cpuidle usage.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle.c  |   12 
 include/linux/compiler_types.h |   10 ++
 2 files changed, 22 insertions(+)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -252,6 +252,18 @@ noinstr int cpuidle_enter_state(struct c
instrumentation_begin();
}
 
+   /*
+* NOTE!!
+*
+* For cpuidle_state::enter() methods that do *NOT* set
+* CPUIDLE_FLAG_RCU_IDLE RCU will be disabled here and these functions
+* must be marked either noinstr or __cpuidle.
+*
+* For cpuidle_state::enter() methods that *DO* set
+* CPUIDLE_FLAG_RCU_IDLE this isn't required, but they must mark the
+* function calling ct_cpuidle_enter() as noinstr/__cpuidle and all
+* functions called within the RCU-idle region.
+*/
entered_state = target_state->enter(dev, drv, index);
 
if (WARN_ONCE(!irqs_disabled(), "%ps leaked IRQ state", 
target_state->enter))
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -233,6 +233,16 @@ struct ftrace_likely_data {
 
 #define noinstr __noinstr_section(".noinstr.text")
 
+/*
+ * The __cpuidle section is used twofold:
+ *
+ *  1) the original use -- identifying if a CPU is 'stuck' in idle state based
+ * on it's instruction pointer. See cpu_in_idle().
+ *
+ *  2) supressing instrumentation around where cpuidle disables RCU; where the
+ * function isn't strictly required for #1, this is interchangeable with
+ * noinstr.
+ */
 #define __cpuidle __noinstr_section(".cpuidle.text")
 
 #endif /* __KERNEL__ */

[PATCH v3 13/51] cpuidle: Fix ct_idle_*() usage

2023-01-12 Thread Peter Zijlstra

The whole disable-RCU, enable-IRQS dance is very intricate since
changing IRQ state is traced, which depends on RCU.

Add two helpers for the cpuidle case that mirror the entry code.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/arm/mach-imx/cpuidle-imx6q.c|4 +--
 arch/arm/mach-imx/cpuidle-imx6sx.c   |4 +--
 arch/arm/mach-omap2/cpuidle34xx.c|4 +--
 arch/arm/mach-omap2/cpuidle44xx.c|8 +++---
 drivers/acpi/processor_idle.c|8 --
 drivers/cpuidle/cpuidle-big_little.c |4 +--
 drivers/cpuidle/cpuidle-mvebu-v7.c   |4 +--
 drivers/cpuidle/cpuidle-psci.c   |4 +--
 drivers/cpuidle/cpuidle-riscv-sbi.c  |4 +--
 drivers/cpuidle/cpuidle-tegra.c  |8 +++---
 drivers/cpuidle/cpuidle.c|   11 
 include/linux/clockchips.h   |4 +--
 include/linux/cpuidle.h  |   34 --
 kernel/sched/idle.c  |   45 ++-
 kernel/time/tick-broadcast.c |6 +++-
 15 files changed, 86 insertions(+), 66 deletions(-)

--- a/arch/arm/mach-imx/cpuidle-imx6q.c
+++ b/arch/arm/mach-imx/cpuidle-imx6q.c
@@ -25,9 +25,9 @@ static int imx6q_enter_wait(struct cpuid
imx6_set_lpm(WAIT_UNCLOCKED);
raw_spin_unlock(_lock);
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
cpu_do_idle();
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
raw_spin_lock(_lock);
if (num_idle_cpus-- == num_online_cpus())
--- a/arch/arm/mach-imx/cpuidle-imx6sx.c
+++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
@@ -47,9 +47,9 @@ static int imx6sx_enter_wait(struct cpui
cpu_pm_enter();
cpu_cluster_pm_enter();
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
cpu_suspend(0, imx6sx_idle_finish);
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
cpu_cluster_pm_exit();
cpu_pm_exit();
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -133,9 +133,9 @@ static int omap3_enter_idle(struct cpuid
}
 
/* Execute ARM wfi */
-   ct_idle_enter();
+   ct_cpuidle_enter();
omap_sram_idle();
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
/*
 * Call idle CPU PM enter notifier chain to restore
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -105,9 +105,9 @@ static int omap_enter_idle_smp(struct cp
}
raw_spin_unlock_irqrestore(_lock, flag);
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
raw_spin_lock_irqsave(_lock, flag);
if (cx->mpu_state_vote == num_online_cpus())
@@ -186,10 +186,10 @@ static int omap_enter_idle_coupled(struc
}
}
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
cpu_done[dev->cpu] = true;
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
/* Wakeup CPU1 only if it is not offlined */
if (dev->cpu == 0 && cpumask_test_cpu(1, cpu_online_mask)) {
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -642,6 +642,8 @@ static int __cpuidle acpi_idle_enter_bm(
 */
bool dis_bm = pr->flags.bm_control;
 
+   instrumentation_begin();
+
/* If we can skip BM, demote to a safe state. */
if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
dis_bm = false;
@@ -663,11 +665,11 @@ static int __cpuidle acpi_idle_enter_bm(
raw_spin_unlock(_lock);
}
 
-   ct_idle_enter();
+   ct_cpuidle_enter();
 
acpi_idle_do_entry(cx);
 
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
/* Re-enable bus master arbitration */
if (dis_bm) {
@@ -677,6 +679,8 @@ static int __cpuidle acpi_idle_enter_bm(
raw_spin_unlock(_lock);
}
 
+   instrumentation_end();
+
return index;
 }
 
--- a/drivers/cpuidle/cpuidle-big_little.c
+++ b/drivers/cpuidle/cpuidle-big_little.c
@@ -126,13 +126,13 @@ static int bl_enter_powerdown(struct cpu
struct cpuidle_driver *drv, int idx)
 {
cpu_pm_enter();
-   ct_idle_enter();
+   ct_cpuidle_enter();
 
cpu_suspend(0, bl_powerdown_finisher);
 
/* signals the MCPM core that CPU is out of low power state */
mcpm_cpu_powered_up();
-   ct_idle_exit();
+   ct_cpuidle_exit();
 
cpu_pm_exit();
 
--- a/drivers/cpuidle/cpuidle-mvebu-v7.c
+++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
@@ -36,9 +36,9 @@ static int mvebu_v7_enter_idle(struct cp
if (drv->states[index].flags & MVEBU_V7_FLAG_DEEP_IDLE)
deepidle = true;
 
-

[PATCH v3 34/51] trace: WARN on rcuidle

2023-01-12 Thread Peter Zijlstra

ARCH_WANTS_NO_INSTR (a superset of CONFIG_GENERIC_ENTRY) disallows any
and all tracing when RCU isn't enabled.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 include/linux/tracepoint.h |   15 +--
 kernel/trace/trace.c   |3 +++
 2 files changed, 16 insertions(+), 2 deletions(-)

--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -178,6 +178,17 @@ static inline struct tracepoint *tracepo
 #endif /* CONFIG_HAVE_STATIC_CALL */
 
 /*
+ * ARCH_WANTS_NO_INSTR archs are expected to have sanitized entry and idle
+ * code that disallow any/all tracing/instrumentation when RCU isn't watching.
+ */
+#ifdef CONFIG_ARCH_WANTS_NO_INSTR
+#define RCUIDLE_COND(rcuidle)  (rcuidle)
+#else
+/* srcu can't be used from NMI */
+#define RCUIDLE_COND(rcuidle)  (rcuidle && in_nmi())
+#endif
+
+/*
  * it_func[0] is never NULL because there is at least one element in the array
  * when the array itself is non NULL.
  */
@@ -188,8 +199,8 @@ static inline struct tracepoint *tracepo
if (!(cond))\
return; \
\
-   /* srcu can't be used from NMI */   \
-   WARN_ON_ONCE(rcuidle && in_nmi());  \
+   if (WARN_ON_ONCE(RCUIDLE_COND(rcuidle)))\
+   return; \
\
/* keep srcu and sched-rcu usage consistent */  \
preempt_disable_notrace();  \
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3119,6 +3119,9 @@ void __trace_stack(struct trace_array *t
return;
}
 
+   if (WARN_ON_ONCE(IS_ENABLED(CONFIG_GENERIC_ENTRY)))
+   return;
+
/*
 * When an NMI triggers, RCU is enabled via ct_nmi_enter(),
 * but if the above rcu_is_watching() failed, then the NMI

[PATCH v3 29/51] cpuidle,tdx: Make tdx noinstr clean

2023-01-12 Thread Peter Zijlstra

vmlinux.o: warning: objtool: __halt+0x2c: call to hcall_func.constprop.0() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: __halt+0x3f: call to __tdx_hypercall() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: __tdx_hypercall+0x66: call to 
__tdx_hypercall_failed() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/x86/boot/compressed/vmlinux.lds.S |1 +
 arch/x86/coco/tdx/tdcall.S |2 ++
 arch/x86/coco/tdx/tdx.c|5 +++--
 3 files changed, 6 insertions(+), 2 deletions(-)

--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -34,6 +34,7 @@ SECTIONS
_text = .;  /* Text */
*(.text)
*(.text.*)
+   *(.noinstr.text)
_etext = . ;
}
.rodata : {
--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -31,6 +31,8 @@
  TDX_R12 | TDX_R13 | \
  TDX_R14 | TDX_R15 )
 
+.section .noinstr.text, "ax"
+
 /*
  * __tdx_module_call()  - Used by TDX guests to request services from
  * the TDX module (does not include VMM services) using TDCALL instruction.
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -53,8 +53,9 @@ static inline u64 _tdx_hypercall(u64 fn,
 }
 
 /* Called from __tdx_hypercall() for unrecoverable failure */
-void __tdx_hypercall_failed(void)
+noinstr void __tdx_hypercall_failed(void)
 {
+   instrumentation_begin();
panic("TDVMCALL failed. TDX module bug?");
 }
 
@@ -64,7 +65,7 @@ void __tdx_hypercall_failed(void)
  * Reusing the KVM EXIT_REASON macros makes it easier to connect the host and
  * guest sides of these calls.
  */
-static u64 hcall_func(u64 exit_reason)
+static __always_inline u64 hcall_func(u64 exit_reason)
 {
return exit_reason;
 }

[PATCH v3 20/51] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IBRS

2023-01-12 Thread Peter Zijlstra

vmlinux.o: warning: objtool: intel_idle_ibrs+0x17: call to spec_ctrl_current() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_ibrs+0x27: call to wrmsrl.constprop.0() 
leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/x86/kernel/cpu/bugs.c |2 +-
 drivers/idle/intel_idle.c  |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -79,7 +79,7 @@ void write_spec_ctrl_current(u64 val, bo
wrmsrl(MSR_IA32_SPEC_CTRL, val);
 }
 
-u64 spec_ctrl_current(void)
+noinstr u64 spec_ctrl_current(void)
 {
return this_cpu_read(x86_spec_ctrl_current);
 }
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -181,12 +181,12 @@ static __cpuidle int intel_idle_ibrs(str
int ret;
 
if (smt_active)
-   wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+   native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
 
ret = __intel_idle(dev, drv, index);
 
if (smt_active)
-   wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
+   native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
 
return ret;
 }

[PATCH v3 22/51] x86/tdx: Remove TDX_HCALL_ISSUE_STI

2023-01-12 Thread Peter Zijlstra

Now that arch_cpu_idle() is expected to return with IRQs disabled,
avoid the useless STI/CLI dance.

Per the specs this is supposed to work, but nobody has yet relied up
this behaviour so broken implementations are possible.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/x86/coco/tdx/tdcall.S|   13 -
 arch/x86/coco/tdx/tdx.c   |   23 ---
 arch/x86/include/asm/shared/tdx.h |1 -
 3 files changed, 4 insertions(+), 33 deletions(-)

--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -139,19 +139,6 @@ SYM_FUNC_START(__tdx_hypercall)
 
movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
 
-   /*
-* For the idle loop STI needs to be called directly before the TDCALL
-* that enters idle (EXIT_REASON_HLT case). STI instruction enables
-* interrupts only one instruction later. If there is a window between
-* STI and the instruction that emulates the HALT state, there is a
-* chance for interrupts to happen in this window, which can delay the
-* HLT operation indefinitely. Since this is the not the desired
-* result, conditionally call STI before TDCALL.
-*/
-   testq $TDX_HCALL_ISSUE_STI, %rsi
-   jz .Lskip_sti
-   sti
-.Lskip_sti:
tdcall
 
/*
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -169,7 +169,7 @@ static int ve_instr_len(struct ve_info *
}
 }
 
-static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti)
+static u64 __cpuidle __halt(const bool irq_disabled)
 {
struct tdx_hypercall_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
@@ -189,20 +189,14 @@ static u64 __cpuidle __halt(const bool i
 * can keep the vCPU in virtual HLT, even if an IRQ is
 * pending, without hanging/breaking the guest.
 */
-   return __tdx_hypercall(, do_sti ? TDX_HCALL_ISSUE_STI : 0);
+   return __tdx_hypercall(, 0);
 }
 
 static int handle_halt(struct ve_info *ve)
 {
-   /*
-* Since non safe halt is mainly used in CPU offlining
-* and the guest will always stay in the halt state, don't
-* call the STI instruction (set do_sti as false).
-*/
const bool irq_disabled = irqs_disabled();
-   const bool do_sti = false;
 
-   if (__halt(irq_disabled, do_sti))
+   if (__halt(irq_disabled))
return -EIO;
 
return ve_instr_len(ve);
@@ -210,22 +204,13 @@ static int handle_halt(struct ve_info *v
 
 void __cpuidle tdx_safe_halt(void)
 {
-/*
- * For do_sti=true case, __tdx_hypercall() function enables
- * interrupts using the STI instruction before the TDCALL. So
- * set irq_disabled as false.
- */
const bool irq_disabled = false;
-   const bool do_sti = true;
 
/*
 * Use WARN_ONCE() to report the failure.
 */
-   if (__halt(irq_disabled, do_sti))
+   if (__halt(irq_disabled))
WARN_ONCE(1, "HLT instruction emulation failed\n");
-
-   /* XXX I can't make sense of what @do_sti actually does */
-   raw_local_irq_disable();
 }
 
 static int read_msr(struct pt_regs *regs, struct ve_info *ve)
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -8,7 +8,6 @@
 #define TDX_HYPERCALL_STANDARD  0
 
 #define TDX_HCALL_HAS_OUTPUT   BIT(0)
-#define TDX_HCALL_ISSUE_STIBIT(1)
 
 #define TDX_CPUID_LEAF_ID  0x21
 #define TDX_IDENT  "IntelTDX"

[PATCH v3 51/51] context_tracking: Fix noinstr vs KASAN

2023-01-12 Thread Peter Zijlstra

vmlinux.o: warning: objtool: __ct_user_enter+0x72: call to 
__kasan_check_write() leaves .noinstr.text section
vmlinux.o: warning: objtool: __ct_user_exit+0x47: call to __kasan_check_write() 
leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/context_tracking.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -510,7 +510,7 @@ void noinstr __ct_user_enter(enum ctx_st
 * In this we case we don't care about any 
concurrency/ordering.
 */
if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE))
-   atomic_set(>state, state);
+   arch_atomic_set(>state, state);
} else {
/*
 * Even if context tracking is disabled on this CPU, 
because it's outside
@@ -527,7 +527,7 @@ void noinstr __ct_user_enter(enum ctx_st
 */
if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) {
/* Tracking for vtime only, no concurrent RCU 
EQS accounting */
-   atomic_set(>state, state);
+   arch_atomic_set(>state, state);
} else {
/*
 * Tracking for vtime and RCU EQS. Make sure we 
don't race
@@ -535,7 +535,7 @@ void noinstr __ct_user_enter(enum ctx_st
 * RCU only requires RCU_DYNTICKS_IDX 
increments to be fully
 * ordered.
 */
-   atomic_add(state, >state);
+   arch_atomic_add(state, >state);
}
}
}
@@ -630,12 +630,12 @@ void noinstr __ct_user_exit(enum ctx_sta
 * In this we case we don't care about any 
concurrency/ordering.
 */
if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE))
-   atomic_set(>state, CONTEXT_KERNEL);
+   arch_atomic_set(>state, CONTEXT_KERNEL);
 
} else {
if (!IS_ENABLED(CONFIG_CONTEXT_TRACKING_IDLE)) {
/* Tracking for vtime only, no concurrent RCU 
EQS accounting */
-   atomic_set(>state, CONTEXT_KERNEL);
+   arch_atomic_set(>state, CONTEXT_KERNEL);
} else {
/*
 * Tracking for vtime and RCU EQS. Make sure we 
don't race
@@ -643,7 +643,7 @@ void noinstr __ct_user_exit(enum ctx_sta
 * RCU only requires RCU_DYNTICKS_IDX 
increments to be fully
 * ordered.
 */
-   atomic_sub(state, >state);
+   arch_atomic_sub(state, >state);
}
}
}

[PATCH v3 39/51] arm,omap2: Use WFI for omap2_pm_idle()

2023-01-12 Thread Peter Zijlstra

arch_cpu_idle() is a very simple idle interface and exposes only a
single idle state and is expected to not require RCU and not do any
tracing/instrumentation.

As such, omap2_pm_idle() is not a valid implementation. Replace it
with a simple (shallow) omap2_do_wfi() call.

Omap2 doesn't have a cpuidle driver; but adding one would be the
recourse to (re)gain the other idle states.

Suggested-by: Tony Lindgren 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/arm/mach-omap2/pm24xx.c |   51 +--
 1 file changed, 2 insertions(+), 49 deletions(-)

--- a/arch/arm/mach-omap2/pm24xx.c
+++ b/arch/arm/mach-omap2/pm24xx.c
@@ -116,50 +116,12 @@ static int omap2_enter_full_retention(vo
 
 static int sti_console_enabled;
 
-static int omap2_allow_mpu_retention(void)
-{
-   if (!omap2xxx_cm_mpu_retention_allowed())
-   return 0;
-   if (sti_console_enabled)
-   return 0;
-
-   return 1;
-}
-
-static void omap2_enter_mpu_retention(void)
+static void omap2_do_wfi(void)
 {
const int zero = 0;
 
-   /* The peripherals seem not to be able to wake up the MPU when
-* it is in retention mode. */
-   if (omap2_allow_mpu_retention()) {
-   /* REVISIT: These write to reserved bits? */
-   omap_prm_clear_mod_irqs(CORE_MOD, PM_WKST1, ~0);
-   omap_prm_clear_mod_irqs(CORE_MOD, OMAP24XX_PM_WKST2, ~0);
-   omap_prm_clear_mod_irqs(WKUP_MOD, PM_WKST, ~0);
-
-   /* Try to enter MPU retention */
-   pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_RET);
-
-   } else {
-   /* Block MPU retention */
-   pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON);
-   }
-
/* WFI */
asm("mcr p15, 0, %0, c7, c0, 4" : : "r" (zero) : "memory", "cc");
-
-   pwrdm_set_next_pwrst(mpu_pwrdm, PWRDM_POWER_ON);
-}
-
-static int omap2_can_sleep(void)
-{
-   if (omap2xxx_cm_fclks_active())
-   return 0;
-   if (__clk_is_enabled(osc_ck))
-   return 0;
-
-   return 1;
 }
 
 static void omap2_pm_idle(void)
@@ -169,16 +131,7 @@ static void omap2_pm_idle(void)
if (omap_irq_pending())
return;
 
-   error = cpu_cluster_pm_enter();
-   if (error || !omap2_can_sleep()) {
-   omap2_enter_mpu_retention();
-   goto out_cpu_cluster_pm;
-   }
-
-   omap2_enter_full_retention();
-
-out_cpu_cluster_pm:
-   cpu_cluster_pm_exit();
+   omap2_do_wfi();
 }
 
 static void __init prcm_setup_regs(void)

[PATCH v3 44/51] entry,kasan,x86: Disallow overriding mem*() functions

2023-01-12 Thread Peter Zijlstra

KASAN cannot just hijack the mem*() functions, it needs to emit
__asan_mem*() variants if it wants instrumentation (other sanitizers
already do this).

vmlinux.o: warning: objtool: sync_regs+0x24: call to memcpy() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: vc_switch_off_ist+0xbe: call to memcpy() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: fixup_bad_iret+0x36: call to memset() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: __sev_get_ghcb+0xa0: call to memcpy() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: __sev_put_ghcb+0x35: call to memcpy() leaves 
.noinstr.text section

Remove the weak aliases to ensure nobody hijacks these functions and
add them to the noinstr section.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/x86/lib/memcpy_64.S  |5 ++---
 arch/x86/lib/memmove_64.S |4 +++-
 arch/x86/lib/memset_64.S  |4 +++-
 mm/kasan/kasan.h  |4 
 mm/kasan/shadow.c |   38 ++
 tools/objtool/check.c |3 +++
 6 files changed, 53 insertions(+), 5 deletions(-)

--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -7,7 +7,7 @@
 #include 
 #include 
 
-.pushsection .noinstr.text, "ax"
+.section .noinstr.text, "ax"
 
 /*
  * We build a jump to memcpy_orig by default which gets NOPped out on
@@ -42,7 +42,7 @@ SYM_FUNC_START(__memcpy)
 SYM_FUNC_END(__memcpy)
 EXPORT_SYMBOL(__memcpy)
 
-SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy)
+SYM_FUNC_ALIAS(memcpy, __memcpy)
 EXPORT_SYMBOL(memcpy)
 
 /*
@@ -183,4 +183,3 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
RET
 SYM_FUNC_END(memcpy_orig)
 
-.popsection
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -13,6 +13,8 @@
 
 #undef memmove
 
+.section .noinstr.text, "ax"
+
 /*
  * Implement memmove(). This can handle overlap between src and dst.
  *
@@ -213,5 +215,5 @@ SYM_FUNC_START(__memmove)
 SYM_FUNC_END(__memmove)
 EXPORT_SYMBOL(__memmove)
 
-SYM_FUNC_ALIAS_WEAK(memmove, __memmove)
+SYM_FUNC_ALIAS(memmove, __memmove)
 EXPORT_SYMBOL(memmove)
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -6,6 +6,8 @@
 #include 
 #include 
 
+.section .noinstr.text, "ax"
+
 /*
  * ISO C memset - set a memory block to a byte value. This function uses fast
  * string to get better performance than the original function. The code is
@@ -43,7 +45,7 @@ SYM_FUNC_START(__memset)
 SYM_FUNC_END(__memset)
 EXPORT_SYMBOL(__memset)
 
-SYM_FUNC_ALIAS_WEAK(memset, __memset)
+SYM_FUNC_ALIAS(memset, __memset)
 EXPORT_SYMBOL(memset)
 
 /*
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -551,6 +551,10 @@ void __asan_set_shadow_f3(const void *ad
 void __asan_set_shadow_f5(const void *addr, size_t size);
 void __asan_set_shadow_f8(const void *addr, size_t size);
 
+void *__asan_memset(void *addr, int c, size_t len);
+void *__asan_memmove(void *dest, const void *src, size_t len);
+void *__asan_memcpy(void *dest, const void *src, size_t len);
+
 void __hwasan_load1_noabort(unsigned long addr);
 void __hwasan_store1_noabort(unsigned long addr);
 void __hwasan_load2_noabort(unsigned long addr);
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -38,6 +38,12 @@ bool __kasan_check_write(const volatile
 }
 EXPORT_SYMBOL(__kasan_check_write);
 
+#ifndef CONFIG_GENERIC_ENTRY
+/*
+ * CONFIG_GENERIC_ENTRY relies on compiler emitted mem*() calls to not be
+ * instrumented. KASAN enabled toolchains should emit __asan_mem*() functions
+ * for the sites they want to instrument.
+ */
 #undef memset
 void *memset(void *addr, int c, size_t len)
 {
@@ -68,6 +74,38 @@ void *memcpy(void *dest, const void *src
 
return __memcpy(dest, src, len);
 }
+#endif
+
+void *__asan_memset(void *addr, int c, size_t len)
+{
+   if (!kasan_check_range((unsigned long)addr, len, true, _RET_IP_))
+   return NULL;
+
+   return __memset(addr, c, len);
+}
+EXPORT_SYMBOL(__asan_memset);
+
+#ifdef __HAVE_ARCH_MEMMOVE
+void *__asan_memmove(void *dest, const void *src, size_t len)
+{
+   if (!kasan_check_range((unsigned long)src, len, false, _RET_IP_) ||
+   !kasan_check_range((unsigned long)dest, len, true, _RET_IP_))
+   return NULL;
+
+   return __memmove(dest, src, len);
+}
+EXPORT_SYMBOL(__asan_memmove);
+#endif
+
+void *__asan_memcpy(void *dest, const void *src, size_t len)
+{
+   if (!kasan_check_range((unsigned long)src, len, false, _RET_IP_) ||
+   !kasan_check_range((unsigned long)dest, len, true, _RET_IP_))
+   return NULL;
+
+   return __memcpy(dest, src, len);
+}
+EXPORT_SYMBOL(__asan_memcpy);
 
 void kasan_poison(const void *addr, size_t size, u8 value, bool init)
 {
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -956,6 +956,9 @@ static const char *uaccess_safe_builtin[
"__asan_store16_noabort",
"__kasan_check_read",
"__kasan_check_write",
+

[PATCH v3 01/51] x86/perf/amd: Remove tracing from perf_lopwr_cb()

2023-01-12 Thread Peter Zijlstra

The perf_lopwr_cb() is called from the idle routines; there is no RCU
there, we must not enter tracing.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/x86/events/amd/brs.c |   13 +
 arch/x86/include/asm/perf_event.h |2 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -41,18 +41,15 @@ static inline unsigned int brs_to(int id
return MSR_AMD_SAMP_BR_FROM + 2 * idx + 1;
 }
 
-static inline void set_debug_extn_cfg(u64 val)
+static __always_inline void set_debug_extn_cfg(u64 val)
 {
/* bits[4:3] must always be set to 11b */
-   wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
+   __wrmsr(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
 }
 
-static inline u64 get_debug_extn_cfg(void)
+static __always_inline u64 get_debug_extn_cfg(void)
 {
-   u64 val;
-
-   rdmsrl(MSR_AMD_DBG_EXTN_CFG, val);
-   return val;
+   return __rdmsr(MSR_AMD_DBG_EXTN_CFG);
 }
 
 static bool __init amd_brs_detect(void)
@@ -338,7 +335,7 @@ void amd_pmu_brs_sched_task(struct perf_
  * called from ACPI processor_idle.c or acpi_pad.c
  * with interrupts disabled
  */
-void perf_amd_brs_lopwr_cb(bool lopwr_in)
+void noinstr perf_amd_brs_lopwr_cb(bool lopwr_in)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
union amd_debug_extn_cfg cfg;
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -554,7 +554,7 @@ extern void perf_amd_brs_lopwr_cb(bool l
 
 DECLARE_STATIC_CALL(perf_lopwr_cb, perf_amd_brs_lopwr_cb);
 
-static inline void perf_lopwr_cb(bool lopwr_in)
+static __always_inline void perf_lopwr_cb(bool lopwr_in)
 {
static_call_mod(perf_lopwr_cb)(lopwr_in);
 }

[PATCH v3 41/51] cpuidle,clk: Remove trace_.*_rcuidle()

2023-01-12 Thread Peter Zijlstra

OMAP was the one and only user.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Ulf Hansson 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 drivers/clk/clk.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -978,12 +978,12 @@ static void clk_core_disable(struct clk_
if (--core->enable_count > 0)
return;
 
-   trace_clk_disable_rcuidle(core);
+   trace_clk_disable(core);
 
if (core->ops->disable)
core->ops->disable(core->hw);
 
-   trace_clk_disable_complete_rcuidle(core);
+   trace_clk_disable_complete(core);
 
clk_core_disable(core->parent);
 }
@@ -1037,12 +1037,12 @@ static int clk_core_enable(struct clk_co
if (ret)
return ret;
 
-   trace_clk_enable_rcuidle(core);
+   trace_clk_enable(core);
 
if (core->ops->enable)
ret = core->ops->enable(core->hw);
 
-   trace_clk_enable_complete_rcuidle(core);
+   trace_clk_enable_complete(core);
 
if (ret) {
clk_core_disable(core->parent);

[PATCH v3 18/51] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE again

2023-01-12 Thread Peter Zijlstra

  vmlinux.o: warning: objtool: intel_idle_irq+0x10c: call to 
trace_hardirqs_off() leaves .noinstr.text section

As per commit 32d4fd5751ea ("cpuidle,intel_idle: Fix
CPUIDLE_FLAG_IRQ_ENABLE"):

  "must not have tracing in idle functions"

Clearly people can't read and tinker along until splat dissapears.
This straight up reverts commit d295ad34f236 ("intel_idle: Fix false
positive RCU splats due to incorrect hardirqs state").

It doesn't re-introduce the problem because preceding patches fixed it
properly.

Fixes: d295ad34f236 ("intel_idle: Fix false positive RCU splats due to 
incorrect hardirqs state")
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 drivers/idle/intel_idle.c |8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -168,13 +168,7 @@ static __cpuidle int intel_idle_irq(stru
 
raw_local_irq_enable();
ret = __intel_idle(dev, drv, index);
-
-   /*
-* The lockdep hardirqs state may be changed to 'on' with timer
-* tick interrupt followed by __do_softirq(). Use local_irq_disable()
-* to keep the hardirqs state correct.
-*/
-   local_irq_disable();
+   raw_local_irq_disable();
 
return ret;
 }

[PATCH v3 26/51] time/tick-broadcast: Remove RCU_NONIDLE usage

2023-01-12 Thread Peter Zijlstra

No callers left that have already disabled RCU.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Mark Rutland 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 kernel/time/tick-broadcast-hrtimer.c |   29 -
 1 file changed, 12 insertions(+), 17 deletions(-)

--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -56,25 +56,20 @@ static int bc_set_next(ktime_t expires,
 * hrtimer callback function is currently running, then
 * hrtimer_start() cannot move it and the timer stays on the CPU on
 * which it is assigned at the moment.
+*/
+   hrtimer_start(, expires, HRTIMER_MODE_ABS_PINNED_HARD);
+   /*
+* The core tick broadcast mode expects bc->bound_on to be set
+* correctly to prevent a CPU which has the broadcast hrtimer
+* armed from going deep idle.
 *
-* As this can be called from idle code, the hrtimer_start()
-* invocation has to be wrapped with RCU_NONIDLE() as
-* hrtimer_start() can call into tracing.
+* As tick_broadcast_lock is held, nothing can change the cpu
+* base which was just established in hrtimer_start() above. So
+* the below access is safe even without holding the hrtimer
+* base lock.
 */
-   RCU_NONIDLE( {
-   hrtimer_start(, expires, HRTIMER_MODE_ABS_PINNED_HARD);
-   /*
-* The core tick broadcast mode expects bc->bound_on to be set
-* correctly to prevent a CPU which has the broadcast hrtimer
-* armed from going deep idle.
-*
-* As tick_broadcast_lock is held, nothing can change the cpu
-* base which was just established in hrtimer_start() above. So
-* the below access is safe even without holding the hrtimer
-* base lock.
-*/
-   bc->bound_on = bctimer.base->cpu_base->cpu;
-   } );
+   bc->bound_on = bctimer.base->cpu_base->cpu;
+
return 0;
 }

[PATCH v3 38/51] cpuidle,omap4: Push RCU-idle into omap4_enter_lowpower()

2023-01-12 Thread Peter Zijlstra

From: Tony Lindgren 

OMAP4 uses full SoC suspend modes as idle states, as such it needs the
whole power-domain and clock-domain code from the idle path.

All that code is not suitable to run with RCU disabled, as such push
RCU-idle deeper still.

Signed-off-by: Tony Lindgren 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
Link: https://lkml.kernel.org/r/yqcv6crsnkusw...@atomide.com
---
 arch/arm/mach-omap2/common.h  |6 --
 arch/arm/mach-omap2/cpuidle44xx.c |8 ++--
 arch/arm/mach-omap2/omap-mpuss-lowpower.c |   12 +++-
 arch/arm/mach-omap2/pm44xx.c  |2 +-
 4 files changed, 18 insertions(+), 10 deletions(-)

--- a/arch/arm/mach-omap2/common.h
+++ b/arch/arm/mach-omap2/common.h
@@ -284,11 +284,13 @@ extern u32 omap4_get_cpu1_ns_pa_addr(voi
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PM)
 extern int omap4_mpuss_init(void);
-extern int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state);
+extern int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
+   bool rcuidle);
 extern int omap4_hotplug_cpu(unsigned int cpu, unsigned int power_state);
 #else
 static inline int omap4_enter_lowpower(unsigned int cpu,
-   unsigned int power_state)
+   unsigned int power_state,
+   bool rcuidle)
 {
cpu_do_idle();
return 0;
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -105,9 +105,7 @@ static int omap_enter_idle_smp(struct cp
}
raw_spin_unlock_irqrestore(_lock, flag);
 
-   ct_cpuidle_enter();
-   omap4_enter_lowpower(dev->cpu, cx->cpu_state);
-   ct_cpuidle_exit();
+   omap4_enter_lowpower(dev->cpu, cx->cpu_state, true);
 
raw_spin_lock_irqsave(_lock, flag);
if (cx->mpu_state_vote == num_online_cpus())
@@ -186,10 +184,8 @@ static int omap_enter_idle_coupled(struc
}
}
 
-   ct_cpuidle_enter();
-   omap4_enter_lowpower(dev->cpu, cx->cpu_state);
+   omap4_enter_lowpower(dev->cpu, cx->cpu_state, true);
cpu_done[dev->cpu] = true;
-   ct_cpuidle_exit();
 
/* Wakeup CPU1 only if it is not offlined */
if (dev->cpu == 0 && cpumask_test_cpu(1, cpu_online_mask)) {
--- a/arch/arm/mach-omap2/omap-mpuss-lowpower.c
+++ b/arch/arm/mach-omap2/omap-mpuss-lowpower.c
@@ -33,6 +33,7 @@
  * and first to wake-up when MPUSS low power states are excercised
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -214,6 +215,7 @@ static void __init save_l2x0_context(voi
  * of OMAP4 MPUSS subsystem
  * @cpu : CPU ID
  * @power_state: Low power state.
+ * @rcuidle: RCU needs to be idled
  *
  * MPUSS states for the context save:
  * save_state =
@@ -222,7 +224,8 @@ static void __init save_l2x0_context(voi
  * 2 - CPUx L1 and logic lost + GIC lost: MPUSS OSWR
  * 3 - CPUx L1 and logic lost + GIC + L2 lost: DEVICE OFF
  */
-int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state)
+int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
+bool rcuidle)
 {
struct omap4_cpu_pm_info *pm_info = _cpu(omap4_pm_info, cpu);
unsigned int save_state = 0, cpu_logic_state = PWRDM_POWER_RET;
@@ -268,6 +271,10 @@ int omap4_enter_lowpower(unsigned int cp
cpu_clear_prev_logic_pwrst(cpu);
pwrdm_set_next_pwrst(pm_info->pwrdm, power_state);
pwrdm_set_logic_retst(pm_info->pwrdm, cpu_logic_state);
+
+   if (rcuidle)
+   ct_cpuidle_enter();
+
set_cpu_wakeup_addr(cpu, __pa_symbol(omap_pm_ops.resume));
omap_pm_ops.scu_prepare(cpu, power_state);
l2x0_pwrst_prepare(cpu, save_state);
@@ -283,6 +290,9 @@ int omap4_enter_lowpower(unsigned int cp
if (IS_PM44XX_ERRATUM(PM_OMAP4_ROM_SMP_BOOT_ERRATUM_GICD) && cpu)
gic_dist_enable();
 
+   if (rcuidle)
+   ct_cpuidle_exit();
+
/*
 * Restore the CPUx power state to ON otherwise CPUx
 * power domain can transitions to programmed low power
--- a/arch/arm/mach-omap2/pm44xx.c
+++ b/arch/arm/mach-omap2/pm44xx.c
@@ -76,7 +76,7 @@ static int omap4_pm_suspend(void)
 * domain CSWR is not supported by hardware.
 * More details can be found in OMAP4430 TRM section 4.3.4.2.
 */
-   omap4_enter_lowpower(cpu_id, cpu_suspend_state);
+   omap4_enter_lowpower(cpu_id, cpu_suspend_state, false);
 
/* Restore next powerdomain state */
list_for_each_entry(pwrst, _list, node) {

[PATCH v3 48/51] cpuidle,arch: Mark all ct_cpuidle_enter() callers __cpuidle

2023-01-12 Thread Peter Zijlstra

For all cpuidle drivers that use CPUIDLE_FLAG_RCU_IDLE, ensure that
all functions that call ct_cpuidle_enter() are marked __cpuidle.

( due to lack of noinstr validation on these platforms it is entirely
  possible this isn't complete )

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-imx/cpuidle-imx6q.c |4 ++--
 arch/arm/mach-imx/cpuidle-imx6sx.c|4 ++--
 arch/arm/mach-omap2/omap-mpuss-lowpower.c |4 ++--
 arch/arm/mach-omap2/pm34xx.c  |2 +-
 arch/arm64/kernel/cpuidle.c   |2 +-
 drivers/cpuidle/cpuidle-arm.c |4 ++--
 drivers/cpuidle/cpuidle-big_little.c  |4 ++--
 drivers/cpuidle/cpuidle-mvebu-v7.c|6 +++---
 drivers/cpuidle/cpuidle-psci.c|   17 ++---
 drivers/cpuidle/cpuidle-qcom-spm.c|4 ++--
 drivers/cpuidle/cpuidle-riscv-sbi.c   |   10 +-
 drivers/cpuidle/cpuidle-tegra.c   |   10 +-
 12 files changed, 33 insertions(+), 38 deletions(-)

--- a/arch/arm/mach-imx/cpuidle-imx6q.c
+++ b/arch/arm/mach-imx/cpuidle-imx6q.c
@@ -17,8 +17,8 @@
 static int num_idle_cpus = 0;
 static DEFINE_RAW_SPINLOCK(cpuidle_lock);
 
-static int imx6q_enter_wait(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int index)
+static __cpuidle int imx6q_enter_wait(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
 {
raw_spin_lock(_lock);
if (++num_idle_cpus == num_online_cpus())
--- a/arch/arm/mach-imx/cpuidle-imx6sx.c
+++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
@@ -30,8 +30,8 @@ static int imx6sx_idle_finish(unsigned l
return 0;
 }
 
-static int imx6sx_enter_wait(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int index)
+static __cpuidle int imx6sx_enter_wait(struct cpuidle_device *dev,
+  struct cpuidle_driver *drv, int index)
 {
imx6_set_lpm(WAIT_UNCLOCKED);
 
--- a/arch/arm/mach-omap2/omap-mpuss-lowpower.c
+++ b/arch/arm/mach-omap2/omap-mpuss-lowpower.c
@@ -224,8 +224,8 @@ static void __init save_l2x0_context(voi
  * 2 - CPUx L1 and logic lost + GIC lost: MPUSS OSWR
  * 3 - CPUx L1 and logic lost + GIC + L2 lost: DEVICE OFF
  */
-int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
-bool rcuidle)
+__cpuidle int omap4_enter_lowpower(unsigned int cpu, unsigned int power_state,
+  bool rcuidle)
 {
struct omap4_cpu_pm_info *pm_info = _cpu(omap4_pm_info, cpu);
unsigned int save_state = 0, cpu_logic_state = PWRDM_POWER_RET;
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -175,7 +175,7 @@ static int omap34xx_do_sram_idle(unsigne
return 0;
 }
 
-void omap_sram_idle(bool rcuidle)
+__cpuidle void omap_sram_idle(bool rcuidle)
 {
/* Variable to tell what needs to be saved and restored
 * in omap_sram_idle*/
--- a/arch/arm64/kernel/cpuidle.c
+++ b/arch/arm64/kernel/cpuidle.c
@@ -62,7 +62,7 @@ int acpi_processor_ffh_lpi_probe(unsigne
return psci_acpi_cpu_init_idle(cpu);
 }
 
-int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
+__cpuidle int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi)
 {
u32 state = lpi->address;
 
--- a/drivers/cpuidle/cpuidle-arm.c
+++ b/drivers/cpuidle/cpuidle-arm.c
@@ -31,8 +31,8 @@
  * Called from the CPUidle framework to program the device to the
  * specified target state selected by the governor.
  */
-static int arm_enter_idle_state(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int idx)
+static __cpuidle int arm_enter_idle_state(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int idx)
 {
/*
 * Pass idle state index to arm_cpuidle_suspend which in turn
--- a/drivers/cpuidle/cpuidle-big_little.c
+++ b/drivers/cpuidle/cpuidle-big_little.c
@@ -122,8 +122,8 @@ static int notrace bl_powerdown_finisher
  * Called from the CPUidle framework to program the device to the
  * specified target state selected by the governor.
  */
-static int bl_enter_powerdown(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int idx)
+static __cpuidle int bl_enter_powerdown(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv, int idx)
 {
cpu_pm_enter();
ct_cpuidle_enter();
--- a/drivers/cpuidle/cpuidle-mvebu-v7.c
+++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
@@ -25,9 +25,9 @@
 
 static int (*mvebu_v7_cpu_suspend)(int);
 
-static int mvebu_v7_enter_idle(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv,
-   int index)
+static __cpuidle int mvebu_v7_enter_idle(struct cpuidle_device *dev,
+struct cpuidle_driver *drv,
+

[PATCH v3 33/51] trace: Remove trace_hardirqs_{on,off}_caller()

2023-01-12 Thread Peter Zijlstra

Per commit 56e62a737028 ("s390: convert to generic entry") the last
and only callers of trace_hardirqs_{on,off}_caller() went away, clean
up.

Cc: Sven Schnelle 
Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/trace/trace_preemptirq.c |   29 -
 1 file changed, 29 deletions(-)

--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -84,35 +84,6 @@ void trace_hardirqs_off(void)
 }
 EXPORT_SYMBOL(trace_hardirqs_off);
 NOKPROBE_SYMBOL(trace_hardirqs_off);
-
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
-{
-   if (this_cpu_read(tracing_irq_cpu)) {
-   if (!in_nmi())
-   trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
-   tracer_hardirqs_on(CALLER_ADDR0, caller_addr);
-   this_cpu_write(tracing_irq_cpu, 0);
-   }
-
-   lockdep_hardirqs_on_prepare();
-   lockdep_hardirqs_on(caller_addr);
-}
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-NOKPROBE_SYMBOL(trace_hardirqs_on_caller);
-
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
-{
-   lockdep_hardirqs_off(caller_addr);
-
-   if (!this_cpu_read(tracing_irq_cpu)) {
-   this_cpu_write(tracing_irq_cpu, 1);
-   tracer_hardirqs_off(CALLER_ADDR0, caller_addr);
-   if (!in_nmi())
-   trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
-   }
-}
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
-NOKPROBE_SYMBOL(trace_hardirqs_off_caller);
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
 #ifdef CONFIG_TRACE_PREEMPT_TOGGLE

[PATCH v3 42/51] ubsan: Fix objtool UACCESS warns

2023-01-12 Thread Peter Zijlstra

clang-14 allyesconfig gives:

vmlinux.o: warning: objtool: emulator_cmpxchg_emulated+0x705: call to 
__ubsan_handle_load_invalid_value() with UACCESS enabled
vmlinux.o: warning: objtool: paging64_update_accessed_dirty_bits+0x39e: call to 
__ubsan_handle_load_invalid_value() with UACCESS enabled
vmlinux.o: warning: objtool: paging32_update_accessed_dirty_bits+0x390: call to 
__ubsan_handle_load_invalid_value() with UACCESS enabled
vmlinux.o: warning: objtool: ept_update_accessed_dirty_bits+0x43f: call to 
__ubsan_handle_load_invalid_value() with UACCESS enabled

Add the required eflags save/restore and whitelist the thing.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 lib/ubsan.c   |5 -
 tools/objtool/check.c |1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

--- a/lib/ubsan.c
+++ b/lib/ubsan.c
@@ -340,9 +340,10 @@ void __ubsan_handle_load_invalid_value(v
 {
struct invalid_value_data *data = _data;
char val_str[VALUE_LENGTH];
+   unsigned long ua_flags = user_access_save();
 
if (suppress_report(>location))
-   return;
+   goto out;
 
ubsan_prologue(>location, "invalid-load");
 
@@ -352,6 +353,8 @@ void __ubsan_handle_load_invalid_value(v
val_str, data->type->type_name);
 
ubsan_epilogue();
+out:
+   user_access_restore(ua_flags);
 }
 EXPORT_SYMBOL(__ubsan_handle_load_invalid_value);
 
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1068,6 +1068,7 @@ static const char *uaccess_safe_builtin[
"__ubsan_handle_type_mismatch",
"__ubsan_handle_type_mismatch_v1",
"__ubsan_handle_shift_out_of_bounds",
+   "__ubsan_handle_load_invalid_value",
/* misc */
"csum_partial_copy_generic",
"copy_mc_fragile",

[PATCH v3 08/51] cpuidle,imx6: Push RCU-idle into driver

2023-01-12 Thread Peter Zijlstra

Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is daft.

Notably both cpu_pm_enter() and cpu_cluster_pm_enter() implicity
re-enable RCU.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Frederic Weisbecker 
Acked-by: Rafael J. Wysocki 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/arm/mach-imx/cpuidle-imx6sx.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/arm/mach-imx/cpuidle-imx6sx.c
+++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
@@ -47,7 +47,9 @@ static int imx6sx_enter_wait(struct cpui
cpu_pm_enter();
cpu_cluster_pm_enter();
 
+   ct_idle_enter();
cpu_suspend(0, imx6sx_idle_finish);
+   ct_idle_exit();
 
cpu_cluster_pm_exit();
cpu_pm_exit();
@@ -87,7 +89,8 @@ static struct cpuidle_driver imx6sx_cpui
 */
.exit_latency = 300,
.target_residency = 500,
-   .flags = CPUIDLE_FLAG_TIMER_STOP,
+   .flags = CPUIDLE_FLAG_TIMER_STOP |
+CPUIDLE_FLAG_RCU_IDLE,
.enter = imx6sx_enter_wait,
.name = "LOW-POWER-IDLE",
.desc = "ARM power off",

[PATCH v3 25/51] printk: Remove trace_.*_rcuidle() usage

2023-01-12 Thread Peter Zijlstra

The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console
tracepoint"), was printk usage from the cpuidle path where RCU was
already disabled.

Per the patches earlier in this series, this is no longer the case.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Sergey Senozhatsky 
Acked-by: Petr Mladek 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 kernel/printk/printk.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2238,7 +2238,7 @@ static u16 printk_sprint(char *text, u16
}
}
 
-   trace_console_rcuidle(text, text_len);
+   trace_console(text, text_len);
 
return text_len;
 }

[PATCH v3 47/51] cpuidle: Ensure ct_cpuidle_enter() is always called from noinstr/__cpuidle

2023-01-12 Thread Peter Zijlstra

Tracing (kprobes included) and other compiler instrumentation relies
on a normal kernel runtime. Therefore all functions that disable RCU
should be noinstr, as should all functions that are called while RCU
is disabled.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle.c |   37 -
 1 file changed, 28 insertions(+), 9 deletions(-)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -137,11 +137,13 @@ int cpuidle_find_deepest_state(struct cp
 }
 
 #ifdef CONFIG_SUSPEND
-static void enter_s2idle_proper(struct cpuidle_driver *drv,
-   struct cpuidle_device *dev, int index)
+static noinstr void enter_s2idle_proper(struct cpuidle_driver *drv,
+struct cpuidle_device *dev, int index)
 {
-   ktime_t time_start, time_end;
struct cpuidle_state *target_state = >states[index];
+   ktime_t time_start, time_end;
+
+   instrumentation_begin();
 
time_start = ns_to_ktime(local_clock());
 
@@ -152,13 +154,18 @@ static void enter_s2idle_proper(struct c
 * suspended is generally unsafe.
 */
stop_critical_timings();
-   if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
+   if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
ct_cpuidle_enter();
+   /* Annotate away the indirect call */
+   instrumentation_begin();
+   }
target_state->enter_s2idle(dev, drv, index);
if (WARN_ON_ONCE(!irqs_disabled()))
raw_local_irq_disable();
-   if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
+   if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
+   instrumentation_end();
ct_cpuidle_exit();
+   }
tick_unfreeze();
start_critical_timings();
 
@@ -166,6 +173,7 @@ static void enter_s2idle_proper(struct c
 
dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, 
time_start);
dev->states_usage[index].s2idle_usage++;
+   instrumentation_end();
 }
 
 /**
@@ -200,8 +208,9 @@ int cpuidle_enter_s2idle(struct cpuidle_
  * @drv: cpuidle driver for this cpu
  * @index: index into the states table in @drv of the state to enter
  */
-int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
-   int index)
+noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
+struct cpuidle_driver *drv,
+int index)
 {
int entered_state;
 
@@ -209,6 +218,8 @@ int cpuidle_enter_state(struct cpuidle_d
bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
ktime_t time_start, time_end;
 
+   instrumentation_begin();
+
/*
 * Tell the time framework to switch to a broadcast timer because our
 * local timer will be shut down.  If a local timer is used from another
@@ -235,15 +246,21 @@ int cpuidle_enter_state(struct cpuidle_d
time_start = ns_to_ktime(local_clock());
 
stop_critical_timings();
-   if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
+   if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
ct_cpuidle_enter();
+   /* Annotate away the indirect call */
+   instrumentation_begin();
+   }
 
entered_state = target_state->enter(dev, drv, index);
+
if (WARN_ONCE(!irqs_disabled(), "%ps leaked IRQ state", 
target_state->enter))
raw_local_irq_disable();
 
-   if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
+   if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
+   instrumentation_end();
ct_cpuidle_exit();
+   }
start_critical_timings();
 
sched_clock_idle_wakeup_event();
@@ -306,6 +323,8 @@ int cpuidle_enter_state(struct cpuidle_d
dev->states_usage[index].rejected++;
}
 
+   instrumentation_end();
+
return entered_state;
 }

[PATCH v3 43/51] intel_idle: Add force_irq_on module param

2023-01-12 Thread Peter Zijlstra

For testing purposes.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 drivers/idle/intel_idle.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1787,6 +1787,9 @@ static bool __init intel_idle_verify_cst
return true;
 }
 
+static bool force_irq_on __read_mostly;
+module_param(force_irq_on, bool, 0444);
+
 static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 {
int cstate;
@@ -1838,8 +1841,10 @@ static void __init intel_idle_init_cstat
/* Structure copy. */
drv->states[drv->state_count] = cpuidle_state_table[cstate];
 
-   if (cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IRQ_ENABLE)
+   if ((cpuidle_state_table[cstate].flags & 
CPUIDLE_FLAG_IRQ_ENABLE) || force_irq_on) {
+   printk("intel_idle: forced intel_idle_irq for state 
%d\n", cstate);
drv->states[drv->state_count].enter = intel_idle_irq;
+   }
 
if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IBRS) {

[PATCH v3 49/51] cpuidle,arch: Mark all regular cpuidle_state::enter methods __cpuidle

2023-01-12 Thread Peter Zijlstra

For all cpuidle drivers that do not use CPUIDLE_FLAG_RCU_IDLE (iow,
the simple ones) make sure all the functions are marked __cpuidle.

( due to lack of noinstr validation on these platforms it is entirely
  possible this isn't complete )

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/kernel/cpuidle.c   |4 ++--
 arch/arm/mach-davinci/cpuidle.c |4 ++--
 arch/arm/mach-imx/cpuidle-imx5.c|4 ++--
 arch/arm/mach-imx/cpuidle-imx6sl.c  |4 ++--
 arch/arm/mach-imx/cpuidle-imx7ulp.c |4 ++--
 arch/arm/mach-s3c/cpuidle-s3c64xx.c |5 ++---
 arch/mips/kernel/idle.c |6 +++---
 7 files changed, 15 insertions(+), 16 deletions(-)

--- a/arch/arm/kernel/cpuidle.c
+++ b/arch/arm/kernel/cpuidle.c
@@ -26,8 +26,8 @@ static struct cpuidle_ops cpuidle_ops[NR
  *
  * Returns the index passed as parameter
  */
-int arm_cpuidle_simple_enter(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int index)
+__cpuidle int arm_cpuidle_simple_enter(struct cpuidle_device *dev, struct
+  cpuidle_driver *drv, int index)
 {
cpu_do_idle();
 
--- a/arch/arm/mach-davinci/cpuidle.c
+++ b/arch/arm/mach-davinci/cpuidle.c
@@ -44,8 +44,8 @@ static void davinci_save_ddr_power(int e
 }
 
 /* Actual code that puts the SoC in different idle states */
-static int davinci_enter_idle(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index)
+static __cpuidle int davinci_enter_idle(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv, int index)
 {
davinci_save_ddr_power(1, ddr2_pdown);
cpu_do_idle();
--- a/arch/arm/mach-imx/cpuidle-imx5.c
+++ b/arch/arm/mach-imx/cpuidle-imx5.c
@@ -8,8 +8,8 @@
 #include 
 #include "cpuidle.h"
 
-static int imx5_cpuidle_enter(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index)
+static __cpuidle int imx5_cpuidle_enter(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv, int index)
 {
arm_pm_idle();
return index;
--- a/arch/arm/mach-imx/cpuidle-imx6sl.c
+++ b/arch/arm/mach-imx/cpuidle-imx6sl.c
@@ -11,8 +11,8 @@
 #include "common.h"
 #include "cpuidle.h"
 
-static int imx6sl_enter_wait(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int index)
+static __cpuidle int imx6sl_enter_wait(struct cpuidle_device *dev,
+  struct cpuidle_driver *drv, int index)
 {
imx6_set_lpm(WAIT_UNCLOCKED);
/*
--- a/arch/arm/mach-imx/cpuidle-imx7ulp.c
+++ b/arch/arm/mach-imx/cpuidle-imx7ulp.c
@@ -12,8 +12,8 @@
 #include "common.h"
 #include "cpuidle.h"
 
-static int imx7ulp_enter_wait(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int index)
+static __cpuidle int imx7ulp_enter_wait(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv, int index)
 {
if (index == 1)
imx7ulp_set_lpm(ULP_PM_WAIT);
--- a/arch/arm/mach-s3c/cpuidle-s3c64xx.c
+++ b/arch/arm/mach-s3c/cpuidle-s3c64xx.c
@@ -19,9 +19,8 @@
 #include "regs-sys-s3c64xx.h"
 #include "regs-syscon-power-s3c64xx.h"
 
-static int s3c64xx_enter_idle(struct cpuidle_device *dev,
- struct cpuidle_driver *drv,
- int index)
+static __cpuidle int s3c64xx_enter_idle(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv, int index)
 {
unsigned long tmp;
 
--- a/arch/mips/kernel/idle.c
+++ b/arch/mips/kernel/idle.c
@@ -241,7 +241,7 @@ void __init check_wait(void)
}
 }
 
-void arch_cpu_idle(void)
+__cpuidle void arch_cpu_idle(void)
 {
if (cpu_wait)
cpu_wait();
@@ -249,8 +249,8 @@ void arch_cpu_idle(void)
 
 #ifdef CONFIG_CPU_IDLE
 
-int mips_cpuidle_wait_enter(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int index)
+__cpuidle int mips_cpuidle_wait_enter(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
 {
arch_cpu_idle();
return index;

[PATCH v3 32/51] cpuidle,acpi: Make noinstr clean

2023-01-12 Thread Peter Zijlstra

vmlinux.o: warning: objtool: io_idle+0xc: call to __inb.isra.0() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter+0xfe: call to num_online_cpus() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter+0x115: call to 
acpi_idle_fallback_to_c1.isra.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/x86/include/asm/shared/io.h |4 ++--
 drivers/acpi/processor_idle.c|2 +-
 include/linux/cpumask.h  |4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/shared/io.h
+++ b/arch/x86/include/asm/shared/io.h
@@ -5,13 +5,13 @@
 #include 
 
 #define BUILDIO(bwl, bw, type) \
-static inline void __out##bwl(type value, u16 port)\
+static __always_inline void __out##bwl(type value, u16 port)   \
 {  \
asm volatile("out" #bwl " %" #bw "0, %w1"   \
 : : "a"(value), "Nd"(port));   \
 }  \
\
-static inline type __in##bwl(u16 port) \
+static __always_inline type __in##bwl(u16 port)
\
 {  \
type value; \
asm volatile("in" #bwl " %w1, %" #bw "0"\
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -593,7 +593,7 @@ static int acpi_idle_play_dead(struct cp
return 0;
 }
 
-static bool acpi_idle_fallback_to_c1(struct acpi_processor *pr)
+static __always_inline bool acpi_idle_fallback_to_c1(struct acpi_processor *pr)
 {
return IS_ENABLED(CONFIG_HOTPLUG_CPU) && !pr->flags.has_cst &&
!(acpi_gbl_FADT.flags & ACPI_FADT_C2_MP_SUPPORTED);
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -908,9 +908,9 @@ static inline const struct cpumask *get_
  * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
  * region.
  */
-static inline unsigned int num_online_cpus(void)
+static __always_inline unsigned int num_online_cpus(void)
 {
-   return atomic_read(&__num_online_cpus);
+   return arch_atomic_read(&__num_online_cpus);
 }
 #define num_possible_cpus()cpumask_weight(cpu_possible_mask)
 #define num_present_cpus() cpumask_weight(cpu_present_mask)

[PATCH v3 46/51] arm64,riscv,perf: Remove RCU_NONIDLE() usage

2023-01-12 Thread Peter Zijlstra

The PM notifiers should no longer be ran with RCU disabled (per the
previous patches), as such this hack is no longer required either.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 drivers/perf/arm_pmu.c   |   11 +--
 drivers/perf/riscv_pmu_sbi.c |8 +---
 2 files changed, 2 insertions(+), 17 deletions(-)

--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -762,17 +762,8 @@ static void cpu_pm_pmu_setup(struct arm_
case CPU_PM_ENTER_FAILED:
 /*
  * Restore and enable the counter.
- * armpmu_start() indirectly calls
- *
- * perf_event_update_userpage()
- *
- * that requires RCU read locking to be functional,
- * wrap the call within RCU_NONIDLE to make the
- * RCU subsystem aware this cpu is not idle from
- * an RCU perspective for the armpmu_start() call
- * duration.
  */
-   RCU_NONIDLE(armpmu_start(event, PERF_EF_RELOAD));
+   armpmu_start(event, PERF_EF_RELOAD);
break;
default:
break;
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -747,14 +747,8 @@ static int riscv_pm_pmu_notify(struct no
case CPU_PM_ENTER_FAILED:
/*
 * Restore and enable the counter.
-*
-* Requires RCU read locking to be functional,
-* wrap the call within RCU_NONIDLE to make the
-* RCU subsystem aware this cpu is not idle from
-* an RCU perspective for the riscv_pmu_start() call
-* duration.
 */
-   RCU_NONIDLE(riscv_pmu_start(event, PERF_EF_RELOAD));
+   riscv_pmu_start(event, PERF_EF_RELOAD);
break;
default:
break;

[PATCH v3 21/51] arch/idle: Change arch_cpu_idle() IRQ behaviour

2023-01-12 Thread Peter Zijlstra

Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.

However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.

Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Gautham R. Shenoy 
Acked-by: Mark Rutland  [arm64]
Acked-by: Rafael J. Wysocki 
Acked-by: Guo Ren 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/alpha/kernel/process.c  |1 -
 arch/arc/kernel/process.c|3 +++
 arch/arm/kernel/process.c|1 -
 arch/arm/mach-gemini/board-dt.c  |3 ++-
 arch/arm64/kernel/idle.c |1 -
 arch/csky/kernel/process.c   |1 -
 arch/csky/kernel/smp.c   |2 +-
 arch/hexagon/kernel/process.c|1 -
 arch/ia64/kernel/process.c   |1 +
 arch/loongarch/kernel/idle.c |1 +
 arch/microblaze/kernel/process.c |1 -
 arch/mips/kernel/idle.c  |8 +++-
 arch/nios2/kernel/process.c  |1 -
 arch/openrisc/kernel/process.c   |1 +
 arch/parisc/kernel/process.c |2 --
 arch/powerpc/kernel/idle.c   |5 ++---
 arch/riscv/kernel/process.c  |1 -
 arch/s390/kernel/idle.c  |1 -
 arch/sh/kernel/idle.c|1 +
 arch/sparc/kernel/leon_pmc.c |4 
 arch/sparc/kernel/process_32.c   |1 -
 arch/sparc/kernel/process_64.c   |3 ++-
 arch/um/kernel/process.c |1 -
 arch/x86/coco/tdx/tdx.c  |3 +++
 arch/x86/kernel/process.c|   15 ---
 arch/xtensa/kernel/process.c |1 +
 kernel/sched/idle.c  |2 --
 27 files changed, 29 insertions(+), 37 deletions(-)

--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
 void arch_cpu_idle(void)
 {
wtint(0);
-   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_dead(void)
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -114,6 +114,8 @@ void arch_cpu_idle(void)
"sleep %0   \n"
:
:"I"(arg)); /* can't be "r" has to be embedded const */
+
+   raw_local_irq_disable();
 }
 
 #else  /* ARC700 */
@@ -122,6 +124,7 @@ void arch_cpu_idle(void)
 {
/* sleep, but enable both set E1/E2 (levels of interrupts) before 
committing */
__asm__ __volatile__("sleep 0x3 \n");
+   raw_local_irq_disable();
 }
 
 #endif
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -78,7 +78,6 @@ void arch_cpu_idle(void)
arm_pm_idle();
else
cpu_do_idle();
-   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)
--- a/arch/arm/mach-gemini/board-dt.c
+++ b/arch/arm/mach-gemini/board-dt.c
@@ -42,8 +42,9 @@ static void gemini_idle(void)
 */
 
/* FIXME: Enabling interrupts here is racy! */
-   local_irq_enable();
+   raw_local_irq_enable();
cpu_do_idle();
+   raw_local_irq_disable();
 }
 
 static void __init gemini_init_machine(void)
--- a/arch/arm64/kernel/idle.c
+++ b/arch/arm64/kernel/idle.c
@@ -42,5 +42,4 @@ void noinstr arch_cpu_idle(void)
 * tricks
 */
cpu_do_idle();
-   raw_local_irq_enable();
 }
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -100,6 +100,5 @@ void arch_cpu_idle(void)
 #ifdef CONFIG_CPU_PM_STOP
asm volatile("stop\n");
 #endif
-   raw_local_irq_enable();
 }
 #endif
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -309,7 +309,7 @@ void arch_cpu_idle_dead(void)
while (!secondary_stack)
arch_cpu_idle();
 
-   local_irq_disable();
+   raw_local_irq_disable();
 
asm volatile(
"movsp, %0\n"
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -44,7 +44,6 @@ void arch_cpu_idle(void)
 {
__vmwait();
/*  interrupts wake us up, but irqs are still disabled */
-   raw_local_irq_enable();
 }
 
 /*
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -242,6 +242,7 @@ void arch_cpu_idle(void)
(*mark_idle)(1);
 
raw_safe_halt();
+   raw_local_irq_disable();
 
if (mark_idle)
(*mark_idle)(0);
--- a/arch/loongarch/kernel/idle.c
+++ b/arch/loongarch/kernel/idle.c
@@ -13,4 +13,5 @@ void __cpuidle arch_cpu_idle(void)
 {
raw_local_irq_enable();
__arch_cpu_idle(); /* idle instruction needs irq enabled */
+   raw_local_irq_disable();
 }
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -140,5 +140,4 @@ int dump_fpu(struct pt_regs *regs, elf_f
 
 void arch_cpu_idle(void)
 {
-   raw_local_irq_enable();

[PATCH v3 45/51] sched: Always inline __this_cpu_preempt_check()

2023-01-12 Thread Peter Zijlstra

vmlinux.o: warning: objtool: in_entry_stack+0x9: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: default_do_nmi+0x10: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: fpu_idle_fpregs+0x41: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: kvm_read_and_reset_apf_flags+0x1: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: lockdep_hardirqs_on+0xb0: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: lockdep_hardirqs_off+0xae: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_nmi_enter+0x69: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_nmi_exit+0x32: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0x9: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter+0x43: call to 
__this_cpu_preempt_check() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0x45: call to 
__this_cpu_preempt_check() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 include/linux/percpu-defs.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -310,7 +310,7 @@ extern void __bad_size_call_parameter(vo
 #ifdef CONFIG_DEBUG_PREEMPT
 extern void __this_cpu_preempt_check(const char *op);
 #else
-static inline void __this_cpu_preempt_check(const char *op) { }
+static __always_inline void __this_cpu_preempt_check(const char *op) { }
 #endif
 
 #define __pcpu_size_call_return(stem, variable)
\

[PATCH v3 36/51] cpuidle,omap3: Use WFI for omap3_pm_idle()

2023-01-12 Thread Peter Zijlstra

arch_cpu_idle() is a very simple idle interface and exposes only a
single idle state and is expected to not require RCU and not do any
tracing/instrumentation.

As such, omap_sram_idle() is not a valid implementation. Replace it
with the simple (shallow) omap3_do_wfi() call. Leaving the more
complicated idle states for the cpuidle driver.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Tony Lindgren 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/arm/mach-omap2/pm34xx.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -294,7 +294,7 @@ static void omap3_pm_idle(void)
if (omap_irq_pending())
return;
 
-   omap_sram_idle();
+   omap3_do_wfi();
 }
 
 #ifdef CONFIG_SUSPEND

[PATCH v3 04/51] cpuidle: Move IRQ state validation

2023-01-12 Thread Peter Zijlstra

Make cpuidle_enter_state() consistent with the s2idle variant and
verify ->enter() always returns with interrupts disabled.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 drivers/cpuidle/cpuidle.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -236,7 +236,11 @@ int cpuidle_enter_state(struct cpuidle_d
stop_critical_timings();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
ct_idle_enter();
+
entered_state = target_state->enter(dev, drv, index);
+   if (WARN_ONCE(!irqs_disabled(), "%ps leaked IRQ state", 
target_state->enter))
+   raw_local_irq_disable();
+
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
ct_idle_exit();
start_critical_timings();
@@ -248,12 +252,8 @@ int cpuidle_enter_state(struct cpuidle_d
/* The cpu is no longer idle or about to enter idle. */
sched_idle_set_state(NULL);
 
-   if (broadcast) {
-   if (WARN_ON_ONCE(!irqs_disabled()))
-   local_irq_disable();
-
+   if (broadcast)
tick_broadcast_exit();
-   }
 
if (!cpuidle_state_is_coupled(drv, index))
local_irq_enable();

[PATCH v3 03/51] cpuidle/poll: Ensure IRQ state is invariant

2023-01-12 Thread Peter Zijlstra

cpuidle_state::enter() methods should be IRQ invariant.

Additionally make sure to use raw_local_irq_*() methods since this
cpuidle callback will be called with RCU already disabled.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Rafael J. Wysocki 
Reviewed-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 drivers/cpuidle/poll_state.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -17,7 +17,7 @@ static int __cpuidle poll_idle(struct cp
 
dev->poll_time_limit = false;
 
-   local_irq_enable();
+   raw_local_irq_enable();
if (!current_set_polling_and_test()) {
unsigned int loop_count = 0;
u64 limit;
@@ -36,6 +36,8 @@ static int __cpuidle poll_idle(struct cp
}
}
}
+   raw_local_irq_disable();
+
current_clr_polling();
 
return index;

[PATCH v3 23/51] arm,smp: Remove trace_.*_rcuidle() usage

2023-01-12 Thread Peter Zijlstra

None of these functions should ever be ran with RCU disabled anymore.

Specifically, do_handle_IPI() is only called from handle_IPI() which
explicitly does irq_enter()/irq_exit() which ensures RCU is watching.

The problem with smp_cross_call() was, per commit 7c64cc0531fa ("arm: Use
_rcuidle for smp_cross_call() tracepoints"), that
cpuidle_enter_state_coupled() already had RCU disabled, but that's
long been fixed by commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle
deeper into the idle path").

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Ulf Hansson 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 arch/arm/kernel/smp.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -639,7 +639,7 @@ static void do_handle_IPI(int ipinr)
unsigned int cpu = smp_processor_id();
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_entry_rcuidle(ipi_types[ipinr]);
+   trace_ipi_entry(ipi_types[ipinr]);
 
switch (ipinr) {
case IPI_WAKEUP:
@@ -686,7 +686,7 @@ static void do_handle_IPI(int ipinr)
}
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_exit_rcuidle(ipi_types[ipinr]);
+   trace_ipi_exit(ipi_types[ipinr]);
 }
 
 /* Legacy version, should go away once all irqchips have been converted */
@@ -709,7 +709,7 @@ static irqreturn_t ipi_handler(int irq,
 
 static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
 {
-   trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
+   trace_ipi_raise(target, ipi_types[ipinr]);
__ipi_send_mask(ipi_desc[ipinr], target);
 }

[PATCH v3 15/51] acpi_idle: Remove tracing

2023-01-12 Thread Peter Zijlstra

All the idle routines are called with RCU disabled, as such there must
not be any tracing inside.

While there; clean-up the io-port idle thing.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 drivers/acpi/processor_idle.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -109,8 +109,8 @@ static const struct dmi_system_id proces
 static void __cpuidle acpi_safe_halt(void)
 {
if (!tif_need_resched()) {
-   safe_halt();
-   local_irq_disable();
+   raw_safe_halt();
+   raw_local_irq_disable();
}
 }
 
@@ -525,8 +525,11 @@ static int acpi_idle_bm_check(void)
return bm_status;
 }
 
-static void wait_for_freeze(void)
+static __cpuidle void io_idle(unsigned long addr)
 {
+   /* IO port based C-state */
+   inb(addr);
+
 #ifdef CONFIG_X86
/* No delay is needed if we are in guest */
if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
@@ -571,9 +574,7 @@ static void __cpuidle acpi_idle_do_entry
} else if (cx->entry_method == ACPI_CSTATE_HALT) {
acpi_safe_halt();
} else {
-   /* IO port based C-state */
-   inb(cx->address);
-   wait_for_freeze();
+   io_idle(cx->address);
}
 
perf_lopwr_cb(false);
@@ -595,8 +596,7 @@ static int acpi_idle_play_dead(struct cp
if (cx->entry_method == ACPI_CSTATE_HALT)
safe_halt();
else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
-   inb(cx->address);
-   wait_for_freeze();
+   io_idle(cx->address);
} else
return -ENODEV;

[PATCH v3 27/51] cpuidle,sched: Remove annotations from TIF_{POLLING_NRFLAG,NEED_RESCHED}

2023-01-12 Thread Peter Zijlstra

vmlinux.o: warning: objtool: mwait_idle+0x5: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0xc5: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to 
test_ti_thread_flag() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0xbc: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xea: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0xb4: call to 
current_set_polling_and_test() leaves .noinstr.text section

vmlinux.o: warning: objtool: intel_idle+0xa6: call to current_clr_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xbf: call to current_clr_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0xa1: call to 
current_clr_polling() leaves .noinstr.text section

vmlinux.o: warning: objtool: mwait_idle+0xe: call to __current_set_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0xc5: call to 
__current_set_polling() leaves .noinstr.text section
vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to 
test_ti_thread_flag() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0xbc: call to __current_set_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xea: call to 
__current_set_polling() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0xb4: call to 
__current_set_polling() leaves .noinstr.text section

vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to 
test_ti_thread_flag() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0x73: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0x91: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0x78: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_safe_halt+0xf: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Rafael J. Wysocki 
Acked-by: Frederic Weisbecker 
Tested-by: Tony Lindgren 
Tested-by: Ulf Hansson 
---
 include/linux/sched/idle.h  |   40 ++--
 include/linux/thread_info.h |   18 +-
 2 files changed, 47 insertions(+), 11 deletions(-)

--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -23,12 +23,37 @@ static inline void wake_up_if_idle(int c
  */
 #ifdef TIF_POLLING_NRFLAG
 
-static inline void __current_set_polling(void)
+#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_ATOMIC_H
+
+static __always_inline void __current_set_polling(void)
 {
-   set_thread_flag(TIF_POLLING_NRFLAG);
+   arch_set_bit(TIF_POLLING_NRFLAG,
+(unsigned long *)(_thread_info()->flags));
 }
 
-static inline bool __must_check current_set_polling_and_test(void)
+static __always_inline void __current_clr_polling(void)
+{
+   arch_clear_bit(TIF_POLLING_NRFLAG,
+  (unsigned long *)(_thread_info()->flags));
+}
+
+#else
+
+static __always_inline void __current_set_polling(void)
+{
+   set_bit(TIF_POLLING_NRFLAG,
+   (unsigned long *)(_thread_info()->flags));
+}
+
+static __always_inline void __current_clr_polling(void)
+{
+   clear_bit(TIF_POLLING_NRFLAG,
+ (unsigned long *)(_thread_info()->flags));
+}
+
+#endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_ATOMIC_H */
+
+static __always_inline bool __must_check current_set_polling_and_test(void)
 {
__current_set_polling();
 
@@ -41,12 +66,7 @@ static inline bool __must_check current_
return unlikely(tif_need_resched());
 }
 
-static inline void __current_clr_polling(void)
-{
-   clear_thread_flag(TIF_POLLING_NRFLAG);
-}
-
-static inline bool __must_check current_clr_polling_and_test(void)
+static __always_inline bool __must_check current_clr_polling_and_test(void)
 {
__current_clr_polling();
 
@@ -73,7 +93,7 @@ static inline bool __must_check current_
 }
 #endif
 
-static inline void current_clr_polling(void)
+static __always_inline void current_clr_polling(void)
 {
__current_clr_polling();
 
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -177,7 +177,23 @@ static __always_inline unsigned long rea
clear_ti_thread_flag(task_thread_info(t), TIF_##fl)
 #endif /* !CONFIG_GENERIC_ENTRY */
 
-#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
+#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_NON_ATOMIC_H
+
+static __always_inline bool tif_need_resched(void)
+{
+   return arch_test_bit(TIF_NEED_RESCHED,
+

Re: [PATCH 1/1] PCI: layerscape: Add EP mode support for ls1028a

2023-01-12 Thread Lorenzo Pieralisi

On Mon, Jan 09, 2023 at 03:41:31PM +, Frank Li wrote:
> > 
> > From: Xiaowei Bao 
> > 
> > Add PCIe EP mode support for ls1028a.
> > 
> > Signed-off-by: Xiaowei Bao 
> > Signed-off-by: Hou Zhiqiang 
> > ---
> > 
> > All other patches were already accepte by maintainer in
> > https://lore.kernel.org/lkml/2022223457.10599-1-leoyang...@nxp.com/
> > 
> > But missed this one.
> > 
> > Re-post.
> > 
> 
> Ping.

You must sign it off since you obviously are in the patch delivery chain:

https://docs.kernel.org/process/submitting-patches.html

> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Re: [PATCH v9 02/10] dt-bindings: phy: Add Lynx 10G phy binding

2023-01-12 Thread Rob Herring



On Thu, 29 Dec 2022 19:01:31 -0500, Sean Anderson wrote:
> This adds a binding for the SerDes module found on QorIQ processors.
> Each phy is a subnode of the top-level device, possibly supporting
> multiple lanes and protocols. This "thick" #phy-cells is used due to
> allow for better organization of parameters. Note that the particular
> parameters necessary to select a protocol-controller/lane combination
> vary across different SoCs, and even within different SerDes on the same
> SoC.
> 
> The driver is designed to be able to completely reconfigure lanes at
> runtime. Generally, the phy consumer can select the appropriate
> protocol using set_mode.
> 
> There are two PLLs, each of which can be used as the master clock for
> each lane. Each PLL has its own reference. For the moment they are
> required, because it simplifies the driver implementation. Absent
> reference clocks can be modeled by a fixed-clock with a rate of 0.
> 
> Signed-off-by: Sean Anderson 
> ---
> 
> Changes in v9:
> - Add fsl,unused-lanes-reserved to allow for a gradual transition
>   between firmware and Linux control of the SerDes
> - Change phy-type back to fsl,type, as I was getting the error
> '#phy-cells' is a dependency of 'phy-type'
> 
> Changes in v7:
> - Use double quotes everywhere in yaml
> 
> Changes in v6:
> - fsl,type -> phy-type
> 
> Changes in v4:
> - Use subnodes to describe lane configuration, instead of describing
>   PCCRs. This is the same style used by phy-cadence-sierra et al.
> 
> Changes in v3:
> - Manually expand yaml references
> - Add mode configuration to device tree
> 
> Changes in v2:
> - Rename to fsl,lynx-10g.yaml
> - Refer to the device in the documentation, rather than the binding
> - Move compatible first
> - Document phy cells in the description
> - Allow a value of 1 for phy-cells. This allows for compatibility with
>   the similar (but according to Ioana Ciornei different enough) lynx-28g
>   binding.
> - Remove minItems
> - Use list for clock-names
> - Fix example binding having too many cells in regs
> - Add #clock-cells. This will allow using assigned-clocks* to configure
>   the PLLs.
> - Document the structure of the compatible strings
> 
>  .../devicetree/bindings/phy/fsl,lynx-10g.yaml | 248 ++
>  1 file changed, 248 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml
> 

Reviewed-by: Rob Herring

[PATCH 1/1] PCI: layerscape: Add the workaround for A-010305

2023-01-12 Thread Frank Li

From: Xiaowei Bao 

When a link down or hot reset event occurs, the PCI Express EP
controller's Link Capabilities Register should retain the values of
the Maximum Link Width and Supported Link Speed configured by RCW.

Signed-off-by: Xiaowei Bao 
Signed-off-by: Hou Zhiqiang 
Signed-off-by: Frank Li 
---
 .../pci/controller/dwc/pci-layerscape-ep.c| 112 +-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index ed5cfc9408d9..1b884854c18e 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -18,6 +18,22 @@
 
 #include "pcie-designware.h"
 
+#define PCIE_LINK_CAP  0x7C/* PCIe Link Capabilities*/
+#define MAX_LINK_SP_MASK   0x0F
+#define MAX_LINK_W_MASK0x3F
+#define MAX_LINK_W_SHIFT   4
+
+/* PEX PFa PCIE pme and message interrupt registers*/
+#define PEX_PF0_PME_MES_DR 0xC0020
+#define PEX_PF0_PME_MES_DR_LUD (1 << 7)
+#define PEX_PF0_PME_MES_DR_LDD (1 << 9)
+#define PEX_PF0_PME_MES_DR_HRD (1 << 10)
+
+#define PEX_PF0_PME_MES_IER0xC0028
+#define PEX_PF0_PME_MES_IER_LUDIE  (1 << 7)
+#define PEX_PF0_PME_MES_IER_LDDIE  (1 << 9)
+#define PEX_PF0_PME_MES_IER_HRDIE  (1 << 10)
+
 #define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
 
 struct ls_pcie_ep_drvdata {
@@ -30,8 +46,90 @@ struct ls_pcie_ep {
struct dw_pcie  *pci;
struct pci_epc_features *ls_epc;
const struct ls_pcie_ep_drvdata *drvdata;
+   u8  max_speed;
+   u8  max_width;
+   boolbig_endian;
+   int irq;
 };
 
+static u32 ls_lut_readl(struct ls_pcie_ep *pcie, u32 offset)
+{
+   struct dw_pcie *pci = pcie->pci;
+
+   if (pcie->big_endian)
+   return ioread32be(pci->dbi_base + offset);
+   else
+   return ioread32(pci->dbi_base + offset);
+}
+
+static void ls_lut_writel(struct ls_pcie_ep *pcie, u32 offset,
+ u32 value)
+{
+   struct dw_pcie *pci = pcie->pci;
+
+   if (pcie->big_endian)
+   iowrite32be(value, pci->dbi_base + offset);
+   else
+   iowrite32(value, pci->dbi_base + offset);
+}
+
+static irqreturn_t ls_pcie_ep_event_handler(int irq, void *dev_id)
+{
+   struct ls_pcie_ep *pcie = (struct ls_pcie_ep *)dev_id;
+   struct dw_pcie *pci = pcie->pci;
+   u32 val;
+
+   val = ls_lut_readl(pcie, PEX_PF0_PME_MES_DR);
+   if (!val)
+   return IRQ_NONE;
+
+   if (val & PEX_PF0_PME_MES_DR_LUD)
+   dev_info(pci->dev, "Detect the link up state !\n");
+   else if (val & PEX_PF0_PME_MES_DR_LDD)
+   dev_info(pci->dev, "Detect the link down state !\n");
+   else if (val & PEX_PF0_PME_MES_DR_HRD)
+   dev_info(pci->dev, "Detect the hot reset state !\n");
+
+   dw_pcie_dbi_ro_wr_en(pci);
+   dw_pcie_writew_dbi(pci, PCIE_LINK_CAP,
+  (pcie->max_width << MAX_LINK_W_SHIFT) |
+  pcie->max_speed);
+   dw_pcie_dbi_ro_wr_dis(pci);
+
+   ls_lut_writel(pcie, PEX_PF0_PME_MES_DR, val);
+
+   return IRQ_HANDLED;
+}
+
+static int ls_pcie_ep_interrupt_init(struct ls_pcie_ep *pcie,
+struct platform_device *pdev)
+{
+   u32 val;
+   int ret;
+
+   pcie->irq = platform_get_irq_byname(pdev, "pme");
+   if (pcie->irq < 0) {
+   dev_err(>dev, "Can't get 'pme' irq.\n");
+   return pcie->irq;
+   }
+
+   ret = devm_request_irq(>dev, pcie->irq,
+  ls_pcie_ep_event_handler, IRQF_SHARED,
+  pdev->name, pcie);
+   if (ret) {
+   dev_err(>dev, "Can't register PCIe IRQ.\n");
+   return ret;
+   }
+
+   /* Enable interrupts */
+   val = ls_lut_readl(pcie, PEX_PF0_PME_MES_IER);
+   val |=  PEX_PF0_PME_MES_IER_LDDIE | PEX_PF0_PME_MES_IER_HRDIE |
+   PEX_PF0_PME_MES_IER_LUDIE;
+   ls_lut_writel(pcie, PEX_PF0_PME_MES_IER, val);
+
+   return 0;
+}
+
 static const struct pci_epc_features*
 ls_pcie_ep_get_features(struct dw_pcie_ep *ep)
 {
@@ -125,6 +223,7 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
struct ls_pcie_ep *pcie;
struct pci_epc_features *ls_epc;
struct resource *dbi_base;
+   int ret;
 
pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
if (!pcie)
@@ -155,9 +254,20 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
 
pci->ep.ops = _pcie_ep_ops;
 
+   pcie->big_endian = of_property_read_bool(dev->of_node, "big-endian");
+
+   pcie->max_speed =

[PATCH 1/1] PCI: layerscape: Set 64-bit DMA mask

2023-01-12 Thread Frank Li

From: Guanhua Gao 

Set DMA mask and coherent DMA mask to enable 64-bit addressing.

Signed-off-by: Guanhua Gao 
Signed-off-by: Hou Zhiqiang 
Signed-off-by: Frank Li 
---
 drivers/pci/controller/dwc/pci-layerscape-ep.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index 1b884854c18e..c19e7ec58b05 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -261,6 +261,10 @@ static int __init ls_pcie_ep_probe(struct platform_device 
*pdev)
pcie->max_width = (dw_pcie_readw_dbi(pci, PCIE_LINK_CAP) >>
  MAX_LINK_W_SHIFT) & MAX_LINK_W_MASK;
 
+   /* set 64-bit DMA mask and coherent DMA mask */
+   if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
+   dev_warn(dev, "Failed to set 64-bit DMA mask.\n");
+
platform_set_drvdata(pdev, pcie);
 
ret = dw_pcie_ep_init(>ep);
-- 
2.34.1

[PATCH v2 1/1] PCI: layerscape: Add EP mode support for ls1028a

2023-01-12 Thread Frank Li

From: Xiaowei Bao 

Add PCIe EP mode support for ls1028a.

Signed-off-by: Xiaowei Bao 
Signed-off-by: Hou Zhiqiang 
Signed-off-by: Frank Li 
Acked-by:  Roy Zang 

---

Added 
Signed-off-by: Frank Li 
Acked-by:  Roy Zang 


All other patches were already accepte by maintainer in 
https://lore.kernel.org/lkml/2022223457.10599-1-leoyang...@nxp.com/

But missed this one.

Re-post.

 drivers/pci/controller/dwc/pci-layerscape-ep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index ad99707b3b99..ed5cfc9408d9 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -112,6 +112,7 @@ static const struct ls_pcie_ep_drvdata lx2_ep_drvdata = {
 static const struct of_device_id ls_pcie_ep_of_match[] = {
{ .compatible = "fsl,ls1046a-pcie-ep", .data = _ep_drvdata },
{ .compatible = "fsl,ls1088a-pcie-ep", .data = _ep_drvdata },
+   { .compatible = "fsl,ls1028a-pcie-ep", .data = _ep_drvdata },
{ .compatible = "fsl,ls2088a-pcie-ep", .data = _ep_drvdata },
{ .compatible = "fsl,lx2160ar2-pcie-ep", .data = _ep_drvdata },
{ },
-- 
2.34.1

Re: [PATCH v2 07/14] powerpc/vdso: Improve linker flags

2023-01-12 Thread Sedat Dilek

On Thu, Jan 12, 2023 at 7:21 PM Nathan Chancellor  wrote:
>
> Hi Sedat,
>
> On Thu, Jan 12, 2023 at 07:02:30PM +0100, Sedat Dilek wrote:
> > On Thu, Jan 12, 2023 at 4:06 AM Nathan Chancellor  wrote:
> > >
> > > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, there
> > > are several warnings in the PowerPC vDSO:
> > >
> > >   clang-16: error: -Wl,-soname=linux-vdso32.so.1: 'linker' input unused 
> > > [-Werror,-Wunused-command-line-argument]
> > >   clang-16: error: -Wl,--hash-style=both: 'linker' input unused 
> > > [-Werror,-Wunused-command-line-argument]
> > >   clang-16: error: argument unused during compilation: '-shared' 
> > > [-Werror,-Wunused-command-line-argument]
> > >
> > >   clang-16: error: argument unused during compilation: '-nostdinc' 
> > > [-Werror,-Wunused-command-line-argument]
> > >   clang-16: error: argument unused during compilation: '-Wa,-maltivec' 
> > > [-Werror,-Wunused-command-line-argument]
> > >
> > > The first group of warnings point out that linker flags were being added
> > > to all invocations of $(CC), even though they will only be used during
> > > the final vDSO link. Move those flags to ldflags-y.
> > >
> > > The second group of warnings are compiler or assembler flags that will
> > > be unused during linking. Filter them out from KBUILD_CFLAGS so that
> > > they are not used during linking.
> > >
> > > Additionally, '-z noexecstack' was added directly to the ld_and_check
> > > rule in commit 1d53c0192b15 ("powerpc/vdso: link with -z noexecstack")
> > > but now that there is a common ldflags variable, it can be moved there.
> > >
> > > Signed-off-by: Nathan Chancellor 
> > > Reviewed-by: Nick Desaulniers 
> > > ---
> > > Cc: m...@ellerman.id.au
> > > Cc: npig...@gmail.com
> > > Cc: christophe.le...@csgroup.eu
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > ---
> > >  arch/powerpc/kernel/vdso/Makefile | 18 +++---
> > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/powerpc/kernel/vdso/Makefile 
> > > b/arch/powerpc/kernel/vdso/Makefile
> > > index 45c0cc5d34b6..4337b3aa9171 100644
> > > --- a/arch/powerpc/kernel/vdso/Makefile
> > > +++ b/arch/powerpc/kernel/vdso/Makefile
> > > @@ -47,13 +47,17 @@ KCOV_INSTRUMENT := n
> > >  UBSAN_SANITIZE := n
> > >  KASAN_SANITIZE := n
> > >
> > > -ccflags-y := -shared -fno-common -fno-builtin -nostdlib 
> > > -Wl,--hash-style=both
> > > -ccflags-$(CONFIG_LD_IS_LLD) += $(call 
> > > cc-option,--ld-path=$(LD),-fuse-ld=lld)
> > > -
> > > -CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32
> > > +ccflags-y := -fno-common -fno-builtin
> > > +ldflags-y := -Wl,--hash-style=both -nostdlib -shared -z noexecstack
> > > +ldflags-$(CONFIG_LD_IS_LLD) += $(call 
> > > cc-option,--ld-path=$(LD),-fuse-ld=lld)
> > > +# Filter flags that clang will warn are unused for linking
> > > +ldflags-y += $(filter-out $(CC_FLAGS_FTRACE) -Wa$(comma)%, 
> > > $(KBUILD_CFLAGS))
> > > +
> > > +CC32FLAGS := -m32
> > > +LD32FLAGS := -Wl,-soname=linux-vdso32.so.1
> > >  AS32FLAGS := -D__VDSO32__
> > >
> > > -CC64FLAGS := -Wl,-soname=linux-vdso64.so.1
> >
> > Set CC64FLAGS := -m64 ?
>
> I do not think it is necessary. ldflags-y is filtered from
> KBUILD_CFLAGS, which should already include '-m64' (search for
> 'HAS_BIARCH' in arch/powerpc/Makefile). We would have seen a problem
> with this already if a 32-bit target (powerpc-linux-gnu-) CROSS_COMPILE
> value since $(c_flags) uses the main kernel's CROSS_COMPILE value.
>

Happy new 2023 Nathan,

that vdso Makefiles are hard to read.

Looks like x86/vdso explicitly sets -m32 and filter-out -m64 for the
32-bit case.

Best regards,
-Sedat-

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/entry/vdso/Makefile

> > > +LD64FLAGS := -Wl,-soname=linux-vdso64.so.1
> > >  AS64FLAGS := -D__VDSO64__
> > >
> > >  targets += vdso32.lds
> > > @@ -92,14 +96,14 @@ include/generated/vdso64-offsets.h: 
> > > $(obj)/vdso64.so.dbg FORCE
> > >
> > >  # actual build commands
> > >  quiet_cmd_vdso32ld_and_check = VDSO32L $@
> > > -  cmd_vdso32ld_and_check = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ 
> > > -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; 
> > > $(cmd_vdso_check)
> > > +  cmd_vdso32ld_and_check = $(VDSOCC) $(ldflags-y) $(CC32FLAGS) 
> > > $(LD32FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^); 
> > > $(cmd_vdso_check)
> > >  quiet_cmd_vdso32as = VDSO32A $@
> > >cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) $(AS32FLAGS) -c 
> > > -o $@ $<
> > >  quiet_cmd_vdso32cc = VDSO32C $@
> > >cmd_vdso32cc = $(VDSOCC) $(c_flags) $(CC32FLAGS) -c -o $@ $<
> > >
> > >  quiet_cmd_vdso64ld_and_check = VDSO64L $@
> > > -  cmd_vdso64ld_and_check = $(VDSOCC) $(c_flags) $(CC64FLAGS) -o $@ 
> > > -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; 
> > > $(cmd_vdso_check)
> > > +  cmd_vdso64ld_and_check = $(VDSOCC) $(ldflags-y) $(CC64FLAGS) 
> > > $(LD64FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter

Re: [PATCH V2] PCI/AER: Configure ECRC only AER is native

2023-01-12 Thread Bjorn Helgaas

On Thu, Jan 12, 2023 at 12:51:11PM +0530, Vidya Sagar wrote:
> As the ECRC configuration bits are part of AER registers, configure
> ECRC only if AER is natively owned by the kernel.
> 
> Signed-off-by: Vidya Sagar 

Applied to pci/aer for v6.3, thanks!

> ---
> v2:
> * Updated kernel-parameters.txt document based on Bjorn's suggestion
> 
>  Documentation/admin-guide/kernel-parameters.txt | 4 +++-
>  drivers/pci/pcie/aer.c  | 3 +++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 426fa892d311..8f85a1230525 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4242,7 +4242,9 @@
>   specified, e.g., 12@pci:8086:9c22:103c:198f
>   for 4096-byte alignment.
>   ecrc=   Enable/disable PCIe ECRC (transaction layer
> - end-to-end CRC checking).
> + end-to-end CRC checking). Only effective if
> + OS has native AER control (either granted by
> + ACPI _OSC or forced via "pcie_ports=native")
>   bios: Use BIOS/firmware settings. This is the
>   the default.
>   off: Turn ECRC off
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e2d8a74f83c3..730b47bdcdef 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -184,6 +184,9 @@ static int disable_ecrc_checking(struct pci_dev *dev)
>   */
>  void pcie_set_ecrc_checking(struct pci_dev *dev)
>  {
> + if (!pcie_aer_is_native(dev))
> + return;
> +
>   switch (ecrc_policy) {
>   case ECRC_POLICY_DEFAULT:
>   return;
> -- 
> 2.17.1
>

Re: [PATCH] kallsyms: Fix scheduling with interrupts disabled in self-test

2023-01-12 Thread Luis Chamberlain

On Thu, Jan 12, 2023 at 08:54:26PM +1000, Nicholas Piggin wrote:
> kallsyms_on_each* may schedule so must not be called with interrupts
> disabled. The iteration function could disable interrupts, but this
> also changes lookup_symbol() to match the change to the other timing
> code.
> 
> Reported-by: Erhard F. 
> Link: 
> https://lore.kernel.org/all/bug-216902-206...@https.bugzilla.kernel.org%2F/
> Reported-by: kernel test robot 
> Link: 
> https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.s...@intel.com
> Fixes: 30f3bb09778d ("kallsyms: Add self-test facility")
> Signed-off-by: Nicholas Piggin 
> ---

Thanks Nicholas!

Petr had just suggested removing this aspect of the selftests, the performance
test as its specific to the config, it doesn't run many times to get an
average and odd things on a system can create different metrics. Zhen Lei
had given up on fixing it and has a patch to instead remove this part of
the selftest.

I still find value in keeping it, but Petr, would like your opinion on
this fix, if we were to keep it.

  Luis

>  kernel/kallsyms_selftest.c | 21 ++---
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/kallsyms_selftest.c b/kernel/kallsyms_selftest.c
> index f35d9cc1aab1..bfbc12da3326 100644
> --- a/kernel/kallsyms_selftest.c
> +++ b/kernel/kallsyms_selftest.c
> @@ -157,14 +157,11 @@ static void test_kallsyms_compression_ratio(void)
>  static int lookup_name(void *data, const char *name, struct module *mod, 
> unsigned long addr)
>  {
>   u64 t0, t1, t;
> - unsigned long flags;
>   struct test_stat *stat = (struct test_stat *)data;
>  
> - local_irq_save(flags);
> - t0 = sched_clock();
> + t0 = ktime_get_ns();
>   (void)kallsyms_lookup_name(name);
> - t1 = sched_clock();
> - local_irq_restore(flags);
> + t1 = ktime_get_ns();
>  
>   t = t1 - t0;
>   if (t < stat->min)
> @@ -234,18 +231,15 @@ static int find_symbol(void *data, const char *name, 
> struct module *mod, unsigne
>  static void test_perf_kallsyms_on_each_symbol(void)
>  {
>   u64 t0, t1;
> - unsigned long flags;
>   struct test_stat stat;
>  
>   memset(, 0, sizeof(stat));
>   stat.max = INT_MAX;
>   stat.name = stub_name;
>   stat.perf = 1;
> - local_irq_save(flags);
> - t0 = sched_clock();
> + t0 = ktime_get_ns();
>   kallsyms_on_each_symbol(find_symbol, );
> - t1 = sched_clock();
> - local_irq_restore(flags);
> + t1 = ktime_get_ns();
>   pr_info("kallsyms_on_each_symbol() traverse all: %lld ns\n", t1 - t0);
>  }
>  
> @@ -270,17 +264,14 @@ static int match_symbol(void *data, unsigned long addr)
>  static void test_perf_kallsyms_on_each_match_symbol(void)
>  {
>   u64 t0, t1;
> - unsigned long flags;
>   struct test_stat stat;
>  
>   memset(, 0, sizeof(stat));
>   stat.max = INT_MAX;
>   stat.name = stub_name;
> - local_irq_save(flags);
> - t0 = sched_clock();
> + t0 = ktime_get_ns();
>   kallsyms_on_each_match_symbol(match_symbol, stat.name, );
> - t1 = sched_clock();
> - local_irq_restore(flags);
> + t1 = ktime_get_ns();
>   pr_info("kallsyms_on_each_match_symbol() traverse all: %lld ns\n", t1 - 
> t0);
>  }
>  
> -- 
> 2.37.2
>

Re: [PATCH V1] PCI/AER: Configure ECRC only AER is native

2023-01-12 Thread Bjorn Helgaas

On Wed, Jan 11, 2023 at 03:27:51PM -0800, Sathyanarayanan Kuppuswamy wrote:
> On 1/11/23 3:10 PM, Bjorn Helgaas wrote:
> > On Wed, Jan 11, 2023 at 01:42:21PM -0800, Sathyanarayanan Kuppuswamy wrote:
> >> On 1/11/23 12:31 PM, Vidya Sagar wrote:
> >>> As the ECRC configuration bits are part of AER registers, configure
> >>> ECRC only if AER is natively owned by the kernel.
> >>
> >> ecrc command line option takes "bios/on/off" as possible options. It
> >> does not clarify whether "on/off" choices can only be used if AER is
> >> owned by OS or it can override the ownership of ECRC configuration 
> >> similar to pcie_ports=native option. Maybe that needs to be clarified.
> > 
> > Good point, what do you think of an update like this:
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index 6cfa6e3996cf..f7b40a439194 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -4296,7 +4296,9 @@
> > specified, e.g., 12@pci:8086:9c22:103c:198f
> > for 4096-byte alignment.
> > ecrc=   Enable/disable PCIe ECRC (transaction layer
> > -   end-to-end CRC checking).
> > +   end-to-end CRC checking).  Only effective
> > +   if OS has native AER control (either granted by
> > +   ACPI _OSC or forced via "pcie_ports=native").
> > bios: Use BIOS/firmware settings. This is the
> > the default.
> > off: Turn ECRC off
> 
> Looks fine. But do we even need "bios" option? Since it is the default
> value, I am not sure why we need to list that as an option again. IMO
> this could be removed.

I agree, it seems pointless.

> > I don't know whether the "ecrc=" parameter is really needed.  If we
> > were adding it today, I would ask "why not enable ECRC wherever it is
> > supported?"  If there are devices where it's broken, we could always
> > add quirks to disable it on a case-by-case basis.
> 
> Checking the original patch which added it, it looks like the intention
> is to give option to boost performance over integrity.
> 
> commit 43c16408842b0eeb367c23a6fa540ce69f99e347
> Author: Andrew Patterson 
> Date:   Wed Apr 22 16:52:09 2009 -0600
> 
> PCI: Add support for turning PCIe ECRC on or off
> 
> Adds support for PCI Express transaction layer end-to-end CRC checking
> (ECRC).  This patch will enable/disable ECRC checking by setting/clearing
> the ECRC Check Enable and/or ECRC Generation Enable bits for devices that
> support ECRC.
> 
> The ECRC setting is controlled by the "pci=ecrc=" command-line
> option. If this option is not set or is set to 'bios", the enable and
> generation bits are left in whatever state that firmware/BIOS set them to.
> The "off" setting turns them off, and the "on" option turns them on (if 
> the
> device supports it).
> 
> Turning ECRC on or off can be a data integrity versus performance
> tradeoff.  In theory, turning it on will catch more data errors, turning
> it off means possibly better performance since CRC does not need to be
> calculated by the PCIe hardware and packet sizes are reduced.

Ah, right, and I think I was even part of the conversation when this
was added :)

I'm not sure I would make the same choice today, though.  IMHO it's
kind of hard to defend choosing performance over data integrity.

If a platform really wants to sacrifice integrity for performance, it
could retain control of AER, and after Vidya's patch, Linux will leave
the ECRC configuration alone.

Straw-man: If Linux owns AER and ECRC is supported, enable ECRC by
default.  Retain "ecrc=off" to turn it off, but drop a note in dmesg
and taint the kernel.

Bjorn

Re: [PATCH v2 07/14] powerpc/vdso: Improve linker flags

2023-01-12 Thread Nathan Chancellor

Hi Sedat,

On Thu, Jan 12, 2023 at 07:02:30PM +0100, Sedat Dilek wrote:
> On Thu, Jan 12, 2023 at 4:06 AM Nathan Chancellor  wrote:
> >
> > When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, there
> > are several warnings in the PowerPC vDSO:
> >
> >   clang-16: error: -Wl,-soname=linux-vdso32.so.1: 'linker' input unused 
> > [-Werror,-Wunused-command-line-argument]
> >   clang-16: error: -Wl,--hash-style=both: 'linker' input unused 
> > [-Werror,-Wunused-command-line-argument]
> >   clang-16: error: argument unused during compilation: '-shared' 
> > [-Werror,-Wunused-command-line-argument]
> >
> >   clang-16: error: argument unused during compilation: '-nostdinc' 
> > [-Werror,-Wunused-command-line-argument]
> >   clang-16: error: argument unused during compilation: '-Wa,-maltivec' 
> > [-Werror,-Wunused-command-line-argument]
> >
> > The first group of warnings point out that linker flags were being added
> > to all invocations of $(CC), even though they will only be used during
> > the final vDSO link. Move those flags to ldflags-y.
> >
> > The second group of warnings are compiler or assembler flags that will
> > be unused during linking. Filter them out from KBUILD_CFLAGS so that
> > they are not used during linking.
> >
> > Additionally, '-z noexecstack' was added directly to the ld_and_check
> > rule in commit 1d53c0192b15 ("powerpc/vdso: link with -z noexecstack")
> > but now that there is a common ldflags variable, it can be moved there.
> >
> > Signed-off-by: Nathan Chancellor 
> > Reviewed-by: Nick Desaulniers 
> > ---
> > Cc: m...@ellerman.id.au
> > Cc: npig...@gmail.com
> > Cc: christophe.le...@csgroup.eu
> > Cc: linuxppc-dev@lists.ozlabs.org
> > ---
> >  arch/powerpc/kernel/vdso/Makefile | 18 +++---
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/vdso/Makefile 
> > b/arch/powerpc/kernel/vdso/Makefile
> > index 45c0cc5d34b6..4337b3aa9171 100644
> > --- a/arch/powerpc/kernel/vdso/Makefile
> > +++ b/arch/powerpc/kernel/vdso/Makefile
> > @@ -47,13 +47,17 @@ KCOV_INSTRUMENT := n
> >  UBSAN_SANITIZE := n
> >  KASAN_SANITIZE := n
> >
> > -ccflags-y := -shared -fno-common -fno-builtin -nostdlib 
> > -Wl,--hash-style=both
> > -ccflags-$(CONFIG_LD_IS_LLD) += $(call 
> > cc-option,--ld-path=$(LD),-fuse-ld=lld)
> > -
> > -CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32
> > +ccflags-y := -fno-common -fno-builtin
> > +ldflags-y := -Wl,--hash-style=both -nostdlib -shared -z noexecstack
> > +ldflags-$(CONFIG_LD_IS_LLD) += $(call 
> > cc-option,--ld-path=$(LD),-fuse-ld=lld)
> > +# Filter flags that clang will warn are unused for linking
> > +ldflags-y += $(filter-out $(CC_FLAGS_FTRACE) -Wa$(comma)%, 
> > $(KBUILD_CFLAGS))
> > +
> > +CC32FLAGS := -m32
> > +LD32FLAGS := -Wl,-soname=linux-vdso32.so.1
> >  AS32FLAGS := -D__VDSO32__
> >
> > -CC64FLAGS := -Wl,-soname=linux-vdso64.so.1
> 
> Set CC64FLAGS := -m64 ?

I do not think it is necessary. ldflags-y is filtered from
KBUILD_CFLAGS, which should already include '-m64' (search for
'HAS_BIARCH' in arch/powerpc/Makefile). We would have seen a problem
with this already if a 32-bit target (powerpc-linux-gnu-) CROSS_COMPILE
value since $(c_flags) uses the main kernel's CROSS_COMPILE value.

> > +LD64FLAGS := -Wl,-soname=linux-vdso64.so.1
> >  AS64FLAGS := -D__VDSO64__
> >
> >  targets += vdso32.lds
> > @@ -92,14 +96,14 @@ include/generated/vdso64-offsets.h: 
> > $(obj)/vdso64.so.dbg FORCE
> >
> >  # actual build commands
> >  quiet_cmd_vdso32ld_and_check = VDSO32L $@
> > -  cmd_vdso32ld_and_check = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ 
> > -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; $(cmd_vdso_check)
> > +  cmd_vdso32ld_and_check = $(VDSOCC) $(ldflags-y) $(CC32FLAGS) 
> > $(LD32FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^); 
> > $(cmd_vdso_check)
> >  quiet_cmd_vdso32as = VDSO32A $@
> >cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) $(AS32FLAGS) -c -o 
> > $@ $<
> >  quiet_cmd_vdso32cc = VDSO32C $@
> >cmd_vdso32cc = $(VDSOCC) $(c_flags) $(CC32FLAGS) -c -o $@ $<
> >
> >  quiet_cmd_vdso64ld_and_check = VDSO64L $@
> > -  cmd_vdso64ld_and_check = $(VDSOCC) $(c_flags) $(CC64FLAGS) -o $@ 
> > -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; $(cmd_vdso_check)
> > +  cmd_vdso64ld_and_check = $(VDSOCC) $(ldflags-y) $(CC64FLAGS) 
> > $(LD64FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^); 
> > $(cmd_vdso_check)
> 
> If no CC64FLAGS := xxx is set, this can go?

Good catch! CC64FLAGS can be removed. Masahiro, I am happy to send a v3
when I am back online next week but if you are able to fix it up during
application, please feel free to do so (once the PowerPC folks give
their Acks of course).

> >  quiet_cmd_vdso64as = VDSO64A $@
> >cmd_vdso64as = $(VDSOCC) $(a_flags) $(CC64FLAGS) $(AS64FLAGS) -c -o 
> > $@ $<
> >
> >
> > --
> > 2.39.0
> >

Thanks for the review, cheers!
Nathan

Re: [PATCH v2 07/14] powerpc/vdso: Improve linker flags

2023-01-12 Thread Sedat Dilek

On Thu, Jan 12, 2023 at 4:06 AM Nathan Chancellor  wrote:
>
> When clang's -Qunused-arguments is dropped from KBUILD_CPPFLAGS, there
> are several warnings in the PowerPC vDSO:
>
>   clang-16: error: -Wl,-soname=linux-vdso32.so.1: 'linker' input unused 
> [-Werror,-Wunused-command-line-argument]
>   clang-16: error: -Wl,--hash-style=both: 'linker' input unused 
> [-Werror,-Wunused-command-line-argument]
>   clang-16: error: argument unused during compilation: '-shared' 
> [-Werror,-Wunused-command-line-argument]
>
>   clang-16: error: argument unused during compilation: '-nostdinc' 
> [-Werror,-Wunused-command-line-argument]
>   clang-16: error: argument unused during compilation: '-Wa,-maltivec' 
> [-Werror,-Wunused-command-line-argument]
>
> The first group of warnings point out that linker flags were being added
> to all invocations of $(CC), even though they will only be used during
> the final vDSO link. Move those flags to ldflags-y.
>
> The second group of warnings are compiler or assembler flags that will
> be unused during linking. Filter them out from KBUILD_CFLAGS so that
> they are not used during linking.
>
> Additionally, '-z noexecstack' was added directly to the ld_and_check
> rule in commit 1d53c0192b15 ("powerpc/vdso: link with -z noexecstack")
> but now that there is a common ldflags variable, it can be moved there.
>
> Signed-off-by: Nathan Chancellor 
> Reviewed-by: Nick Desaulniers 
> ---
> Cc: m...@ellerman.id.au
> Cc: npig...@gmail.com
> Cc: christophe.le...@csgroup.eu
> Cc: linuxppc-dev@lists.ozlabs.org
> ---
>  arch/powerpc/kernel/vdso/Makefile | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/kernel/vdso/Makefile 
> b/arch/powerpc/kernel/vdso/Makefile
> index 45c0cc5d34b6..4337b3aa9171 100644
> --- a/arch/powerpc/kernel/vdso/Makefile
> +++ b/arch/powerpc/kernel/vdso/Makefile
> @@ -47,13 +47,17 @@ KCOV_INSTRUMENT := n
>  UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>
> -ccflags-y := -shared -fno-common -fno-builtin -nostdlib -Wl,--hash-style=both
> -ccflags-$(CONFIG_LD_IS_LLD) += $(call cc-option,--ld-path=$(LD),-fuse-ld=lld)
> -
> -CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32
> +ccflags-y := -fno-common -fno-builtin
> +ldflags-y := -Wl,--hash-style=both -nostdlib -shared -z noexecstack
> +ldflags-$(CONFIG_LD_IS_LLD) += $(call cc-option,--ld-path=$(LD),-fuse-ld=lld)
> +# Filter flags that clang will warn are unused for linking
> +ldflags-y += $(filter-out $(CC_FLAGS_FTRACE) -Wa$(comma)%, $(KBUILD_CFLAGS))
> +
> +CC32FLAGS := -m32
> +LD32FLAGS := -Wl,-soname=linux-vdso32.so.1
>  AS32FLAGS := -D__VDSO32__
>
> -CC64FLAGS := -Wl,-soname=linux-vdso64.so.1

Set CC64FLAGS := -m64 ?

> +LD64FLAGS := -Wl,-soname=linux-vdso64.so.1
>  AS64FLAGS := -D__VDSO64__
>
>  targets += vdso32.lds
> @@ -92,14 +96,14 @@ include/generated/vdso64-offsets.h: $(obj)/vdso64.so.dbg 
> FORCE
>
>  # actual build commands
>  quiet_cmd_vdso32ld_and_check = VDSO32L $@
> -  cmd_vdso32ld_and_check = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ 
> -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; $(cmd_vdso_check)
> +  cmd_vdso32ld_and_check = $(VDSOCC) $(ldflags-y) $(CC32FLAGS) 
> $(LD32FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^); 
> $(cmd_vdso_check)
>  quiet_cmd_vdso32as = VDSO32A $@
>cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) $(AS32FLAGS) -c -o $@ 
> $<
>  quiet_cmd_vdso32cc = VDSO32C $@
>cmd_vdso32cc = $(VDSOCC) $(c_flags) $(CC32FLAGS) -c -o $@ $<
>
>  quiet_cmd_vdso64ld_and_check = VDSO64L $@
> -  cmd_vdso64ld_and_check = $(VDSOCC) $(c_flags) $(CC64FLAGS) -o $@ 
> -Wl,-T$(filter %.lds,$^) $(filter %.o,$^) -z noexecstack ; $(cmd_vdso_check)
> +  cmd_vdso64ld_and_check = $(VDSOCC) $(ldflags-y) $(CC64FLAGS) 
> $(LD64FLAGS) -o $@ -Wl,-T$(filter %.lds,$^) $(filter %.o,$^); 
> $(cmd_vdso_check)

If no CC64FLAGS := xxx is set, this can go?

-Sedat-

>  quiet_cmd_vdso64as = VDSO64A $@
>cmd_vdso64as = $(VDSOCC) $(a_flags) $(CC64FLAGS) $(AS64FLAGS) -c -o $@ 
> $<
>
>
> --
> 2.39.0
>

Re: [PATCH] powerpc/rtas: upgrade internal arch spinlocks

2023-01-12 Thread Nathan Lynch

Laurent Dufour  writes:
> On 10/01/2023 05:42:55, Nathan Lynch wrote:
>> --- a/arch/powerpc/include/asm/rtas-types.h
>> +++ b/arch/powerpc/include/asm/rtas-types.h
>> @@ -18,7 +18,7 @@ struct rtas_t {
>>  unsigned long entry;/* physical address pointer */
>>  unsigned long base; /* physical address pointer */
>>  unsigned long size;
>> -arch_spinlock_t lock;
>> +raw_spinlock_t lock;
>>  struct rtas_args args;
>>  struct device_node *dev;/* virtual address pointer */
>>  };
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index deded51a7978..a834726f18e3 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -61,7 +61,7 @@ static inline void do_enter_rtas(unsigned long args)
>>  }
>>  
>>  struct rtas_t rtas = {
>> -.lock = __ARCH_SPIN_LOCK_UNLOCKED
>> +.lock = __RAW_SPIN_LOCK_UNLOCKED(rtas.lock),
>>  };
>>  EXPORT_SYMBOL(rtas);
>
> This is not the scope of this patch, but the RTAS's lock is externalized
> through the structure rtas_t, while it is only used in that file.
>
> I think, this would be good, in case of future change about that lock, and
> in order to not break KABI, to move it out of that structure, and to define
> it statically in that file.

Thanks for pointing this out.

/* rtas-types.h */
struct rtas_t {
unsigned long entry;/* physical address pointer */
unsigned long base; /* physical address pointer */
unsigned long size;
raw_spinlock_t lock;
struct rtas_args args;
struct device_node *dev;/* virtual address pointer */
};

/* rtas.h */
extern struct rtas_t rtas;

There's C and asm code outside of rtas.c that accesses rtas.entry,
rtas.base, rtas.size, and rtas.dev. But as you say, rtas.lock is used
only in rtas.c, and it's hard to imagine any legitimate external
use. This applies to the args member as well, since accesses must occur
under the lock.

Making the lock and args private to rtas.c seems desirable on its own,
so I think that should be done first as a cleanup, followed by the
riskier arch -> raw lock conversion.

I'll tentatively plan on doing that for a v2, pending further comments.

[PATCH] soc/fsl/qe: fix usb.c build errors

2023-01-12 Thread Randy Dunlap

Fix build errors in soc/fsl/qe/usb.c when QUICC_ENGINE is not set.
This happens when PPC_EP88XC is set, which selects CPM1 & CPM.
When CPM is set, USB_FSL_QE can be set without QUICC_ENGINE
being set. When USB_FSL_QE is set, QE_USB deafults to y, which
causes build errors when QUICC_ENGINE is not set. Making
QE_USB depend on QUICC_ENGINE prevents QE_USB from defaulting to y.

Fixes these build errors:

drivers/soc/fsl/qe/usb.o: in function `qe_usb_clock_set':
usb.c:(.text+0x1e): undefined reference to `qe_immr'
powerpc-linux-ld: usb.c:(.text+0x2a): undefined reference to `qe_immr'
powerpc-linux-ld: usb.c:(.text+0xbc): undefined reference to `qe_setbrg'
powerpc-linux-ld: usb.c:(.text+0xca): undefined reference to `cmxgcr_lock'
powerpc-linux-ld: usb.c:(.text+0xce): undefined reference to `cmxgcr_lock'

Fixes: 5e41486c408e ("powerpc/QE: add support for QE USB clocks routing")
Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Link: https://lore.kernel.org/all/202301101500.pillnv6r-...@intel.com/
Suggested-by: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Leo Li 
Cc: Masahiro Yamada 
Cc: Nicolas Schier 
Cc: Qiang Zhao 
Cc: linuxppc-dev 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Anton Vorontsov 
Cc: Kumar Gala 
---
 drivers/soc/fsl/qe/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff -- a/drivers/soc/fsl/qe/Kconfig b/drivers/soc/fsl/qe/Kconfig
--- a/drivers/soc/fsl/qe/Kconfig
+++ b/drivers/soc/fsl/qe/Kconfig
@@ -39,6 +39,7 @@ config QE_TDM
 
 config QE_USB
bool
+   depends on QUICC_ENGINE
default y if USB_FSL_QE
help
  QE USB Controller support

[PATCH net-next 02/10] net: mdio: i2c: Separate C22 and C45 transactions

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The MDIO over I2C bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
 drivers/net/mdio/mdio-i2c.c | 32 +++-
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mdio/mdio-i2c.c b/drivers/net/mdio/mdio-i2c.c
index bf8bf5e20faf..9577a1842997 100644
--- a/drivers/net/mdio/mdio-i2c.c
+++ b/drivers/net/mdio/mdio-i2c.c
@@ -30,7 +30,8 @@ static unsigned int i2c_mii_phy_addr(int phy_id)
return phy_id + 0x40;
 }
 
-static int i2c_mii_read_default(struct mii_bus *bus, int phy_id, int reg)
+static int i2c_mii_read_default_c45(struct mii_bus *bus, int phy_id, int devad,
+   int reg)
 {
struct i2c_adapter *i2c = bus->priv;
struct i2c_msg msgs[2];
@@ -41,8 +42,8 @@ static int i2c_mii_read_default(struct mii_bus *bus, int 
phy_id, int reg)
return 0x;
 
p = addr;
-   if (reg & MII_ADDR_C45) {
-   *p++ = 0x20 | ((reg >> 16) & 31);
+   if (devad >= 0) {
+   *p++ = 0x20 | devad;
*p++ = reg >> 8;
}
*p++ = reg;
@@ -64,8 +65,8 @@ static int i2c_mii_read_default(struct mii_bus *bus, int 
phy_id, int reg)
return data[0] << 8 | data[1];
 }
 
-static int i2c_mii_write_default(struct mii_bus *bus, int phy_id, int reg,
-u16 val)
+static int i2c_mii_write_default_c45(struct mii_bus *bus, int phy_id,
+int devad, int reg, u16 val)
 {
struct i2c_adapter *i2c = bus->priv;
struct i2c_msg msg;
@@ -76,8 +77,8 @@ static int i2c_mii_write_default(struct mii_bus *bus, int 
phy_id, int reg,
return 0;
 
p = data;
-   if (reg & MII_ADDR_C45) {
-   *p++ = (reg >> 16) & 31;
+   if (devad >= 0) {
+   *p++ = devad;
*p++ = reg >> 8;
}
*p++ = reg;
@@ -94,6 +95,17 @@ static int i2c_mii_write_default(struct mii_bus *bus, int 
phy_id, int reg,
return ret < 0 ? ret : 0;
 }
 
+static int i2c_mii_read_default_c22(struct mii_bus *bus, int phy_id, int reg)
+{
+   return i2c_mii_read_default_c45(bus, phy_id, -1, reg);
+}
+
+static int i2c_mii_write_default_c22(struct mii_bus *bus, int phy_id, int reg,
+u16 val)
+{
+   return i2c_mii_write_default_c45(bus, phy_id, -1, reg, val);
+}
+
 /* RollBall SFPs do not access internal PHY via I2C address 0x56, but
  * instead via address 0x51, when SFP page is set to 0x03 and password to
  * 0x.
@@ -403,8 +415,10 @@ struct mii_bus *mdio_i2c_alloc(struct device *parent, 
struct i2c_adapter *i2c,
mii->write = i2c_mii_write_rollball;
break;
default:
-   mii->read = i2c_mii_read_default;
-   mii->write = i2c_mii_write_default;
+   mii->read = i2c_mii_read_default_c22;
+   mii->write = i2c_mii_write_default_c22;
+   mii->read_c45 = i2c_mii_read_default_c45;
+   mii->write_c45 = i2c_mii_write_default_c45;
break;
}
 

-- 
2.30.2

[PATCH net-next 01/10] net: mdio: cavium: Separate C22 and C45 transactions

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The cavium IP can perform both C22 and C45 transfers.  Create separate
functions for each and register the C45 versions in both the octeon
and thunder bus driver.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
 drivers/net/mdio/mdio-cavium.c  | 111 +---
 drivers/net/mdio/mdio-cavium.h  |   9 +++-
 drivers/net/mdio/mdio-octeon.c  |   6 ++-
 drivers/net/mdio/mdio-thunder.c |   6 ++-
 4 files changed, 95 insertions(+), 37 deletions(-)

diff --git a/drivers/net/mdio/mdio-cavium.c b/drivers/net/mdio/mdio-cavium.c
index 95ce274c1be1..fd81546a4d3d 100644
--- a/drivers/net/mdio/mdio-cavium.c
+++ b/drivers/net/mdio/mdio-cavium.c
@@ -26,7 +26,7 @@ static void cavium_mdiobus_set_mode(struct cavium_mdiobus *p,
 }
 
 static int cavium_mdiobus_c45_addr(struct cavium_mdiobus *p,
-  int phy_id, int regnum)
+  int phy_id, int devad, int regnum)
 {
union cvmx_smix_cmd smi_cmd;
union cvmx_smix_wr_dat smi_wr;
@@ -38,12 +38,10 @@ static int cavium_mdiobus_c45_addr(struct cavium_mdiobus *p,
smi_wr.s.dat = regnum & 0x;
oct_mdio_writeq(smi_wr.u64, p->register_base + SMI_WR_DAT);
 
-   regnum = (regnum >> 16) & 0x1f;
-
smi_cmd.u64 = 0;
smi_cmd.s.phy_op = 0; /* MDIO_CLAUSE_45_ADDRESS */
smi_cmd.s.phy_adr = phy_id;
-   smi_cmd.s.reg_adr = regnum;
+   smi_cmd.s.reg_adr = devad;
oct_mdio_writeq(smi_cmd.u64, p->register_base + SMI_CMD);
 
do {
@@ -59,28 +57,51 @@ static int cavium_mdiobus_c45_addr(struct cavium_mdiobus *p,
return 0;
 }
 
-int cavium_mdiobus_read(struct mii_bus *bus, int phy_id, int regnum)
+int cavium_mdiobus_read_c22(struct mii_bus *bus, int phy_id, int regnum)
 {
struct cavium_mdiobus *p = bus->priv;
union cvmx_smix_cmd smi_cmd;
union cvmx_smix_rd_dat smi_rd;
-   unsigned int op = 1; /* MDIO_CLAUSE_22_READ */
int timeout = 1000;
 
-   if (regnum & MII_ADDR_C45) {
-   int r = cavium_mdiobus_c45_addr(p, phy_id, regnum);
+   cavium_mdiobus_set_mode(p, C22);
+
+   smi_cmd.u64 = 0;
+   smi_cmd.s.phy_op = 1; /* MDIO_CLAUSE_22_READ */;
+   smi_cmd.s.phy_adr = phy_id;
+   smi_cmd.s.reg_adr = regnum;
+   oct_mdio_writeq(smi_cmd.u64, p->register_base + SMI_CMD);
+
+   do {
+   /* Wait 1000 clocks so we don't saturate the RSL bus
+* doing reads.
+*/
+   __delay(1000);
+   smi_rd.u64 = oct_mdio_readq(p->register_base + SMI_RD_DAT);
+   } while (smi_rd.s.pending && --timeout);
+
+   if (smi_rd.s.val)
+   return smi_rd.s.dat;
+   else
+   return -EIO;
+}
+EXPORT_SYMBOL(cavium_mdiobus_read_c22);
 
-   if (r < 0)
-   return r;
+int cavium_mdiobus_read_c45(struct mii_bus *bus, int phy_id, int devad,
+   int regnum)
+{
+   struct cavium_mdiobus *p = bus->priv;
+   union cvmx_smix_cmd smi_cmd;
+   union cvmx_smix_rd_dat smi_rd;
+   int timeout = 1000;
+   int r;
 
-   regnum = (regnum >> 16) & 0x1f;
-   op = 3; /* MDIO_CLAUSE_45_READ */
-   } else {
-   cavium_mdiobus_set_mode(p, C22);
-   }
+   r = cavium_mdiobus_c45_addr(p, phy_id, devad, regnum);
+   if (r < 0)
+   return r;
 
smi_cmd.u64 = 0;
-   smi_cmd.s.phy_op = op;
+   smi_cmd.s.phy_op = 3; /* MDIO_CLAUSE_45_READ */
smi_cmd.s.phy_adr = phy_id;
smi_cmd.s.reg_adr = regnum;
oct_mdio_writeq(smi_cmd.u64, p->register_base + SMI_CMD);
@@ -98,36 +119,64 @@ int cavium_mdiobus_read(struct mii_bus *bus, int phy_id, 
int regnum)
else
return -EIO;
 }
-EXPORT_SYMBOL(cavium_mdiobus_read);
+EXPORT_SYMBOL(cavium_mdiobus_read_c45);
 
-int cavium_mdiobus_write(struct mii_bus *bus, int phy_id, int regnum, u16 val)
+int cavium_mdiobus_write_c22(struct mii_bus *bus, int phy_id, int regnum,
+u16 val)
 {
struct cavium_mdiobus *p = bus->priv;
union cvmx_smix_cmd smi_cmd;
union cvmx_smix_wr_dat smi_wr;
-   unsigned int op = 0; /* MDIO_CLAUSE_22_WRITE */
int timeout = 1000;
 
-   if (regnum & MII_ADDR_C45) {
-   int r = cavium_mdiobus_c45_addr(p, phy_id, regnum);
+   cavium_mdiobus_set_mode(p, C22);
 
-   if (r < 0)
-   return r;
+   smi_wr.u64 = 0;
+   smi_wr.s.dat = val;
+   oct_mdio_writeq(smi_wr.u64, p->register_base + SMI_WR_DAT);
 
-   regnum = (regnum >> 16) & 0x1f;
-   op = 1; /* MDIO_CLAUSE_45_WRITE */
-   } else {
-   cavium_mdiobus_set_mode(p, C22);
-   }
+   smi_cmd.u64 = 0;
+   smi_cmd.s.phy_op = 0; /* MDIO_CLAUSE_22_WRITE */;
+   smi_cmd.s.phy_adr = phy_id;
+   smi_cmd.s.reg_adr = regnum;
+

[PATCH net-next 04/10] net: mdio: aspeed: Separate C22 and C45 transactions

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The aspeed MDIO bus driver can perform both C22 and C45 transfers.
Modify the existing C45 functions to take the devad as a parameter,
and remove the wrappers so there are individual C22 and C45 functions. Add
the C45 functions to the new API calls.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
 drivers/net/mdio/mdio-aspeed.c | 47 +++---
 1 file changed, 12 insertions(+), 35 deletions(-)

diff --git a/drivers/net/mdio/mdio-aspeed.c b/drivers/net/mdio/mdio-aspeed.c
index 944d005d2bd1..2f4bbda5e56c 100644
--- a/drivers/net/mdio/mdio-aspeed.c
+++ b/drivers/net/mdio/mdio-aspeed.c
@@ -104,61 +104,36 @@ static int aspeed_mdio_write_c22(struct mii_bus *bus, int 
addr, int regnum,
  addr, regnum, val);
 }
 
-static int aspeed_mdio_read_c45(struct mii_bus *bus, int addr, int regnum)
+static int aspeed_mdio_read_c45(struct mii_bus *bus, int addr, int devad,
+   int regnum)
 {
-   u8 c45_dev = (regnum >> 16) & 0x1F;
-   u16 c45_addr = regnum & 0x;
int rc;
 
rc = aspeed_mdio_op(bus, ASPEED_MDIO_CTRL_ST_C45, MDIO_C45_OP_ADDR,
-   addr, c45_dev, c45_addr);
+   addr, devad, regnum);
if (rc < 0)
return rc;
 
rc = aspeed_mdio_op(bus, ASPEED_MDIO_CTRL_ST_C45, MDIO_C45_OP_READ,
-   addr, c45_dev, 0);
+   addr, devad, 0);
if (rc < 0)
return rc;
 
return aspeed_mdio_get_data(bus);
 }
 
-static int aspeed_mdio_write_c45(struct mii_bus *bus, int addr, int regnum,
-u16 val)
+static int aspeed_mdio_write_c45(struct mii_bus *bus, int addr, int devad,
+int regnum, u16 val)
 {
-   u8 c45_dev = (regnum >> 16) & 0x1F;
-   u16 c45_addr = regnum & 0x;
int rc;
 
rc = aspeed_mdio_op(bus, ASPEED_MDIO_CTRL_ST_C45, MDIO_C45_OP_ADDR,
-   addr, c45_dev, c45_addr);
+   addr, devad, regnum);
if (rc < 0)
return rc;
 
return aspeed_mdio_op(bus, ASPEED_MDIO_CTRL_ST_C45, MDIO_C45_OP_WRITE,
- addr, c45_dev, val);
-}
-
-static int aspeed_mdio_read(struct mii_bus *bus, int addr, int regnum)
-{
-   dev_dbg(>dev, "%s: addr: %d, regnum: %d\n", __func__, addr,
-   regnum);
-
-   if (regnum & MII_ADDR_C45)
-   return aspeed_mdio_read_c45(bus, addr, regnum);
-
-   return aspeed_mdio_read_c22(bus, addr, regnum);
-}
-
-static int aspeed_mdio_write(struct mii_bus *bus, int addr, int regnum, u16 
val)
-{
-   dev_dbg(>dev, "%s: addr: %d, regnum: %d, val: 0x%x\n",
-   __func__, addr, regnum, val);
-
-   if (regnum & MII_ADDR_C45)
-   return aspeed_mdio_write_c45(bus, addr, regnum, val);
-
-   return aspeed_mdio_write_c22(bus, addr, regnum, val);
+ addr, devad, val);
 }
 
 static int aspeed_mdio_probe(struct platform_device *pdev)
@@ -185,8 +160,10 @@ static int aspeed_mdio_probe(struct platform_device *pdev)
bus->name = DRV_NAME;
snprintf(bus->id, MII_BUS_ID_SIZE, "%s%d", pdev->name, pdev->id);
bus->parent = >dev;
-   bus->read = aspeed_mdio_read;
-   bus->write = aspeed_mdio_write;
+   bus->read = aspeed_mdio_read_c22;
+   bus->write = aspeed_mdio_write_c22;
+   bus->read_c45 = aspeed_mdio_read_c45;
+   bus->write_c45 = aspeed_mdio_write_c45;
bus->probe_capabilities = MDIOBUS_C22_C45;
 
rc = of_mdiobus_register(bus, pdev->dev.of_node);

-- 
2.30.2

[PATCH net-next 09/10] net: stmmac: Separate C22 and C45 transactions for xgmac

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The stmmac MDIO bus driver in variant gmac4 can perform both C22 and
C45 transfers. Create separate functions for each and register the
C45 versions using the new API calls where appropriate.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 200 +++---
 1 file changed, 138 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 4836a40df1af..d2cb22f49ce5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -234,8 +234,29 @@ static int stmmac_xgmac2_mdio_write_c45(struct mii_bus 
*bus, int phyaddr,
phydata);
 }
 
+static int stmmac_mdio_read(struct stmmac_priv *priv, int data, u32 value)
+{
+   unsigned int mii_address = priv->hw->mii.addr;
+   unsigned int mii_data = priv->hw->mii.data;
+   u32 v;
+
+   if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
+  100, 1))
+   return -EBUSY;
+
+   writel(data, priv->ioaddr + mii_data);
+   writel(value, priv->ioaddr + mii_address);
+
+   if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
+  100, 1))
+   return -EBUSY;
+
+   /* Read the data from the MII data register */
+   return readl(priv->ioaddr + mii_data) & MII_DATA_MASK;
+}
+
 /**
- * stmmac_mdio_read
+ * stmmac_mdio_read_c22
  * @bus: points to the mii_bus structure
  * @phyaddr: MII addr
  * @phyreg: MII reg
@@ -244,15 +265,12 @@ static int stmmac_xgmac2_mdio_write_c45(struct mii_bus 
*bus, int phyaddr,
  * accessing the PHY registers.
  * Fortunately, it seems this has no drawback for the 7109 MAC.
  */
-static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg)
+static int stmmac_mdio_read_c22(struct mii_bus *bus, int phyaddr, int phyreg)
 {
struct net_device *ndev = bus->priv;
struct stmmac_priv *priv = netdev_priv(ndev);
-   unsigned int mii_address = priv->hw->mii.addr;
-   unsigned int mii_data = priv->hw->mii.data;
u32 value = MII_BUSY;
int data = 0;
-   u32 v;
 
data = pm_runtime_resume_and_get(priv->device);
if (data < 0)
@@ -265,60 +283,94 @@ static int stmmac_mdio_read(struct mii_bus *bus, int 
phyaddr, int phyreg)
& priv->hw->mii.clk_csr_mask;
if (priv->plat->has_gmac4) {
value |= MII_GMAC4_READ;
-   if (phyreg & MII_ADDR_C45) {
-   value |= MII_GMAC4_C45E;
-   value &= ~priv->hw->mii.reg_mask;
-   value |= ((phyreg >> MII_DEVADDR_C45_SHIFT) <<
-  priv->hw->mii.reg_shift) &
-  priv->hw->mii.reg_mask;
-
-   data |= (phyreg & MII_REGADDR_C45_MASK) <<
-   MII_GMAC4_REG_ADDR_SHIFT;
-   }
}
 
-   if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
-  100, 1)) {
-   data = -EBUSY;
-   goto err_disable_clks;
-   }
+   data = stmmac_mdio_read(priv, data, value);
 
-   writel(data, priv->ioaddr + mii_data);
-   writel(value, priv->ioaddr + mii_address);
+   pm_runtime_put(priv->device);
 
-   if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
-  100, 1)) {
-   data = -EBUSY;
-   goto err_disable_clks;
+   return data;
+}
+
+/**
+ * stmmac_mdio_read_c45
+ * @bus: points to the mii_bus structure
+ * @phyaddr: MII addr
+ * @devad: device address to read
+ * @phyreg: MII reg
+ * Description: it reads data from the MII register from within the phy device.
+ * For the 7111 GMAC, we must set the bit 0 in the MII address register while
+ * accessing the PHY registers.
+ * Fortunately, it seems this has no drawback for the 7109 MAC.
+ */
+static int stmmac_mdio_read_c45(struct mii_bus *bus, int phyaddr, int devad,
+   int phyreg)
+{
+   struct net_device *ndev = bus->priv;
+   struct stmmac_priv *priv = netdev_priv(ndev);
+   u32 value = MII_BUSY;
+   int data = 0;
+
+   data = pm_runtime_get_sync(priv->device);
+   if (data < 0) {
+   pm_runtime_put_noidle(priv->device);
+   return data;
}
 
-   /* Read the data from the MII data register */
-   data = (int)readl(priv->ioaddr + mii_data) & MII_DATA_MASK;
+   value |= (phyaddr << priv->hw->mii.addr_shift)
+   & priv->hw->mii.addr_mask;
+   value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
+   value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
+   &

[PATCH net-next 05/10] net: mdio: ipq4019: Separate C22 and C45 transactions

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The ipq4019 driver can perform both C22 and C45 transfers.  Create
separate functions for each and register the C45 versions using the
new driver API calls.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
 drivers/net/mdio/mdio-ipq4019.c | 154 +++-
 1 file changed, 90 insertions(+), 64 deletions(-)

diff --git a/drivers/net/mdio/mdio-ipq4019.c b/drivers/net/mdio/mdio-ipq4019.c
index 4eba5a91075c..78b93de636f5 100644
--- a/drivers/net/mdio/mdio-ipq4019.c
+++ b/drivers/net/mdio/mdio-ipq4019.c
@@ -53,7 +53,8 @@ static int ipq4019_mdio_wait_busy(struct mii_bus *bus)
  IPQ4019_MDIO_SLEEP, IPQ4019_MDIO_TIMEOUT);
 }
 
-static int ipq4019_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+static int ipq4019_mdio_read_c45(struct mii_bus *bus, int mii_id, int mmd,
+int reg)
 {
struct ipq4019_mdio_data *priv = bus->priv;
unsigned int data;
@@ -62,61 +63,71 @@ static int ipq4019_mdio_read(struct mii_bus *bus, int 
mii_id, int regnum)
if (ipq4019_mdio_wait_busy(bus))
return -ETIMEDOUT;
 
-   /* Clause 45 support */
-   if (regnum & MII_ADDR_C45) {
-   unsigned int mmd = (regnum >> 16) & 0x1F;
-   unsigned int reg = regnum & 0x;
+   data = readl(priv->membase + MDIO_MODE_REG);
 
-   /* Enter Clause 45 mode */
-   data = readl(priv->membase + MDIO_MODE_REG);
+   data |= MDIO_MODE_C45;
 
-   data |= MDIO_MODE_C45;
+   writel(data, priv->membase + MDIO_MODE_REG);
 
-   writel(data, priv->membase + MDIO_MODE_REG);
+   /* issue the phy address and mmd */
+   writel((mii_id << 8) | mmd, priv->membase + MDIO_ADDR_REG);
 
-   /* issue the phy address and mmd */
-   writel((mii_id << 8) | mmd, priv->membase + MDIO_ADDR_REG);
+   /* issue reg */
+   writel(reg, priv->membase + MDIO_DATA_WRITE_REG);
 
-   /* issue reg */
-   writel(reg, priv->membase + MDIO_DATA_WRITE_REG);
+   cmd = MDIO_CMD_ACCESS_START | MDIO_CMD_ACCESS_CODE_C45_ADDR;
 
-   cmd = MDIO_CMD_ACCESS_START | MDIO_CMD_ACCESS_CODE_C45_ADDR;
-   } else {
-   /* Enter Clause 22 mode */
-   data = readl(priv->membase + MDIO_MODE_REG);
+   /* issue read command */
+   writel(cmd, priv->membase + MDIO_CMD_REG);
 
-   data &= ~MDIO_MODE_C45;
+   /* Wait read complete */
+   if (ipq4019_mdio_wait_busy(bus))
+   return -ETIMEDOUT;
 
-   writel(data, priv->membase + MDIO_MODE_REG);
+   cmd = MDIO_CMD_ACCESS_START | MDIO_CMD_ACCESS_CODE_C45_READ;
 
-   /* issue the phy address and reg */
-   writel((mii_id << 8) | regnum, priv->membase + MDIO_ADDR_REG);
+   writel(cmd, priv->membase + MDIO_CMD_REG);
 
-   cmd = MDIO_CMD_ACCESS_START | MDIO_CMD_ACCESS_CODE_READ;
-   }
+   if (ipq4019_mdio_wait_busy(bus))
+   return -ETIMEDOUT;
 
-   /* issue read command */
-   writel(cmd, priv->membase + MDIO_CMD_REG);
+   /* Read and return data */
+   return readl(priv->membase + MDIO_DATA_READ_REG);
+}
+
+static int ipq4019_mdio_read_c22(struct mii_bus *bus, int mii_id, int regnum)
+{
+   struct ipq4019_mdio_data *priv = bus->priv;
+   unsigned int data;
+   unsigned int cmd;
 
-   /* Wait read complete */
if (ipq4019_mdio_wait_busy(bus))
return -ETIMEDOUT;
 
-   if (regnum & MII_ADDR_C45) {
-   cmd = MDIO_CMD_ACCESS_START | MDIO_CMD_ACCESS_CODE_C45_READ;
+   data = readl(priv->membase + MDIO_MODE_REG);
 
-   writel(cmd, priv->membase + MDIO_CMD_REG);
+   data &= ~MDIO_MODE_C45;
 
-   if (ipq4019_mdio_wait_busy(bus))
-   return -ETIMEDOUT;
-   }
+   writel(data, priv->membase + MDIO_MODE_REG);
+
+   /* issue the phy address and reg */
+   writel((mii_id << 8) | regnum, priv->membase + MDIO_ADDR_REG);
+
+   cmd = MDIO_CMD_ACCESS_START | MDIO_CMD_ACCESS_CODE_READ;
+
+   /* issue read command */
+   writel(cmd, priv->membase + MDIO_CMD_REG);
+
+   /* Wait read complete */
+   if (ipq4019_mdio_wait_busy(bus))
+   return -ETIMEDOUT;
 
/* Read and return data */
return readl(priv->membase + MDIO_DATA_READ_REG);
 }
 
-static int ipq4019_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
-u16 value)
+static int ipq4019_mdio_write_c45(struct mii_bus *bus, int mii_id, int mmd,
+ int reg, u16 value)
 {
struct ipq4019_mdio_data *priv = bus->priv;
unsigned int data;
@@ -125,50 +136,63 @@ static int ipq4019_mdio_write(struct mii_bus *bus, int 
mii_id, int regnum,
if (ipq4019_mdio_wait_busy(bus))

[PATCH net-next 03/10] net: mdio: mux-bcm-iproc: Separate C22 and C45 transactions

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The MDIO mux broadcom iproc can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
Apparently, in the c45 case, the reg value including the MII_ADDR_C45
bit is written to the hardware. Looks weird, that a "random" software
bit is written to a register. Florian is that correct? Also, with this
patch this flag isn't set anymore.
---
 drivers/net/mdio/mdio-mux-bcm-iproc.c | 54 ---
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mdio/mdio-mux-bcm-iproc.c 
b/drivers/net/mdio/mdio-mux-bcm-iproc.c
index 014c0baedbd2..956d54846b62 100644
--- a/drivers/net/mdio/mdio-mux-bcm-iproc.c
+++ b/drivers/net/mdio/mdio-mux-bcm-iproc.c
@@ -98,7 +98,7 @@ static int iproc_mdio_wait_for_idle(void __iomem *base, bool 
result)
  * Return value: Successful Read operation returns read reg values and write
  *  operation returns 0. Failure operation returns negative error code.
  */
-static int start_miim_ops(void __iomem *base,
+static int start_miim_ops(void __iomem *base, bool c45,
  u16 phyid, u32 reg, u16 val, u32 op)
 {
u32 param;
@@ -112,7 +112,7 @@ static int start_miim_ops(void __iomem *base,
param = readl(base + MDIO_PARAM_OFFSET);
param |= phyid << MDIO_PARAM_PHY_ID;
param |= val << MDIO_PARAM_PHY_DATA;
-   if (reg & MII_ADDR_C45)
+   if (c45)
param |= BIT(MDIO_PARAM_C45_SEL);
 
writel(param, base + MDIO_PARAM_OFFSET);
@@ -131,28 +131,58 @@ static int start_miim_ops(void __iomem *base,
return ret;
 }
 
-static int iproc_mdiomux_read(struct mii_bus *bus, int phyid, int reg)
+static int iproc_mdiomux_read_c22(struct mii_bus *bus, int phyid, int reg)
 {
struct iproc_mdiomux_desc *md = bus->priv;
int ret;
 
-   ret = start_miim_ops(md->base, phyid, reg, 0, MDIO_CTRL_READ_OP);
+   ret = start_miim_ops(md->base, false, phyid, reg, 0, MDIO_CTRL_READ_OP);
if (ret < 0)
-   dev_err(>dev, "mdiomux read operation failed!!!");
+   dev_err(>dev, "mdiomux c22 read operation failed!!!");
 
return ret;
 }
 
-static int iproc_mdiomux_write(struct mii_bus *bus,
-  int phyid, int reg, u16 val)
+static int iproc_mdiomux_read_c45(struct mii_bus *bus, int phyid, int devad,
+ int reg)
+{
+   struct iproc_mdiomux_desc *md = bus->priv;
+   int ret;
+
+   ret = start_miim_ops(md->base, true, phyid, reg | devad << 16, 0,
+MDIO_CTRL_READ_OP);
+   if (ret < 0)
+   dev_err(>dev, "mdiomux read c45 operation failed!!!");
+
+   return ret;
+}
+
+static int iproc_mdiomux_write_c22(struct mii_bus *bus,
+  int phyid, int reg, u16 val)
+{
+   struct iproc_mdiomux_desc *md = bus->priv;
+   int ret;
+
+   /* Write val at reg offset */
+   ret = start_miim_ops(md->base, false, phyid, reg, val,
+MDIO_CTRL_WRITE_OP);
+   if (ret < 0)
+   dev_err(>dev, "mdiomux write c22 operation failed!!!");
+
+   return ret;
+}
+
+static int iproc_mdiomux_write_c45(struct mii_bus *bus,
+  int phyid, int devad, int reg, u16 val)
 {
struct iproc_mdiomux_desc *md = bus->priv;
int ret;
 
/* Write val at reg offset */
-   ret = start_miim_ops(md->base, phyid, reg, val, MDIO_CTRL_WRITE_OP);
+   ret = start_miim_ops(md->base, true, phyid, reg | devad << 16, val,
+MDIO_CTRL_WRITE_OP);
if (ret < 0)
-   dev_err(>dev, "mdiomux write operation failed!!!");
+   dev_err(>dev, "mdiomux write c45 operation failed!!!");
 
return ret;
 }
@@ -223,8 +253,10 @@ static int mdio_mux_iproc_probe(struct platform_device 
*pdev)
bus->name = "iProc MDIO mux bus";
snprintf(bus->id, MII_BUS_ID_SIZE, "%s-%d", pdev->name, pdev->id);
bus->parent = >dev;
-   bus->read = iproc_mdiomux_read;
-   bus->write = iproc_mdiomux_write;
+   bus->read = iproc_mdiomux_read_c22;
+   bus->write = iproc_mdiomux_write_c22;
+   bus->read_c45 = iproc_mdiomux_read_c45;
+   bus->write_c45 = iproc_mdiomux_write_c45;
 
bus->phy_mask = ~0;
bus->dev.of_node = pdev->dev.of_node;

-- 
2.30.2

[PATCH net-next 06/10] net: ethernet: mtk_eth_soc: Separate C22 and C45 transactions

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The mediatek bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 178 +---
 1 file changed, 112 insertions(+), 66 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index e3de9a53b2d9..dc50e0b227a6 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -215,8 +215,8 @@ static int mtk_mdio_busy_wait(struct mtk_eth *eth)
return -ETIMEDOUT;
 }
 
-static int _mtk_mdio_write(struct mtk_eth *eth, u32 phy_addr, u32 phy_reg,
-  u32 write_data)
+static int _mtk_mdio_write_c22(struct mtk_eth *eth, u32 phy_addr, u32 phy_reg,
+  u32 write_data)
 {
int ret;
 
@@ -224,35 +224,13 @@ static int _mtk_mdio_write(struct mtk_eth *eth, u32 
phy_addr, u32 phy_reg,
if (ret < 0)
return ret;
 
-   if (phy_reg & MII_ADDR_C45) {
-   mtk_w32(eth, PHY_IAC_ACCESS |
-PHY_IAC_START_C45 |
-PHY_IAC_CMD_C45_ADDR |
-PHY_IAC_REG(mdiobus_c45_devad(phy_reg)) |
-PHY_IAC_ADDR(phy_addr) |
-PHY_IAC_DATA(mdiobus_c45_regad(phy_reg)),
-   MTK_PHY_IAC);
-
-   ret = mtk_mdio_busy_wait(eth);
-   if (ret < 0)
-   return ret;
-
-   mtk_w32(eth, PHY_IAC_ACCESS |
-PHY_IAC_START_C45 |
-PHY_IAC_CMD_WRITE |
-PHY_IAC_REG(mdiobus_c45_devad(phy_reg)) |
-PHY_IAC_ADDR(phy_addr) |
-PHY_IAC_DATA(write_data),
-   MTK_PHY_IAC);
-   } else {
-   mtk_w32(eth, PHY_IAC_ACCESS |
-PHY_IAC_START_C22 |
-PHY_IAC_CMD_WRITE |
-PHY_IAC_REG(phy_reg) |
-PHY_IAC_ADDR(phy_addr) |
-PHY_IAC_DATA(write_data),
-   MTK_PHY_IAC);
-   }
+   mtk_w32(eth, PHY_IAC_ACCESS |
+   PHY_IAC_START_C22 |
+   PHY_IAC_CMD_WRITE |
+   PHY_IAC_REG(phy_reg) |
+   PHY_IAC_ADDR(phy_addr) |
+   PHY_IAC_DATA(write_data),
+   MTK_PHY_IAC);
 
ret = mtk_mdio_busy_wait(eth);
if (ret < 0)
@@ -261,7 +239,8 @@ static int _mtk_mdio_write(struct mtk_eth *eth, u32 
phy_addr, u32 phy_reg,
return 0;
 }
 
-static int _mtk_mdio_read(struct mtk_eth *eth, u32 phy_addr, u32 phy_reg)
+static int _mtk_mdio_write_c45(struct mtk_eth *eth, u32 phy_addr,
+  u32 devad, u32 phy_reg, u32 write_data)
 {
int ret;
 
@@ -269,33 +248,82 @@ static int _mtk_mdio_read(struct mtk_eth *eth, u32 
phy_addr, u32 phy_reg)
if (ret < 0)
return ret;
 
-   if (phy_reg & MII_ADDR_C45) {
-   mtk_w32(eth, PHY_IAC_ACCESS |
-PHY_IAC_START_C45 |
-PHY_IAC_CMD_C45_ADDR |
-PHY_IAC_REG(mdiobus_c45_devad(phy_reg)) |
-PHY_IAC_ADDR(phy_addr) |
-PHY_IAC_DATA(mdiobus_c45_regad(phy_reg)),
-   MTK_PHY_IAC);
-
-   ret = mtk_mdio_busy_wait(eth);
-   if (ret < 0)
-   return ret;
-
-   mtk_w32(eth, PHY_IAC_ACCESS |
-PHY_IAC_START_C45 |
-PHY_IAC_CMD_C45_READ |
-PHY_IAC_REG(mdiobus_c45_devad(phy_reg)) |
-PHY_IAC_ADDR(phy_addr),
-   MTK_PHY_IAC);
-   } else {
-   mtk_w32(eth, PHY_IAC_ACCESS |
-PHY_IAC_START_C22 |
-PHY_IAC_CMD_C22_READ |
-PHY_IAC_REG(phy_reg) |
-PHY_IAC_ADDR(phy_addr),
-   MTK_PHY_IAC);
-   }
+   mtk_w32(eth, PHY_IAC_ACCESS |
+   PHY_IAC_START_C45 |
+   PHY_IAC_CMD_C45_ADDR |
+   PHY_IAC_REG(devad) |
+   PHY_IAC_ADDR(phy_addr) |
+   PHY_IAC_DATA(phy_reg),
+   MTK_PHY_IAC);
+
+   ret = mtk_mdio_busy_wait(eth);
+   if (ret < 0)
+   return ret;
+
+   mtk_w32(eth, PHY_IAC_ACCESS |
+   PHY_IAC_START_C45 |
+   PHY_IAC_CMD_WRITE |
+   PHY_IAC_REG(devad) |
+   PHY_IAC_ADDR(phy_addr) |
+   PHY_IAC_DATA(write_data),
+

[PATCH net-next 07/10] net: lan743x: Separate C22 and C45 transactions

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The microchip lan743x MDIO bus driver can perform both C22 and C45
transfers in some variants. Create separate functions for each and
register the C45 versions using the new API calls where appropriate.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
 drivers/net/ethernet/microchip/lan743x_main.c | 106 +-
 1 file changed, 51 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c 
b/drivers/net/ethernet/microchip/lan743x_main.c
index 534840f9a7ca..e205edf477de 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -792,7 +792,7 @@ static int lan743x_mac_mii_wait_till_not_busy(struct 
lan743x_adapter *adapter)
  !(data & MAC_MII_ACC_MII_BUSY_), 0, 100);
 }
 
-static int lan743x_mdiobus_read(struct mii_bus *bus, int phy_id, int index)
+static int lan743x_mdiobus_read_c22(struct mii_bus *bus, int phy_id, int index)
 {
struct lan743x_adapter *adapter = bus->priv;
u32 val, mii_access;
@@ -814,8 +814,8 @@ static int lan743x_mdiobus_read(struct mii_bus *bus, int 
phy_id, int index)
return (int)(val & 0x);
 }
 
-static int lan743x_mdiobus_write(struct mii_bus *bus,
-int phy_id, int index, u16 regval)
+static int lan743x_mdiobus_write_c22(struct mii_bus *bus,
+int phy_id, int index, u16 regval)
 {
struct lan743x_adapter *adapter = bus->priv;
u32 val, mii_access;
@@ -835,12 +835,10 @@ static int lan743x_mdiobus_write(struct mii_bus *bus,
return ret;
 }
 
-static u32 lan743x_mac_mmd_access(int id, int index, int op)
+static u32 lan743x_mac_mmd_access(int id, int dev_addr, int op)
 {
-   u16 dev_addr;
u32 ret;
 
-   dev_addr = (index >> 16) & 0x1f;
ret = (id << MAC_MII_ACC_PHY_ADDR_SHIFT_) &
MAC_MII_ACC_PHY_ADDR_MASK_;
ret |= (dev_addr << MAC_MII_ACC_MIIMMD_SHIFT_) &
@@ -858,7 +856,8 @@ static u32 lan743x_mac_mmd_access(int id, int index, int op)
return ret;
 }
 
-static int lan743x_mdiobus_c45_read(struct mii_bus *bus, int phy_id, int index)
+static int lan743x_mdiobus_read_c45(struct mii_bus *bus, int phy_id,
+   int dev_addr, int index)
 {
struct lan743x_adapter *adapter = bus->priv;
u32 mmd_access;
@@ -868,32 +867,30 @@ static int lan743x_mdiobus_c45_read(struct mii_bus *bus, 
int phy_id, int index)
ret = lan743x_mac_mii_wait_till_not_busy(adapter);
if (ret < 0)
return ret;
-   if (index & MII_ADDR_C45) {
-   /* Load Register Address */
-   lan743x_csr_write(adapter, MAC_MII_DATA, (u32)(index & 0x));
-   mmd_access = lan743x_mac_mmd_access(phy_id, index,
-   MMD_ACCESS_ADDRESS);
-   lan743x_csr_write(adapter, MAC_MII_ACC, mmd_access);
-   ret = lan743x_mac_mii_wait_till_not_busy(adapter);
-   if (ret < 0)
-   return ret;
-   /* Read Data */
-   mmd_access = lan743x_mac_mmd_access(phy_id, index,
-   MMD_ACCESS_READ);
-   lan743x_csr_write(adapter, MAC_MII_ACC, mmd_access);
-   ret = lan743x_mac_mii_wait_till_not_busy(adapter);
-   if (ret < 0)
-   return ret;
-   ret = lan743x_csr_read(adapter, MAC_MII_DATA);
-   return (int)(ret & 0x);
-   }
 
-   ret = lan743x_mdiobus_read(bus, phy_id, index);
-   return ret;
+   /* Load Register Address */
+   lan743x_csr_write(adapter, MAC_MII_DATA, index);
+   mmd_access = lan743x_mac_mmd_access(phy_id, dev_addr,
+   MMD_ACCESS_ADDRESS);
+   lan743x_csr_write(adapter, MAC_MII_ACC, mmd_access);
+   ret = lan743x_mac_mii_wait_till_not_busy(adapter);
+   if (ret < 0)
+   return ret;
+
+   /* Read Data */
+   mmd_access = lan743x_mac_mmd_access(phy_id, dev_addr,
+   MMD_ACCESS_READ);
+   lan743x_csr_write(adapter, MAC_MII_ACC, mmd_access);
+   ret = lan743x_mac_mii_wait_till_not_busy(adapter);
+   if (ret < 0)
+   return ret;
+
+   ret = lan743x_csr_read(adapter, MAC_MII_DATA);
+   return (int)(ret & 0x);
 }
 
-static int lan743x_mdiobus_c45_write(struct mii_bus *bus,
-int phy_id, int index, u16 regval)
+static int lan743x_mdiobus_write_c45(struct mii_bus *bus, int phy_id,
+int dev_addr, int index, u16 regval)
 {
struct lan743x_adapter *adapter = bus->priv;
u32 mmd_access;
@@ -903,26 +900,23 @@ static int lan743x_mdiobus_c45_write(struct mii_bus *bus,
ret =

[PATCH net-next 10/10] enetc: Separate C22 and C45 transactions

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The enetc MDIO bus driver can perform both C22 and C45 transfers.
Create separate functions for each and register the C45 versions using
the new API calls where appropriate.

This driver is shared with the Felix DSA switch, so update that at the
same time.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
 drivers/net/dsa/ocelot/felix_vsc9959.c |   6 +-
 drivers/net/ethernet/freescale/enetc/enetc_mdio.c  | 119 +++--
 .../net/ethernet/freescale/enetc/enetc_pci_mdio.c  |   6 +-
 drivers/net/ethernet/freescale/enetc/enetc_pf.c|  12 ++-
 include/linux/fsl/enetc_mdio.h |  21 +++-
 5 files changed, 121 insertions(+), 43 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c 
b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 01ac70fd7ddf..cbcc457499f3 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -954,8 +954,10 @@ static int vsc9959_mdio_bus_alloc(struct ocelot *ocelot)
return -ENOMEM;
 
bus->name = "VSC9959 internal MDIO bus";
-   bus->read = enetc_mdio_read;
-   bus->write = enetc_mdio_write;
+   bus->read = enetc_mdio_read_c22;
+   bus->write = enetc_mdio_write_c22;
+   bus->read_c45 = enetc_mdio_read_c45;
+   bus->write_c45 = enetc_mdio_write_c45;
bus->parent = dev;
mdio_priv = bus->priv;
mdio_priv->hw = hw;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_mdio.c 
b/drivers/net/ethernet/freescale/enetc/enetc_mdio.c
index 1c8f5cc6dec4..998aaa394e9c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_mdio.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_mdio.c
@@ -55,7 +55,8 @@ static int enetc_mdio_wait_complete(struct enetc_mdio_priv 
*mdio_priv)
  is_busy, !is_busy, 10, 10 * 1000);
 }
 
-int enetc_mdio_write(struct mii_bus *bus, int phy_id, int regnum, u16 value)
+int enetc_mdio_write_c22(struct mii_bus *bus, int phy_id, int regnum,
+u16 value)
 {
struct enetc_mdio_priv *mdio_priv = bus->priv;
u32 mdio_ctl, mdio_cfg;
@@ -63,14 +64,39 @@ int enetc_mdio_write(struct mii_bus *bus, int phy_id, int 
regnum, u16 value)
int ret;
 
mdio_cfg = ENETC_EMDIO_CFG;
-   if (regnum & MII_ADDR_C45) {
-   dev_addr = (regnum >> 16) & 0x1f;
-   mdio_cfg |= MDIO_CFG_ENC45;
-   } else {
-   /* clause 22 (ie 1G) */
-   dev_addr = regnum & 0x1f;
-   mdio_cfg &= ~MDIO_CFG_ENC45;
-   }
+   dev_addr = regnum & 0x1f;
+   mdio_cfg &= ~MDIO_CFG_ENC45;
+
+   enetc_mdio_wr(mdio_priv, ENETC_MDIO_CFG, mdio_cfg);
+
+   ret = enetc_mdio_wait_complete(mdio_priv);
+   if (ret)
+   return ret;
+
+   /* set port and dev addr */
+   mdio_ctl = MDIO_CTL_PORT_ADDR(phy_id) | MDIO_CTL_DEV_ADDR(dev_addr);
+   enetc_mdio_wr(mdio_priv, ENETC_MDIO_CTL, mdio_ctl);
+
+   /* write the value */
+   enetc_mdio_wr(mdio_priv, ENETC_MDIO_DATA, value);
+
+   ret = enetc_mdio_wait_complete(mdio_priv);
+   if (ret)
+   return ret;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(enetc_mdio_write_c22);
+
+int enetc_mdio_write_c45(struct mii_bus *bus, int phy_id, int dev_addr,
+int regnum, u16 value)
+{
+   struct enetc_mdio_priv *mdio_priv = bus->priv;
+   u32 mdio_ctl, mdio_cfg;
+   int ret;
+
+   mdio_cfg = ENETC_EMDIO_CFG;
+   mdio_cfg |= MDIO_CFG_ENC45;
 
enetc_mdio_wr(mdio_priv, ENETC_MDIO_CFG, mdio_cfg);
 
@@ -83,13 +109,11 @@ int enetc_mdio_write(struct mii_bus *bus, int phy_id, int 
regnum, u16 value)
enetc_mdio_wr(mdio_priv, ENETC_MDIO_CTL, mdio_ctl);
 
/* set the register address */
-   if (regnum & MII_ADDR_C45) {
-   enetc_mdio_wr(mdio_priv, ENETC_MDIO_ADDR, regnum & 0x);
+   enetc_mdio_wr(mdio_priv, ENETC_MDIO_ADDR, regnum & 0x);
 
-   ret = enetc_mdio_wait_complete(mdio_priv);
-   if (ret)
-   return ret;
-   }
+   ret = enetc_mdio_wait_complete(mdio_priv);
+   if (ret)
+   return ret;
 
/* write the value */
enetc_mdio_wr(mdio_priv, ENETC_MDIO_DATA, value);
@@ -100,9 +124,9 @@ int enetc_mdio_write(struct mii_bus *bus, int phy_id, int 
regnum, u16 value)
 
return 0;
 }
-EXPORT_SYMBOL_GPL(enetc_mdio_write);
+EXPORT_SYMBOL_GPL(enetc_mdio_write_c45);
 
-int enetc_mdio_read(struct mii_bus *bus, int phy_id, int regnum)
+int enetc_mdio_read_c22(struct mii_bus *bus, int phy_id, int regnum)
 {
struct enetc_mdio_priv *mdio_priv = bus->priv;
u32 mdio_ctl, mdio_cfg;
@@ -110,14 +134,51 @@ int enetc_mdio_read(struct mii_bus *bus, int phy_id, int 
regnum)
int ret;
 
mdio_cfg = ENETC_EMDIO_CFG;
-   if (regnum & MII_ADDR_C45) {
-   dev_addr = (regnum >> 16) & 0x1f;
-   mdio_cfg

[PATCH net-next 08/10] net: stmmac: Separate C22 and C45 transactions for xgmac2

2023-01-12 Thread Michael Walle

From: Andrew Lunn 

The stmicro stmmac xgmac2 MDIO bus driver can perform both C22 and C45
transfers. Create separate functions for each and register the C45
versions using the new API calls where appropriate.

Signed-off-by: Andrew Lunn 
Signed-off-by: Michael Walle 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 131 +-
 1 file changed, 81 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 5f177ea80725..4836a40df1af 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -45,8 +45,8 @@
 #define MII_XGMAC_PA_SHIFT 16
 #define MII_XGMAC_DA_SHIFT 21
 
-static int stmmac_xgmac2_c45_format(struct stmmac_priv *priv, int phyaddr,
-   int phyreg, u32 *hw_addr)
+static void stmmac_xgmac2_c45_format(struct stmmac_priv *priv, int phyaddr,
+int devad, int phyreg, u32 *hw_addr)
 {
u32 tmp;
 
@@ -56,19 +56,14 @@ static int stmmac_xgmac2_c45_format(struct stmmac_priv 
*priv, int phyaddr,
writel(tmp, priv->ioaddr + XGMAC_MDIO_C22P);
 
*hw_addr = (phyaddr << MII_XGMAC_PA_SHIFT) | (phyreg & 0x);
-   *hw_addr |= (phyreg >> MII_DEVADDR_C45_SHIFT) << MII_XGMAC_DA_SHIFT;
-   return 0;
+   *hw_addr |= devad << MII_XGMAC_DA_SHIFT;
 }
 
-static int stmmac_xgmac2_c22_format(struct stmmac_priv *priv, int phyaddr,
-   int phyreg, u32 *hw_addr)
+static void stmmac_xgmac2_c22_format(struct stmmac_priv *priv, int phyaddr,
+int phyreg, u32 *hw_addr)
 {
u32 tmp;
 
-   /* HW does not support C22 addr >= 4 */
-   if (phyaddr > MII_XGMAC_MAX_C22ADDR)
-   return -ENODEV;
-
/* Set port as Clause 22 */
tmp = readl(priv->ioaddr + XGMAC_MDIO_C22P);
tmp &= ~MII_XGMAC_C22P_MASK;
@@ -76,16 +71,14 @@ static int stmmac_xgmac2_c22_format(struct stmmac_priv 
*priv, int phyaddr,
writel(tmp, priv->ioaddr + XGMAC_MDIO_C22P);
 
*hw_addr = (phyaddr << MII_XGMAC_PA_SHIFT) | (phyreg & 0x1f);
-   return 0;
 }
 
-static int stmmac_xgmac2_mdio_read(struct mii_bus *bus, int phyaddr, int 
phyreg)
+static int stmmac_xgmac2_mdio_read(struct stmmac_priv *priv, u32 addr,
+  u32 value)
 {
-   struct net_device *ndev = bus->priv;
-   struct stmmac_priv *priv = netdev_priv(ndev);
unsigned int mii_address = priv->hw->mii.addr;
unsigned int mii_data = priv->hw->mii.data;
-   u32 tmp, addr, value = MII_XGMAC_BUSY;
+   u32 tmp;
int ret;
 
ret = pm_runtime_resume_and_get(priv->device);
@@ -99,20 +92,6 @@ static int stmmac_xgmac2_mdio_read(struct mii_bus *bus, int 
phyaddr, int phyreg)
goto err_disable_clks;
}
 
-   if (phyreg & MII_ADDR_C45) {
-   phyreg &= ~MII_ADDR_C45;
-
-   ret = stmmac_xgmac2_c45_format(priv, phyaddr, phyreg, );
-   if (ret)
-   goto err_disable_clks;
-   } else {
-   ret = stmmac_xgmac2_c22_format(priv, phyaddr, phyreg, );
-   if (ret)
-   goto err_disable_clks;
-
-   value |= MII_XGMAC_SADDR;
-   }
-
value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
& priv->hw->mii.clk_csr_mask;
value |= MII_XGMAC_READ;
@@ -144,14 +123,44 @@ static int stmmac_xgmac2_mdio_read(struct mii_bus *bus, 
int phyaddr, int phyreg)
return ret;
 }
 
-static int stmmac_xgmac2_mdio_write(struct mii_bus *bus, int phyaddr,
-   int phyreg, u16 phydata)
+static int stmmac_xgmac2_mdio_read_c22(struct mii_bus *bus, int phyaddr,
+  int phyreg)
 {
struct net_device *ndev = bus->priv;
-   struct stmmac_priv *priv = netdev_priv(ndev);
+   struct stmmac_priv *priv;
+   u32 addr;
+
+   priv = netdev_priv(ndev);
+
+   /* HW does not support C22 addr >= 4 */
+   if (phyaddr > MII_XGMAC_MAX_C22ADDR)
+   return -ENODEV;
+
+   stmmac_xgmac2_c22_format(priv, phyaddr, phyreg, );
+
+   return stmmac_xgmac2_mdio_read(priv, addr, MII_XGMAC_BUSY);
+}
+
+static int stmmac_xgmac2_mdio_read_c45(struct mii_bus *bus, int phyaddr,
+  int devad, int phyreg)
+{
+   struct net_device *ndev = bus->priv;
+   struct stmmac_priv *priv;
+   u32 addr;
+
+   priv = netdev_priv(ndev);
+
+   stmmac_xgmac2_c45_format(priv, phyaddr, devad, phyreg, );
+
+   return stmmac_xgmac2_mdio_read(priv, addr, MII_XGMAC_BUSY);
+}
+
+static int stmmac_xgmac2_mdio_write(struct stmmac_priv *priv, u32 addr,
+   u32 value, u16 phydata)
+{
unsigned int mii_address = priv->hw->mii.addr;

[PATCH net-next 00/10] net: mdio: Continue separating C22 and C45

2023-01-12 Thread Michael Walle

I've picked this older series from Andrew up and rebased it onto
the latest net-next.

This is the second patch set in the series which separates the C22
and C45 MDIO bus transactions at the API level to the MDIO bus drivers.

Signed-off-by: Michael Walle 



To: Heiner Kallweit 
To: Russell King 
To: "David S. Miller" 
To: Eric Dumazet 
To: Jakub Kicinski 
To: Paolo Abeni 
To: Ray Jui 
To: Scott Branden 
To: Broadcom internal kernel review list 
To: Joel Stanley 
To: Andrew Jeffery 
To: Felix Fietkau 
To: John Crispin 
To: Sean Wang 
To: Mark Lee 
To: Lorenzo Bianconi 
To: Matthias Brugger 
To: Bryan Whitehead 
To: unglinuxdri...@microchip.com
To: Giuseppe Cavallaro 
To: Alexandre Torgue 
To: Jose Abreu 
To: Maxime Coquelin 
To: Vladimir Oltean 
To: Claudiu Manoil 
To: Alexandre Belloni 
To: Florian Fainelli 
To: Li Yang 
Cc: net...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-asp...@lists.ozlabs.org
Cc: linux-media...@lists.infradead.org
Cc: linux-st...@st-md-mailman.stormreply.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Andrew Lunn 
Signed-off-by: Michael Walle 

---
Andrew Lunn (10):
  net: mdio: cavium: Separate C22 and C45 transactions
  net: mdio: i2c: Separate C22 and C45 transactions
  net: mdio: mux-bcm-iproc: Separate C22 and C45 transactions
  net: mdio: aspeed: Separate C22 and C45 transactions
  net: mdio: ipq4019: Separate C22 and C45 transactions
  net: ethernet: mtk_eth_soc: Separate C22 and C45 transactions
  net: lan743x: Separate C22 and C45 transactions
  net: stmmac: Separate C22 and C45 transactions for xgmac2
  net: stmmac: Separate C22 and C45 transactions for xgmac
  enetc: Separate C22 and C45 transactions

 drivers/net/dsa/ocelot/felix_vsc9959.c |   6 +-
 drivers/net/ethernet/freescale/enetc/enetc_mdio.c  | 119 ++--
 .../net/ethernet/freescale/enetc/enetc_pci_mdio.c  |   6 +-
 drivers/net/ethernet/freescale/enetc/enetc_pf.c|  12 +-
 drivers/net/ethernet/mediatek/mtk_eth_soc.c| 178 +++
 drivers/net/ethernet/microchip/lan743x_main.c  | 106 ---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  | 331 ++---
 drivers/net/mdio/mdio-aspeed.c |  47 +--
 drivers/net/mdio/mdio-cavium.c | 111 +--
 drivers/net/mdio/mdio-cavium.h |   9 +-
 drivers/net/mdio/mdio-i2c.c|  32 +-
 drivers/net/mdio/mdio-ipq4019.c| 154 ++
 drivers/net/mdio/mdio-mux-bcm-iproc.c  |  54 +++-
 drivers/net/mdio/mdio-octeon.c |   6 +-
 drivers/net/mdio/mdio-thunder.c|   6 +-
 include/linux/fsl/enetc_mdio.h |  21 +-
 16 files changed, 766 insertions(+), 432 deletions(-)
---
base-commit: 0a093b2893c711d82622a9ab27da4f1172821336
change-id: 20230112-net-next-c45-seperation-part-2-1b8fbb144687

Best regards,
-- 
Michael Walle

Re: [PATCH] powerpc/rtas: upgrade internal arch spinlocks

2023-01-12 Thread Laurent Dufour

On 10/01/2023 05:42:55, Nathan Lynch wrote:
> At the time commit f97bb36f705d ("powerpc/rtas: Turn rtas lock into a
> raw spinlock") was written, the spinlock lockup detection code called
> __delay(), which will not make progress if the timebase is not
> advancing. Since the interprocessor timebase synchronization sequence
> for chrp, cell, and some now-unsupported Power models can temporarily
> freeze the timebase through an RTAS function (freeze-time-base), the
> lock that serializes most RTAS calls was converted to arch_spinlock_t
> to prevent kernel hangs in the lockup detection code.
> 
> However, commit bc88c10d7e69 ("locking/spinlock/debug: Remove spinlock
> lockup detection code") removed that inconvenient property from the
> lock debug code several years ago. So now it should be safe to
> reintroduce generic locks into the RTAS support code, primarily to
> increase lockdep coverage.
> 
> Making rtas.lock a spinlock_t would violate lock type nesting rules
> because it can be acquired while holding raw locks, e.g. pci_lock and
> irq_desc->lock. So convert it to raw_spinlock_t. There's no apparent
> reason not to upgrade timebase_lock as well.
> 
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/include/asm/rtas-types.h |  2 +-
>  arch/powerpc/kernel/rtas.c| 52 ---
>  2 files changed, 15 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas-types.h 
> b/arch/powerpc/include/asm/rtas-types.h
> index 8df6235d64d1..a58f96eb2d19 100644
> --- a/arch/powerpc/include/asm/rtas-types.h
> +++ b/arch/powerpc/include/asm/rtas-types.h
> @@ -18,7 +18,7 @@ struct rtas_t {
>   unsigned long entry;/* physical address pointer */
>   unsigned long base; /* physical address pointer */
>   unsigned long size;
> - arch_spinlock_t lock;
> + raw_spinlock_t lock;
>   struct rtas_args args;
>   struct device_node *dev;/* virtual address pointer */
>  };
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index deded51a7978..a834726f18e3 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -61,7 +61,7 @@ static inline void do_enter_rtas(unsigned long args)
>  }
>  
>  struct rtas_t rtas = {
> - .lock = __ARCH_SPIN_LOCK_UNLOCKED
> + .lock = __RAW_SPIN_LOCK_UNLOCKED(rtas.lock),
>  };
>  EXPORT_SYMBOL(rtas);

This is not the scope of this patch, but the RTAS's lock is externalized
through the structure rtas_t, while it is only used in that file.

I think, this would be good, in case of future change about that lock, and
in order to not break KABI, to move it out of that structure, and to define
it statically in that file.

Otherwise, looks good to me.

Reviewed-by: Laurent Dufour 

>  
> @@ -80,28 +80,6 @@ unsigned long rtas_rmo_buf;
>  void (*rtas_flash_term_hook)(int);
>  EXPORT_SYMBOL(rtas_flash_term_hook);
>  
> -/* RTAS use home made raw locking instead of spin_lock_irqsave
> - * because those can be called from within really nasty contexts
> - * such as having the timebase stopped which would lockup with
> - * normal locks and spinlock debugging enabled
> - */
> -static unsigned long lock_rtas(void)
> -{
> - unsigned long flags;
> -
> - local_irq_save(flags);
> - preempt_disable();
> - arch_spin_lock();
> - return flags;
> -}
> -
> -static void unlock_rtas(unsigned long flags)
> -{
> - arch_spin_unlock();
> - local_irq_restore(flags);
> - preempt_enable();
> -}
> ->  /*
>   * call_rtas_display_status and call_rtas_display_status_delay
>   * are designed only for very early low-level debugging, which
> @@ -109,14 +87,14 @@ static void unlock_rtas(unsigned long flags)
>   */
>  static void call_rtas_display_status(unsigned char c)
>  {
> - unsigned long s;
> + unsigned long flags;
>  
>   if (!rtas.base)
>   return;
>  
> - s = lock_rtas();
> + raw_spin_lock_irqsave(, flags);
>   rtas_call_unlocked(, 10, 1, 1, NULL, c);
> - unlock_rtas(s);
> + raw_spin_unlock_irqrestore(, flags);
>  }
>  
>  static void call_rtas_display_status_delay(char c)
> @@ -534,7 +512,7 @@ int rtas_call(int token, int nargs, int nret, int 
> *outputs, ...)
>  {
>   va_list list;
>   int i;
> - unsigned long s;
> + unsigned long flags;
>   struct rtas_args *rtas_args;
>   char *buff_copy = NULL;
>   int ret;
> @@ -557,8 +535,7 @@ int rtas_call(int token, int nargs, int nret, int 
> *outputs, ...)
>   return -1;
>   }
>  
> - s = lock_rtas();
> -
> + raw_spin_lock_irqsave(, flags);
>   /* We use the global rtas args buffer */
>   rtas_args = 
>  
> @@ -576,7 +553,7 @@ int rtas_call(int token, int nargs, int nret, int 
> *outputs, ...)
>   outputs[i] = be32_to_cpu(rtas_args->rets[i+1]);
>   ret = (nret > 0)? be32_to_cpu(rtas_args->rets[0]): 0;
>  
> - unlock_rtas(s);
> + raw_spin_unlock_irqrestore(, flags);
>

[PATCH] kallsyms: Fix scheduling with interrupts disabled in self-test

2023-01-12 Thread Nicholas Piggin

kallsyms_on_each* may schedule so must not be called with interrupts
disabled. The iteration function could disable interrupts, but this
also changes lookup_symbol() to match the change to the other timing
code.

Reported-by: Erhard F. 
Link: 
https://lore.kernel.org/all/bug-216902-206...@https.bugzilla.kernel.org%2F/
Reported-by: kernel test robot 
Link: https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.s...@intel.com
Fixes: 30f3bb09778d ("kallsyms: Add self-test facility")
Signed-off-by: Nicholas Piggin 
---
 kernel/kallsyms_selftest.c | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/kernel/kallsyms_selftest.c b/kernel/kallsyms_selftest.c
index f35d9cc1aab1..bfbc12da3326 100644
--- a/kernel/kallsyms_selftest.c
+++ b/kernel/kallsyms_selftest.c
@@ -157,14 +157,11 @@ static void test_kallsyms_compression_ratio(void)
 static int lookup_name(void *data, const char *name, struct module *mod, 
unsigned long addr)
 {
u64 t0, t1, t;
-   unsigned long flags;
struct test_stat *stat = (struct test_stat *)data;
 
-   local_irq_save(flags);
-   t0 = sched_clock();
+   t0 = ktime_get_ns();
(void)kallsyms_lookup_name(name);
-   t1 = sched_clock();
-   local_irq_restore(flags);
+   t1 = ktime_get_ns();
 
t = t1 - t0;
if (t < stat->min)
@@ -234,18 +231,15 @@ static int find_symbol(void *data, const char *name, 
struct module *mod, unsigne
 static void test_perf_kallsyms_on_each_symbol(void)
 {
u64 t0, t1;
-   unsigned long flags;
struct test_stat stat;
 
memset(, 0, sizeof(stat));
stat.max = INT_MAX;
stat.name = stub_name;
stat.perf = 1;
-   local_irq_save(flags);
-   t0 = sched_clock();
+   t0 = ktime_get_ns();
kallsyms_on_each_symbol(find_symbol, );
-   t1 = sched_clock();
-   local_irq_restore(flags);
+   t1 = ktime_get_ns();
pr_info("kallsyms_on_each_symbol() traverse all: %lld ns\n", t1 - t0);
 }
 
@@ -270,17 +264,14 @@ static int match_symbol(void *data, unsigned long addr)
 static void test_perf_kallsyms_on_each_match_symbol(void)
 {
u64 t0, t1;
-   unsigned long flags;
struct test_stat stat;
 
memset(, 0, sizeof(stat));
stat.max = INT_MAX;
stat.name = stub_name;
-   local_irq_save(flags);
-   t0 = sched_clock();
+   t0 = ktime_get_ns();
kallsyms_on_each_match_symbol(match_symbol, stat.name, );
-   t1 = sched_clock();
-   local_irq_restore(flags);
+   t1 = ktime_get_ns();
pr_info("kallsyms_on_each_match_symbol() traverse all: %lld ns\n", t1 - 
t0);
 }
 
-- 
2.37.2

78 matches

Mail list logo