Re: [PATCH 1/3] x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n

2024-04-12 Thread Stephen Rothwell
Hi Sean,

I noticed this commit in linux-next.

On Tue,  9 Apr 2024 10:51:05 -0700 Sean Christopherson  
wrote:
>
> Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built
> with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly
> states that disabling SPECULATION_MITIGATIONS is supposed to turn off all
> mitigations by default.
> 
>   │ If you say N, all mitigations will be disabled. You really
>   │ should know what you are doing to say so.
> 
> As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in
> some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n.
> 
> Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Sean Christopherson 
> ---
>  kernel/cpu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 8f6affd051f7..07ad53b7f119 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -3207,7 +3207,8 @@ enum cpu_mitigations {
>  };
>  
>  static enum cpu_mitigations cpu_mitigations __ro_after_init =
> - CPU_MITIGATIONS_AUTO;
> + IS_ENABLED(CONFIG_SPECULATION_MITIGATIONS) ? CPU_MITIGATIONS_AUTO :
> +  CPU_MITIGATIONS_OFF;
>  
>  static int __init mitigations_parse_cmdline(char *arg)
>  {
> -- 
> 2.44.0.478.gd926399ef9-goog
> 

I noticed because it turned off all mitigations for my PowerPC qemu
boot tests - probably because CONFIG_SPECULATION_MITIGATIONS only
exists in arch/x86/Kconfig ... thus for other architectures that have
cpu mitigations, this will always default them to off, right?

-- 
Cheers,
Stephen Rothwell


pgpVHLvRdprn6.pgp
Description: OpenPGP digital signature


Re: [PATCH v12 8/8] PCI: endpoint: Remove "core_init_notifier" flag

2024-04-12 Thread Bjorn Helgaas
On Wed, Mar 27, 2024 at 02:43:37PM +0530, Manivannan Sadhasivam wrote:
> "core_init_notifier" flag is set by the glue drivers requiring refclk from
> the host to complete the DWC core initialization. Also, those drivers will
> send a notification to the EPF drivers once the initialization is fully
> completed using the pci_epc_init_notify() API. Only then, the EPF drivers
> will start functioning.
> 
> For the rest of the drivers generating refclk locally, EPF drivers will
> start functioning post binding with them. EPF drivers rely on the
> 'core_init_notifier' flag to differentiate between the drivers.
> Unfortunately, this creates two different flows for the EPF drivers.
> 
> So to avoid that, let's get rid of the "core_init_notifier" flag and follow
> a single initialization flow for the EPF drivers. This is done by calling
> the dw_pcie_ep_init_notify() from all glue drivers after the completion of
> dw_pcie_ep_init_registers() API. This will allow all the glue drivers to
> send the notification to the EPF drivers once the initialization is fully
> completed.

Thanks for doing this!  I think this is a significantly nicer
solution than core_init_notifier was.

One question: both qcom and tegra194 call dw_pcie_ep_init_registers()
from an interrupt handler, but they register that handler in a
different order with respect to dw_pcie_ep_init().

I don't know what actually starts the process that leads to the
interrupt, but if it's dw_pcie_ep_init(), then one of these (qcom, I
think) must be racy:

  qcom_pcie_ep_probe
dw_pcie_ep_init <- A
qcom_pcie_ep_enable_irq_resources
  devm_request_threaded_irq(qcom_pcie_ep_perst_irq_thread)  <- B

  qcom_pcie_ep_perst_irq_thread
qcom_pcie_perst_deassert
  dw_pcie_ep_init_registers

  tegra_pcie_dw_probe
tegra_pcie_config_ep
  devm_request_threaded_irq(tegra_pcie_ep_pex_rst_irq)  <- B
  dw_pcie_ep_init   <- A

  tegra_pcie_ep_pex_rst_irq
pex_ep_event_pex_rst_deassert
  dw_pcie_ep_init_registers

Whatever the right answer is, I think qcom and tegra194 should both
order dw_pcie_ep_init() and the devm_request_threaded_irq() the same
way.

Bjorn


Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread David Hildenbrand

On 11.04.24 18:55, Paolo Bonzini wrote:

On Mon, Apr 8, 2024 at 3:56 PM Peter Xu  wrote:

Paolo,

I may miss a bunch of details here (as I still remember some change_pte
patches previously on the list..), however not sure whether we considered
enable it?  Asked because I remember Andrea used to have a custom tree
maintaining that part:

https://github.com/aagit/aa/commit/c761078df7a77d13ddfaeebe56a0f4bc128b1968


The patch enables it only for KSM, so it would still require a bunch
of cleanups, for example I also would still use set_pte_at() in all
the places that are not KSM. This would at least fix the issue with
the poor documentation of where to use set_pte_at_notify() vs
set_pte_at().

With regard to the implementation, I like the idea of disabling the
invalidation on the MMU notifier side, but I would rather have
MMU_NOTIFIER_CHANGE_PTE as a separate field in the range instead of
overloading the event field.


Maybe it can't be enabled for some reason that I overlooked in the current
tree, or we just decided to not to?


I have just learnt about the patch, nobody had ever mentioned it even
though it's almost 2 years old... It's a lot of code though and no one


I assume Andrea used it on his tree where he also has a version of 
"randprotect" (even included in that commit subject) to mitigate a KSM 
security issue that was reported by some security researchers [1] a 
while ago. From what I recall, the industry did not end up caring about 
that security issue that much.


IIUC, with "randprotect" we get a lot more R/O protection even when not 
de-duplicating a page -- thus the name. Likely, the reporter mentioned 
in the commit is a researcher that played with Andreas fix for the 
security issue. But I'm just speculating at this point :)



has ever reported an issue for over 10 years, so I think it's easiest
to just rip the code out.


Yes. Can always be readded in a possibly cleaner fashion (like you note 
above), when deemed necessary and we are willing to support it.


[1] https://gruss.cc/files/remote_dedup.pdf

--
Cheers,

David / dhildenb



Re: [PATCH v12 2/8] PCI: dwc: ep: Add Kernel-doc comments for APIs

2024-04-12 Thread Bjorn Helgaas
On Wed, Mar 27, 2024 at 02:43:31PM +0530, Manivannan Sadhasivam wrote:
> All of the APIs are missing the Kernel-doc comments. Hence, add them.

> + * dw_pcie_ep_reset_bar - Reset endpoint BAR

Apparently this resets @bar for every function of the device, so it's
not just a single BAR?

> + * dw_pcie_ep_raise_intx_irq - Raise INTx IRQ to the host
> + * @ep: DWC EP device
> + * @func_no: Function number of the endpoint
> + *
> + * Return: 0 if success, errono otherwise.

s/errono/errno/ (another instance below)

Bjorn


[RFC PATCH] powerpc: Optimise barriers for fully ordered atomics

2024-04-12 Thread Nicholas Piggin
"Fully ordered" atomics (RMW that return a value) are said to have a
full barrier before and after the atomic operation. This is implemented
as:

  hwsync
  larx
  ...
  stcx.
  bne-
  hwsync

This is slow on POWER processors because hwsync and stcx. require a
round-trip to the nest (~= L2 cache). The hwsyncs can be avoided with
the sequence:

  lwsync
  larx
  ...
  stcx.
  bne-
  isync

lwsync prevents all reorderings except store/load reordering, so the
larx could be execued ahead of a prior store becoming visible. However
the stcx. is a store, so it is ordered by the lwsync against all prior
access and if the value in memory had been modified since the larx, it
will fail. So the point at which the larx executes is not a concern
because the stcx. always verifies the memory was unchanged.

The isync prevents subsequent instructions being executed before the
stcx. executes, and stcx. is necessarily visible to the system after
it executes, so there is no opportunity for it (or prior stores, thanks
to lwsync) to become visible after a subsequent load or store.

This sequence requires only one L2 round-trip and so is around 2x faster
measured on a POWER10 with back-to-back atomic ops on cached memory.

[ Remains to be seen if this is always faster when there is other activity
going on, and if it's faster on non-POEWR CPUs or perhaps older ones
like 970 that might not optimise isync so much. ]

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/synch.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/synch.h b/arch/powerpc/include/asm/synch.h
index b0b4c64870d7..0b1718eb9a40 100644
--- a/arch/powerpc/include/asm/synch.h
+++ b/arch/powerpc/include/asm/synch.h
@@ -60,8 +60,8 @@ static inline void ppc_after_tlbiel_barrier(void)
MAKE_LWSYNC_SECTION_ENTRY(97, __lwsync_fixup);
 #define PPC_ACQUIRE_BARRIER "\n" stringify_in_c(__PPC_ACQUIRE_BARRIER)
 #define PPC_RELEASE_BARRIER stringify_in_c(LWSYNC) "\n"
-#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(sync) "\n"
-#define PPC_ATOMIC_EXIT_BARRIER "\n" stringify_in_c(sync) "\n"
+#define PPC_ATOMIC_ENTRY_BARRIER "\n" stringify_in_c(LWSYNC) "\n"
+#define PPC_ATOMIC_EXIT_BARRIER "\n" stringify_in_c(isync) "\n"
 #else
 #define PPC_ACQUIRE_BARRIER
 #define PPC_RELEASE_BARRIER
-- 
2.43.0



Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread Sean Christopherson
On Fri, Apr 12, 2024, Marc Zyngier wrote:
> On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon  wrote:
> > On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > Also, if you're in the business of hacking the MMU notifier code, it
> > would be really great to change the .clear_flush_young() callback so
> > that the architecture could handle the TLB invalidation. At the moment,
> > the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> > being set by kvm_handle_hva_range(), whereas we could do a much
> > lighter-weight and targetted TLBI in the architecture page-table code
> > when we actually update the ptes for small ranges.
> 
> Indeed, and I was looking at this earlier this week as it has a pretty
> devastating effect with NV (it blows the shadow S2 for that VMID, with
> costly consequences).
> 
> In general, it feels like the TLB invalidation should stay with the
> code that deals with the page tables, as it has a pretty good idea of
> what needs to be invalidated and how -- specially on architectures
> that have a HW-broadcast facility like arm64.

Would this be roughly on par with an in-line flush on arm64?  The simpler, more
straightforward solution would be to let architectures override flush_on_ret,
but I would prefer something like the below as x86 can also utilize a 
range-based
flush when running as a nested hypervisor.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ff0a20565f90..b65116294efe 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t 
__kvm_handle_hva_range(struct kvm *kvm,
struct kvm_gfn_range gfn_range;
struct kvm_memory_slot *slot;
struct kvm_memslots *slots;
+   bool need_flush = false;
int i, idx;
 
if (WARN_ON_ONCE(range->end <= range->start))
@@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t 
__kvm_handle_hva_range(struct kvm *kvm,
break;
}
r.ret |= range->handler(kvm, _range);
+
+   /*
+* Use a precise gfn-based TLB flush when possible, as
+* most mmu_notifier events affect a small-ish range.
+* Fall back to a full TLB flush if the gfn-based flush
+* fails, and don't bother trying the gfn-based flush
+* if a full flush is already pending.
+*/
+   if (range->flush_on_ret && !need_flush && r.ret &&
+   kvm_arch_flush_remote_tlbs_range(kvm, 
gfn_range.start
+gfn_range.end - 
gfn_range.start + 1))
+   need_flush = true;
}
}
 
-   if (range->flush_on_ret && r.ret)
+   if (need_flush)
kvm_flush_remote_tlbs(kvm);
 
if (r.found_memslot)



Re: [RFC PATCH 0/8] Reimplement huge pages without hugepd on powerpc 8xx

2024-04-12 Thread Peter Xu
On Fri, Apr 12, 2024 at 02:08:03PM +, Christophe Leroy wrote:
> 
> 
> Le 11/04/2024 à 18:15, Peter Xu a écrit :
> > On Mon, Mar 25, 2024 at 01:38:40PM -0300, Jason Gunthorpe wrote:
> >> On Mon, Mar 25, 2024 at 03:55:53PM +0100, Christophe Leroy wrote:
> >>> This series reimplements hugepages with hugepd on powerpc 8xx.
> >>>
> >>> Unlike most architectures, powerpc 8xx HW requires a two-level
> >>> pagetable topology for all page sizes. So a leaf PMD-contig approach
> >>> is not feasible as such.
> >>>
> >>> Possible sizes are 4k, 16k, 512k and 8M.
> >>>
> >>> First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD entries
> >>> must point to a single entry level-2 page table. Until now that was
> >>> done using hugepd. This series changes it to use standard page tables
> >>> where the entry is replicated 1024 times on each of the two pagetables
> >>> refered by the two associated PMD entries for that 8M page.
> >>>
> >>> At the moment it has to look into each helper to know if the
> >>> hugepage ptep is a PTE or a PMD in order to know it is a 8M page or
> >>> a lower size. I hope this can me handled by core-mm in the future.
> >>>
> >>> There are probably several ways to implement stuff, so feedback is
> >>> very welcome.
> >>
> >> I thought it looks pretty good!
> > 
> > I second it.
> > 
> > I saw the discussions in patch 1.  Christophe, I suppose you're exploring
> > the big hammer over hugepd, and perhaps went already with the 32bit pmd
> > solution for nohash/32bit challenge you mentioned?
> > 
> > I'm trying to position my next step; it seems like at least I should not
> > adding any more hugepd code, then should I go with ARCH_HAS_HUGEPD checks,
> > or you're going to have an RFC soon then I can base on top?
> 
> Depends on what you expect by "soon".
> 
> I sure won't be able to send any RFC before end of April.
> 
> Should be possible to have something during May.

That's good enough, thanks.  I'll see what is the best I can do.

Then do you think I can leave p4d/pgd leaves alone?  Please check the other
email where I'm not sure whether pgd leaves ever existed for any of
PowerPC.  That's so far what I plan to do, on teaching pgtable walkers
recognize pud and lower for all leaves.  Then if Power can switch from
hugepd to this it should just work.

Even if pgd exists (then something I overlooked..), I'm wondering whether
we can push that downwards to be either pud/pmd (and looks like we all
agree p4d is never used on Power).  That may involve some pgtable
operations moving from pgd level to lower, e.g. my pure imagination would
look like starting with:

#define PTE_INDEX_SIZE  PTE_SHIFT
#define PMD_INDEX_SIZE  0
#define PUD_INDEX_SIZE  0
#define PGD_INDEX_SIZE  (32 - PGDIR_SHIFT)

To:

#define PTE_INDEX_SIZE  PTE_SHIFT
#define PMD_INDEX_SIZE  (32 - PMD_SHIFT)
#define PUD_INDEX_SIZE  0
#define PGD_INDEX_SIZE  0

And the rest will need care too.  I hope moving downward is easier
(e.g. the walker should always exist for lower levels but not always for
higher levels), but I actually have little idea on whether there's any
other implications, so please bare with me on stupid mistakes.

I just hope pgd leaves don't exist already, then I think it'll be simpler.

Thanks,

-- 
Peter Xu



Re: [PATCH v3 00/12] mm/gup: Unify hugetlb, part 2

2024-04-12 Thread Christophe Leroy


Le 10/04/2024 à 21:58, Peter Xu a écrit :
>>
>> e500 has two modes: 32 bits and 64 bits.
>>
>> For 32 bits:
>>
>> 8xx is the only one handling it through HW-assisted pagetable walk hence
>> requiring a 2-level whatever the pagesize is.
> 
> Hmm I think maybe finally I get it..
> 
> I think the confusion came from when I saw there's always such level-2
> table described in Figure 8-5 of the manual:
> 
> https://www.nxp.com/docs/en/reference-manual/MPC860UM.pdf

Yes indeed that figure is confusing.

Table 8-1 gives a pretty good idea of what is required. We only use 
MD_CTR[TWAM] = 1

> 
> So I suppose you meant for 8M, the PowerPC 8xx system hardware will be
> aware of such 8M pgtable (from level-1's entry, where it has bit 28-29 set
> 011b), then it won't ever read anything starting from "Level-2 Descriptor
> 1" (but only read the only entry "Level-2 Descriptor 0"), so fundamentally
> hugepd format must look like such for 8xx?
> 
> But then perhaps it's still compatible with cont-pte because the rest
> entries (pte index 1+) will simply be ignored by the hardware?

Yes, still compatible with CONT-PTE allthough things become tricky 
because you need two page tables to get the full 8M so that's a kind of 
cont-PMD down to PTE level, as you can see in my RFC series.

> 
>>
>> On e500 it is all software so pages 2M and larger should be cont-PGD (by
>> the way I'm a bit puzzled that on arches that have only 2 levels, ie PGD
>> and PTE, the PGD entries are populated by a function called PMD_populate()).
> 
> Yeah.. I am also wondering whether pgd_populate() could also work there
> (perhaps with some trivial changes, or maybe not even needed..), as when
> p4d/pud/pmd levels are missing, linux should just do something like an
> enforced cast from pgd_t* -> pmd_t* in this case.
> 
> I think currently they're already not pgd, as __find_linux_pte() already
> skipped pgd unconditionally:
> 
>   pgdp = pgdir + pgd_index(ea);
>   p4dp = p4d_offset(pgdp, ea);
> 

Yes that's what is confusing, some parts of code considers we have only 
a PGD and a PT while other parts consider we have only a PMD and a PT

>>
>> Current situation for 8xx is illustrated here:
>> https://github.com/linuxppc/wiki/wiki/Huge-pages#8xx
>>
>> I also tried to better illustrate e500/32 here:
>> https://github.com/linuxppc/wiki/wiki/Huge-pages#e500
>>
>> For 64 bits:
>> We have PTE/PMD/PUD/PGD, no P4D
>>
>> See arch/powerpc/include/asm/nohash/64/pgtable-4k.h
> 
> We don't have anything that is above pud in this category, right?  That's
> what I read from your wiki (and thanks for providing that in the first
> place; helps a lot for me to understand how it works on PowerPC).

Yes thanks to Michael and Aneesh who initiated that Wiki page.

> 
> I want to make sure if I can move on without caring on p4d/pgd leafs like
> what we do right now, even after if we can remove hugepd for good, in this
> case since p4d always missing, then it's about whether "pud|pmd|pte_leaf()"
> can also cover the pgd ones when that day comes, iiuc.

I guess so but I'd like Aneesh and/or Michael to confirm as I'm not an 
expert on PPC64.

Christophe


Re: [RFC PATCH 0/8] Reimplement huge pages without hugepd on powerpc 8xx

2024-04-12 Thread Christophe Leroy


Le 11/04/2024 à 18:15, Peter Xu a écrit :
> On Mon, Mar 25, 2024 at 01:38:40PM -0300, Jason Gunthorpe wrote:
>> On Mon, Mar 25, 2024 at 03:55:53PM +0100, Christophe Leroy wrote:
>>> This series reimplements hugepages with hugepd on powerpc 8xx.
>>>
>>> Unlike most architectures, powerpc 8xx HW requires a two-level
>>> pagetable topology for all page sizes. So a leaf PMD-contig approach
>>> is not feasible as such.
>>>
>>> Possible sizes are 4k, 16k, 512k and 8M.
>>>
>>> First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD entries
>>> must point to a single entry level-2 page table. Until now that was
>>> done using hugepd. This series changes it to use standard page tables
>>> where the entry is replicated 1024 times on each of the two pagetables
>>> refered by the two associated PMD entries for that 8M page.
>>>
>>> At the moment it has to look into each helper to know if the
>>> hugepage ptep is a PTE or a PMD in order to know it is a 8M page or
>>> a lower size. I hope this can me handled by core-mm in the future.
>>>
>>> There are probably several ways to implement stuff, so feedback is
>>> very welcome.
>>
>> I thought it looks pretty good!
> 
> I second it.
> 
> I saw the discussions in patch 1.  Christophe, I suppose you're exploring
> the big hammer over hugepd, and perhaps went already with the 32bit pmd
> solution for nohash/32bit challenge you mentioned?
> 
> I'm trying to position my next step; it seems like at least I should not
> adding any more hugepd code, then should I go with ARCH_HAS_HUGEPD checks,
> or you're going to have an RFC soon then I can base on top?

Depends on what you expect by "soon".

I sure won't be able to send any RFC before end of April.

Should be possible to have something during May.

Christophe


[PATCH v3 2/2] PCI: Create helper to print TLP Header and Prefix Log

2024-04-12 Thread Ilpo Järvinen
Add pcie_print_tlp_log() helper to print TLP Header and Prefix Log.
Print End-End Prefixes only if they are non-zero.

Consolidate the few places which currently print TLP using custom
formatting.

The first attempt used pr_cont() instead of building a string first but
it turns out pr_cont() is not compatible with pci_err() but prints on a
separate line. When I asked about this, Andy Shevchenko suggested
pr_cont() should not be used in the first place (to eventually get rid
of it) so pr_cont() is now replaced with building the string first.

Signed-off-by: Ilpo Järvinen 
---
 drivers/pci/pci.c  | 32 
 drivers/pci/pcie/aer.c | 10 ++
 drivers/pci/pcie/dpc.c |  5 +
 include/linux/aer.h|  2 ++
 4 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af230e6e5557..54d4872d14b8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -9,6 +9,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1116,6 +1117,37 @@ int pcie_read_tlp_log(struct pci_dev *dev, int where, 
int where2,
 }
 EXPORT_SYMBOL_GPL(pcie_read_tlp_log);
 
+/**
+ * pcie_print_tlp_log - Print TLP Header / Prefix Log contents
+ * @dev:   PCIe device
+ * @tlp_log:   TLP Log structure
+ * @pfx:   Internal string prefix (for indentation)
+ *
+ * Prints TLP Header and Prefix Log information held by @tlp_log.
+ */
+void pcie_print_tlp_log(const struct pci_dev *dev,
+   const struct pcie_tlp_log *tlp_log, const char *pfx)
+{
+   char buf[(10 + 1) * (4 + ARRAY_SIZE(tlp_log->prefix)) + 14 + 1];
+   unsigned int i;
+   int len;
+
+   len = scnprintf(buf, sizeof(buf), "%#010x %#010x %#010x %#010x",
+   tlp_log->dw[0], tlp_log->dw[1], tlp_log->dw[2],
+   tlp_log->dw[3]);
+
+   if (tlp_log->prefix[0])
+   len += scnprintf(buf + len, sizeof(buf) - len, " E-E 
Prefixes:");
+   for (i = 0; i < ARRAY_SIZE(tlp_log->prefix); i++) {
+   if (!tlp_log->prefix[i])
+   break;
+   len += scnprintf(buf + len, sizeof(buf) - len,
+" %#010x", tlp_log->prefix[i]);
+   }
+
+   pci_err(dev, "%sTLP Header: %s\n", pfx, buf);
+}
+
 /**
  * pci_restore_bars - restore a device's BAR values (e.g. after wake-up)
  * @dev: PCI device to have its BARs restored
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index ecc1dea5a208..efb9e728fe94 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -664,12 +664,6 @@ static void pci_rootport_aer_stats_incr(struct pci_dev 
*pdev,
}
 }
 
-static void __print_tlp_header(struct pci_dev *dev, struct pcie_tlp_log *t)
-{
-   pci_err(dev, "  TLP Header: %08x %08x %08x %08x\n",
-   t->dw[0], t->dw[1], t->dw[2], t->dw[3]);
-}
-
 static void __aer_print_error(struct pci_dev *dev,
  struct aer_err_info *info)
 {
@@ -724,7 +718,7 @@ void aer_print_error(struct pci_dev *dev, struct 
aer_err_info *info)
__aer_print_error(dev, info);
 
if (info->tlp_header_valid)
-   __print_tlp_header(dev, >tlp);
+   pcie_print_tlp_log(dev, >tlp, "  ");
 
 out:
if (info->id && info->error_dev_num > 1 && info->id == id)
@@ -796,7 +790,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
aer->uncor_severity);
 
if (tlp_header_valid)
-   __print_tlp_header(dev, >header_log);
+   pcie_print_tlp_log(dev, >header_log, "  ");
 
trace_aer_event(dev_name(>dev), (status & ~mask),
aer_severity, tlp_header_valid, >header_log);
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 80b1456f95fe..3f8e3b6c7948 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -229,10 +229,7 @@ static void dpc_process_rp_pio_error(struct pci_dev *pdev)
pcie_read_tlp_log(pdev, cap + PCI_EXP_DPC_RP_PIO_HEADER_LOG,
  cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG,
  dpc_tlp_log_len(pdev), _log);
-   pci_err(pdev, "TLP Header: %#010x %#010x %#010x %#010x\n",
-   tlp_log.dw[0], tlp_log.dw[1], tlp_log.dw[2], tlp_log.dw[3]);
-   for (i = 0; i < pdev->dpc_rp_log_size - 5; i++)
-   pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, 
tlp_log.prefix[i]);
+   pcie_print_tlp_log(pdev, _log, "");
 
if (pdev->dpc_rp_log_size < 5)
goto clear_status;
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 2484056feb8d..1e8c61deca65 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -41,6 +41,8 @@ struct aer_capability_regs {
 int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
  unsigned int tlp_len, struct pcie_tlp_log *log);
 unsigned int aer_tlp_log_len(struct pci_dev *dev);
+void pcie_print_tlp_log(const 

[PATCH v3 1/2] PCI: Add TLP Prefix reading into pcie_read_tlp_log()

2024-04-12 Thread Ilpo Järvinen
pcie_read_tlp_log() handles only 4 TLP Header Log DWORDs but TLP Prefix
Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present.

Generalize pcie_read_tlp_log() and struct pcie_tlp_log to handle also
TLP Prefix Log. The layout of relevant registers in AER and DPC
Capability is not identical because the offsets of TLP Header Log and
TLP Prefix Log vary so the callers must pass the offsets to
pcie_read_tlp_log().

Convert eetlp_prefix_path into integer called eetlp_prefix_max and
make is available also when CONFIG_PCI_PASID is not configured to
be able to determine the number of E-E Prefixes.

Signed-off-by: Ilpo Järvinen 
---
 drivers/pci/ats.c |  2 +-
 drivers/pci/pci.c | 34 --
 drivers/pci/pcie/aer.c|  4 +++-
 drivers/pci/pcie/dpc.c| 22 +++---
 drivers/pci/probe.c   | 14 +-
 include/linux/aer.h   |  5 -
 include/linux/pci.h   |  2 +-
 include/uapi/linux/pci_regs.h |  2 ++
 8 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index c570892b2090..e13433dcfc82 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -377,7 +377,7 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (WARN_ON(pdev->pasid_enabled))
return -EBUSY;
 
-   if (!pdev->eetlp_prefix_path && !pdev->pasid_no_tlp)
+   if (!pdev->eetlp_prefix_max && !pdev->pasid_no_tlp)
return -EINVAL;
 
if (!pasid)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e5f243dd4288..af230e6e5557 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1066,26 +1066,48 @@ static void pci_enable_acs(struct pci_dev *dev)
pci_disable_acs_redir(dev);
 }
 
+/**
+ * aer_tlp_log_len - Calculates TLP Header/Prefix Log length
+ * @dev:   PCIe device
+ *
+ * Return: TLP Header/Prefix Log length
+ */
+unsigned int aer_tlp_log_len(struct pci_dev *dev)
+{
+   return 4 + dev->eetlp_prefix_max;
+}
+
 /**
  * pcie_read_tlp_log - read TLP Header Log
  * @dev: PCIe device
  * @where: PCI Config offset of TLP Header Log
+ * @where2: PCI Config offset of TLP Prefix Log
+ * @tlp_len: TLP Log length (in DWORDs)
  * @tlp_log: TLP Log structure to fill
  *
  * Fill @tlp_log from TLP Header Log registers, e.g., AER or DPC.
  *
  * Return: 0 on success and filled TLP Log structure, <0 on error.
  */
-int pcie_read_tlp_log(struct pci_dev *dev, int where,
- struct pcie_tlp_log *tlp_log)
+int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2,
+ unsigned int tlp_len, struct pcie_tlp_log *tlp_log)
 {
-   int i, ret;
+   unsigned int i;
+   int off, ret;
+   u32 *to;
 
memset(tlp_log, 0, sizeof(*tlp_log));
 
-   for (i = 0; i < 4; i++) {
-   ret = pci_read_config_dword(dev, where + i * 4,
-   _log->dw[i]);
+   for (i = 0; i < tlp_len; i++) {
+   if (i < 4) {
+   to = _log->dw[i];
+   off = where + i * 4;
+   } else {
+   to = _log->prefix[i - 4];
+   off = where2 + (i - 4) * 4;
+   }
+
+   ret = pci_read_config_dword(dev, off, to);
if (ret)
return pcibios_err_to_errno(ret);
}
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index ac6293c24976..ecc1dea5a208 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1245,7 +1245,9 @@ int aer_get_device_error_info(struct pci_dev *dev, struct 
aer_err_info *info)
 
if (info->status & AER_LOG_TLP_MASKS) {
info->tlp_header_valid = 1;
-   pcie_read_tlp_log(dev, aer + PCI_ERR_HEADER_LOG, 
>tlp);
+   pcie_read_tlp_log(dev, aer + PCI_ERR_HEADER_LOG,
+ aer + PCI_ERR_PREFIX_LOG,
+ aer_tlp_log_len(dev), >tlp);
}
}
 
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index a668820696dc..80b1456f95fe 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -187,10 +187,19 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
return ret;
 }
 
+static unsigned int dpc_tlp_log_len(struct pci_dev *pdev)
+{
+   /* Remove ImpSpec Log register from the count */
+   if (pdev->dpc_rp_log_size >= 5)
+   return pdev->dpc_rp_log_size - 1;
+
+   return pdev->dpc_rp_log_size;
+}
+
 static void dpc_process_rp_pio_error(struct pci_dev *pdev)
 {
u16 cap = pdev->dpc_cap, dpc_status, first_error;
-   u32 status, mask, sev, syserr, exc, log, prefix;
+   u32 status, mask, sev, syserr, exc, log;
struct pcie_tlp_log tlp_log;
int i;
 
@@ -217,20 +226,19 @@ static void dpc_process_rp_pio_error(struct pci_dev *pdev)
 

[PATCH v3 0/2] PCI: Consolidate TLP Log reading and printing

2024-04-12 Thread Ilpo Järvinen
This series has the remaining patches of the AER & DPC TLP Log handling
consolidation.

v3:
- Small rewording in a commit message

v2:
- Don't add EXPORT()s
- Don't include igxbe changes
- Don't use pr_cont() as it's incompatible with pci_err() and according
  to Andy Shevchenko should not be used in the first place

Ilpo Järvinen (2):
  PCI: Add TLP Prefix reading into pcie_read_tlp_log()
  PCI: Create helper to print TLP Header and Prefix Log

 drivers/pci/ats.c |  2 +-
 drivers/pci/pci.c | 66 +++
 drivers/pci/pcie/aer.c| 14 +++-
 drivers/pci/pcie/dpc.c| 23 +++-
 drivers/pci/probe.c   | 14 +---
 include/linux/aer.h   |  7 +++-
 include/linux/pci.h   |  2 +-
 include/uapi/linux/pci_regs.h |  2 ++
 8 files changed, 98 insertions(+), 32 deletions(-)

-- 
2.39.2



Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread Marc Zyngier
On Fri, 12 Apr 2024 11:44:09 +0100,
Will Deacon  wrote:
> 
> On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index dc04bc767865..ff17849be9f4 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
> > kvm_gfn_range *range)
> > return false;
> >  }
> >  
> > -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > -{
> > -   kvm_pfn_t pfn = pte_pfn(range->arg.pte);
> > -
> > -   if (!kvm->arch.mmu.pgt)
> > -   return false;
> > -
> > -   WARN_ON(range->end - range->start != 1);
> > -
> > -   /*
> > -* If the page isn't tagged, defer to user_mem_abort() for sanitising
> > -* the MTE tags. The S2 pte should have been unmapped by
> > -* mmu_notifier_invalidate_range_end().
> > -*/
> > -   if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
> > -   return false;
> > -
> > -   /*
> > -* We've moved a page around, probably through CoW, so let's treat
> > -* it just like a translation fault and the map handler will clean
> > -* the cache to the PoC.
> > -*
> > -* The MMU notifiers will have unmapped a huge PMD before calling
> > -* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
> > -* therefore we never need to clear out a huge PMD through this
> > -* calling path and a memcache is not required.
> > -*/
> > -   kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
> > -  PAGE_SIZE, __pfn_to_phys(pfn),
> > -  KVM_PGTABLE_PROT_R, NULL, 0);
> > -
> > -   return false;
> > -}
> > -
> >  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> >  {
> > u64 size = (range->end - range->start) << PAGE_SHIFT;
> 
> Thanks. It's nice to see this code retire:
> 
> Acked-by: Will Deacon 
> 
> Also, if you're in the business of hacking the MMU notifier code, it
> would be really great to change the .clear_flush_young() callback so
> that the architecture could handle the TLB invalidation. At the moment,
> the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
> being set by kvm_handle_hva_range(), whereas we could do a much
> lighter-weight and targetted TLBI in the architecture page-table code
> when we actually update the ptes for small ranges.

Indeed, and I was looking at this earlier this week as it has a pretty
devastating effect with NV (it blows the shadow S2 for that VMID, with
costly consequences).

In general, it feels like the TLB invalidation should stay with the
code that deals with the page tables, as it has a pretty good idea of
what needs to be invalidated and how -- specially on architectures
that have a HW-broadcast facility like arm64.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify()

2024-04-12 Thread Marc Zyngier
On Fri, 05 Apr 2024 12:58:11 +0100,
Paolo Bonzini  wrote:
> 
> The .change_pte() MMU notifier callback was intended as an optimization
> and for this reason it was initially called without a surrounding
> mmu_notifier_invalidate_range_{start,end}() pair.  It was only ever
> implemented by KVM (which was also the original user of MMU notifiers)
> and the rules on when to call set_pte_at_notify() rather than set_pte_at()
> have always been pretty obscure.
> 
> It may seem a miracle that it has never caused any hard to trigger
> bugs, but there's a good reason for that: KVM's implementation has
> been nonfunctional for a good part of its existence.  Already in
> 2012, commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with
> invalidate_range_start and invalidate_range_end", 2012-10-09) changed the
> .change_pte() callback to occur within an invalidate_range_start/end()
> pair; and because KVM unmaps the sPTEs during .invalidate_range_start(),
> .change_pte() has no hope of finding a sPTE to change.
> 
> Therefore, all the code for .change_pte() can be removed from both KVM
> and mm/, and set_pte_at_notify() can be replaced with just set_pte_at().
> 
> Please review!  Also feel free to take the KVM patches through the mm
> tree, as I don't expect any conflicts.
> 
> Thanks,
> 
> Paolo
> 
> Paolo Bonzini (4):
>   KVM: delete .change_pte MMU notifier callback
>   KVM: remove unused argument of kvm_handle_hva_range()
>   mmu_notifier: remove the .change_pte() callback
>   mm: replace set_pte_at_notify() with just set_pte_at()
> 
>  arch/arm64/kvm/mmu.c  | 34 -
>  arch/loongarch/include/asm/kvm_host.h |  1 -
>  arch/loongarch/kvm/mmu.c  | 32 
>  arch/mips/kvm/mmu.c   | 30 ---
>  arch/powerpc/include/asm/kvm_ppc.h|  1 -
>  arch/powerpc/kvm/book3s.c |  5 ---
>  arch/powerpc/kvm/book3s.h |  1 -
>  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 12 --
>  arch/powerpc/kvm/book3s_hv.c  |  1 -
>  arch/powerpc/kvm/book3s_pr.c  |  7 
>  arch/powerpc/kvm/e500_mmu_host.c  |  6 ---
>  arch/riscv/kvm/mmu.c  | 20 --
>  arch/x86/kvm/mmu/mmu.c| 54 +--
>  arch/x86/kvm/mmu/spte.c   | 16 
>  arch/x86/kvm/mmu/spte.h   |  2 -
>  arch/x86/kvm/mmu/tdp_mmu.c| 46 ---
>  arch/x86/kvm/mmu/tdp_mmu.h|  1 -
>  include/linux/kvm_host.h  |  2 -
>  include/linux/mmu_notifier.h  | 44 --
>  include/trace/events/kvm.h| 15 
>  kernel/events/uprobes.c   |  5 +--
>  mm/ksm.c  |  4 +-
>  mm/memory.c   |  7 +---
>  mm/migrate_device.c   |  8 +---
>  mm/mmu_notifier.c | 17 -
>  virt/kvm/kvm_main.c   | 50 +
>  26 files changed, 10 insertions(+), 411 deletions(-)
> 

Reviewed-by: Marc Zyngier 

M.

-- 
Without deviation from the norm, progress is not possible.


Re: [PATCH 1/1] Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig

2024-04-12 Thread Michael Ellerman
Vignesh Balasubramanian  writes:
> "ARCH_HAVE_EXTRA_ELF_NOTES" enables an extra note section in the
> core dump. Kconfig variable is preferred over ARCH_HAVE_* macro.
>
> Co-developed-by: Jini Susan George 
> Signed-off-by: Jini Susan George 
> Signed-off-by: Vignesh Balasubramanian 
> ---
>  arch/Kconfig   | 9 +
>  arch/powerpc/Kconfig   | 1 +
>  arch/powerpc/include/asm/elf.h | 2 --
>  include/linux/elf.h| 2 +-
>  4 files changed, 11 insertions(+), 3 deletions(-)

Acked-by: Michael Ellerman  (powerpc)

cheers

> diff --git a/arch/Kconfig b/arch/Kconfig
> index 9f066785bb71..143f021c8a76 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -502,6 +502,15 @@ config MMU_LAZY_TLB_SHOOTDOWN
>  config ARCH_HAVE_NMI_SAFE_CMPXCHG
>   bool
>  
> +config ARCH_HAVE_EXTRA_ELF_NOTES
> + bool
> + help
> +   An architecture should select this in order to enable adding an
> +   arch-specific ELF note section to core files. It must provide two
> +   functions: elf_coredump_extra_notes_size() and
> +   elf_coredump_extra_notes_write() which are invoked by the ELF core
> +   dumper.
> +
>  config ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
>   bool
>  
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 1c4be3373686..c45fa9d7fb76 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -156,6 +156,7 @@ config PPC
>   select ARCH_HAS_UACCESS_FLUSHCACHE
>   select ARCH_HAS_UBSAN
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
> + select ARCH_HAVE_EXTRA_ELF_NOTESif SPU_BASE
>   select ARCH_KEEP_MEMBLOCK
>   select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU
>   select ARCH_MIGHT_HAVE_PC_PARPORT
> diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
> index 79f1c480b5eb..bb4b9d3e 100644
> --- a/arch/powerpc/include/asm/elf.h
> +++ b/arch/powerpc/include/asm/elf.h
> @@ -127,8 +127,6 @@ extern int arch_setup_additional_pages(struct 
> linux_binprm *bprm,
>  /* Notes used in ET_CORE. Note name is "SPU//". */
>  #define NT_SPU   1
>  
> -#define ARCH_HAVE_EXTRA_ELF_NOTES
> -
>  #endif /* CONFIG_SPU_BASE */
>  
>  #ifdef CONFIG_PPC64
> diff --git a/include/linux/elf.h b/include/linux/elf.h
> index c9a46c4e183b..5c402788da19 100644
> --- a/include/linux/elf.h
> +++ b/include/linux/elf.h
> @@ -65,7 +65,7 @@ extern Elf64_Dyn _DYNAMIC [];
>  struct file;
>  struct coredump_params;
>  
> -#ifndef ARCH_HAVE_EXTRA_ELF_NOTES
> +#ifndef CONFIG_ARCH_HAVE_EXTRA_ELF_NOTES
>  static inline int elf_coredump_extra_notes_size(void) { return 0; }
>  static inline int elf_coredump_extra_notes_write(struct coredump_params 
> *cprm) { return 0; }
>  #else
> -- 
> 2.34.1


Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-12 Thread Will Deacon
On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index dc04bc767865..ff17849be9f4 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
> kvm_gfn_range *range)
>   return false;
>  }
>  
> -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> -{
> - kvm_pfn_t pfn = pte_pfn(range->arg.pte);
> -
> - if (!kvm->arch.mmu.pgt)
> - return false;
> -
> - WARN_ON(range->end - range->start != 1);
> -
> - /*
> -  * If the page isn't tagged, defer to user_mem_abort() for sanitising
> -  * the MTE tags. The S2 pte should have been unmapped by
> -  * mmu_notifier_invalidate_range_end().
> -  */
> - if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
> - return false;
> -
> - /*
> -  * We've moved a page around, probably through CoW, so let's treat
> -  * it just like a translation fault and the map handler will clean
> -  * the cache to the PoC.
> -  *
> -  * The MMU notifiers will have unmapped a huge PMD before calling
> -  * ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
> -  * therefore we never need to clear out a huge PMD through this
> -  * calling path and a memcache is not required.
> -  */
> - kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
> -PAGE_SIZE, __pfn_to_phys(pfn),
> -KVM_PGTABLE_PROT_R, NULL, 0);
> -
> - return false;
> -}
> -
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
>   u64 size = (range->end - range->start) << PAGE_SHIFT;

Thanks. It's nice to see this code retire:

Acked-by: Will Deacon 

Also, if you're in the business of hacking the MMU notifier code, it
would be really great to change the .clear_flush_young() callback so
that the architecture could handle the TLB invalidation. At the moment,
the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
being set by kvm_handle_hva_range(), whereas we could do a much
lighter-weight and targetted TLBI in the architecture page-table code
when we actually update the ptes for small ranges.

Will


[PATCH v2 1/2] powerpc/pseries: Add pool idle time at LPAR boot

2024-04-12 Thread Shrikanth Hegde
When there are no options specified for lparstat, it is expected to
give reports since LPAR(Logical Partition) boot.

APP(Available Processor Pool) is an indicator of how many cores in the
shared pool are free to use in Shared Processor LPAR(SPLPAR). APP is
derived using pool_idle_time which is obtained using H_PIC call.

The interval based reports show correct APP value while since boot
report shows very high APP values. This happens because in that case APP
is obtained by dividing pool idle time by LPAR uptime. Since pool idle
time is reported by the PowerVM hypervisor since its boot, it need not
align with LPAR boot.

To fix that export boot pool idle time in lparcfg and powerpc-utils will
use this info to derive APP as below for since boot reports.

APP = (pool idle time - boot pool idle time) / (uptime * timebase)

Results:: Observe APP values.
== Shared LPAR 
lparstat
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00

reboot
stress-ng --cpu=$(nproc) -t 600
sleep 600
So in this case app is expected to close to 37-6=31.

== 6.9-rc1 and lparstat 1.3.10  =
%user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
- - --- - - - - -
47.48  0.01  0.0052.51 0.00  0.00 47.49 69099.72 54154721

=== With this patch and powerpc-utils patch to do the above equation ===
%user  %sys %wait%idlephysc %entc lbusy   app  vcsw phint
- - --- - - - - -
47.48  0.01  0.0052.51 5.73 47.75 47.49 31.21 54175321
=

Note: physc, purr/idle purr being inaccurate is being handled in a
separate patch in powerpc-utils tree.

Signed-off-by: Shrikanth Hegde 
---
 arch/powerpc/platforms/pseries/lparcfg.c | 39 ++--
 1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index f73c4d1c26af..5c2a3e802a02 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -170,20 +170,24 @@ static void show_gpci_data(struct seq_file *m)
kfree(buf);
 }

-static unsigned h_pic(unsigned long *pool_idle_time,
- unsigned long *num_procs)
+static long h_pic(unsigned long *pool_idle_time,
+ unsigned long *num_procs)
 {
-   unsigned long rc;
-   unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+   long rc;
+   unsigned long retbuf[PLPAR_HCALL_BUFSIZE] = {0};

rc = plpar_hcall(H_PIC, retbuf);

-   *pool_idle_time = retbuf[0];
-   *num_procs = retbuf[1];
+   if (pool_idle_time)
+   *pool_idle_time = retbuf[0];
+   if (num_procs)
+   *num_procs = retbuf[1];

return rc;
 }

+unsigned long boot_pool_idle_time;
+
 /*
  * parse_ppp_data
  * Parse out the data returned from h_get_ppp and h_pic
@@ -215,9 +219,15 @@ static void parse_ppp_data(struct seq_file *m)
seq_printf(m, "pool_capacity=%d\n",
   ppp_data.active_procs_in_pool * 100);

-   h_pic(_idle_time, _procs);
-   seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
-   seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
+   /* In case h_pic call is not successful, this would result in
+* APP values being wrong in tools like lparstat.
+*/
+
+   if (h_pic(_idle_time, _procs) == H_SUCCESS) {
+   seq_printf(m, "pool_idle_time=%ld\n", pool_idle_time);
+   seq_printf(m, "pool_num_procs=%ld\n", pool_procs);
+   seq_printf(m, "boot_pool_idle_time=%ld\n", 
boot_pool_idle_time);
+   }
}

seq_printf(m, "unallocated_capacity_weight=%d\n",
@@ -792,6 +802,7 @@ static const struct proc_ops lparcfg_proc_ops = {
 static int __init lparcfg_init(void)
 {
umode_t mode = 0444;
+   long retval;

/* Allow writing if we have FW_FEATURE_SPLPAR */
if (firmware_has_feature(FW_FEATURE_SPLPAR))
@@ -801,6 +812,16 @@ static int __init lparcfg_init(void)
printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
return -EIO;
}
+
+   /* If this call fails, it would result in APP values
+* being wrong for since boot reports of lparstat
+*/
+   retval = h_pic(_pool_idle_time, NULL);
+
+   if (retval != H_SUCCESS)
+   pr_debug("H_PIC failed during lparcfg init retval: %ld\n",
+retval);
+
return 0;
 }
 machine_device_initcall(pseries, lparcfg_init);
--
2.39.3



[PATCH v2 2/2] powerpc/pseries: Add failure related checks for h_get_mpp and h_get_ppp

2024-04-12 Thread Shrikanth Hegde
Couple of Minor fixes:

- hcall return values are long. Fix that for h_get_mpp, h_get_ppp and
parse_ppp_data

- If hcall fails, values set should be at-least zero. It shouldn't be
uninitialized values. Fix that for h_get_mpp and h_get_ppp

Signed-off-by: Shrikanth Hegde 
---
 arch/powerpc/include/asm/hvcall.h| 2 +-
 arch/powerpc/platforms/pseries/lpar.c| 6 +++---
 arch/powerpc/platforms/pseries/lparcfg.c | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index a41e542ba94d..3d642139b900 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -570,7 +570,7 @@ struct hvcall_mpp_data {
unsigned long backing_mem;
 };

-int h_get_mpp(struct hvcall_mpp_data *);
+long h_get_mpp(struct hvcall_mpp_data *mpp_data);

 struct hvcall_mpp_x_data {
unsigned long coalesced_bytes;
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 4e9916bb03d7..c1d8bee8f701 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -1886,10 +1886,10 @@ notrace void __trace_hcall_exit(long opcode, long 
retval, unsigned long *retbuf)
  * h_get_mpp
  * H_GET_MPP hcall returns info in 7 parms
  */
-int h_get_mpp(struct hvcall_mpp_data *mpp_data)
+long h_get_mpp(struct hvcall_mpp_data *mpp_data)
 {
-   int rc;
-   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
+   long rc;

rc = plpar_hcall9(H_GET_MPP, retbuf);

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index 5c2a3e802a02..ed2176d8a866 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -113,8 +113,8 @@ struct hvcall_ppp_data {
  */
 static unsigned int h_get_ppp(struct hvcall_ppp_data *ppp_data)
 {
-   unsigned long rc;
-   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+   unsigned long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
+   long rc;

rc = plpar_hcall9(H_GET_PPP, retbuf);

@@ -197,7 +197,7 @@ static void parse_ppp_data(struct seq_file *m)
struct hvcall_ppp_data ppp_data;
struct device_node *root;
const __be32 *perf_level;
-   int rc;
+   long rc;

rc = h_get_ppp(_data);
if (rc)
--
2.39.3



[PATCH v2 0/2] powerpc/pseries: Fixes for lparstat boot reports

2024-04-12 Thread Shrikanth Hegde
Currently lparstat reports which shows since LPAR boot are wrong for
some fields. There is a need for storing the PIC(Pool Idle Count) at
boot for accurate reporting. PATCH 1 Does that.

While there, it was noticed that hcall return value is long and both
h_get_ppp and h_get_mpp could set the uninitialized values if the hcall
fails. PATCH 2 does that.

v1 -> v2:
- Nathan pointed out the issues surrounding the h_pic call. Addressed
those.
- Added a pr_debug if h_pic fails during lparcfg_init
- If h_pic fails while reading lparcfg, related files are not exported.
- Added failure checks for h_get_mpp, h_get_ppp calls as well.

v1: https://lore.kernel.org/all/20240405101340.149171-1-sshe...@linux.ibm.com/

Shrikanth Hegde (2):
  powerpc/pseries: Add pool idle time at LPAR boot
  powerpc/pseries: Add fail related checks for h_get_mpp and h_get_ppp

 arch/powerpc/include/asm/hvcall.h|  2 +-
 arch/powerpc/platforms/pseries/lpar.c|  6 ++--
 arch/powerpc/platforms/pseries/lparcfg.c | 45 +---
 3 files changed, 37 insertions(+), 16 deletions(-)

--
2.39.3



Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-12 Thread Ingo Molnar


* Mike Rapoport  wrote:

> +/**
> + * enum execmem_type - types of executable memory ranges
> + *
> + * There are several subsystems that allocate executable memory.
> + * Architectures define different restrictions on placement,
> + * permissions, alignment and other parameters for memory that can be used
> + * by these subsystems.
> + * Types in this enum identify subsystems that allocate executable memory
> + * and let architectures define parameters for ranges suitable for
> + * allocations by each subsystem.
> + *
> + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> + * are not explcitly defined.
> + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> + * @EXECMEM_KPROBES: parameters for kprobes
> + * @EXECMEM_FTRACE: parameters for ftrace
> + * @EXECMEM_BPF: parameters for BPF
> + * @EXECMEM_TYPE_MAX:
> + */
> +enum execmem_type {
> + EXECMEM_DEFAULT,
> + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> + EXECMEM_KPROBES,
> + EXECMEM_FTRACE,
> + EXECMEM_BPF,
> + EXECMEM_TYPE_MAX,
> +};

s/explcitly
 /explicitly

Thanks,

Ingo


Re: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text

2024-04-12 Thread Ingo Molnar


* Mike Rapoport  wrote:

>   for (s = start; s < end; s++) {
>   void *addr = (void *)s + *s;
> + void *wr_addr = addr + module_writable_offset(mod, addr);

So instead of repeating this pattern in a dozen of places, why not use a 
simpler method:

void *wr_addr = module_writable_address(mod, addr);

or so, since we have to pass 'addr' to the module code anyway.

The text patching code is pretty complex already.

Thanks,

Ingo


Re: Re: [PATCH] tty: hvc: wakeup hvc console immediately when needed

2024-04-12 Thread li.hao40
> On 12. 04. 24, 5:38, li.ha...@zte.com.cn wrote:
> > From: Li Hao 
> > 
> > Cancel the do_wakeup flag in hvc_struct, and change it to immediately
> > wake up tty when hp->n_outbuf is 0 in hvc_push().
> > 
> > When we receive a key input character, the interrupt handling function
> > hvc_handle_interrupt() will be executed, and the echo thread
> > flush_to_ldisc() will be added to the queue.
> > 
> > If the user is currently using tcsetattr(), a hang may occur. tcsetattr()
> > enters kernel and waits for hp->n_outbuf to become 0 via
> > tty_wait_until_sent(). If the echo thread finishes executing before
> > reaching tty_wait_until_sent (for example, put_chars() takes too long),
> > it will cause while meeting the wakeup condition (hp->do_wakeup = 1),
> > tty_wait_until_sent() cannot be woken up (missed the tty_wakeup() of
> > this round's tty_poll). Unless the next key input character comes,
> > hvc_poll will be executed, and tty_wakeup() will be performed through
> > the do_wakeup flag.
> > 
> > Signed-off-by: Li Hao 
> > ---
> >   drivers/tty/hvc/hvc_console.c | 12 +---
> >   drivers/tty/hvc/hvc_console.h |  1 -
> >   2 files changed, 5 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> > index cd1f657f7..2fa90d938 100644
> > --- a/drivers/tty/hvc/hvc_console.c
> > +++ b/drivers/tty/hvc/hvc_console.c
> > @@ -476,11 +476,13 @@ static void hvc_hangup(struct tty_struct *tty)
> >   static int hvc_push(struct hvc_struct *hp)
> >   {
> >   int n;
> > +struct tty_struct *tty;
> > 
> >   n = hp->ops->put_chars(hp->vtermno, hp->outbuf, hp->n_outbuf);
> > +tty = tty_port_tty_get(>port);
> >   if (n <= 0) {
> >   if (n == 0 || n == -EAGAIN) {
> > -hp->do_wakeup = 1;
> > +tty_wakeup(tty);
> 
> What if tty is NULL? Did you intent to use tty_port_tty_wakeup() instead?
> 
> thanks,
> -- 
> js
> suse labs

Thank you for your prompt reply.
tty_port_tty_wakeup() is better, it no longer check if tty is NULL in hvc_push()

Li Hao


Re: [PATCH] tty: hvc: wakeup hvc console immediately when needed

2024-04-12 Thread Jiri Slaby

On 12. 04. 24, 5:38, li.ha...@zte.com.cn wrote:

From: Li Hao 

Cancel the do_wakeup flag in hvc_struct, and change it to immediately
wake up tty when hp->n_outbuf is 0 in hvc_push().

When we receive a key input character, the interrupt handling function
hvc_handle_interrupt() will be executed, and the echo thread
flush_to_ldisc() will be added to the queue.

If the user is currently using tcsetattr(), a hang may occur. tcsetattr()
enters kernel and waits for hp->n_outbuf to become 0 via
tty_wait_until_sent(). If the echo thread finishes executing before
reaching tty_wait_until_sent (for example, put_chars() takes too long),
it will cause while meeting the wakeup condition (hp->do_wakeup = 1),
tty_wait_until_sent() cannot be woken up (missed the tty_wakeup() of
this round's tty_poll). Unless the next key input character comes,
hvc_poll will be executed, and tty_wakeup() will be performed through
the do_wakeup flag.

Signed-off-by: Li Hao 
---
  drivers/tty/hvc/hvc_console.c | 12 +---
  drivers/tty/hvc/hvc_console.h |  1 -
  2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index cd1f657f7..2fa90d938 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -476,11 +476,13 @@ static void hvc_hangup(struct tty_struct *tty)
  static int hvc_push(struct hvc_struct *hp)
  {
int n;
+   struct tty_struct *tty;

n = hp->ops->put_chars(hp->vtermno, hp->outbuf, hp->n_outbuf);
+   tty = tty_port_tty_get(>port);
if (n <= 0) {
if (n == 0 || n == -EAGAIN) {
-   hp->do_wakeup = 1;
+   tty_wakeup(tty);


What if tty is NULL? Did you intent to use tty_port_tty_wakeup() instead?

thanks,
--
js
suse labs



Re: [PATCH 14/64] i2c: cpm: reword according to newest specification

2024-04-12 Thread Jochen Friedrich

out_8(>i2c_reg->i2mod, 0x00);
-   out_8(>i2c_reg->i2com, I2COM_MASTER);/* Master mode */
+   out_8(>i2c_reg->i2com, I2COM_MASTER);/* Host mode */

I2COM_MASTER might be coming from the datasheet.

Maybe we can just drop the comment? The value we write is pretty
self-explaining.

indeed.

Andi


I also agree. You might add my Acked-By:  here. Jochen



Re: [PATCH v8 6/6] docs: trusted-encrypted: add DCP as new trust source

2024-04-12 Thread Herbert Xu
On Wed, Apr 03, 2024 at 06:47:51PM +0300, Jarkko Sakkinen wrote:
>
> Reviewed-by: Jarkko Sakkinen 
> 
> I can only test that this does not break a machine without the
> hardware feature.

Please feel free to take this through your tree.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH 1/1] Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig

2024-04-12 Thread Vignesh Balasubramanian
"ARCH_HAVE_EXTRA_ELF_NOTES" enables an extra note section in the
core dump. Kconfig variable is preferred over ARCH_HAVE_* macro.

Co-developed-by: Jini Susan George 
Signed-off-by: Jini Susan George 
Signed-off-by: Vignesh Balasubramanian 
---
 arch/Kconfig   | 9 +
 arch/powerpc/Kconfig   | 1 +
 arch/powerpc/include/asm/elf.h | 2 --
 include/linux/elf.h| 2 +-
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 9f066785bb71..143f021c8a76 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -502,6 +502,15 @@ config MMU_LAZY_TLB_SHOOTDOWN
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
 
+config ARCH_HAVE_EXTRA_ELF_NOTES
+   bool
+   help
+ An architecture should select this in order to enable adding an
+ arch-specific ELF note section to core files. It must provide two
+ functions: elf_coredump_extra_notes_size() and
+ elf_coredump_extra_notes_write() which are invoked by the ELF core
+ dumper.
+
 config ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
bool
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..c45fa9d7fb76 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -156,6 +156,7 @@ config PPC
select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_UBSAN
select ARCH_HAVE_NMI_SAFE_CMPXCHG
+   select ARCH_HAVE_EXTRA_ELF_NOTESif SPU_BASE
select ARCH_KEEP_MEMBLOCK
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
index 79f1c480b5eb..bb4b9d3e 100644
--- a/arch/powerpc/include/asm/elf.h
+++ b/arch/powerpc/include/asm/elf.h
@@ -127,8 +127,6 @@ extern int arch_setup_additional_pages(struct linux_binprm 
*bprm,
 /* Notes used in ET_CORE. Note name is "SPU//". */
 #define NT_SPU 1
 
-#define ARCH_HAVE_EXTRA_ELF_NOTES
-
 #endif /* CONFIG_SPU_BASE */
 
 #ifdef CONFIG_PPC64
diff --git a/include/linux/elf.h b/include/linux/elf.h
index c9a46c4e183b..5c402788da19 100644
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -65,7 +65,7 @@ extern Elf64_Dyn _DYNAMIC [];
 struct file;
 struct coredump_params;
 
-#ifndef ARCH_HAVE_EXTRA_ELF_NOTES
+#ifndef CONFIG_ARCH_HAVE_EXTRA_ELF_NOTES
 static inline int elf_coredump_extra_notes_size(void) { return 0; }
 static inline int elf_coredump_extra_notes_write(struct coredump_params *cprm) 
{ return 0; }
 #else
-- 
2.34.1



[PATCH 0/1] Replace the macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig

2024-04-12 Thread Vignesh Balasubramanian
This patch replaces the macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig
as discussed here
https://lore.kernel.org/lkml/ca+55afxdk9_cmo4spymgg_wq+_g5e_v6o-hetq_nts-q1zj...@mail.gmail.com/
It is a pre-requisite patch for the review
https://lore.kernel.org/lkml/20240314112359.50713-1-vigba...@amd.com/
I have split this patch as suggested in the review comment
https://lore.kernel.org/lkml/87o7bg31jd.fsf@mail.lhotse/


Vignesh Balasubramanian (1):
  Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig

 arch/Kconfig   | 9 +
 arch/powerpc/Kconfig   | 1 +
 arch/powerpc/include/asm/elf.h | 2 --
 include/linux/elf.h| 2 +-
 4 files changed, 11 insertions(+), 3 deletions(-)

-- 
2.34.1



Re: [RFC PATCH 2/7] mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations

2024-04-12 Thread Christophe Leroy


Le 11/04/2024 à 18:05, Mike Rapoport a écrit :
> From: "Mike Rapoport (IBM)" 
> 
> vmalloc allocations with VM_ALLOW_HUGE_VMAP that do not explictly
> specify node ID will use huge pages only if size_per_node is larger than
> PMD_SIZE.
> Still the actual allocated memory is not distributed between nodes and
> there is no advantage in such approach.
> On the contrary, BPF allocates PMD_SIZE * num_possible_nodes() for each
> new bpf_prog_pack, while it could do with PMD_SIZE'ed packs.
> 
> Don't account for number of nodes for VM_ALLOW_HUGE_VMAP with
> NUMA_NO_NODE and use huge pages whenever the requested allocation size
> is larger than PMD_SIZE.

Patch looks ok but message is confusing. We also use huge pages at PTE 
size, for instance 512k pages or 16k pages on powerpc 8xx, while 
PMD_SIZE is 4M.

Christophe

> 
> Signed-off-by: Mike Rapoport (IBM) 
> ---
>   mm/vmalloc.c | 9 ++---
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 22aa63f4ef63..5fc8b514e457 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3737,8 +3737,6 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
> long align,
>   }
>   
>   if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
> - unsigned long size_per_node;
> -
>   /*
>* Try huge pages. Only try for PAGE_KERNEL allocations,
>* others like modules don't yet expect huge pages in
> @@ -3746,13 +3744,10 @@ void *__vmalloc_node_range(unsigned long size, 
> unsigned long align,
>* supporting them.
>*/
>   
> - size_per_node = size;
> - if (node == NUMA_NO_NODE)
> - size_per_node /= num_online_nodes();
> - if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
> + if (arch_vmap_pmd_supported(prot) && size >= PMD_SIZE)
>   shift = PMD_SHIFT;
>   else
> - shift = arch_vmap_pte_supported_shift(size_per_node);
> + shift = arch_vmap_pte_supported_shift(size);
>   
>   align = max(real_align, 1UL << shift);
>   size = ALIGN(real_size, 1UL << shift);


Re: [PATCH] cpufreq: Covert to exit callback returning void

2024-04-12 Thread Viresh Kumar
On 10-04-24, 06:22, Lizhe wrote:
> For the exit() callback function returning an int type value.
> this leads many driver authors mistakenly believing that error
> handling can be performed by returning an error code. However.
> the returned value is ignore, and to improve this situation.
> it is proposed to modify the return type of the exit() callback
> function to void
> 
> Signed-off-by: Lizhe 
> ---
>  drivers/cpufreq/acpi-cpufreq.c | 4 +---
>  drivers/cpufreq/amd-pstate.c   | 7 ++-
>  drivers/cpufreq/apple-soc-cpufreq.c| 4 +---
>  drivers/cpufreq/bmips-cpufreq.c| 4 +---
>  drivers/cpufreq/cppc_cpufreq.c | 3 +--
>  drivers/cpufreq/cpufreq-dt.c   | 3 +--
>  drivers/cpufreq/e_powersaver.c | 3 +--
>  drivers/cpufreq/intel_pstate.c | 4 +---
>  drivers/cpufreq/mediatek-cpufreq-hw.c  | 4 +---
>  drivers/cpufreq/mediatek-cpufreq.c | 4 +---
>  drivers/cpufreq/omap-cpufreq.c | 3 +--
>  drivers/cpufreq/pasemi-cpufreq.c   | 6 ++
>  drivers/cpufreq/powernow-k6.c  | 3 +--
>  drivers/cpufreq/powernow-k7.c  | 3 +--
>  drivers/cpufreq/powernow-k8.c  | 4 +---
>  drivers/cpufreq/powernv-cpufreq.c  | 4 +---
>  drivers/cpufreq/ppc_cbe_cpufreq.c  | 3 +--
>  drivers/cpufreq/qcom-cpufreq-hw.c  | 4 +---
>  drivers/cpufreq/qoriq-cpufreq.c| 4 +---
>  drivers/cpufreq/scmi-cpufreq.c | 4 +---
>  drivers/cpufreq/scpi-cpufreq.c | 4 +---
>  drivers/cpufreq/sh-cpufreq.c   | 4 +---
>  drivers/cpufreq/sparc-us2e-cpufreq.c   | 3 +--
>  drivers/cpufreq/sparc-us3-cpufreq.c| 3 +--
>  drivers/cpufreq/speedstep-centrino.c   | 4 +---
>  drivers/cpufreq/tegra194-cpufreq.c | 4 +---
>  drivers/cpufreq/vexpress-spc-cpufreq.c | 3 +--
>  27 files changed, 29 insertions(+), 74 deletions(-)

I have discarded all emails with following subject line:

"cpufreq: Convert to exit callback returning void".

While you have sent decent patches for removing the empty exit callbacks, the
way you have handled these changes is not correct.

Don't send any patches for now and please wait and understand what's being asked
from you.

This change you are trying to make is okay and sensible, but you can not send
random patches to the list just like that. You are wasting everyone's time here
including yourself.

Now what we expect here is a single commit (with version history), which changes
all the users of the exit() function (each and every cpufreq driver) and
cpufreq.h and cpufreq.c. That change should compile fine and break none of the
platforms compilation.

Please don't send more of these patches unless this is done.

-- 
viresh