Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Jarkko Sakkinen
Yamada , Jarkko Sakkinen , Sami 
Tolvanen , "Naveen N. Rao" 
, Marco Elver , Kees Cook 
, Steven Rostedt , Nathan 
Chancellor , "Russell King \(Oracle\)" 
, Mark Brown , Borislav Petkov 
, Alexander Egorenkov , Thomas 
Bogendoerfer , linux-par...@vger.kernel.org, 
Nathaniel McCallum , Dmitry Torokhov 
, "David S. Miller" , "Kirill 
A. Shutemov" , Tobias Huschle 
, "Peter Zijlstra \(Intel\)" , "H. 
Peter Anvin" , sparcli...@vger.kernel.org, Tiezhu Yang 
, Miroslav Benes , Chen Zhongjin 
, l
 inux-ri...@lists.infradead.org, X86 ML , Russell King 
, Ingo Molnar , Aaron Tomlin 
, Albert Ou , Heiko Carstens 
, Liao Chang , Paul Walmsley 
, Josh Poimboeuf , Thomas 
Richter , linux-m...@vger.kernel.org, Changbin Du 
, Palmer Dabbelt , 
linuxppc-dev@lists.ozlabs.org, linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 06:27:51PM +0200, Ard Biesheuvel wrote:
> Hello Jarkko,
> 
> On Wed, 8 Jun 2022 at 02:02, Jarkko Sakkinen  wrote:
> >
> > Tracing with kprobes while running a monolithic kernel is currently
> > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES.  This
> > dependency is a result of kprobes code using the module allocator for the
> > trampoline code.
> >
> > Detaching kprobes from modules helps to squeeze down the user space,
> > e.g. when developing new core kernel features, while still having all
> > the nice tracing capabilities.
> >
> > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > module_alloc.c, and compile as part of vmlinux when either CONFIG_MODULES
> > or CONFIG_KPROBES is enabled.  In addition, flag kernel module specific
> > code with CONFIG_MODULES.
> >
> > As the result, kprobes can be used with a monolithic kernel.
> 
> I think I may have mentioned this the previous time as well, but I
> don't think this is the right approach.

OK, I apologize for my ignorance. It's been a while.

> Kprobes uses alloc_insn_page() to allocate executable memory, but the
> requirements for this memory are radically different compared to
> loadable modules, which need to be within an arch-specific distance of
> the core kernel, need KASAN backing etc etc.
> 
> This is why arm64, for instance, does not implement alloc_insn_page()
> in terms of module_alloc() [and likely does not belong in this patch
> for that reason]
> 
> Is there any reason kprobes cannot simply use vmalloc()?

All arch's, except nios2 use vmalloc() in the end for module_alloc().
nios2 uses kmalloc() for the reasons that I'm not aware of, but it does
not support kprobes in the first place.

Based on this, I think that could work out just fine.

I could cope with that.

BR, Jarkko


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Christoph Hellwig
s...@kernel.org>, Will Deacon , Masahiro Yamada 
, Jarkko Sakkinen , Sami Tolvanen 
, "Naveen N. Rao" , Marco 
Elver , Kees Cook , Steven Rostedt 
, Nathan Chancellor , "Russell King 
\(Oracle\)" , Mark Brown , 
Borislav Petkov , Alexander Egorenkov , 
Thomas Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav Benes , Chen Zhongjin 
, Ard Biesheuvel , the arch/x86 
maintainers , Russell King , 
linux-riscv , Ingo Molnar , 
Aaron Tomlin , Albert Ou , Heiko 
Carstens , Liao Chang , Paul 
Walmsley , Josh Poimboeuf , 
Thomas Richter , "open list:BROADCOM NVRAM DRIVER" 
, Changbin Du , Palmer 
Dabbelt , linuxppc-dev , 
linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 01:26:19PM -0700, Luis Chamberlain wrote:
> No, that was removed because it has only one user.

That is only part of the story.  The other part is that the overall
kernel simply does not have any business allocating exutable memory.
Executable memory is a very special concept for modules or module-like
code like kprobes, and should not be exposed as a general concept.

Especially as executable memory really should not also be writable
for security reasons.  In other words, we should actually never
allocate executable memory, every.  We might seal memory and then
mark it executable after having written to it, which is how modules
and kprobes are implemented on all modern Linux ports anyway.


Re: [PATCH RFC v1 3/7] swiotlb-xen: support highmem for xen specific code

2022-06-08 Thread Christoph Hellwig
On Wed, Jun 08, 2022 at 05:55:49PM -0700, Dongli Zhang wrote:
> @@ -109,19 +110,25 @@ int xen_swiotlb_fixup(void *buf, unsigned long nslabs, 
> bool high)
>   int rc;
>   unsigned int order = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT);
>   unsigned int i, dma_bits = order + PAGE_SHIFT;
> + unsigned int max_dma_bits = MAX_DMA32_BITS;
>   dma_addr_t dma_handle;
>   phys_addr_t p = virt_to_phys(buf);
>  
>   BUILD_BUG_ON(IO_TLB_SEGSIZE & (IO_TLB_SEGSIZE - 1));
>   BUG_ON(nslabs % IO_TLB_SEGSIZE);
>  
> + if (high) {
> + dma_bits = MAX_DMA64_BITS;
> + max_dma_bits = MAX_DMA64_BITS;
> + }
> +

I think you really want to pass the addressing bits or mask to the
remap callback and not do magic with a 'high' flag here.


Re: [PATCH RFC v1 7/7] swiotlb: fix the slot_addr() overflow

2022-06-08 Thread Christoph Hellwig
On Wed, Jun 08, 2022 at 05:55:53PM -0700, Dongli Zhang wrote:
> +#define slot_addr(start, idx)((start) + \
> + (((unsigned long)idx) << IO_TLB_SHIFT))

Please just convert it to an inline function.


Re: [PATCH RFC v1 6/7] virtio: use io_tlb_high_mem if it is active

2022-06-08 Thread Christoph Hellwig
On Wed, Jun 08, 2022 at 05:55:52PM -0700, Dongli Zhang wrote:
>  /* Unique numbering for virtio devices. */
> @@ -241,6 +243,12 @@ static int virtio_dev_probe(struct device *_d)
>   u64 device_features;
>   u64 driver_features;
>   u64 driver_features_legacy;
> + struct device *parent = dev->dev.parent;
> + u64 dma_mask = min_not_zero(*parent->dma_mask,
> + parent->bus_dma_limit);
> +
> + if (dma_mask == DMA_BIT_MASK(64))
> + swiotlb_use_high(parent);

The driver already very clearly communicated its addressing
requirements.  The underlying swiotlb code needs to transparently
pick the right pool.



Re: [PATCH RFC v1 5/7] swiotlb: add interface to set dev->dma_io_tlb_mem

2022-06-08 Thread Christoph Hellwig
This should be handled under the hood without the driver even knowing.


Re: [PATCH RFC v1 4/7] swiotlb: to implement io_tlb_high_mem

2022-06-08 Thread Christoph Hellwig
All this really needs to be hidden under the hood.


Re: [PATCH RFC v1 1/7] swiotlb: introduce the highmem swiotlb buffer

2022-06-08 Thread Christoph Hellwig
On Wed, Jun 08, 2022 at 05:55:47PM -0700, Dongli Zhang wrote:
> @@ -109,6 +109,7 @@ struct io_tlb_mem {
>   } *slots;
>  };
>  extern struct io_tlb_mem io_tlb_default_mem;
> +extern struct io_tlb_mem io_tlb_high_mem;

Tis should not be exposed.

> +extern bool swiotlb_high_active(void);

And this should not even exist.

> +static unsigned long high_nslabs;

And I don't think "high" is a good name here to start with.  That
suggests highmem, which we are not using here.


Re: [PATCH V2] ASoC: imx-audmux: remove unnecessary check of clk_disable_unprepare/clk_prepare_enable

2022-06-08 Thread Shengjiu Wang
On Mon, Jun 6, 2022 at 11:37 AM  wrote:

> From: Minghao Chi 
>
> Because clk_disable_unprepare/clk_prepare_enable already checked NULL clock
> parameter, so the additional checks are unnecessary, just remove them.
>
> Reported-by: Zeal Robot 
> Signed-off-by: Minghao Chi 
>

Acked-by: Shengjiu Wang 

Best regards
Wang Shengjiu

> ---
> v1->v2:
> remove the check of audmux_clk before "clk_prepare_enable"
>  sound/soc/fsl/imx-audmux.c | 22 --
>  1 file changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/sound/soc/fsl/imx-audmux.c b/sound/soc/fsl/imx-audmux.c
> index dfa05d40b276..3ba82adace42 100644
> --- a/sound/soc/fsl/imx-audmux.c
> +++ b/sound/soc/fsl/imx-audmux.c
> @@ -62,17 +62,14 @@ static ssize_t audmux_read_file(struct file *file,
> char __user *user_buf,
> uintptr_t port = (uintptr_t)file->private_data;
> u32 pdcr, ptcr;
>
> -   if (audmux_clk) {
> -   ret = clk_prepare_enable(audmux_clk);
> -   if (ret)
> -   return ret;
> -   }
> +   ret = clk_prepare_enable(audmux_clk);
> +   if (ret)
> +   return ret;
>
> ptcr = readl(audmux_base + IMX_AUDMUX_V2_PTCR(port));
> pdcr = readl(audmux_base + IMX_AUDMUX_V2_PDCR(port));
>
> -   if (audmux_clk)
> -   clk_disable_unprepare(audmux_clk);
> +   clk_disable_unprepare(audmux_clk);
>
> buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
> if (!buf)
> @@ -209,17 +206,14 @@ int imx_audmux_v2_configure_port(unsigned int port,
> unsigned int ptcr,
> if (!audmux_base)
> return -ENOSYS;
>
> -   if (audmux_clk) {
> -   ret = clk_prepare_enable(audmux_clk);
> -   if (ret)
> -   return ret;
> -   }
> +   ret = clk_prepare_enable(audmux_clk);
> +   if (ret)
> +   return ret;
>
> writel(ptcr, audmux_base + IMX_AUDMUX_V2_PTCR(port));
> writel(pdcr, audmux_base + IMX_AUDMUX_V2_PDCR(port));
>
> -   if (audmux_clk)
> -   clk_disable_unprepare(audmux_clk);
> +   clk_disable_unprepare(audmux_clk);
>
> return 0;
>  }
> --
> 2.25.1
>
>
>


[PATCH RFC v1 0/7] swiotlb: extra 64-bit buffer for dev->dma_io_tlb_mem

2022-06-08 Thread Dongli Zhang
Hello,

I used to send out a patchset on 64-bit buffer and people thought it was
the same as Restricted DMA. However, the 64-bit buffer is still not supported.

https://lore.kernel.org/all/20210203233709.19819-1-dongli.zh...@oracle.com/

This RFC is to introduce the extra swiotlb buffer with SWIOTLB_ANY flag,
to support 64-bit swiotlb.

The core ideas are:

1. Create an extra io_tlb_mem with SWIOTLB_ANY flags.

2. The dev->dma_io_tlb_mem is set to either default or the extra io_tlb_mem,
   depending on dma mask.


Would you please help suggest for below questions in the RFC?

- Is it fine to create the extra io_tlb_mem?

- Which one is better: to create a separate variable for the extra
  io_tlb_mem, or make it an array of two io_tlb_mem?

- Should I set dev->dma_io_tlb_mem in each driver (e.g., virtio driver as
  in this patchset)based on the value of
  min_not_zero(*dev->dma_mask, dev->bus_dma_limit), or at higher level
  (e.g., post pci driver)?


This patchset is to demonstrate that the idea works. Since this is just a
RFC, I have only tested virtio-blk on qemu-7.0 by enforcing swiotlb. It is
not tested on AMD SEV environment.

qemu-system-x86_64 -cpu host -name debug-threads=on \
-smp 8 -m 16G -machine q35,accel=kvm -vnc :5 -hda boot.img \
-kernel mainline-linux/arch/x86_64/boot/bzImage \
-append "root=/dev/sda1 init=/sbin/init text console=ttyS0 loglevel=7 
swiotlb=327680,3145728,force" \
-device 
virtio-blk-pci,id=vblk0,num-queues=8,drive=drive0,disable-legacy=on,iommu_platform=true
 \
-drive file=test.raw,if=none,id=drive0,cache=none \
-net nic -net user,hostfwd=tcp::5025-:22 -serial stdio


The kernel command line "swiotlb=327680,3145728,force" is to allocate 6GB for
the extra swiotlb.

[2.826676] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[2.826693] software IO TLB: default mapped [mem 
0x3700-0x5f00] (640MB)
[2.826697] software IO TLB: high mapped [mem 
0x0002edc8-0x00046dc8] (6144MB)

The highmem swiotlb is being used by virtio-blk.

$ cat /sys/kernel/debug/swiotlb/swiotlb-hi/io_tlb_nslabs 
3145728
$ cat /sys/kernel/debug/swiotlb/swiotlb-hi/io_tlb_used 
8960


Dongli Zhang (7):
  swiotlb: introduce the highmem swiotlb buffer
  swiotlb: change the signature of remap function
  swiotlb-xen: support highmem for xen specific code
  swiotlb: to implement io_tlb_high_mem
  swiotlb: add interface to set dev->dma_io_tlb_mem
  virtio: use io_tlb_high_mem if it is active
  swiotlb: fix the slot_addr() overflow

arch/powerpc/kernel/dma-swiotlb.c  |   8 +-
arch/x86/include/asm/xen/swiotlb-xen.h |   2 +-
arch/x86/kernel/pci-dma.c  |   5 +-
drivers/virtio/virtio.c|   8 ++
drivers/xen/swiotlb-xen.c  |  16 +++-
include/linux/swiotlb.h|  14 ++-
kernel/dma/swiotlb.c   | 136 +---
7 files changed, 145 insertions(+), 44 deletions(-)

Thank you very much for feedback and suggestion!

Dongli Zhang




[PATCH RFC v1 3/7] swiotlb-xen: support highmem for xen specific code

2022-06-08 Thread Dongli Zhang
While for most of times the swiotlb-xen relies on the generic swiotlb api
to initialize and use swiotlb, this patch is to support highmem swiotlb
for swiotlb-xen specific code.

E.g., the xen_swiotlb_fixup() may request the hypervisor to provide
64-bit memory pages as swiotlb buffer.

Cc: Konrad Wilk 
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
 drivers/xen/swiotlb-xen.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 339f46e21053..d15321e9f9db 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -38,7 +38,8 @@
 #include 
 
 #include 
-#define MAX_DMA_BITS 32
+#define MAX_DMA32_BITS 32
+#define MAX_DMA64_BITS 64
 
 /*
  * Quick lookup value of the bus address of the IOTLB.
@@ -109,19 +110,25 @@ int xen_swiotlb_fixup(void *buf, unsigned long nslabs, 
bool high)
int rc;
unsigned int order = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT);
unsigned int i, dma_bits = order + PAGE_SHIFT;
+   unsigned int max_dma_bits = MAX_DMA32_BITS;
dma_addr_t dma_handle;
phys_addr_t p = virt_to_phys(buf);
 
BUILD_BUG_ON(IO_TLB_SEGSIZE & (IO_TLB_SEGSIZE - 1));
BUG_ON(nslabs % IO_TLB_SEGSIZE);
 
+   if (high) {
+   dma_bits = MAX_DMA64_BITS;
+   max_dma_bits = MAX_DMA64_BITS;
+   }
+
i = 0;
do {
do {
rc = xen_create_contiguous_region(
p + (i << IO_TLB_SHIFT), order,
dma_bits, _handle);
-   } while (rc && dma_bits++ < MAX_DMA_BITS);
+   } while (rc && dma_bits++ < max_dma_bits);
if (rc)
return rc;
 
@@ -381,7 +388,8 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask ||
+  xen_phys_to_dma(hwdev, io_tlb_high_mem.end - 1) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
-- 
2.17.1



[PATCH RFC v1 1/7] swiotlb: introduce the highmem swiotlb buffer

2022-06-08 Thread Dongli Zhang
Currently, the virtio driver is not able to use 4+ GB memory when the
swiotlb is enforced, e.g., when amd sev is involved.

Fortunately, the SWIOTLB_ANY flag has been introduced since
commit 8ba2ed1be90f ("swiotlb: add a SWIOTLB_ANY flag to lift the low
memory restriction") to allocate swiotlb buffer from high memory.

While the default swiotlb is 'io_tlb_default_mem', the extra
'io_tlb_high_mem' is introduced to allocate with SWIOTLB_ANY flag in the
future patches. E.g., the user may configure the extra highmem swiotlb
buffer via "swiotlb=327680,4194304" to allocate 8GB memory.

In the future, the driver will be able to decide to use whether
'io_tlb_default_mem' or 'io_tlb_high_mem'.

The highmem swiotlb is enabled by user if io_tlb_high_mem is set. It can
be actively used if swiotlb_high_active() returns true.

The kernel command line "swiotlb=32768,3145728,force" is to allocate 64MB
for default swiotlb, and 6GB for the extra highmem swiotlb.

Cc: Konrad Wilk 
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/swiotlb.c| 16 
 2 files changed, 18 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..e67e605af2dd 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -109,6 +109,7 @@ struct io_tlb_mem {
} *slots;
 };
 extern struct io_tlb_mem io_tlb_default_mem;
+extern struct io_tlb_mem io_tlb_high_mem;
 
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
@@ -164,6 +165,7 @@ static inline void swiotlb_adjust_size(unsigned long size)
 }
 #endif /* CONFIG_SWIOTLB */
 
+extern bool swiotlb_high_active(void);
 extern void swiotlb_print_info(void);
 
 #ifdef CONFIG_DMA_RESTRICTED_POOL
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cb50f8d38360..569bc30e7b7a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -66,10 +66,12 @@ static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
 struct io_tlb_mem io_tlb_default_mem;
+struct io_tlb_mem io_tlb_high_mem;
 
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long high_nslabs;
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -81,6 +83,15 @@ setup_io_tlb_npages(char *str)
}
if (*str == ',')
++str;
+
+   if (isdigit(*str)) {
+   /* avoid tail segment of size < IO_TLB_SEGSIZE */
+   high_nslabs =
+   ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE);
+   }
+   if (*str == ',')
+   ++str;
+
if (!strcmp(str, "force"))
swiotlb_force_bounce = true;
else if (!strcmp(str, "noforce"))
@@ -90,6 +101,11 @@ setup_io_tlb_npages(char *str)
 }
 early_param("swiotlb", setup_io_tlb_npages);
 
+bool swiotlb_high_active(void)
+{
+   return high_nslabs && io_tlb_high_mem.nslabs;
+}
+
 unsigned int swiotlb_max_segment(void)
 {
if (!io_tlb_default_mem.nslabs)
-- 
2.17.1



[PATCH RFC v1 7/7] swiotlb: fix the slot_addr() overflow

2022-06-08 Thread Dongli Zhang
Since the type of swiotlb slot index is a signed integer, the
"((idx) << IO_TLB_SHIFT)" will returns incorrect value. As a result, the
slot_addr() returns a value which is smaller than the expected one.

E.g., the 'tlb_addr' generated in swiotlb_tbl_map_single() may return a
value smaller than the expected one. As a result, the swiotlb_bounce()
will access a wrong swiotlb slot.

Cc: Konrad Wilk 
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
 kernel/dma/swiotlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0dcdd25ea95d..c64e557de55c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -531,7 +531,8 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t 
tlb_addr, size_t size
}
 }
 
-#define slot_addr(start, idx)  ((start) + ((idx) << IO_TLB_SHIFT))
+#define slot_addr(start, idx)  ((start) + \
+   (((unsigned long)idx) << IO_TLB_SHIFT))
 
 /*
  * Carefully handle integer overflow which can occur when boundary_mask == 
~0UL.
-- 
2.17.1



[PATCH RFC v1 2/7] swiotlb: change the signature of remap function

2022-06-08 Thread Dongli Zhang
Add new argument 'high' to remap function, so that it will be able to
remap the swiotlb buffer based on whether the swiotlb is 32-bit or
64-bit.

Currently the only remap function is xen_swiotlb_fixup().

Cc: Konrad Wilk 
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
 arch/x86/include/asm/xen/swiotlb-xen.h | 2 +-
 drivers/xen/swiotlb-xen.c  | 2 +-
 include/linux/swiotlb.h| 4 ++--
 kernel/dma/swiotlb.c   | 8 
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/xen/swiotlb-xen.h 
b/arch/x86/include/asm/xen/swiotlb-xen.h
index 77a2d19cc990..a54eae15605e 100644
--- a/arch/x86/include/asm/xen/swiotlb-xen.h
+++ b/arch/x86/include/asm/xen/swiotlb-xen.h
@@ -8,7 +8,7 @@ extern int pci_xen_swiotlb_init_late(void);
 static inline int pci_xen_swiotlb_init_late(void) { return -ENXIO; }
 #endif
 
-int xen_swiotlb_fixup(void *buf, unsigned long nslabs);
+int xen_swiotlb_fixup(void *buf, unsigned long nslabs, bool high);
 int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
unsigned int address_bits,
dma_addr_t *dma_handle);
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 67aa74d20162..339f46e21053 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -104,7 +104,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 }
 
 #ifdef CONFIG_X86
-int xen_swiotlb_fixup(void *buf, unsigned long nslabs)
+int xen_swiotlb_fixup(void *buf, unsigned long nslabs, bool high)
 {
int rc;
unsigned int order = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT);
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index e67e605af2dd..e61c074c55eb 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -36,9 +36,9 @@ struct scatterlist;
 
 unsigned long swiotlb_size_or_default(void);
 void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
-   int (*remap)(void *tlb, unsigned long nslabs));
+   int (*remap)(void *tlb, unsigned long nslabs, bool high));
 int swiotlb_init_late(size_t size, gfp_t gfp_mask,
-   int (*remap)(void *tlb, unsigned long nslabs));
+   int (*remap)(void *tlb, unsigned long nslabs, bool high));
 extern void __init swiotlb_update_mem_attributes(void);
 
 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 569bc30e7b7a..793ca7f9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -245,7 +245,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
  * structures for the software IO TLB used to implement the DMA API.
  */
 void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
-   int (*remap)(void *tlb, unsigned long nslabs))
+   int (*remap)(void *tlb, unsigned long nslabs, bool high))
 {
struct io_tlb_mem *mem = _tlb_default_mem;
unsigned long nslabs = default_nslabs;
@@ -274,7 +274,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
return;
}
 
-   if (remap && remap(tlb, nslabs) < 0) {
+   if (remap && remap(tlb, nslabs, false) < 0) {
memblock_free(tlb, PAGE_ALIGN(bytes));
 
nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
@@ -307,7 +307,7 @@ void __init swiotlb_init(bool addressing_limit, unsigned 
int flags)
  * This should be just like above, but with some error catching.
  */
 int swiotlb_init_late(size_t size, gfp_t gfp_mask,
-   int (*remap)(void *tlb, unsigned long nslabs))
+   int (*remap)(void *tlb, unsigned long nslabs, bool high))
 {
struct io_tlb_mem *mem = _tlb_default_mem;
unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
@@ -337,7 +337,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
return -ENOMEM;
 
if (remap)
-   rc = remap(vstart, nslabs);
+   rc = remap(vstart, nslabs, false);
if (rc) {
free_pages((unsigned long)vstart, order);
 
-- 
2.17.1



[PATCH RFC v1 6/7] virtio: use io_tlb_high_mem if it is active

2022-06-08 Thread Dongli Zhang
When the swiotlb is enforced (e.g., when amd sev is involved), the virito
driver will not be able to use 4+ GB memory. Therefore, the virtio driver
uses 'io_tlb_high_mem' as swiotlb.

Cc: Konrad Wilk 
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
 drivers/virtio/virtio.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ef04a96942bf..d9ebe3940e2d 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -5,6 +5,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 /* Unique numbering for virtio devices. */
@@ -241,6 +243,12 @@ static int virtio_dev_probe(struct device *_d)
u64 device_features;
u64 driver_features;
u64 driver_features_legacy;
+   struct device *parent = dev->dev.parent;
+   u64 dma_mask = min_not_zero(*parent->dma_mask,
+   parent->bus_dma_limit);
+
+   if (dma_mask == DMA_BIT_MASK(64))
+   swiotlb_use_high(parent);
 
/* We have a driver! */
virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
-- 
2.17.1



[PATCH RFC v1 4/7] swiotlb: to implement io_tlb_high_mem

2022-06-08 Thread Dongli Zhang
This patch is to implement the extra 'io_tlb_high_mem'. In the future, the
device drivers may choose to use either 'io_tlb_default_mem' or
'io_tlb_high_mem' as dev->dma_io_tlb_mem.

The highmem buffer is regarded as active if
(high_nslabs && io_tlb_high_mem.nslabs) returns true.

Cc: Konrad Wilk 
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
 arch/powerpc/kernel/dma-swiotlb.c |   8 ++-
 arch/x86/kernel/pci-dma.c |   5 +-
 include/linux/swiotlb.h   |   2 +-
 kernel/dma/swiotlb.c  | 103 +-
 4 files changed, 84 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
b/arch/powerpc/kernel/dma-swiotlb.c
index ba256c37bcc0..f18694881264 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -20,9 +20,11 @@ void __init swiotlb_detect_4g(void)
 
 static int __init check_swiotlb_enabled(void)
 {
-   if (ppc_swiotlb_enable)
-   swiotlb_print_info();
-   else
+   if (ppc_swiotlb_enable) {
+   swiotlb_print_info(false);
+   if (swiotlb_high_active())
+   swiotlb_print_info(true);
+   } else
swiotlb_exit();
 
return 0;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 30bbe4abb5d6..1504b349b312 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -196,7 +196,10 @@ static int __init pci_iommu_init(void)
/* An IOMMU turned us off. */
if (x86_swiotlb_enable) {
pr_info("PCI-DMA: Using software bounce buffering for IO 
(SWIOTLB)\n");
-   swiotlb_print_info();
+
+   swiotlb_print_info(false);
+   if (swiotlb_high_active())
+   swiotlb_print_info(true);
} else {
swiotlb_exit();
}
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index e61c074c55eb..8196bf961aab 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -166,7 +166,7 @@ static inline void swiotlb_adjust_size(unsigned long size)
 #endif /* CONFIG_SWIOTLB */
 
 extern bool swiotlb_high_active(void);
-extern void swiotlb_print_info(void);
+extern void swiotlb_print_info(bool high);
 
 #ifdef CONFIG_DMA_RESTRICTED_POOL
 struct page *swiotlb_alloc(struct device *dev, size_t size);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 793ca7f9..ff82b281ce01 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -101,6 +101,21 @@ setup_io_tlb_npages(char *str)
 }
 early_param("swiotlb", setup_io_tlb_npages);
 
+static struct io_tlb_mem *io_tlb_mem_get(bool high)
+{
+   return high ? _tlb_high_mem : _tlb_default_mem;
+}
+
+static unsigned long nslabs_get(bool high)
+{
+   return high ? high_nslabs : default_nslabs;
+}
+
+static char *swiotlb_name_get(bool high)
+{
+   return high ? "high" : "default";
+}
+
 bool swiotlb_high_active(void)
 {
return high_nslabs && io_tlb_high_mem.nslabs;
@@ -133,17 +148,18 @@ void __init swiotlb_adjust_size(unsigned long size)
pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
 }
 
-void swiotlb_print_info(void)
+void swiotlb_print_info(bool high)
 {
-   struct io_tlb_mem *mem = _tlb_default_mem;
+   struct io_tlb_mem *mem = io_tlb_mem_get(high);
 
if (!mem->nslabs) {
pr_warn("No low mem\n");
return;
}
 
-   pr_info("mapped [mem %pa-%pa] (%luMB)\n", >start, >end,
-  (mem->nslabs << IO_TLB_SHIFT) >> 20);
+   pr_info("%s mapped [mem %pa-%pa] (%luMB)\n",
+   swiotlb_name_get(high), >start, >end,
+   (mem->nslabs << IO_TLB_SHIFT) >> 20);
 }
 
 static inline unsigned long io_tlb_offset(unsigned long val)
@@ -184,15 +200,9 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, 
unsigned long bytes)
 }
 #endif
 
-/*
- * Early SWIOTLB allocation may be too early to allow an architecture to
- * perform the desired operations.  This function allows the architecture to
- * call SWIOTLB when the operations are possible.  It needs to be called
- * before the SWIOTLB memory is used.
- */
-void __init swiotlb_update_mem_attributes(void)
+static void __init __swiotlb_update_mem_attributes(bool high)
 {
-   struct io_tlb_mem *mem = _tlb_default_mem;
+   struct io_tlb_mem *mem = io_tlb_mem_get(high);
void *vaddr;
unsigned long bytes;
 
@@ -207,6 +217,19 @@ void __init swiotlb_update_mem_attributes(void)
mem->vaddr = vaddr;
 }
 
+/*
+ * Early SWIOTLB allocation may be too early to allow an architecture to
+ * perform the desired operations.  This function allows the architecture to
+ * call SWIOTLB when the operations are possible.  It needs to be called
+ * before the SWIOTLB memory is used.
+ */
+void __init swiotlb_update_mem_attributes(void)
+{
+   __swiotlb_update_mem_attributes(false);
+   if (swiotlb_high_active())
+   

[PATCH RFC v1 5/7] swiotlb: add interface to set dev->dma_io_tlb_mem

2022-06-08 Thread Dongli Zhang
The interface re-configures dev->dma_io_tlb_mem conditionally, if the
'io_tlb_high_mem' is active.

Cc: Konrad Wilk 
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
 include/linux/swiotlb.h |  6 ++
 kernel/dma/swiotlb.c| 10 ++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8196bf961aab..78217d8bbee2 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -131,6 +131,7 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+bool swiotlb_use_high(struct device *dev);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -163,6 +164,11 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static bool swiotlb_use_high(struct device *dev);
+{
+   return false;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern bool swiotlb_high_active(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ff82b281ce01..0dcdd25ea95d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -121,6 +121,16 @@ bool swiotlb_high_active(void)
return high_nslabs && io_tlb_high_mem.nslabs;
 }
 
+bool swiotlb_use_high(struct device *dev)
+{
+   if (!swiotlb_high_active())
+   return false;
+
+   dev->dma_io_tlb_mem = _tlb_high_mem;
+   return true;
+}
+EXPORT_SYMBOL_GPL(swiotlb_use_high);
+
 unsigned int swiotlb_max_segment(void)
 {
if (!io_tlb_default_mem.nslabs)
-- 
2.17.1



Re: [PATCH 20/36] arch/idle: Change arch_cpu_idle() IRQ behaviour

2022-06-08 Thread Arnd Bergmann
o...@users.sourceforge.jp>, Linux-sh list , Fabio 
Estevam , Helge Deller , Daniel Lezcano 
, Jonathan Hunter , Mathieu 
Desnoyers , Frederic Weisbecker 
, Len Brown , "open list:TENSILICA XTENSA 
PORT \(xtensa\)" , Sascha Hauer 
, Vasily Gorbik , linux-arm-msm 
, alpha , 
linux-m68k , Stafford Horne 
, Linux ARM , Chris 
Zankel , Stephen Boyd , Dinh Nguyen 
, Daniel Bristot de Oliveira , 
Alexander Shishkin , lpieral...@kernel.org, 
Rasmus Villemoes , Joel Fernandes <
 j...@joelfernandes.org>, Will Deacon , Boris Ostrovsky 
, Kevin Hilman , 
linux-c...@vger.kernel.org, Pv-drivers , "open 
list:SYNOPSYS ARC ARCHITECTURE" , Mel 
Gorman , jacob.jun@linux.intel.com, Arnd Bergmann 
, Hans Ulli Kroll , Vineet Gupta 
, linux-clk , Josh Triplett 
, Steven Rostedt , 
r...@vger.kernel.org, Borislav Petkov , bc...@quicinc.com, 
Thomas Bogendoerfer , Parisc List 
, Sudeep Holla , Shawn Guo 
, David Miller , Rich Felker 
, Tony Lindgren , amakha...@vmware.com, 
Bjorn Andersson , "H. Peter Anvin" , sparclinux , "open l
ist:QUALCOMM HEXAGON..." , linux-riscv 
, Anton Ivanov 
, Jonas Bonn , Yury Norov 
, Richard Weinberger , the arch/x86 
maintainers , Russell King - ARM Linux 
, Ingo Molnar , Albert Ou 
, "Paul E. McKenney" , Heiko 
Carstens , Stefan Kristiansson 
, Openrisc , 
Paul Walmsley , "open list:TEGRA ARCHITECTURE 
SUPPORT" , Namhyung Kim , 
Andy Shevchenko , jpoim...@kernel.org, 
Juergen Gross , Michal Simek , "open 
list:BROADCOM NVRAM DRIVER" , Palmer Dabbelt 
, Anup Patel , Ivan Kokshaysky , Johannes 
Berg , linuxppc-dev 
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 8, 2022 at 4:27 PM Peter Zijlstra  wrote:
>
> Current arch_cpu_idle() is called with IRQs disabled, but will return
> with IRQs enabled.
>
> However, the very first thing the generic code does after calling
> arch_cpu_idle() is raw_local_irq_disable(). This means that
> architectures that can idle with IRQs disabled end up doing a
> pointless 'enable-disable' dance.
>
> Therefore, push this IRQ disabling into the idle function, meaning
> that those architectures can avoid the pointless IRQ state flipping.
>
> Signed-off-by: Peter Zijlstra (Intel) 

I think you now need to add the a raw_local_irq_disable(); in loongarch
as well.

   Arnd


Re: [PATCH 33/36] cpuidle,omap3: Use WFI for omap3_pm_idle()

2022-06-08 Thread Arnd Bergmann
o...@users.sourceforge.jp>, Linux-sh list , Fabio 
Estevam , Helge Deller , Daniel Lezcano 
, Jonathan Hunter , Mathieu 
Desnoyers , Frederic Weisbecker 
, Len Brown , "open list:TENSILICA XTENSA 
PORT \(xtensa\)" , Sascha Hauer 
, Vasily Gorbik , linux-arm-msm 
, alpha , 
linux-m68k , Stafford Horne 
, Linux ARM , Chris 
Zankel , Stephen Boyd , Dinh Nguyen 
, Daniel Bristot de Oliveira , 
Alexander Shishkin , lpieral...@kernel.org, 
Rasmus Villemoes , Joel Fernandes <
 j...@joelfernandes.org>, Will Deacon , Boris Ostrovsky 
, Kevin Hilman , 
linux-c...@vger.kernel.org, Pv-drivers , "open 
list:SYNOPSYS ARC ARCHITECTURE" , Mel 
Gorman , jacob.jun@linux.intel.com, Arnd Bergmann 
, Hans Ulli Kroll , Vineet Gupta 
, linux-clk , Josh Triplett 
, Steven Rostedt , 
r...@vger.kernel.org, Borislav Petkov , bc...@quicinc.com, 
Thomas Bogendoerfer , Parisc List 
, Sudeep Holla , Shawn Guo 
, David Miller , Rich Felker 
, Tony Lindgren , amakha...@vmware.com, 
Bjorn Andersson , "H. Peter Anvin" , sparclinux , "open l
ist:QUALCOMM HEXAGON..." , linux-riscv 
, Anton Ivanov 
, Jonas Bonn , Yury Norov 
, Richard Weinberger , the arch/x86 
maintainers , Russell King - ARM Linux 
, Ingo Molnar , Albert Ou 
, "Paul E. McKenney" , Heiko 
Carstens , Stefan Kristiansson 
, Openrisc , 
Paul Walmsley , "open list:TEGRA ARCHITECTURE 
SUPPORT" , Namhyung Kim , 
Andy Shevchenko , jpoim...@kernel.org, 
Juergen Gross , Michal Simek , "open 
list:BROADCOM NVRAM DRIVER" , Palmer Dabbelt 
, Anup Patel , Ivan Kokshaysky , Johannes 
Berg , linuxppc-dev 
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 8, 2022 at 4:27 PM Peter Zijlstra  wrote:
>
> arch_cpu_idle() is a very simple idle interface and exposes only a
> single idle state and is expected to not require RCU and not do any
> tracing/instrumentation.
>
> As such, omap_sram_idle() is not a valid implementation. Replace it
> with the simple (shallow) omap3_do_wfi() call. Leaving the more
> complicated idle states for the cpuidle driver.
>
> Signed-off-by: Peter Zijlstra (Intel) 

I see similar code in omap2:

omap2_pm_idle()
 -> omap2_enter_full_retention()
 -> omap2_sram_suspend()

Is that code path safe to use without RCU or does it need a similar change?

Arnd


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Luis Chamberlain
da , Jarkko Sakkinen , Sami Tolvanen 
, "Naveen N. Rao" , Marco 
Elver , Kees Cook , Steven Rostedt 
, Nathan Chancellor , "Russell King 
\(Oracle\)" , Mark Brown , 
Borislav Petkov , Alexander Egorenkov , 
Thomas Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav 
Benes , Chen Zhongjin , Ard Biesheuvel , the arch/x86 
maintainers , Russell King , 
linux-riscv , Ingo Molnar , 
Aaron Tomlin , Albert Ou , Heiko 
Carstens , Liao Chang , Paul 
Walmsley , Josh Poimboeuf , 
Thomas Richter , "open list:BROADCOM NVRAM DRIVER" 
, Changbin Du , Palmer 
Dabbelt , linuxppc-dev , 
linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 11:20:53AM -0700, Song Liu wrote:
> On Wed, Jun 8, 2022 at 9:12 AM Song Liu  wrote:
> > On Wed, Jun 8, 2022 at 7:21 AM Masami Hiramatsu  wrote:
> > > On Wed, 8 Jun 2022 08:25:38 +0300
> > > Jarkko Sakkinen  wrote:
> > > > On Wed, Jun 08, 2022 at 10:35:42AM +0800, Guo Ren wrote:
> > > > > On Wed, Jun 8, 2022 at 8:02 AM Jarkko Sakkinen  
> > > > > wrote:
> > > > > > As the result, kprobes can be used with a monolithic kernel.
> > > > > It's strange when MODULES is n, but vmlinux still obtains 
> > > > > module_alloc.
> > > > >
> > > > > Maybe we need a kprobe_alloc, right?
> > > >
> > > > Perhaps not the best name but at least it documents the fact that
> > > > they use the same allocator.
> > > >
> > > > Few years ago I carved up something "half-way there" for kprobes,
> > > > and I used the name text_alloc() [*].
> > > >
> > > > [*] 
> > > > https://lore.kernel.org/all/20200724050553.1724168-1-jarkko.sakki...@linux.intel.com/
> > >
> > > Yeah, I remember that. Thank you for updating your patch!
> > > I think the idea (split module_alloc() from CONFIG_MODULE) is good to me.
> > > If module support maintainers think this name is not good, you may be
> > > able to rename it as text_alloc() and make the module_alloc() as a
> > > wrapper of it.
> >
> > IIUC, most users of module_alloc() use it to allocate memory for text, 
> > except
> > that module code uses it for both text and data. Therefore, I guess calling 
> > it
> > text_alloc() is not 100% accurate until we change the module code (to use
> > a different API to allocate memory for data).
> 
> Git history showed me
> 
> 7a0e27b2a0ce mm: remove vmalloc_exec
> 
> I guess we are somehow going back in time...

No, that was removed because it has only one user. The real hard work
to generalize vmalloc_exec() with all the arch special sauce was not
done.

To do this properly architectures must be able to override it. We can
use the old vmalloc_exec() or text_alloc(). I think vmalloc_exec() is
more in line with mm stuff, but it would be our first __weak mm call
from what I can tell.

Anyway patches welcomed.

  Luis


Re: [PATCH 36/36] cpuidle,clk: Remove trace_.*_rcuidle()

2022-06-08 Thread Stephen Boyd
Quoting Peter Zijlstra (2022-06-08 07:27:59)
> OMAP was the one and only user.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---

Acked-by: Stephen Boyd 


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Song Liu
masahi...@kernel.org>, Jarkko Sakkinen , Sami Tolvanen 
, "Naveen N. Rao" , Marco 
Elver , Kees Cook , Steven Rostedt 
, Nathan Chancellor , "Russell King 
\(Oracle\)" , Mark Brown , 
Borislav Petkov , Alexander Egorenkov , 
Thomas Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav 
Benes , Chen Zhongjin , Ard Biesheuvel , the arch/x86 maintainers 
, Russell King , linux-riscv 
, Ingo Molnar , Aaron Tomlin 
, Albert Ou , Heiko Carstens 
, Liao Chang , Paul Walmsley 
, Josh Poimboeuf , Thomas 
Richter , "open list:BROADCOM NVRAM DRIVER" 
, Changbin Du , Palmer 
Dabbelt , linuxppc-dev , 
linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 8, 2022 at 9:12 AM Song Liu  wrote:
>
> On Wed, Jun 8, 2022 at 7:21 AM Masami Hiramatsu  wrote:
> >
> > Hi Jarkko,
> >
> > On Wed, 8 Jun 2022 08:25:38 +0300
> > Jarkko Sakkinen  wrote:
> >
> > > On Wed, Jun 08, 2022 at 10:35:42AM +0800, Guo Ren wrote:
> > > > .
> > > >
> > > > On Wed, Jun 8, 2022 at 8:02 AM Jarkko Sakkinen  
> > > > wrote:
> > > > >
> > > > > Tracing with kprobes while running a monolithic kernel is currently
> > > > > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES.  
> > > > > This
> > > > > dependency is a result of kprobes code using the module allocator for 
> > > > > the
> > > > > trampoline code.
> > > > >
> > > > > Detaching kprobes from modules helps to squeeze down the user space,
> > > > > e.g. when developing new core kernel features, while still having all
> > > > > the nice tracing capabilities.
> > > > >
> > > > > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > > > > module_alloc.c, and compile as part of vmlinux when either 
> > > > > CONFIG_MODULES
> > > > > or CONFIG_KPROBES is enabled.  In addition, flag kernel module 
> > > > > specific
> > > > > code with CONFIG_MODULES.
> > > > >
> > > > > As the result, kprobes can be used with a monolithic kernel.
> > > > It's strange when MODULES is n, but vmlinux still obtains module_alloc.
> > > >
> > > > Maybe we need a kprobe_alloc, right?
> > >
> > > Perhaps not the best name but at least it documents the fact that
> > > they use the same allocator.
> > >
> > > Few years ago I carved up something "half-way there" for kprobes,
> > > and I used the name text_alloc() [*].
> > >
> > > [*] 
> > > https://lore.kernel.org/all/20200724050553.1724168-1-jarkko.sakki...@linux.intel.com/
> >
> > Yeah, I remember that. Thank you for updating your patch!
> > I think the idea (split module_alloc() from CONFIG_MODULE) is good to me.
> > If module support maintainers think this name is not good, you may be
> > able to rename it as text_alloc() and make the module_alloc() as a
> > wrapper of it.
>
> IIUC, most users of module_alloc() use it to allocate memory for text, except
> that module code uses it for both text and data. Therefore, I guess calling it
> text_alloc() is not 100% accurate until we change the module code (to use
> a different API to allocate memory for data).

Git history showed me

7a0e27b2a0ce mm: remove vmalloc_exec

I guess we are somehow going back in time...

Song

>
> Thanks,
> Song
>
> >
> > Acked-by: Masami Hiramatsu (Google) 
> > for kprobe side.
> >
> > Thank you,
> >
> > --
> > Masami Hiramatsu (Google) 


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Song Liu
sahiro Yamada , Jarkko Sakkinen , 
Sami Tolvanen , "Naveen N. Rao" 
, Marco Elver , Kees Cook 
, Steven Rostedt , Nathan 
Chancellor , "Russell King \(Oracle\)" 
, Mark Brown , Borislav Petkov 
, Alexander Egorenkov , Thomas 
Bogendoerfer , linux-par...@vger.kernel.org, 
Nathaniel McCallum , Dmitry Torokhov 
, "David S. Miller" , "Kirill 
A. Shutemov" , Tobias Huschle 
, "Peter Zijlstra \(Intel\)" , "H. 
Peter Anvin" , sparcli...@vger.kernel.org, Tiezhu Yang 
, Miroslav Benes , Chen Zhongjin 
, linux-ri...@lists.infradead.org, X86 ML , Russell King 
, Ingo Molnar , Aaron Tomlin 
, Albert Ou , Heiko Carstens 
, Liao Chang , Paul Walmsley 
, Josh Poimboeuf , Thomas 
Richter , linux-m...@vger.kernel.org, Changbin Du 
, Palmer Dabbelt , 
linuxppc-dev@lists.ozlabs.org, linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 8, 2022 at 9:28 AM Ard Biesheuvel  wrote:
>
> Hello Jarkko,
>
> On Wed, 8 Jun 2022 at 02:02, Jarkko Sakkinen  wrote:
> >
> > Tracing with kprobes while running a monolithic kernel is currently
> > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES.  This
> > dependency is a result of kprobes code using the module allocator for the
> > trampoline code.
> >
> > Detaching kprobes from modules helps to squeeze down the user space,
> > e.g. when developing new core kernel features, while still having all
> > the nice tracing capabilities.
> >
> > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > module_alloc.c, and compile as part of vmlinux when either CONFIG_MODULES
> > or CONFIG_KPROBES is enabled.  In addition, flag kernel module specific
> > code with CONFIG_MODULES.
> >
> > As the result, kprobes can be used with a monolithic kernel.
>
> I think I may have mentioned this the previous time as well, but I
> don't think this is the right approach.
>
> Kprobes uses alloc_insn_page() to allocate executable memory, but the
> requirements for this memory are radically different compared to
> loadable modules, which need to be within an arch-specific distance of
> the core kernel, need KASAN backing etc etc.

I think the distance of core kernel requirement is the same for kprobe
alloc_insn_page and modules, no?

Thanks,
Song

>
> This is why arm64, for instance, does not implement alloc_insn_page()
> in terms of module_alloc() [and likely does not belong in this patch
> for that reason]



>
> Is there any reason kprobes cannot simply use vmalloc()?
>


Re: [PATCH 02/36] x86/idle: Replace x86_idle with a static_call

2022-06-08 Thread Rafael J. Wysocki
ieu.desnoy...@efficios.com>, Frederic Weisbecker , Len 
Brown , linux-xte...@linux-xtensa.org, Sascha Hauer 
, Vasily Gorbik , linux-arm-msm 
, linux-al...@vger.kernel.org, linux-m68k 
, Stafford Horne , Linux ARM 
, ch...@zankel.net, Stephen Boyd 
, dingu...@kernel.org, Daniel Bristot de Oliveira 
, Alexander Shishkin , 
lpieral...@kernel.org, Rasmus Villemoes , Joel 
Fernandes , Will Deacon , Boris 
Ostrovsky , Kevin Hilman , 
linux-c...@vger.kernel.org, pv-driv...@vmware.com, 
linux-snps-...@lists.infradead.org, Mel Gorman , Jacob Pan 
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, linux-clk 
, Josh Triplett , Steven 
Rostedt , r...@vger.kernel.org, Borislav Petkov 
, bc...@quicinc.com, Thomas Bogendoerfer 
, Parisc List , Sudeep 
Holla , Shawn Guo , David Miller 
, Rich Felker , Tony Lindgren 
, amakha...@vmware.com, Bjorn Andersson 
, "H. Peter Anvin" , 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, linux-riscv 
, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, Yury Norov , Richard Weinberger 
, the arch/x86 maintainers , Russell King - 
ARM Linux , Ingo Molnar , Albert Ou 
, "P
 aul E. McKenney" , Heiko Carstens
 , stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, Paul Walmsley , 
linux-tegra , namhy...@kernel.org, Andy Shevchenko 
, jpoim...@kernel.org, Juergen Gross 
, Michal Simek , "open list:BROADCOM NVRAM 
DRIVER" , Palmer Dabbelt , Anup 
Patel , i...@jurassic.park.msu.ru, Johannes Berg 
, linuxppc-dev 
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
>
> Typical boot time setup; no need to suffer an indirect call for that.
>
> Signed-off-by: Peter Zijlstra (Intel) 
> Reviewed-by: Frederic Weisbecker 

Reviewed-by: Rafael J. Wysocki 

> ---
>  arch/x86/kernel/process.c |   50 
> +-
>  1 file changed, 28 insertions(+), 22 deletions(-)
>
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -692,7 +693,23 @@ void __switch_to_xtra(struct task_struct
>  unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
>  EXPORT_SYMBOL(boot_option_idle_override);
>
> -static void (*x86_idle)(void);
> +/*
> + * We use this if we don't have any better idle routine..
> + */
> +void __cpuidle default_idle(void)
> +{
> +   raw_safe_halt();
> +}
> +#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
> +EXPORT_SYMBOL(default_idle);
> +#endif
> +
> +DEFINE_STATIC_CALL_NULL(x86_idle, default_idle);
> +
> +static bool x86_idle_set(void)
> +{
> +   return !!static_call_query(x86_idle);
> +}
>
>  #ifndef CONFIG_SMP
>  static inline void play_dead(void)
> @@ -715,28 +732,17 @@ void arch_cpu_idle_dead(void)
>  /*
>   * Called from the generic idle code.
>   */
> -void arch_cpu_idle(void)
> -{
> -   x86_idle();
> -}
> -
> -/*
> - * We use this if we don't have any better idle routine..
> - */
> -void __cpuidle default_idle(void)
> +void __cpuidle arch_cpu_idle(void)
>  {
> -   raw_safe_halt();
> +   static_call(x86_idle)();
>  }
> -#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
> -EXPORT_SYMBOL(default_idle);
> -#endif
>
>  #ifdef CONFIG_XEN
>  bool xen_set_default_idle(void)
>  {
> -   bool ret = !!x86_idle;
> +   bool ret = x86_idle_set();
>
> -   x86_idle = default_idle;
> +   static_call_update(x86_idle, default_idle);
>
> return ret;
>  }
> @@ -859,20 +865,20 @@ void select_idle_routine(const struct cp
> if (boot_option_idle_override == IDLE_POLL && smp_num_siblings > 1)
> pr_warn_once("WARNING: polling idle and HT enabled, 
> performance may degrade\n");
>  #endif
> -   if (x86_idle || boot_option_idle_override == IDLE_POLL)
> +   if (x86_idle_set() || boot_option_idle_override == IDLE_POLL)
> return;
>
> if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
> pr_info("using AMD E400 aware idle routine\n");
> -   x86_idle = amd_e400_idle;
> +   static_call_update(x86_idle, amd_e400_idle);
> } else if (prefer_mwait_c1_over_halt(c)) {
> pr_info("using mwait in idle threads\n");
> -   x86_idle = mwait_idle;
> +   static_call_update(x86_idle, mwait_idle);
> } else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
> pr_info("using TDX aware idle routine\n");
> -   x86_idle = tdx_safe_halt;
> +   static_call_update(x86_idle, tdx_safe_halt);
> } else
> -   x86_idle = default_idle;
> +   static_call_update(x86_idle, default_idle);
>  }
>
>  void amd_e400_c1e_apic_setup(void)
> @@ -925,7 +931,7 @@ static int __init idle_setup(char *str)
>  * To continue to load the CPU idle driver, 

Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Ard Biesheuvel
Deacon , Masahiro Yamada , Sami Tolvanen 
, "Naveen N. Rao" , Marco 
Elver , Kees Cook , Steven Rostedt 
, Nathan Chancellor , "Russell King 
\(Oracle\)" , Mark Brown , 
Borislav Petkov , Alexander Egorenkov , 
Thomas Bogendoerfer , linux-par...@vger.kernel.org, 
Nathaniel McCallum , Dmitry Torokhov 
, "David S. Miller" , "Kirill 
A. Shutemov" , Tobias Huschle 
, "Peter Zijlstra \(Intel\)" , "H. 
Peter Anvin" , sparcli...@vger.kernel.org, Tiezhu Yang 
, Miroslav Benes , Chen Zhongjin 
, linu
 x-ri...@lists.infradead.org, X86 ML , Russell King 
, Ingo Molnar , Aaron Tomlin 
, Albert Ou , Heiko Carstens 
, Liao Chang , Paul Walmsley 
, Josh Poimboeuf , Thomas 
Richter , linux-m...@vger.kernel.org, Changbin Du 
, Palmer Dabbelt , 
linuxppc-dev@lists.ozlabs.org, linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Hello Jarkko,

On Wed, 8 Jun 2022 at 02:02, Jarkko Sakkinen  wrote:
>
> Tracing with kprobes while running a monolithic kernel is currently
> impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES.  This
> dependency is a result of kprobes code using the module allocator for the
> trampoline code.
>
> Detaching kprobes from modules helps to squeeze down the user space,
> e.g. when developing new core kernel features, while still having all
> the nice tracing capabilities.
>
> For kernel/ and arch/*, move module_alloc() and module_memfree() to
> module_alloc.c, and compile as part of vmlinux when either CONFIG_MODULES
> or CONFIG_KPROBES is enabled.  In addition, flag kernel module specific
> code with CONFIG_MODULES.
>
> As the result, kprobes can be used with a monolithic kernel.

I think I may have mentioned this the previous time as well, but I
don't think this is the right approach.

Kprobes uses alloc_insn_page() to allocate executable memory, but the
requirements for this memory are radically different compared to
loadable modules, which need to be within an arch-specific distance of
the core kernel, need KASAN backing etc etc.

This is why arm64, for instance, does not implement alloc_insn_page()
in terms of module_alloc() [and likely does not belong in this patch
for that reason]

Is there any reason kprobes cannot simply use vmalloc()?


>
> Signed-off-by: Jarkko Sakkinen 
> ---
> Tested with the help of BuildRoot and QEMU:
> - arm (function tracer)
> - arm64 (function tracer)
> - mips (function tracer)
> - powerpc (function tracer)
> - riscv (function tracer)
> - s390 (function tracer)
> - sparc (function tracer)
> - x86 (function tracer)
> - sh (function tracer, for the "pure" kernel/modules_alloc.c path)
> ---
>  arch/Kconfig   |  1 -
>  arch/arm/kernel/Makefile   |  5 +++
>  arch/arm/kernel/module.c   | 32 
>  arch/arm/kernel/module_alloc.c | 42 
>  arch/arm64/kernel/Makefile |  5 +++
>  arch/arm64/kernel/module.c | 47 ---
>  arch/arm64/kernel/module_alloc.c   | 57 
>  arch/mips/kernel/Makefile  |  5 +++
>  arch/mips/kernel/module.c  |  9 -
>  arch/mips/kernel/module_alloc.c| 18 +
>  arch/parisc/kernel/Makefile|  5 +++
>  arch/parisc/kernel/module.c| 11 --
>  arch/parisc/kernel/module_alloc.c  | 23 +++
>  arch/powerpc/kernel/Makefile   |  5 +++
>  arch/powerpc/kernel/module.c   | 37 --
>  arch/powerpc/kernel/module_alloc.c | 47 +++
>  arch/riscv/kernel/Makefile |  5 +++
>  arch/riscv/kernel/module.c | 10 -
>  arch/riscv/kernel/module_alloc.c   | 19 ++
>  arch/s390/kernel/Makefile  |  5 +++
>  arch/s390/kernel/module.c  | 17 -
>  arch/s390/kernel/module_alloc.c| 33 
>  arch/sparc/kernel/Makefile |  5 +++
>  arch/sparc/kernel/module.c | 30 ---
>  arch/sparc/kernel/module_alloc.c   | 39 +++
>  arch/x86/kernel/Makefile   |  5 +++
>  arch/x86/kernel/module.c   | 50 
>  arch/x86/kernel/module_alloc.c | 61 ++
>  kernel/Makefile|  5 +++
>  kernel/kprobes.c   | 10 +
>  kernel/module/main.c   | 17 -
>  kernel/module_alloc.c  | 26 +
>  kernel/trace/trace_kprobe.c| 10 -
>  33 files changed, 434 insertions(+), 262 deletions(-)
>  create mode 100644 arch/arm/kernel/module_alloc.c
>  create mode 100644 arch/arm64/kernel/module_alloc.c
>  create mode 100644 arch/mips/kernel/module_alloc.c
>  create mode 100644 arch/parisc/kernel/module_alloc.c
>  create mode 100644 arch/powerpc/kernel/module_alloc.c
>  create mode 100644 arch/riscv/kernel/module_alloc.c
>  create mode 100644 arch/s390/kernel/module_alloc.c
>  create mode 100644 arch/sparc/kernel/module_alloc.c
>  create mode 100644 

Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Song Liu
masahi...@kernel.org>, Jarkko Sakkinen , Sami Tolvanen 
, "Naveen N. Rao" , Marco 
Elver , Kees Cook , Steven Rostedt 
, Nathan Chancellor , "Russell King 
\(Oracle\)" , Mark Brown , 
Borislav Petkov , Alexander Egorenkov , 
Thomas Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav 
Benes , Chen Zhongjin , Ard Biesheuvel , the arch/x86 maintainers 
, Russell King , linux-riscv 
, Ingo Molnar , Aaron Tomlin 
, Albert Ou , Heiko Carstens 
, Liao Chang , Paul Walmsley 
, Josh Poimboeuf , Thomas 
Richter , "open list:BROADCOM NVRAM DRIVER" 
, Changbin Du , Palmer 
Dabbelt , linuxppc-dev , 
linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 8, 2022 at 7:21 AM Masami Hiramatsu  wrote:
>
> Hi Jarkko,
>
> On Wed, 8 Jun 2022 08:25:38 +0300
> Jarkko Sakkinen  wrote:
>
> > On Wed, Jun 08, 2022 at 10:35:42AM +0800, Guo Ren wrote:
> > > .
> > >
> > > On Wed, Jun 8, 2022 at 8:02 AM Jarkko Sakkinen  wrote:
> > > >
> > > > Tracing with kprobes while running a monolithic kernel is currently
> > > > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES.  This
> > > > dependency is a result of kprobes code using the module allocator for 
> > > > the
> > > > trampoline code.
> > > >
> > > > Detaching kprobes from modules helps to squeeze down the user space,
> > > > e.g. when developing new core kernel features, while still having all
> > > > the nice tracing capabilities.
> > > >
> > > > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > > > module_alloc.c, and compile as part of vmlinux when either 
> > > > CONFIG_MODULES
> > > > or CONFIG_KPROBES is enabled.  In addition, flag kernel module specific
> > > > code with CONFIG_MODULES.
> > > >
> > > > As the result, kprobes can be used with a monolithic kernel.
> > > It's strange when MODULES is n, but vmlinux still obtains module_alloc.
> > >
> > > Maybe we need a kprobe_alloc, right?
> >
> > Perhaps not the best name but at least it documents the fact that
> > they use the same allocator.
> >
> > Few years ago I carved up something "half-way there" for kprobes,
> > and I used the name text_alloc() [*].
> >
> > [*] 
> > https://lore.kernel.org/all/20200724050553.1724168-1-jarkko.sakki...@linux.intel.com/
>
> Yeah, I remember that. Thank you for updating your patch!
> I think the idea (split module_alloc() from CONFIG_MODULE) is good to me.
> If module support maintainers think this name is not good, you may be
> able to rename it as text_alloc() and make the module_alloc() as a
> wrapper of it.

IIUC, most users of module_alloc() use it to allocate memory for text, except
that module code uses it for both text and data. Therefore, I guess calling it
text_alloc() is not 100% accurate until we change the module code (to use
a different API to allocate memory for data).

Thanks,
Song

>
> Acked-by: Masami Hiramatsu (Google) 
> for kprobe side.
>
> Thank you,
>
> --
> Masami Hiramatsu (Google) 


Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-06-08 Thread Rafael J. Wysocki
ieu.desnoy...@efficios.com>, Frederic Weisbecker , Len 
Brown , linux-xte...@linux-xtensa.org, Sascha Hauer 
, Vasily Gorbik , linux-arm-msm 
, linux-al...@vger.kernel.org, linux-m68k 
, Stafford Horne , Linux ARM 
, ch...@zankel.net, Stephen Boyd 
, dingu...@kernel.org, Daniel Bristot de Oliveira 
, Alexander Shishkin , 
lpieral...@kernel.org, Rasmus Villemoes , Joel 
Fernandes , Will Deacon , Boris 
Ostrovsky , Kevin Hilman , 
linux-c...@vger.kernel.org, pv-driv...@vmware.com, 
linux-snps-...@lists.infradead.org, Mel Gorman , Jacob Pan 
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, linux-clk 
, Josh Triplett , Steven 
Rostedt , r...@vger.kernel.org, Borislav Petkov 
, bc...@quicinc.com, Thomas Bogendoerfer 
, Parisc List , Sudeep 
Holla , Shawn Guo , David Miller 
, Rich Felker , Tony Lindgren 
, amakha...@vmware.com, Bjorn Andersson 
, "H. Peter Anvin" , 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, linux-riscv 
, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, Yury Norov , Richard Weinberger 
, the arch/x86 maintainers , Russell King - 
ARM Linux , Ingo Molnar , Albert Ou 
, "P
 aul E. McKenney" , Heiko Carstens
 , stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, Paul Walmsley , 
linux-tegra , namhy...@kernel.org, Andy Shevchenko 
, jpoim...@kernel.org, Juergen Gross 
, Michal Simek , "open list:BROADCOM NVRAM 
DRIVER" , Palmer Dabbelt , Anup 
Patel , i...@jurassic.park.msu.ru, Johannes Berg 
, linuxppc-dev 
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 8, 2022 at 5:48 PM Peter Zijlstra  wrote:
>
> On Wed, Jun 08, 2022 at 05:01:05PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
> > >
> > > Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> > > Xeons") wrecked intel_idle in two ways:
> > >
> > >  - must not have tracing in idle functions
> > >  - must return with IRQs disabled
> > >
> > > Additionally, it added a branch for no good reason.
> > >
> > > Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> > > Signed-off-by: Peter Zijlstra (Intel) 
> >
> > Acked-by: Rafael J. Wysocki 
> >
> > And do I think correctly that this can be applied without the rest of
> > the series?
>
> Yeah, I don't think this relies on any of the preceding patches. If you
> want to route this through the pm/fixes tree that's fine.

OK, thanks, applied (and I moved the intel_idle() kerneldoc so it is
next to the function to avoid the docs build warning).


Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-06-08 Thread Peter Zijlstra
el.org>, Len Brown , linux-xte...@linux-xtensa.org, Sascha 
Hauer , Vasily Gorbik , 
linux-arm-msm , linux-al...@vger.kernel.org, 
linux-m68k , Stafford Horne 
, Linux ARM , 
ch...@zankel.net, Stephen Boyd , dingu...@kernel.org, Daniel 
Bristot de Oliveira , Alexander Shishkin 
, Michael Turquette 
, Rasmus Villemoes , Joel 
Fernandes , Will Deacon , Boris 
Ostrovsky , Kevin Hilman , 
linux-c...@vger.kernel.org, pv-driv...@vmware.com, 
linux-snps-...@lists.infradead.org, Mel Gorman , Jacob Pan 
, Arnd Bergmann , 
ulli.kr...@googlemail.com, vgupta@ker
 nel.org, linux-clk , Josh Triplett 
, Steven Rostedt , 
r...@vger.kernel.org, Borislav Petkov , bc...@quicinc.com, 
Thomas Bogendoerfer , Parisc List 
, Sudeep Holla , Shawn Guo 
, David Miller , Rich Felker 
, Tony Lindgren , amakha...@vmware.com, 
Bjorn Andersson , "H. Peter Anvin" 
, sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-riscv , anton.iva...@cambridgegreys.com, 
jo...@southpole.se, Yury Norov , Richard Weinberger 
, the arch/x86 maintainers , Russell King - 
ARM Linux , Ingo Molnar , Albert Ou 
, "Paul E. McKenney" , He
 iko Carstens , stefan.kristiansson
@saunalahti.fi, openr...@lists.librecores.org, Paul Walmsley 
, linux-tegra , 
namhy...@kernel.org, Andy Shevchenko , 
jpoim...@kernel.org, Juergen Gross , Michal Simek 
, "open list:BROADCOM NVRAM DRIVER" 
, Palmer Dabbelt , Anup Patel 
, i...@jurassic.park.msu.ru, Johannes Berg 
, linuxppc-dev 
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 05:01:05PM +0200, Rafael J. Wysocki wrote:
> On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
> >
> > Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> > Xeons") wrecked intel_idle in two ways:
> >
> >  - must not have tracing in idle functions
> >  - must return with IRQs disabled
> >
> > Additionally, it added a branch for no good reason.
> >
> > Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> > Signed-off-by: Peter Zijlstra (Intel) 
> 
> Acked-by: Rafael J. Wysocki 
> 
> And do I think correctly that this can be applied without the rest of
> the series?

Yeah, I don't think this relies on any of the preceding patches. If you
want to route this through the pm/fixes tree that's fine.

Thanks!


Re: [PATCH 34/36] cpuidle,omap3: Push RCU-idle into omap_sram_idle()

2022-06-08 Thread Peter Zijlstra
On Wed, Jun 08, 2022 at 04:27:57PM +0200, Peter Zijlstra wrote:
> @@ -254,11 +255,18 @@ void omap_sram_idle(void)
>*/
>   if (save_state)
>   omap34xx_save_context(omap3_arm_context);
> +
> + if (rcuidle)
> + cpuidle_rcu_enter();
> +
>   if (save_state == 1 || save_state == 3)
>   cpu_suspend(save_state, omap34xx_do_sram_idle);
>   else
>   omap34xx_do_sram_idle(save_state);
>  
> + if (rcuidle)
> + rcuidle_rcu_exit();

*sigh* so much for this having been exposed to the robots for >2 days :/


Re: [PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-06-08 Thread Rafael J. Wysocki
ieu.desnoy...@efficios.com>, Frederic Weisbecker , Len 
Brown , linux-xte...@linux-xtensa.org, Sascha Hauer 
, Vasily Gorbik , linux-arm-msm 
, linux-al...@vger.kernel.org, linux-m68k 
, Stafford Horne , Linux ARM 
, ch...@zankel.net, Stephen Boyd 
, dingu...@kernel.org, Daniel Bristot de Oliveira 
, Alexander Shishkin , 
lpieral...@kernel.org, Rasmus Villemoes , Joel 
Fernandes , Will Deacon , Boris 
Ostrovsky , Kevin Hilman , 
linux-c...@vger.kernel.org, pv-driv...@vmware.com, 
linux-snps-...@lists.infradead.org, Mel Gorman , Jacob Pan 
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, linux-clk 
, Josh Triplett , Steven 
Rostedt , r...@vger.kernel.org, Borislav Petkov 
, bc...@quicinc.com, Thomas Bogendoerfer 
, Parisc List , Sudeep 
Holla , Shawn Guo , David Miller 
, Rich Felker , Tony Lindgren 
, amakha...@vmware.com, Bjorn Andersson 
, "H. Peter Anvin" , 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, linux-riscv 
, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, Yury Norov , Richard Weinberger 
, the arch/x86 maintainers , Russell King - 
ARM Linux , Ingo Molnar , Albert Ou 
, "P
 aul E. McKenney" , Heiko Carstens
 , stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, Paul Walmsley , 
linux-tegra , namhy...@kernel.org, Andy Shevchenko 
, jpoim...@kernel.org, Juergen Gross 
, Michal Simek , "open list:BROADCOM NVRAM 
DRIVER" , Palmer Dabbelt , Anup 
Patel , i...@jurassic.park.msu.ru, Johannes Berg 
, linuxppc-dev 
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 8, 2022 at 4:47 PM Peter Zijlstra  wrote:
>
> Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
> Xeons") wrecked intel_idle in two ways:
>
>  - must not have tracing in idle functions
>  - must return with IRQs disabled
>
> Additionally, it added a branch for no good reason.
>
> Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
> Signed-off-by: Peter Zijlstra (Intel) 

Acked-by: Rafael J. Wysocki 

And do I think correctly that this can be applied without the rest of
the series?

> ---
>  drivers/idle/intel_idle.c |   48 
> +++---
>  1 file changed, 37 insertions(+), 11 deletions(-)
>
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -129,21 +137,37 @@ static unsigned int mwait_substates __in
>   *
>   * Must be called under local_irq_disable().
>   */
> +
> -static __cpuidle int intel_idle(struct cpuidle_device *dev,
> -   struct cpuidle_driver *drv, int index)
> +static __always_inline int __intel_idle(struct cpuidle_device *dev,
> +   struct cpuidle_driver *drv, int index)
>  {
> struct cpuidle_state *state = >states[index];
> unsigned long eax = flg2MWAIT(state->flags);
> unsigned long ecx = 1; /* break on interrupt flag */
>
> -   if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE)
> -   local_irq_enable();
> -
> mwait_idle_with_hints(eax, ecx);
>
> return index;
>  }
>
> +static __cpuidle int intel_idle(struct cpuidle_device *dev,
> +   struct cpuidle_driver *drv, int index)
> +{
> +   return __intel_idle(dev, drv, index);
> +}
> +
> +static __cpuidle int intel_idle_irq(struct cpuidle_device *dev,
> +   struct cpuidle_driver *drv, int index)
> +{
> +   int ret;
> +
> +   raw_local_irq_enable();
> +   ret = __intel_idle(dev, drv, index);
> +   raw_local_irq_disable();
> +
> +   return ret;
> +}
> +
>  /**
>   * intel_idle_s2idle - Ask the processor to enter the given idle state.
>   * @dev: cpuidle device of the target CPU.
> @@ -1801,6 +1824,9 @@ static void __init intel_idle_init_cstat
> /* Structure copy. */
> drv->states[drv->state_count] = cpuidle_state_table[cstate];
>
> +   if (cpuidle_state_table[cstate].flags & 
> CPUIDLE_FLAG_IRQ_ENABLE)
> +   drv->states[drv->state_count].enter = intel_idle_irq;
> +
> if ((disabled_states_mask & BIT(drv->state_count)) ||
> ((icpu->use_acpi || force_use_acpi) &&
>  intel_idle_off_by_default(mwait_hint) &&
>
>


[PATCH 15/36] cpuidle,cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


All callers should still have RCU enabled.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/cpu_pm.c |9 -
 1 file changed, 9 deletions(-)

--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -30,16 +30,9 @@ static int cpu_pm_notify(enum cpu_pm_eve
 {
int ret;
 
-   /*
-* This introduces a RCU read critical section, which could be
-* disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
-* this.
-*/
-   rcu_irq_enter_irqson();
rcu_read_lock();
ret = raw_notifier_call_chain(_pm_notifier.chain, event, NULL);
rcu_read_unlock();
-   rcu_irq_exit_irqson();
 
return notifier_to_errno(ret);
 }
@@ -49,11 +42,9 @@ static int cpu_pm_notify_robust(enum cpu
unsigned long flags;
int ret;
 
-   rcu_irq_enter_irqson();
raw_spin_lock_irqsave(_pm_notifier.lock, flags);
ret = raw_notifier_call_chain_robust(_pm_notifier.chain, event_up, 
event_down, NULL);
raw_spin_unlock_irqrestore(_pm_notifier.lock, flags);
-   rcu_irq_exit_irqson();
 
return notifier_to_errno(ret);
 }




[PATCH 31/36] cpuidle,acpi: Make noinstr clean

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


vmlinux.o: warning: objtool: io_idle+0xc: call to __inb.isra.0() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter+0xfe: call to num_online_cpus() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_idle_enter+0x115: call to 
acpi_idle_fallback_to_c1.isra.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/shared/io.h |4 ++--
 drivers/acpi/processor_idle.c|2 +-
 include/linux/cpumask.h  |4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/include/asm/shared/io.h
+++ b/arch/x86/include/asm/shared/io.h
@@ -5,13 +5,13 @@
 #include 
 
 #define BUILDIO(bwl, bw, type) \
-static inline void __out##bwl(type value, u16 port)\
+static __always_inline void __out##bwl(type value, u16 port)   \
 {  \
asm volatile("out" #bwl " %" #bw "0, %w1"   \
 : : "a"(value), "Nd"(port));   \
 }  \
\
-static inline type __in##bwl(u16 port) \
+static __always_inline type __in##bwl(u16 port)
\
 {  \
type value; \
asm volatile("in" #bwl " %w1, %" #bw "0"\
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -593,7 +593,7 @@ static int acpi_idle_play_dead(struct cp
return 0;
 }
 
-static bool acpi_idle_fallback_to_c1(struct acpi_processor *pr)
+static __always_inline bool acpi_idle_fallback_to_c1(struct acpi_processor *pr)
 {
return IS_ENABLED(CONFIG_HOTPLUG_CPU) && !pr->flags.has_cst &&
!(acpi_gbl_FADT.flags & ACPI_FADT_C2_MP_SUPPORTED);
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -908,9 +908,9 @@ static inline const struct cpumask *get_
  * concurrent CPU hotplug operations unless invoked from a cpuhp_lock held
  * region.
  */
-static inline unsigned int num_online_cpus(void)
+static __always_inline unsigned int num_online_cpus(void)
 {
-   return atomic_read(&__num_online_cpus);
+   return arch_atomic_read(&__num_online_cpus);
 }
 #define num_possible_cpus()cpumask_weight(cpu_possible_mask)
 #define num_present_cpus() cpumask_weight(cpu_present_mask)




[PATCH 12/36] cpuidle,omap2: Push RCU-idle into driver

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Doing RCU-idle outside the driver, only to then temporarily enable it
again, some *four* times, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-omap2/cpuidle44xx.c |   29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -105,7 +105,9 @@ static int omap_enter_idle_smp(struct cp
}
raw_spin_unlock_irqrestore(_lock, flag);
 
+   rcu_idle_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
+   rcu_idle_exit();
 
raw_spin_lock_irqsave(_lock, flag);
if (cx->mpu_state_vote == num_online_cpus())
@@ -151,10 +153,10 @@ static int omap_enter_idle_coupled(struc
 (cx->mpu_logic_state == PWRDM_POWER_OFF);
 
/* Enter broadcast mode for periodic timers */
-   RCU_NONIDLE(tick_broadcast_enable());
+   tick_broadcast_enable();
 
/* Enter broadcast mode for one-shot timers */
-   RCU_NONIDLE(tick_broadcast_enter());
+   tick_broadcast_enter();
 
/*
 * Call idle CPU PM enter notifier chain so that
@@ -166,7 +168,7 @@ static int omap_enter_idle_coupled(struc
 
if (dev->cpu == 0) {
pwrdm_set_logic_retst(mpu_pd, cx->mpu_logic_state);
-   RCU_NONIDLE(omap_set_pwrdm_state(mpu_pd, cx->mpu_state));
+   omap_set_pwrdm_state(mpu_pd, cx->mpu_state);
 
/*
 * Call idle CPU cluster PM enter notifier chain
@@ -178,14 +180,16 @@ static int omap_enter_idle_coupled(struc
index = 0;
cx = state_ptr + index;
pwrdm_set_logic_retst(mpu_pd, 
cx->mpu_logic_state);
-   RCU_NONIDLE(omap_set_pwrdm_state(mpu_pd, 
cx->mpu_state));
+   omap_set_pwrdm_state(mpu_pd, cx->mpu_state);
mpuss_can_lose_context = 0;
}
}
}
 
+   rcu_idle_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
cpu_done[dev->cpu] = true;
+   rcu_idle_exit();
 
/* Wakeup CPU1 only if it is not offlined */
if (dev->cpu == 0 && cpumask_test_cpu(1, cpu_online_mask)) {
@@ -194,9 +198,9 @@ static int omap_enter_idle_coupled(struc
mpuss_can_lose_context)
gic_dist_disable();
 
-   RCU_NONIDLE(clkdm_deny_idle(cpu_clkdm[1]));
-   RCU_NONIDLE(omap_set_pwrdm_state(cpu_pd[1], PWRDM_POWER_ON));
-   RCU_NONIDLE(clkdm_allow_idle(cpu_clkdm[1]));
+   clkdm_deny_idle(cpu_clkdm[1]);
+   omap_set_pwrdm_state(cpu_pd[1], PWRDM_POWER_ON);
+   clkdm_allow_idle(cpu_clkdm[1]);
 
if (IS_PM44XX_ERRATUM(PM_OMAP4_ROM_SMP_BOOT_ERRATUM_GICD) &&
mpuss_can_lose_context) {
@@ -222,7 +226,7 @@ static int omap_enter_idle_coupled(struc
cpu_pm_exit();
 
 cpu_pm_out:
-   RCU_NONIDLE(tick_broadcast_exit());
+   tick_broadcast_exit();
 
 fail:
cpuidle_coupled_parallel_barrier(dev, _barrier);
@@ -247,7 +251,8 @@ static struct cpuidle_driver omap4_idle_
/* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */
.exit_latency = 328 + 440,
.target_residency = 960,
-   .flags = CPUIDLE_FLAG_COUPLED,
+   .flags = CPUIDLE_FLAG_COUPLED |
+CPUIDLE_FLAG_RCU_IDLE,
.enter = omap_enter_idle_coupled,
.name = "C2",
.desc = "CPUx OFF, MPUSS CSWR",
@@ 

[PATCH 03/36] cpuidle/poll: Ensure IRQ state is invariant

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


cpuidle_state::enter() methods should be IRQ invariant

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/poll_state.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -17,7 +17,7 @@ static int __cpuidle poll_idle(struct cp
 
dev->poll_time_limit = false;
 
-   local_irq_enable();
+   raw_local_irq_enable();
if (!current_set_polling_and_test()) {
unsigned int loop_count = 0;
u64 limit;
@@ -36,6 +36,8 @@ static int __cpuidle poll_idle(struct cp
}
}
}
+   raw_local_irq_disable();
+
current_clr_polling();
 
return index;




[PATCH 16/36] rcu: Fix rcu_idle_exit()

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Current rcu_idle_exit() is terminally broken because it uses
local_irq_{save,restore}(), which are traced which uses RCU.

However, now that all the callers are sure to have IRQs disabled, we
can remove these calls.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Paul E. McKenney 
---
 kernel/rcu/tree.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -659,7 +659,7 @@ static noinstr void rcu_eqs_enter(bool u
  * If you add or remove a call to rcu_idle_enter(), be sure to test with
  * CONFIG_RCU_EQS_DEBUG=y.
  */
-void rcu_idle_enter(void)
+void noinstr rcu_idle_enter(void)
 {
lockdep_assert_irqs_disabled();
rcu_eqs_enter(false);
@@ -896,13 +896,10 @@ static void noinstr rcu_eqs_exit(bool us
  * If you add or remove a call to rcu_idle_exit(), be sure to test with
  * CONFIG_RCU_EQS_DEBUG=y.
  */
-void rcu_idle_exit(void)
+void noinstr rcu_idle_exit(void)
 {
-   unsigned long flags;
-
-   local_irq_save(flags);
+   lockdep_assert_irqs_disabled();
rcu_eqs_exit(false);
-   local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_exit);
 




[PATCH 29/36] cpuidle,xenpv: Make more PARAVIRT_XXL noinstr clean

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to 
HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/paravirt.h  |6 --
 arch/x86/include/asm/special_insns.h |4 ++--
 arch/x86/include/asm/xen/hypercall.h |2 +-
 arch/x86/kernel/paravirt.c   |   14 --
 arch/x86/xen/enlighten_pv.c  |2 +-
 arch/x86/xen/irq.c   |2 +-
 6 files changed, 21 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -168,7 +168,7 @@ static inline void __write_cr4(unsigned
PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-static inline void arch_safe_halt(void)
+static __always_inline void arch_safe_halt(void)
 {
PVOP_VCALL0(irq.safe_halt);
 }
@@ -178,7 +178,9 @@ static inline void halt(void)
PVOP_VCALL0(irq.halt);
 }
 
-static inline void wbinvd(void)
+extern noinstr void pv_native_wbinvd(void);
+
+static __always_inline void wbinvd(void)
 {
PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV));
 }
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -115,7 +115,7 @@ static inline void wrpkru(u32 pkru)
 }
 #endif
 
-static inline void native_wbinvd(void)
+static __always_inline void native_wbinvd(void)
 {
asm volatile("wbinvd": : :"memory");
 }
@@ -179,7 +179,7 @@ static inline void __write_cr4(unsigned
native_write_cr4(x);
 }
 
-static inline void wbinvd(void)
+static __always_inline void wbinvd(void)
 {
native_wbinvd();
 }
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -382,7 +382,7 @@ MULTI_stack_switch(struct multicall_entr
 }
 #endif
 
-static inline int
+static __always_inline int
 HYPERVISOR_sched_op(int cmd, void *arg)
 {
return _hypercall2(int, sched_op, cmd, arg);
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -233,6 +233,11 @@ static noinstr void pv_native_set_debugr
native_set_debugreg(regno, val);
 }
 
+noinstr void pv_native_wbinvd(void)
+{
+   native_wbinvd();
+}
+
 static noinstr void pv_native_irq_enable(void)
 {
native_irq_enable();
@@ -242,6 +247,11 @@ static noinstr void pv_native_irq_disabl
 {
native_irq_disable();
 }
+
+static noinstr void pv_native_safe_halt(void)
+{
+   native_safe_halt();
+}
 #endif
 
 enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
@@ -273,7 +283,7 @@ struct paravirt_patch_template pv_ops =
.cpu.read_cr0   = native_read_cr0,
.cpu.write_cr0  = native_write_cr0,
.cpu.write_cr4  = native_write_cr4,
-   .cpu.wbinvd = native_wbinvd,
+   .cpu.wbinvd = pv_native_wbinvd,
.cpu.read_msr   = native_read_msr,
.cpu.write_msr  = native_write_msr,
.cpu.read_msr_safe  = native_read_msr_safe,
@@ -307,7 +317,7 @@ struct paravirt_patch_template pv_ops =
.irq.save_fl= __PV_IS_CALLEE_SAVE(native_save_fl),
.irq.irq_disable= __PV_IS_CALLEE_SAVE(pv_native_irq_disable),
.irq.irq_enable = __PV_IS_CALLEE_SAVE(pv_native_irq_enable),
-   .irq.safe_halt  = native_safe_halt,
+   .irq.safe_halt  = pv_native_safe_halt,
.irq.halt   = native_halt,
 #endif /* CONFIG_PARAVIRT_XXL */
 
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1019,7 +1019,7 @@ static const typeof(pv_ops) 

[PATCH 13/36] cpuidle,dt: Push RCU-idle into driver

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Doing RCU-idle outside the driver, only to then temporarily enable it
again before going idle is daft.

Notably: this converts all dt_init_idle_driver() and
__CPU_PM_CPU_IDLE_ENTER() users for they are inextrably intertwined.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-omap2/cpuidle34xx.c|4 ++--
 drivers/acpi/processor_idle.c|2 ++
 drivers/cpuidle/cpuidle-arm.c|1 +
 drivers/cpuidle/cpuidle-big_little.c |8 ++--
 drivers/cpuidle/cpuidle-psci.c   |1 +
 drivers/cpuidle/cpuidle-qcom-spm.c   |1 +
 drivers/cpuidle/cpuidle-riscv-sbi.c  |1 +
 drivers/cpuidle/dt_idle_states.c |2 +-
 include/linux/cpuidle.h  |4 
 9 files changed, 19 insertions(+), 5 deletions(-)

--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -1200,6 +1200,8 @@ static int acpi_processor_setup_lpi_stat
state->target_residency = lpi->min_residency;
if (lpi->arch_flags)
state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+   if (lpi->entry_method == ACPI_CSTATE_FFH)
+   state->flags |= CPUIDLE_FLAG_RCU_IDLE;
state->enter = acpi_idle_lpi_enter;
drv->safe_state_index = i;
}
--- a/drivers/cpuidle/cpuidle-arm.c
+++ b/drivers/cpuidle/cpuidle-arm.c
@@ -53,6 +53,7 @@ static struct cpuidle_driver arm_idle_dr
 * handler for idle state index 0.
 */
.states[0] = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = arm_enter_idle_state,
.exit_latency   = 1,
.target_residency   = 1,
--- a/drivers/cpuidle/cpuidle-big_little.c
+++ b/drivers/cpuidle/cpuidle-big_little.c
@@ -64,7 +64,8 @@ static struct cpuidle_driver bl_idle_lit
.enter  = bl_enter_powerdown,
.exit_latency   = 700,
.target_residency   = 2500,
-   .flags  = CPUIDLE_FLAG_TIMER_STOP,
+   .flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE,
.name   = "C1",
.desc   = "ARM little-cluster power down",
},
@@ -85,7 +86,8 @@ static struct cpuidle_driver bl_idle_big
.enter  = bl_enter_powerdown,
.exit_latency   = 500,
.target_residency   = 2000,
-   .flags  = CPUIDLE_FLAG_TIMER_STOP,
+   .flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE,
.name   = "C1",
.desc   = "ARM big-cluster power down",
},
@@ -124,11 +126,13 @@ static int bl_enter_powerdown(struct cpu
struct cpuidle_driver *drv, int idx)
 {
cpu_pm_enter();
+   rcu_idle_enter();
 
cpu_suspend(0, bl_powerdown_finisher);
 
/* signals the MCPM core that CPU is out of low power state */
mcpm_cpu_powered_up();
+   rcu_idle_exit();
 
cpu_pm_exit();
 
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -357,6 +357,7 @@ static int psci_idle_init_cpu(struct dev
 * PSCI idle states relies on architectural WFI to be represented as
 * state index 0.
 */
+   drv->states[0].flags = CPUIDLE_FLAG_RCU_IDLE;
drv->states[0].enter = psci_enter_idle_state;
drv->states[0].exit_latency = 1;
drv->states[0].target_residency = 1;
--- a/drivers/cpuidle/cpuidle-qcom-spm.c
+++ 

[PATCH 11/36] cpuidle,armada: Push RCU-idle into driver

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Doing RCU-idle outside the driver, only to then temporarily enable it
again before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle-mvebu-v7.c |7 +++
 1 file changed, 7 insertions(+)

--- a/drivers/cpuidle/cpuidle-mvebu-v7.c
+++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
@@ -36,7 +36,10 @@ static int mvebu_v7_enter_idle(struct cp
if (drv->states[index].flags & MVEBU_V7_FLAG_DEEP_IDLE)
deepidle = true;
 
+   rcu_idle_enter();
ret = mvebu_v7_cpu_suspend(deepidle);
+   rcu_idle_exit();
+
cpu_pm_exit();
 
if (ret)
@@ -49,6 +52,7 @@ static struct cpuidle_driver armadaxp_id
.name   = "armada_xp_idle",
.states[0]  = ARM_CPUIDLE_WFI_STATE,
.states[1]  = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = mvebu_v7_enter_idle,
.exit_latency   = 100,
.power_usage= 50,
@@ -57,6 +61,7 @@ static struct cpuidle_driver armadaxp_id
.desc   = "CPU power down",
},
.states[2]  = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = mvebu_v7_enter_idle,
.exit_latency   = 1000,
.power_usage= 5,
@@ -72,6 +77,7 @@ static struct cpuidle_driver armada370_i
.name   = "armada_370_idle",
.states[0]  = ARM_CPUIDLE_WFI_STATE,
.states[1]  = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = mvebu_v7_enter_idle,
.exit_latency   = 100,
.power_usage= 5,
@@ -87,6 +93,7 @@ static struct cpuidle_driver armada38x_i
.name   = "armada_38x_idle",
.states[0]  = ARM_CPUIDLE_WFI_STATE,
.states[1]  = {
+   .flags  = CPUIDLE_FLAG_RCU_IDLE,
.enter  = mvebu_v7_enter_idle,
.exit_latency   = 10,
.power_usage= 5,




[PATCH 07/36] cpuidle,tegra: Push RCU-idle into driver

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle-tegra.c |   21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

--- a/drivers/cpuidle/cpuidle-tegra.c
+++ b/drivers/cpuidle/cpuidle-tegra.c
@@ -180,9 +180,11 @@ static int tegra_cpuidle_state_enter(str
}
 
local_fiq_disable();
-   RCU_NONIDLE(tegra_pm_set_cpu_in_lp2());
+   tegra_pm_set_cpu_in_lp2();
cpu_pm_enter();
 
+   rcu_idle_enter();
+
switch (index) {
case TEGRA_C7:
err = tegra_cpuidle_c7_enter();
@@ -197,8 +199,10 @@ static int tegra_cpuidle_state_enter(str
break;
}
 
+   rcu_idle_exit();
+
cpu_pm_exit();
-   RCU_NONIDLE(tegra_pm_clear_cpu_in_lp2());
+   tegra_pm_clear_cpu_in_lp2();
local_fiq_enable();
 
return err ?: index;
@@ -226,6 +230,7 @@ static int tegra_cpuidle_enter(struct cp
   struct cpuidle_driver *drv,
   int index)
 {
+   bool do_rcu = drv->states[index].flags & CPUIDLE_FLAG_RCU_IDLE;
unsigned int cpu = cpu_logical_map(dev->cpu);
int ret;
 
@@ -233,9 +238,13 @@ static int tegra_cpuidle_enter(struct cp
if (dev->states_usage[index].disable)
return -1;
 
-   if (index == TEGRA_C1)
+   if (index == TEGRA_C1) {
+   if (do_rcu)
+   rcu_idle_enter();
ret = arm_cpuidle_simple_enter(dev, drv, index);
-   else
+   if (do_rcu)
+   rcu_idle_exit();
+   } else
ret = tegra_cpuidle_state_enter(dev, index, cpu);
 
if (ret < 0) {
@@ -285,7 +294,8 @@ static struct cpuidle_driver tegra_idle_
.exit_latency   = 2000,
.target_residency   = 2200,
.power_usage= 100,
-   .flags  = CPUIDLE_FLAG_TIMER_STOP,
+   .flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE,
.name   = "C7",
.desc   = "CPU core powered off",
},
@@ -295,6 +305,7 @@ static struct cpuidle_driver tegra_idle_
.target_residency   = 1,
.power_usage= 0,
.flags  = CPUIDLE_FLAG_TIMER_STOP |
+ CPUIDLE_FLAG_RCU_IDLE   |
  CPUIDLE_FLAG_COUPLED,
.name   = "CC6",
.desc   = "CPU cluster powered off",




[PATCH 33/36] cpuidle,omap3: Use WFI for omap3_pm_idle()

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


arch_cpu_idle() is a very simple idle interface and exposes only a
single idle state and is expected to not require RCU and not do any
tracing/instrumentation.

As such, omap_sram_idle() is not a valid implementation. Replace it
with the simple (shallow) omap3_do_wfi() call. Leaving the more
complicated idle states for the cpuidle driver.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-omap2/pm34xx.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -294,7 +294,7 @@ static void omap3_pm_idle(void)
if (omap_irq_pending())
return;
 
-   omap_sram_idle();
+   omap3_do_wfi();
 }
 
 #ifdef CONFIG_SUSPEND




[PATCH 17/36] acpi_idle: Remove tracing

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


All the idle routines are called with RCU disabled, as such there must
not be any tracing inside.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/acpi/processor_idle.c |   24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -108,8 +108,8 @@ static const struct dmi_system_id proces
 static void __cpuidle acpi_safe_halt(void)
 {
if (!tif_need_resched()) {
-   safe_halt();
-   local_irq_disable();
+   raw_safe_halt();
+   raw_local_irq_disable();
}
 }
 
@@ -524,16 +524,21 @@ static int acpi_idle_bm_check(void)
return bm_status;
 }
 
-static void wait_for_freeze(void)
+static __cpuidle void io_idle(unsigned long addr)
 {
+   /* IO port based C-state */
+   inb(addr);
+
 #ifdef CONFIG_X86
/* No delay is needed if we are in guest */
if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
return;
 #endif
-   /* Dummy wait op - must do something useless after P_LVL2 read
-  because chipsets cannot guarantee that STPCLK# signal
-  gets asserted in time to freeze execution properly. */
+   /*
+* Dummy wait op - must do something useless after P_LVL2 read
+* because chipsets cannot guarantee that STPCLK# signal
+* gets asserted in time to freeze execution properly.
+*/
inl(acpi_gbl_FADT.xpm_timer_block.address);
 }
 
@@ -553,9 +558,7 @@ static void __cpuidle acpi_idle_do_entry
} else if (cx->entry_method == ACPI_CSTATE_HALT) {
acpi_safe_halt();
} else {
-   /* IO port based C-state */
-   inb(cx->address);
-   wait_for_freeze();
+   io_idle(cx->address);
}
 
perf_lopwr_cb(false);
@@ -577,8 +580,7 @@ static int acpi_idle_play_dead(struct cp
if (cx->entry_method == ACPI_CSTATE_HALT)
safe_halt();
else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
-   inb(cx->address);
-   wait_for_freeze();
+   io_idle(cx->address);
} else
return -ENODEV;
 




[PATCH 00/36] cpuidle,rcu: Cleanup the mess

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Hi All! (omg so many)

These here few patches mostly clear out the utter mess that is cpuidle vs 
rcuidle.

At the end of the ride there's only 2 real RCU_NONIDLE() users left

  arch/arm64/kernel/suspend.c:RCU_NONIDLE(__cpu_suspend_exit());
  drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, 
PERF_EF_RELOAD));
  kernel/cfi.c:   RCU_NONIDLE({

(the CFI one is likely dead in the kCFI rewrite) and there's only a hand full
of trace_.*_rcuidle() left:

  kernel/trace/trace_preemptirq.c:
trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
  kernel/trace/trace_preemptirq.c:
trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
  kernel/trace/trace_preemptirq.c:
trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
  kernel/trace/trace_preemptirq.c:
trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
  kernel/trace/trace_preemptirq.c:
trace_preempt_enable_rcuidle(a0, a1);
  kernel/trace/trace_preemptirq.c:
trace_preempt_disable_rcuidle(a0, a1);

All of them are in 'deprecated' code that is unused for GENERIC_ENTRY.

I've touched a _lot_ of code that I can't test and likely broken some of it :/
In particular, the whole ARM cpuidle stuff was quite involved with OMAP being
the absolute 'winner'.

I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that to
GENERIC_ENTRY. I've also got a note that says ARM64 can probably do a WFE based
idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs.

---
 arch/alpha/kernel/process.c  |1 
 arch/alpha/kernel/vmlinux.lds.S  |1 
 arch/arc/kernel/process.c|3 +
 arch/arc/kernel/vmlinux.lds.S|1 
 arch/arm/include/asm/vmlinux.lds.h   |1 
 arch/arm/kernel/process.c|1 
 arch/arm/kernel/smp.c|6 +--
 arch/arm/mach-gemini/board-dt.c  |3 +
 arch/arm/mach-imx/cpuidle-imx6q.c|4 +-
 arch/arm/mach-imx/cpuidle-imx6sx.c   |5 ++
 arch/arm/mach-omap2/cpuidle34xx.c|   16 
 arch/arm/mach-omap2/cpuidle44xx.c|   29 +--
 arch/arm/mach-omap2/pm.h |2 -
 arch/arm/mach-omap2/pm34xx.c |   14 +--
 arch/arm/mach-omap2/powerdomain.c|   10 ++---
 arch/arm64/kernel/idle.c |1 
 arch/arm64/kernel/smp.c  |4 +-
 arch/arm64/kernel/vmlinux.lds.S  |1 
 arch/csky/kernel/process.c   |1 
 arch/csky/kernel/smp.c   |2 -
 arch/csky/kernel/vmlinux.lds.S   |1 
 arch/hexagon/kernel/process.c|1 
 arch/hexagon/kernel/vmlinux.lds.S|1 
 arch/ia64/kernel/process.c   |1 
 arch/ia64/kernel/vmlinux.lds.S   |1 
 arch/loongarch/kernel/vmlinux.lds.S  |1 
 arch/m68k/kernel/vmlinux-nommu.lds   |1 
 arch/m68k/kernel/vmlinux-std.lds |1 
 arch/m68k/kernel/vmlinux-sun3.lds|1 
 arch/microblaze/kernel/process.c |1 
 arch/microblaze/kernel/vmlinux.lds.S |1 
 arch/mips/kernel/idle.c  |8 +---
 arch/mips/kernel/vmlinux.lds.S   |1 
 arch/nios2/kernel/process.c  |1 
 arch/nios2/kernel/vmlinux.lds.S  |1 
 arch/openrisc/kernel/process.c   |1 
 arch/openrisc/kernel/vmlinux.lds.S   |1 
 arch/parisc/kernel/process.c |2 -
 arch/parisc/kernel/vmlinux.lds.S |1 
 arch/powerpc/kernel/idle.c   |5 +-
 arch/powerpc/kernel/vmlinux.lds.S|1 
 arch/riscv/kernel/process.c  |1 
 arch/riscv/kernel/vmlinux-xip.lds.S  |1 
 arch/riscv/kernel/vmlinux.lds.S  |1 
 arch/s390/kernel/idle.c  |1 
 arch/s390/kernel/vmlinux.lds.S   |1 
 

[PATCH 30/36] cpuidle,nospec: Make noinstr clean

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


vmlinux.o: warning: objtool: mwait_idle+0x47: call to 
mds_idle_clear_cpu_buffers() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0xa2: call to 
mds_idle_clear_cpu_buffers() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0x91: call to 
mds_idle_clear_cpu_buffers() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0x8c: call to 
mds_idle_clear_cpu_buffers() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xaa: call to 
mds_idle_clear_cpu_buffers() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/nospec-branch.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -310,7 +310,7 @@ static __always_inline void mds_user_cle
  *
  * Clear CPU buffers if the corresponding static key is enabled
  */
-static inline void mds_idle_clear_cpu_buffers(void)
+static __always_inline void mds_idle_clear_cpu_buffers(void)
 {
if (static_branch_likely(_idle_clear))
mds_clear_cpu_buffers();




[PATCH 27/36] cpuidle,mwait: Make noinstr clean

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


vmlinux.o: warning: objtool: intel_idle_s2idle+0x6e: call to 
__monitor.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0x8c: call to 
__monitor.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0x73: call to __monitor.constprop.0() 
leaves .noinstr.text section

vmlinux.o: warning: objtool: mwait_idle+0x88: call to clflush() leaves 
.noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/mwait.h |   12 ++--
 arch/x86/include/asm/special_insns.h |2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -25,7 +25,7 @@
 #define TPAUSE_C01_STATE   1
 #define TPAUSE_C02_STATE   0
 
-static inline void __monitor(const void *eax, unsigned long ecx,
+static __always_inline void __monitor(const void *eax, unsigned long ecx,
 unsigned long edx)
 {
/* "monitor %eax, %ecx, %edx;" */
@@ -33,7 +33,7 @@ static inline void __monitor(const void
 :: "a" (eax), "c" (ecx), "d"(edx));
 }
 
-static inline void __monitorx(const void *eax, unsigned long ecx,
+static __always_inline void __monitorx(const void *eax, unsigned long ecx,
  unsigned long edx)
 {
/* "monitorx %eax, %ecx, %edx;" */
@@ -41,7 +41,7 @@ static inline void __monitorx(const void
 :: "a" (eax), "c" (ecx), "d"(edx));
 }
 
-static inline void __mwait(unsigned long eax, unsigned long ecx)
+static __always_inline void __mwait(unsigned long eax, unsigned long ecx)
 {
mds_idle_clear_cpu_buffers();
 
@@ -76,8 +76,8 @@ static inline void __mwait(unsigned long
  * EAX (logical) address to monitor
  * ECX #GP if not zero
  */
-static inline void __mwaitx(unsigned long eax, unsigned long ebx,
-   unsigned long ecx)
+static __always_inline void __mwaitx(unsigned long eax, unsigned long ebx,
+unsigned long ecx)
 {
/* No MDS buffer clear as this is AMD/HYGON only */
 
@@ -86,7 +86,7 @@ static inline void __mwaitx(unsigned lon
 :: "a" (eax), "b" (ebx), "c" (ecx));
 }
 
-static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
+static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
mds_idle_clear_cpu_buffers();
/* "mwait %eax, %ecx;" */
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -196,7 +196,7 @@ static inline void load_gs_index(unsigne
 
 #endif /* CONFIG_PARAVIRT_XXL */
 
-static inline void clflush(volatile void *__p)
+static __always_inline void clflush(volatile void *__p)
 {
asm volatile("clflush %0" : "+m" (*(volatile char __force *)__p));
 }




[PATCH 23/36] arm64,smp: Remove trace_.*_rcuidle() usage

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Ever since commit d3afc7f12987 ("arm64: Allow IPIs to be handled as
normal interrupts") this function is called in regular IRQ context.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm64/kernel/smp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -865,7 +865,7 @@ static void do_handle_IPI(int ipinr)
unsigned int cpu = smp_processor_id();
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_entry_rcuidle(ipi_types[ipinr]);
+   trace_ipi_entry(ipi_types[ipinr]);
 
switch (ipinr) {
case IPI_RESCHEDULE:
@@ -914,7 +914,7 @@ static void do_handle_IPI(int ipinr)
}
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_exit_rcuidle(ipi_types[ipinr]);
+   trace_ipi_exit(ipi_types[ipinr]);
 }
 
 static irqreturn_t ipi_handler(int irq, void *data)




[PATCH 25/36] time/tick-broadcast: Remove RCU_NONIDLE usage

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


No callers left that have already disabled RCU.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/time/tick-broadcast-hrtimer.c |   29 -
 1 file changed, 12 insertions(+), 17 deletions(-)

--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -56,25 +56,20 @@ static int bc_set_next(ktime_t expires,
 * hrtimer callback function is currently running, then
 * hrtimer_start() cannot move it and the timer stays on the CPU on
 * which it is assigned at the moment.
+*/
+   hrtimer_start(, expires, HRTIMER_MODE_ABS_PINNED_HARD);
+   /*
+* The core tick broadcast mode expects bc->bound_on to be set
+* correctly to prevent a CPU which has the broadcast hrtimer
+* armed from going deep idle.
 *
-* As this can be called from idle code, the hrtimer_start()
-* invocation has to be wrapped with RCU_NONIDLE() as
-* hrtimer_start() can call into tracing.
+* As tick_broadcast_lock is held, nothing can change the cpu
+* base which was just established in hrtimer_start() above. So
+* the below access is safe even without holding the hrtimer
+* base lock.
 */
-   RCU_NONIDLE( {
-   hrtimer_start(, expires, HRTIMER_MODE_ABS_PINNED_HARD);
-   /*
-* The core tick broadcast mode expects bc->bound_on to be set
-* correctly to prevent a CPU which has the broadcast hrtimer
-* armed from going deep idle.
-*
-* As tick_broadcast_lock is held, nothing can change the cpu
-* base which was just established in hrtimer_start() above. So
-* the below access is safe even without holding the hrtimer
-* base lock.
-*/
-   bc->bound_on = bctimer.base->cpu_base->cpu;
-   } );
+   bc->bound_on = bctimer.base->cpu_base->cpu;
+
return 0;
 }
 




[PATCH 18/36] cpuidle: Annotate poll_idle()

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


The __cpuidle functions will become a noinstr class, as such they need
explicit annotations.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/poll_state.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -13,7 +13,10 @@
 static int __cpuidle poll_idle(struct cpuidle_device *dev,
   struct cpuidle_driver *drv, int index)
 {
-   u64 time_start = local_clock();
+   u64 time_start;
+
+   instrumentation_begin();
+   time_start = local_clock();
 
dev->poll_time_limit = false;
 
@@ -39,6 +42,7 @@ static int __cpuidle poll_idle(struct cp
raw_local_irq_disable();
 
current_clr_polling();
+   instrumentation_end();
 
return index;
 }




[PATCH 35/36] cpuidle,powerdomain: Remove trace_.*_rcuidle()

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


OMAP was the one and only user.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-omap2/powerdomain.c |   10 +-
 drivers/base/power/runtime.c  |   24 
 2 files changed, 17 insertions(+), 17 deletions(-)

--- a/arch/arm/mach-omap2/powerdomain.c
+++ b/arch/arm/mach-omap2/powerdomain.c
@@ -187,9 +187,9 @@ static int _pwrdm_state_switch(struct po
trace_state = (PWRDM_TRACE_STATES_FLAG |
   ((next & OMAP_POWERSTATE_MASK) << 8) |
   ((prev & OMAP_POWERSTATE_MASK) << 0));
-   trace_power_domain_target_rcuidle(pwrdm->name,
- trace_state,
- 
raw_smp_processor_id());
+   trace_power_domain_target(pwrdm->name,
+ trace_state,
+ raw_smp_processor_id());
}
break;
default:
@@ -541,8 +541,8 @@ int pwrdm_set_next_pwrst(struct powerdom
 
if (arch_pwrdm && arch_pwrdm->pwrdm_set_next_pwrst) {
/* Trace the pwrdm desired target state */
-   trace_power_domain_target_rcuidle(pwrdm->name, pwrst,
- raw_smp_processor_id());
+   trace_power_domain_target(pwrdm->name, pwrst,
+ raw_smp_processor_id());
/* Program the pwrdm desired target state */
ret = arch_pwrdm->pwrdm_set_next_pwrst(pwrdm, pwrst);
}
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -442,7 +442,7 @@ static int rpm_idle(struct device *dev,
int (*callback)(struct device *);
int retval;
 
-   trace_rpm_idle_rcuidle(dev, rpmflags);
+   trace_rpm_idle(dev, rpmflags);
retval = rpm_check_suspend_allowed(dev);
if (retval < 0)
;   /* Conditions are wrong. */
@@ -481,7 +481,7 @@ static int rpm_idle(struct device *dev,
dev->power.request_pending = true;
queue_work(pm_wq, >power.work);
}
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, 0);
+   trace_rpm_return_int(dev, _THIS_IP_, 0);
return 0;
}
 
@@ -493,7 +493,7 @@ static int rpm_idle(struct device *dev,
wake_up_all(>power.wait_queue);
 
  out:
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, retval);
+   trace_rpm_return_int(dev, _THIS_IP_, retval);
return retval ? retval : rpm_suspend(dev, rpmflags | RPM_AUTO);
 }
 
@@ -557,7 +557,7 @@ static int rpm_suspend(struct device *de
struct device *parent = NULL;
int retval;
 
-   trace_rpm_suspend_rcuidle(dev, rpmflags);
+   trace_rpm_suspend(dev, rpmflags);
 
  repeat:
retval = rpm_check_suspend_allowed(dev);
@@ -708,7 +708,7 @@ static int rpm_suspend(struct device *de
}
 
  out:
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, retval);
+   trace_rpm_return_int(dev, _THIS_IP_, retval);
 
return retval;
 
@@ -760,7 +760,7 @@ static int rpm_resume(struct device *dev
struct device *parent = NULL;
int retval = 0;
 
-   trace_rpm_resume_rcuidle(dev, rpmflags);
+   trace_rpm_resume(dev, rpmflags);
 
  repeat:
if (dev->power.runtime_error) {
@@ -925,7 +925,7 @@ static int rpm_resume(struct device *dev
spin_lock_irq(>power.lock);
}
 
-   trace_rpm_return_int_rcuidle(dev, _THIS_IP_, retval);
+   trace_rpm_return_int(dev, _THIS_IP_, retval);
 
return retval;

[PATCH 14/36] cpuidle: Fix rcu_idle_*() usage

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


The whole disable-RCU, enable-IRQS dance is very intricate since
changing IRQ state is traced, which depends on RCU.

Add two helpers for the cpuidle case that mirror the entry code.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-imx/cpuidle-imx6q.c|4 +--
 arch/arm/mach-imx/cpuidle-imx6sx.c   |4 +--
 arch/arm/mach-omap2/cpuidle34xx.c|4 +--
 arch/arm/mach-omap2/cpuidle44xx.c|8 +++---
 drivers/acpi/processor_idle.c|   18 --
 drivers/cpuidle/cpuidle-big_little.c |4 +--
 drivers/cpuidle/cpuidle-mvebu-v7.c   |4 +--
 drivers/cpuidle/cpuidle-psci.c   |4 +--
 drivers/cpuidle/cpuidle-riscv-sbi.c  |4 +--
 drivers/cpuidle/cpuidle-tegra.c  |8 +++---
 drivers/cpuidle/cpuidle.c|   11 
 include/linux/cpuidle.h  |   37 +---
 kernel/sched/idle.c  |   45 ++-
 kernel/time/tick-broadcast.c |6 +++-
 14 files changed, 90 insertions(+), 71 deletions(-)

--- a/arch/arm/mach-imx/cpuidle-imx6q.c
+++ b/arch/arm/mach-imx/cpuidle-imx6q.c
@@ -24,9 +24,9 @@ static int imx6q_enter_wait(struct cpuid
imx6_set_lpm(WAIT_UNCLOCKED);
raw_spin_unlock(_lock);
 
-   rcu_idle_enter();
+   cpuidle_rcu_enter();
cpu_do_idle();
-   rcu_idle_exit();
+   cpuidle_rcu_exit();
 
raw_spin_lock(_lock);
if (num_idle_cpus-- == num_online_cpus())
--- a/arch/arm/mach-imx/cpuidle-imx6sx.c
+++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
@@ -47,9 +47,9 @@ static int imx6sx_enter_wait(struct cpui
cpu_pm_enter();
cpu_cluster_pm_enter();
 
-   rcu_idle_enter();
+   cpuidle_rcu_enter();
cpu_suspend(0, imx6sx_idle_finish);
-   rcu_idle_exit();
+   cpuidle_rcu_exit();
 
cpu_cluster_pm_exit();
cpu_pm_exit();
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -133,9 +133,9 @@ static int omap3_enter_idle(struct cpuid
}
 
/* Execute ARM wfi */
-   rcu_idle_enter();
+   cpuidle_rcu_enter();
omap_sram_idle();
-   rcu_idle_exit();
+   cpuidle_rcu_exit();
 
/*
 * Call idle CPU PM enter notifier chain to restore
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -105,9 +105,9 @@ static int omap_enter_idle_smp(struct cp
}
raw_spin_unlock_irqrestore(_lock, flag);
 
-   rcu_idle_enter();
+   cpuidle_rcu_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
-   rcu_idle_exit();
+   cpuidle_rcu_exit();
 
raw_spin_lock_irqsave(_lock, flag);
if (cx->mpu_state_vote == num_online_cpus())
@@ -186,10 +186,10 @@ static int omap_enter_idle_coupled(struc
}
}
 
-   rcu_idle_enter();
+   cpuidle_rcu_enter();
omap4_enter_lowpower(dev->cpu, cx->cpu_state);
cpu_done[dev->cpu] = true;
-   rcu_idle_exit();
+   cpuidle_rcu_exit();
 
/* Wakeup CPU1 only if it is not offlined */
if (dev->cpu == 0 && cpumask_test_cpu(1, cpu_online_mask)) {
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -607,7 +607,7 @@ static DEFINE_RAW_SPINLOCK(c3_lock);
  * @cx: Target state context
  * @index: index of target state
  */
-static int acpi_idle_enter_bm(struct cpuidle_driver *drv,
+static noinstr int acpi_idle_enter_bm(struct cpuidle_driver *drv,
   struct acpi_processor *pr,
   struct acpi_processor_cx *cx,
   int index)
@@ -626,6 +626,8 @@ static int acpi_idle_enter_bm(struct cpu

[PATCH 20/36] arch/idle: Change arch_cpu_idle() IRQ behaviour

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.

However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.

Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/alpha/kernel/process.c  |1 -
 arch/arc/kernel/process.c|3 +++
 arch/arm/kernel/process.c|1 -
 arch/arm/mach-gemini/board-dt.c  |3 ++-
 arch/arm64/kernel/idle.c |1 -
 arch/csky/kernel/process.c   |1 -
 arch/csky/kernel/smp.c   |2 +-
 arch/hexagon/kernel/process.c|1 -
 arch/ia64/kernel/process.c   |1 +
 arch/microblaze/kernel/process.c |1 -
 arch/mips/kernel/idle.c  |8 +++-
 arch/nios2/kernel/process.c  |1 -
 arch/openrisc/kernel/process.c   |1 +
 arch/parisc/kernel/process.c |2 --
 arch/powerpc/kernel/idle.c   |5 ++---
 arch/riscv/kernel/process.c  |1 -
 arch/s390/kernel/idle.c  |1 -
 arch/sh/kernel/idle.c|1 +
 arch/sparc/kernel/leon_pmc.c |4 
 arch/sparc/kernel/process_32.c   |1 -
 arch/sparc/kernel/process_64.c   |3 ++-
 arch/um/kernel/process.c |1 -
 arch/x86/coco/tdx/tdx.c  |3 +++
 arch/x86/kernel/process.c|   15 ---
 arch/xtensa/kernel/process.c |1 +
 kernel/sched/idle.c  |2 --
 26 files changed, 28 insertions(+), 37 deletions(-)

--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
 void arch_cpu_idle(void)
 {
wtint(0);
-   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_dead(void)
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -114,6 +114,8 @@ void arch_cpu_idle(void)
"sleep %0   \n"
:
:"I"(arg)); /* can't be "r" has to be embedded const */
+
+   raw_local_irq_disable();
 }
 
 #else  /* ARC700 */
@@ -122,6 +124,7 @@ void arch_cpu_idle(void)
 {
/* sleep, but enable both set E1/E2 (levels of interrupts) before 
committing */
__asm__ __volatile__("sleep 0x3 \n");
+   raw_local_irq_disable();
 }
 
 #endif
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -78,7 +78,6 @@ void arch_cpu_idle(void)
arm_pm_idle();
else
cpu_do_idle();
-   raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)
--- a/arch/arm/mach-gemini/board-dt.c
+++ b/arch/arm/mach-gemini/board-dt.c
@@ -42,8 +42,9 @@ static void gemini_idle(void)
 */
 
/* FIXME: Enabling interrupts here is racy! */
-   local_irq_enable();
+   raw_local_irq_enable();
cpu_do_idle();
+   raw_local_irq_disable();
 }
 
 static void __init gemini_init_machine(void)
--- a/arch/arm64/kernel/idle.c
+++ b/arch/arm64/kernel/idle.c
@@ -42,5 +42,4 @@ void noinstr arch_cpu_idle(void)
 * tricks
 */
cpu_do_idle();
-   raw_local_irq_enable();
 }
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -101,6 +101,5 @@ void arch_cpu_idle(void)
 #ifdef CONFIG_CPU_PM_STOP
asm volatile("stop\n");
 #endif
-   raw_local_irq_enable();
 }
 #endif
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -314,7 +314,7 @@ void arch_cpu_idle_dead(void)
while (!secondary_stack)
arch_cpu_idle();
 
-   local_irq_disable();
+   raw_local_irq_disable();
 
asm volatile(
"mov 

[PATCH 19/36] objtool/idle: Validate __cpuidle code as noinstr

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/alpha/kernel/vmlinux.lds.S  |1 -
 arch/arc/kernel/vmlinux.lds.S|1 -
 arch/arm/include/asm/vmlinux.lds.h   |1 -
 arch/arm64/kernel/vmlinux.lds.S  |1 -
 arch/csky/kernel/vmlinux.lds.S   |1 -
 arch/hexagon/kernel/vmlinux.lds.S|1 -
 arch/ia64/kernel/vmlinux.lds.S   |1 -
 arch/loongarch/kernel/vmlinux.lds.S  |1 -
 arch/m68k/kernel/vmlinux-nommu.lds   |1 -
 arch/m68k/kernel/vmlinux-std.lds |1 -
 arch/m68k/kernel/vmlinux-sun3.lds|1 -
 arch/microblaze/kernel/vmlinux.lds.S |1 -
 arch/mips/kernel/vmlinux.lds.S   |1 -
 arch/nios2/kernel/vmlinux.lds.S  |1 -
 arch/openrisc/kernel/vmlinux.lds.S   |1 -
 arch/parisc/kernel/vmlinux.lds.S |1 -
 arch/powerpc/kernel/vmlinux.lds.S|1 -
 arch/riscv/kernel/vmlinux-xip.lds.S  |1 -
 arch/riscv/kernel/vmlinux.lds.S  |1 -
 arch/s390/kernel/vmlinux.lds.S   |1 -
 arch/sh/kernel/vmlinux.lds.S |1 -
 arch/sparc/kernel/vmlinux.lds.S  |1 -
 arch/um/kernel/dyn.lds.S |1 -
 arch/um/kernel/uml.lds.S |1 -
 arch/x86/include/asm/irqflags.h  |   11 ---
 arch/x86/include/asm/mwait.h |2 +-
 arch/x86/kernel/vmlinux.lds.S|1 -
 arch/xtensa/kernel/vmlinux.lds.S |1 -
 include/asm-generic/vmlinux.lds.h|9 +++--
 include/linux/compiler_types.h   |8 ++--
 include/linux/cpu.h  |3 ---
 kernel/module/main.c |2 ++
 kernel/sched/idle.c  |   15 +--
 tools/objtool/check.c|   15 ++-
 34 files changed, 43 insertions(+), 48 deletions(-)

--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -27,7 +27,6 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
--- a/arch/arc/kernel/vmlinux.lds.S
+++ b/arch/arc/kernel/vmlinux.lds.S
@@ -85,7 +85,6 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
--- a/arch/arm/include/asm/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -96,7 +96,6 @@
SOFTIRQENTRY_TEXT   \
TEXT_TEXT   \
SCHED_TEXT  \
-   CPUIDLE_TEXT\
LOCK_TEXT   \
KPROBES_TEXT\
ARM_STUBS_TEXT  \
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -163,7 +163,6 @@ SECTIONS
ENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
HYPERVISOR_TEXT
--- a/arch/csky/kernel/vmlinux.lds.S
+++ b/arch/csky/kernel/vmlinux.lds.S
@@ -38,7 +38,6 @@ SECTIONS
SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
-   CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
--- a/arch/hexagon/kernel/vmlinux.lds.S
+++ 

[PATCH 02/36] x86/idle: Replace x86_idle with a static_call

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Typical boot time setup; no need to suffer an indirect call for that.

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Frederic Weisbecker 
---
 arch/x86/kernel/process.c |   50 +-
 1 file changed, 28 insertions(+), 22 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -692,7 +693,23 @@ void __switch_to_xtra(struct task_struct
 unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-static void (*x86_idle)(void);
+/*
+ * We use this if we don't have any better idle routine..
+ */
+void __cpuidle default_idle(void)
+{
+   raw_safe_halt();
+}
+#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
+EXPORT_SYMBOL(default_idle);
+#endif
+
+DEFINE_STATIC_CALL_NULL(x86_idle, default_idle);
+
+static bool x86_idle_set(void)
+{
+   return !!static_call_query(x86_idle);
+}
 
 #ifndef CONFIG_SMP
 static inline void play_dead(void)
@@ -715,28 +732,17 @@ void arch_cpu_idle_dead(void)
 /*
  * Called from the generic idle code.
  */
-void arch_cpu_idle(void)
-{
-   x86_idle();
-}
-
-/*
- * We use this if we don't have any better idle routine..
- */
-void __cpuidle default_idle(void)
+void __cpuidle arch_cpu_idle(void)
 {
-   raw_safe_halt();
+   static_call(x86_idle)();
 }
-#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
-EXPORT_SYMBOL(default_idle);
-#endif
 
 #ifdef CONFIG_XEN
 bool xen_set_default_idle(void)
 {
-   bool ret = !!x86_idle;
+   bool ret = x86_idle_set();
 
-   x86_idle = default_idle;
+   static_call_update(x86_idle, default_idle);
 
return ret;
 }
@@ -859,20 +865,20 @@ void select_idle_routine(const struct cp
if (boot_option_idle_override == IDLE_POLL && smp_num_siblings > 1)
pr_warn_once("WARNING: polling idle and HT enabled, performance 
may degrade\n");
 #endif
-   if (x86_idle || boot_option_idle_override == IDLE_POLL)
+   if (x86_idle_set() || boot_option_idle_override == IDLE_POLL)
return;
 
if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
pr_info("using AMD E400 aware idle routine\n");
-   x86_idle = amd_e400_idle;
+   static_call_update(x86_idle, amd_e400_idle);
} else if (prefer_mwait_c1_over_halt(c)) {
pr_info("using mwait in idle threads\n");
-   x86_idle = mwait_idle;
+   static_call_update(x86_idle, mwait_idle);
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
pr_info("using TDX aware idle routine\n");
-   x86_idle = tdx_safe_halt;
+   static_call_update(x86_idle, tdx_safe_halt);
} else
-   x86_idle = default_idle;
+   static_call_update(x86_idle, default_idle);
 }
 
 void amd_e400_c1e_apic_setup(void)
@@ -925,7 +931,7 @@ static int __init idle_setup(char *str)
 * To continue to load the CPU idle driver, don't touch
 * the boot_option_idle_override.
 */
-   x86_idle = default_idle;
+   static_call_update(x86_idle, default_idle);
boot_option_idle_override = IDLE_HALT;
} else if (!strcmp(str, "nomwait")) {
/*




[PATCH 21/36] x86/tdx: Remove TDX_HCALL_ISSUE_STI

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, kirill.shute...@linux.intel.com, 
dal...@libc.org, t...@atomide.com, amakha...@vmware.com, 
bjorn.anders...@linaro.org, h...@zytor.com, sparcli...@vger.kernel.org, 
linux-hexa...@vger.kernel.org, linux-ri...@lists.infradead.org, Isaku Yamahata 
, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.or
 g, jgr...@suse.com, mon...@monstr.eu, linux-m...@vger.kernel.org, 
pal...@dabbelt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Now that arch_cpu_idle() is expected to return with IRQs disabled,
avoid the useless STI/CLI dance.

Per the specs this is supposed to work, but nobody has yet relied up
this behaviour so broken implementations are possible.

Cc: Isaku Yamahata 
Cc: kirill.shute...@linux.intel.com
Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/coco/tdx/tdcall.S|   13 -
 arch/x86/coco/tdx/tdx.c   |   23 ---
 arch/x86/include/asm/shared/tdx.h |1 -
 3 files changed, 4 insertions(+), 33 deletions(-)

--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -139,19 +139,6 @@ SYM_FUNC_START(__tdx_hypercall)
 
movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
 
-   /*
-* For the idle loop STI needs to be called directly before the TDCALL
-* that enters idle (EXIT_REASON_HLT case). STI instruction enables
-* interrupts only one instruction later. If there is a window between
-* STI and the instruction that emulates the HALT state, there is a
-* chance for interrupts to happen in this window, which can delay the
-* HLT operation indefinitely. Since this is the not the desired
-* result, conditionally call STI before TDCALL.
-*/
-   testq $TDX_HCALL_ISSUE_STI, %rsi
-   jz .Lskip_sti
-   sti
-.Lskip_sti:
tdcall
 
/*
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -124,7 +124,7 @@ static u64 get_cc_mask(void)
return BIT_ULL(gpa_width - 1);
 }
 
-static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti)
+static u64 __cpuidle __halt(const bool irq_disabled)
 {
struct tdx_hypercall_args args = {
.r10 = TDX_HYPERCALL_STANDARD,
@@ -144,20 +144,14 @@ static u64 __cpuidle __halt(const bool i
 * can keep the vCPU in virtual HLT, even if an IRQ is
 * pending, without hanging/breaking the guest.
 */
-   return __tdx_hypercall(, do_sti ? TDX_HCALL_ISSUE_STI : 0);
+   return __tdx_hypercall(, 0);
 }
 
 static bool handle_halt(void)
 {
-   /*
-* Since non safe halt is mainly used in CPU offlining
-* and the guest will always stay in the halt state, don't
-* call the STI instruction (set do_sti as false).
-*/
const bool irq_disabled = irqs_disabled();
-   const bool do_sti = false;
 
-   if (__halt(irq_disabled, do_sti))
+   if (__halt(irq_disabled))
return false;
 
return true;
@@ -165,22 +159,13 @@ static bool handle_halt(void)
 
 void __cpuidle tdx_safe_halt(void)
 {
-/*
- * For do_sti=true case, __tdx_hypercall() function enables
- * interrupts using the STI instruction before the TDCALL. So
- * set irq_disabled as false.
- */
const bool irq_disabled = false;
-   const bool do_sti = true;
 
/*
 * Use WARN_ONCE() to report the failure.
 */
-   if (__halt(irq_disabled, do_sti))
+   if (__halt(irq_disabled))
WARN_ONCE(1, "HLT instruction emulation failed\n");
-
-   /* XXX I can't make sense of what @do_sti actually does */
-   raw_local_irq_disable();
 }
 
 static bool read_msr(struct pt_regs *regs)
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -8,7 +8,6 @@
 #define TDX_HYPERCALL_STANDARD  0
 
 #define TDX_HCALL_HAS_OUTPUT   BIT(0)
-#define TDX_HCALL_ISSUE_STIBIT(1)
 
 #define TDX_CPUID_LEAF_ID  0x21
 #define TDX_IDENT  "IntelTDX"




[PATCH 04/36] cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Commit c227233ad64c ("intel_idle: enable interrupts before C1 on
Xeons") wrecked intel_idle in two ways:

 - must not have tracing in idle functions
 - must return with IRQs disabled

Additionally, it added a branch for no good reason.

Fixes: c227233ad64c ("intel_idle: enable interrupts before C1 on Xeons")
Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/idle/intel_idle.c |   48 +++---
 1 file changed, 37 insertions(+), 11 deletions(-)

--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -129,21 +137,37 @@ static unsigned int mwait_substates __in
  *
  * Must be called under local_irq_disable().
  */
+
-static __cpuidle int intel_idle(struct cpuidle_device *dev,
-   struct cpuidle_driver *drv, int index)
+static __always_inline int __intel_idle(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv, int index)
 {
struct cpuidle_state *state = >states[index];
unsigned long eax = flg2MWAIT(state->flags);
unsigned long ecx = 1; /* break on interrupt flag */
 
-   if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE)
-   local_irq_enable();
-
mwait_idle_with_hints(eax, ecx);
 
return index;
 }
 
+static __cpuidle int intel_idle(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv, int index)
+{
+   return __intel_idle(dev, drv, index);
+}
+
+static __cpuidle int intel_idle_irq(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv, int index)
+{
+   int ret;
+
+   raw_local_irq_enable();
+   ret = __intel_idle(dev, drv, index);
+   raw_local_irq_disable();
+
+   return ret;
+}
+
 /**
  * intel_idle_s2idle - Ask the processor to enter the given idle state.
  * @dev: cpuidle device of the target CPU.
@@ -1801,6 +1824,9 @@ static void __init intel_idle_init_cstat
/* Structure copy. */
drv->states[drv->state_count] = cpuidle_state_table[cstate];
 
+   if (cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IRQ_ENABLE)
+   drv->states[drv->state_count].enter = intel_idle_irq;
+
if ((disabled_states_mask & BIT(drv->state_count)) ||
((icpu->use_acpi || force_use_acpi) &&
 intel_idle_off_by_default(mwait_hint) &&




[PATCH 28/36] cpuidle,tdx: Make tdx noinstr clean

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


vmlinux.o: warning: objtool: __halt+0x2c: call to hcall_func.constprop.0() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: __halt+0x3f: call to __tdx_hypercall() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: __tdx_hypercall+0x66: call to 
__tdx_hypercall_failed() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/coco/tdx/tdcall.S |2 ++
 arch/x86/coco/tdx/tdx.c|5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

--- a/arch/x86/coco/tdx/tdcall.S
+++ b/arch/x86/coco/tdx/tdcall.S
@@ -31,6 +31,8 @@
  TDX_R12 | TDX_R13 | \
  TDX_R14 | TDX_R15 )
 
+.section .noinstr.text, "ax"
+
 /*
  * __tdx_module_call()  - Used by TDX guests to request services from
  * the TDX module (does not include VMM services) using TDCALL instruction.
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -53,8 +53,9 @@ static inline u64 _tdx_hypercall(u64 fn,
 }
 
 /* Called from __tdx_hypercall() for unrecoverable failure */
-void __tdx_hypercall_failed(void)
+noinstr void __tdx_hypercall_failed(void)
 {
+   instrumentation_begin();
panic("TDVMCALL failed. TDX module bug?");
 }
 
@@ -64,7 +65,7 @@ void __tdx_hypercall_failed(void)
  * Reusing the KVM EXIT_REASON macros makes it easier to connect the host and
  * guest sides of these calls.
  */
-static u64 hcall_func(u64 exit_reason)
+static __always_inline u64 hcall_func(u64 exit_reason)
 {
return exit_reason;
 }




[PATCH 06/36] cpuidle,riscv: Push RCU-idle into driver

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle-riscv-sbi.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/drivers/cpuidle/cpuidle-riscv-sbi.c
+++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
@@ -116,12 +116,12 @@ static int __sbi_enter_domain_idle_state
return -1;
 
/* Do runtime PM to manage a hierarchical CPU toplogy. */
-   rcu_irq_enter_irqson();
if (s2idle)
dev_pm_genpd_suspend(pd_dev);
else
pm_runtime_put_sync_suspend(pd_dev);
-   rcu_irq_exit_irqson();
+
+   rcu_idle_enter();
 
if (sbi_is_domain_state_available())
state = sbi_get_domain_state();
@@ -130,12 +130,12 @@ static int __sbi_enter_domain_idle_state
 
ret = sbi_suspend(state) ? -1 : idx;
 
-   rcu_irq_enter_irqson();
+   rcu_idle_exit();
+
if (s2idle)
dev_pm_genpd_resume(pd_dev);
else
pm_runtime_get_sync(pd_dev);
-   rcu_irq_exit_irqson();
 
cpu_pm_exit();
 
@@ -246,6 +246,7 @@ static int sbi_dt_cpu_init_topology(stru
 * of a shared state for the domain, assumes the domain states are all
 * deeper states.
 */
+   drv->states[state_count - 1].flags |= CPUIDLE_FLAG_RCU_IDLE;
drv->states[state_count - 1].enter = sbi_enter_domain_idle_state;
drv->states[state_count - 1].enter_s2idle =
sbi_enter_s2idle_domain_idle_state;




[PATCH 01/36] x86/perf/amd: Remove tracing from perf_lopwr_cb()

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


The perf_lopwr_cb() is called from the idle routines; there is no RCU
there, we must not enter tracing.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/events/amd/brs.c |   13 +
 arch/x86/include/asm/perf_event.h |2 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -41,18 +41,15 @@ static inline unsigned int brs_to(int id
return MSR_AMD_SAMP_BR_FROM + 2 * idx + 1;
 }
 
-static inline void set_debug_extn_cfg(u64 val)
+static __always_inline void set_debug_extn_cfg(u64 val)
 {
/* bits[4:3] must always be set to 11b */
-   wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
+   __wrmsr(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
 }
 
-static inline u64 get_debug_extn_cfg(void)
+static __always_inline u64 get_debug_extn_cfg(void)
 {
-   u64 val;
-
-   rdmsrl(MSR_AMD_DBG_EXTN_CFG, val);
-   return val;
+   return __rdmsr(MSR_AMD_DBG_EXTN_CFG);
 }
 
 static bool __init amd_brs_detect(void)
@@ -338,7 +335,7 @@ void amd_pmu_brs_sched_task(struct perf_
  * called from ACPI processor_idle.c or acpi_pad.c
  * with interrupts disabled
  */
-void perf_amd_brs_lopwr_cb(bool lopwr_in)
+void noinstr perf_amd_brs_lopwr_cb(bool lopwr_in)
 {
struct cpu_hw_events *cpuc = this_cpu_ptr(_hw_events);
union amd_debug_extn_cfg cfg;
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -554,7 +554,7 @@ extern void perf_amd_brs_lopwr_cb(bool l
 
 DECLARE_STATIC_CALL(perf_lopwr_cb, perf_amd_brs_lopwr_cb);
 
-static inline void perf_lopwr_cb(bool lopwr_in)
+static __always_inline void perf_lopwr_cb(bool lopwr_in)
 {
static_call_mod(perf_lopwr_cb)(lopwr_in);
 }




[PATCH 32/36] ftrace: WARN on rcuidle

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


CONFIG_GENERIC_ENTRY disallows any and all tracing when RCU isn't
enabled.

XXX if s390 (the only other GENERIC_ENTRY user as of this writing)
isn't comfortable with this, we could switch to
HAVE_NOINSTR_VALIDATION which is x86_64 only atm.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/tracepoint.h |   13 -
 kernel/trace/trace.c   |3 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -178,6 +178,16 @@ static inline struct tracepoint *tracepo
 #endif /* CONFIG_HAVE_STATIC_CALL */
 
 /*
+ * CONFIG_GENERIC_ENTRY archs are expected to have sanitized entry and idle
+ * code that disallow any/all tracing/instrumentation when RCU isn't watching.
+ */
+#ifdef CONFIG_GENERIC_ENTRY
+#define RCUIDLE_COND(rcuidle)  (rcuidle)
+#else
+#define RCUIDLE_COND(rcuidle)  (rcuidle && in_nmi())
+#endif
+
+/*
  * it_func[0] is never NULL because there is at least one element in the array
  * when the array itself is non NULL.
  */
@@ -189,7 +199,8 @@ static inline struct tracepoint *tracepo
return; \
\
/* srcu can't be used from NMI */   \
-   WARN_ON_ONCE(rcuidle && in_nmi());  \
+   if (WARN_ON_ONCE(RCUIDLE_COND(rcuidle)))\
+   return; \
\
/* keep srcu and sched-rcu usage consistent */  \
preempt_disable_notrace();  \
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3104,6 +3104,9 @@ void __trace_stack(struct trace_array *t
return;
}
 
+   if (WARN_ON_ONCE(IS_ENABLED(CONFIG_GENERIC_ENTRY)))
+   return;
+
/*
 * When an NMI triggers, RCU is enabled via rcu_nmi_enter(),
 * but if the above rcu_is_watching() failed, then the NMI




[PATCH 05/36] cpuidle: Move IRQ state validation

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Make cpuidle_enter_state() consistent with the s2idle variant and
verify ->enter() always returns with interrupts disabled.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -234,7 +234,11 @@ int cpuidle_enter_state(struct cpuidle_d
stop_critical_timings();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_enter();
+
entered_state = target_state->enter(dev, drv, index);
+   if (WARN_ONCE(!irqs_disabled(), "%ps leaked IRQ state", 
target_state->enter))
+   raw_local_irq_disable();
+
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_exit();
start_critical_timings();
@@ -246,12 +250,8 @@ int cpuidle_enter_state(struct cpuidle_d
/* The cpu is no longer idle or about to enter idle. */
sched_idle_set_state(NULL);
 
-   if (broadcast) {
-   if (WARN_ON_ONCE(!irqs_disabled()))
-   local_irq_disable();
-
+   if (broadcast)
tick_broadcast_exit();
-   }
 
if (!cpuidle_state_is_coupled(drv, index))
local_irq_enable();




[PATCH 26/36] cpuidle,sched: Remove annotations from TIF_{POLLING_NRFLAG,NEED_RESCHED}

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


vmlinux.o: warning: objtool: mwait_idle+0x5: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0xc5: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to 
test_ti_thread_flag() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0xbc: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xea: call to 
current_set_polling_and_test() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0xb4: call to 
current_set_polling_and_test() leaves .noinstr.text section

vmlinux.o: warning: objtool: intel_idle+0xa6: call to current_clr_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xbf: call to current_clr_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0xa1: call to 
current_clr_polling() leaves .noinstr.text section

vmlinux.o: warning: objtool: mwait_idle+0xe: call to __current_set_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_processor_ffh_cstate_enter+0xc5: call to 
__current_set_polling() leaves .noinstr.text section
vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to 
test_ti_thread_flag() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0xbc: call to __current_set_polling() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0xea: call to 
__current_set_polling() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0xb4: call to 
__current_set_polling() leaves .noinstr.text section

vmlinux.o: warning: objtool: cpu_idle_poll.isra.0+0x73: call to 
test_ti_thread_flag() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_s2idle+0x73: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle_irq+0x91: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: intel_idle+0x78: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: acpi_safe_halt+0xf: call to 
test_ti_thread_flag.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/sched/idle.h  |   40 ++--
 include/linux/thread_info.h |   18 +-
 2 files changed, 47 insertions(+), 11 deletions(-)

--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -23,12 +23,37 @@ static inline void wake_up_if_idle(int c
  */
 #ifdef TIF_POLLING_NRFLAG
 
-static inline void __current_set_polling(void)
+#ifdef _ASM_GENERIC_BITOPS_INSTRUMENTED_ATOMIC_H
+
+static __always_inline void __current_set_polling(void)
 {
-   set_thread_flag(TIF_POLLING_NRFLAG);
+   arch_set_bit(TIF_POLLING_NRFLAG,
+(unsigned long *)(_thread_info()->flags));
 }
 
-static inline bool __must_check current_set_polling_and_test(void)
+static __always_inline void __current_clr_polling(void)
+{
+   arch_clear_bit(TIF_POLLING_NRFLAG,
+  (unsigned long *)(_thread_info()->flags));
+}
+
+#else
+
+static __always_inline void __current_set_polling(void)
+{
+   set_bit(TIF_POLLING_NRFLAG,
+   (unsigned long *)(_thread_info()->flags));
+}
+
+static __always_inline void __current_clr_polling(void)
+{
+   clear_bit(TIF_POLLING_NRFLAG,
+ (unsigned long *)(_thread_info()->flags));
+}
+
+#endif /* _ASM_GENERIC_BITOPS_INSTRUMENTED_ATOMIC_H */
+
+static __always_inline bool __must_check current_set_polling_and_test(void)
 {

[PATCH 08/36] cpuidle,psci: Push RCU-idle into driver

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/cpuidle/cpuidle-psci.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -69,12 +69,12 @@ static int __psci_enter_domain_idle_stat
return -1;
 
/* Do runtime PM to manage a hierarchical CPU toplogy. */
-   rcu_irq_enter_irqson();
if (s2idle)
dev_pm_genpd_suspend(pd_dev);
else
pm_runtime_put_sync_suspend(pd_dev);
-   rcu_irq_exit_irqson();
+
+   rcu_idle_enter();
 
state = psci_get_domain_state();
if (!state)
@@ -82,12 +82,12 @@ static int __psci_enter_domain_idle_stat
 
ret = psci_cpu_suspend_enter(state) ? -1 : idx;
 
-   rcu_irq_enter_irqson();
+   rcu_idle_exit();
+
if (s2idle)
dev_pm_genpd_resume(pd_dev);
else
pm_runtime_get_sync(pd_dev);
-   rcu_irq_exit_irqson();
 
cpu_pm_exit();
 
@@ -240,6 +240,7 @@ static int psci_dt_cpu_init_topology(str
 * of a shared state for the domain, assumes the domain states are all
 * deeper states.
 */
+   drv->states[state_count - 1].flags |= CPUIDLE_FLAG_RCU_IDLE;
drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
drv->states[state_count - 1].enter_s2idle = 
psci_enter_s2idle_domain_idle_state;
psci_cpuidle_use_cpuhp = true;




[PATCH 10/36] cpuidle,omap3: Push RCU-idle into driver

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Doing RCU-idle outside the driver, only to then teporarily enable it
again before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-omap2/cpuidle34xx.c |   16 
 1 file changed, 16 insertions(+)

--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -133,7 +133,9 @@ static int omap3_enter_idle(struct cpuid
}
 
/* Execute ARM wfi */
+   rcu_idle_enter();
omap_sram_idle();
+   rcu_idle_exit();
 
/*
 * Call idle CPU PM enter notifier chain to restore
@@ -265,6 +267,7 @@ static struct cpuidle_driver omap3_idle_
.owner= THIS_MODULE,
.states = {
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 2 + 2,
.target_residency = 5,
@@ -272,6 +275,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 10 + 10,
.target_residency = 30,
@@ -279,6 +283,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU ON + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 50 + 50,
.target_residency = 300,
@@ -286,6 +291,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU RET + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 1500 + 1800,
.target_residency = 4000,
@@ -293,6 +299,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU OFF + CORE ON",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 2500 + 7500,
.target_residency = 12000,
@@ -300,6 +307,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU RET + CORE RET",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 3000 + 8500,
.target_residency = 15000,
@@ -307,6 +315,7 @@ static struct cpuidle_driver omap3_idle_
.desc = "MPU OFF + CORE RET",
},
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 1 + 3,
.target_residency = 3,
@@ -328,6 +337,7 @@ static struct cpuidle_driver omap3430_id
.owner= THIS_MODULE,
.states = {
{
+   .flags= CPUIDLE_FLAG_RCU_IDLE,
.enter= omap3_enter_idle_bm,
.exit_latency = 110 + 162,
.target_residency = 5,
@@ -335,6 +345,7 @@ static struct cpuidle_driver 

[PATCH 09/36] cpuidle,imx6: Push RCU-idle into driver

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is daft.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-imx/cpuidle-imx6sx.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/arm/mach-imx/cpuidle-imx6sx.c
+++ b/arch/arm/mach-imx/cpuidle-imx6sx.c
@@ -47,7 +47,9 @@ static int imx6sx_enter_wait(struct cpui
cpu_pm_enter();
cpu_cluster_pm_enter();
 
+   rcu_idle_enter();
cpu_suspend(0, imx6sx_idle_finish);
+   rcu_idle_exit();
 
cpu_cluster_pm_exit();
cpu_pm_exit();
@@ -87,7 +89,8 @@ static struct cpuidle_driver imx6sx_cpui
 */
.exit_latency = 300,
.target_residency = 500,
-   .flags = CPUIDLE_FLAG_TIMER_STOP,
+   .flags = CPUIDLE_FLAG_TIMER_STOP |
+CPUIDLE_FLAG_RCU_IDLE,
.enter = imx6sx_enter_wait,
.name = "LOW-POWER-IDLE",
.desc = "ARM power off",




[PATCH 34/36] cpuidle,omap3: Push RCU-idle into omap_sram_idle()

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


OMAP3 uses full SoC suspend modes as idle states, as such it needs the
whole power-domain and clock-domain code from the idle path.

All that code is not suitable to run with RCU disabled, as such push
RCU-idle deeper still.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/mach-omap2/cpuidle34xx.c |4 +---
 arch/arm/mach-omap2/pm.h  |2 +-
 arch/arm/mach-omap2/pm34xx.c  |   12 ++--
 3 files changed, 12 insertions(+), 6 deletions(-)

--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -133,9 +133,7 @@ static int omap3_enter_idle(struct cpuid
}
 
/* Execute ARM wfi */
-   cpuidle_rcu_enter();
-   omap_sram_idle();
-   cpuidle_rcu_exit();
+   omap_sram_idle(true);
 
/*
 * Call idle CPU PM enter notifier chain to restore
--- a/arch/arm/mach-omap2/pm.h
+++ b/arch/arm/mach-omap2/pm.h
@@ -29,7 +29,7 @@ static inline int omap4_idle_init(void)
 
 extern void *omap3_secure_ram_storage;
 extern void omap3_pm_off_mode_enable(int);
-extern void omap_sram_idle(void);
+extern void omap_sram_idle(bool rcuidle);
 extern int omap_pm_clkdms_setup(struct clockdomain *clkdm, void *unused);
 
 #if defined(CONFIG_PM_OPP)
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -174,7 +175,7 @@ static int omap34xx_do_sram_idle(unsigne
return 0;
 }
 
-void omap_sram_idle(void)
+void omap_sram_idle(bool rcuidle)
 {
/* Variable to tell what needs to be saved and restored
 * in omap_sram_idle*/
@@ -254,11 +255,18 @@ void omap_sram_idle(void)
 */
if (save_state)
omap34xx_save_context(omap3_arm_context);
+
+   if (rcuidle)
+   cpuidle_rcu_enter();
+
if (save_state == 1 || save_state == 3)
cpu_suspend(save_state, omap34xx_do_sram_idle);
else
omap34xx_do_sram_idle(save_state);
 
+   if (rcuidle)
+   rcuidle_rcu_exit();
+
/* Restore normal SDRC POWER settings */
if (cpu_is_omap3430() && omap_rev() >= OMAP3430_REV_ES3_0 &&
(omap_type() == OMAP2_DEVICE_TYPE_EMU ||
@@ -316,7 +324,7 @@ static int omap3_pm_suspend(void)
 
omap3_intc_suspend();
 
-   omap_sram_idle();
+   omap_sram_idle(false);
 
 restore:
/* Restore next_pwrsts */




[PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console
tracepoint"), was printk usage from the cpuidle path where RCU was
already disabled.

Per the patches earlier in this series, this is no longer the case.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/printk/printk.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2238,7 +2238,7 @@ static u16 printk_sprint(char *text, u16
}
}
 
-   trace_console_rcuidle(text, text_len);
+   trace_console(text, text_len);
 
return text_len;
 }




[PATCH 36/36] cpuidle,clk: Remove trace_.*_rcuidle()

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


OMAP was the one and only user.

Signed-off-by: Peter Zijlstra (Intel) 
---
 drivers/clk/clk.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -978,12 +978,12 @@ static void clk_core_disable(struct clk_
if (--core->enable_count > 0)
return;
 
-   trace_clk_disable_rcuidle(core);
+   trace_clk_disable(core);
 
if (core->ops->disable)
core->ops->disable(core->hw);
 
-   trace_clk_disable_complete_rcuidle(core);
+   trace_clk_disable_complete(core);
 
clk_core_disable(core->parent);
 }
@@ -1037,12 +1037,12 @@ static int clk_core_enable(struct clk_co
if (ret)
return ret;
 
-   trace_clk_enable_rcuidle(core);
+   trace_clk_enable(core);
 
if (core->ops->enable)
ret = core->ops->enable(core->hw);
 
-   trace_clk_enable_complete_rcuidle(core);
+   trace_clk_enable_complete(core);
 
if (ret) {
clk_core_disable(core->parent);




[PATCH 22/36] arm,smp: Remove trace_.*_rcuidle() usage

2022-06-08 Thread Peter Zijlstra
, Arnd Bergmann , ulli.kr...@googlemail.com, vgu...@kernel.org, 
linux-...@vger.kernel.org, j...@joshtriplett.org, rost...@goodmis.org, 
r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, t...@atomide.com, 
amakha...@vmware.com, bjorn.anders...@linaro.org, h...@zytor.com, 
sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, yury.no...@gmail.com, rich...@nod.at, x...@kernel.org, 
li...@armlinux.org.uk, mi...@redhat.com, a...@eecs.berkeley.edu, 
paul...@kernel.org, h...@linux.ibm.com, stefan.kristians...@saunalahti.fi, 
openr...@lists.librecores.org, paul.walms...@sifive.com, 
linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, palmer@dab
 belt.com, a...@brainfault.org, i...@jurassic.park.msu.ru, 
johan...@sipsolutions.net, linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


None of these functions should ever be ran with RCU disabled anymore.

Specifically, do_handle_IPI() is only called from handle_IPI() which
explicitly does irq_enter()/irq_exit() which ensures RCU is watching.

The problem with smp_cross_call() was, per commit 7c64cc0531fa ("arm: Use
_rcuidle for smp_cross_call() tracepoints"), that
cpuidle_enter_state_coupled() already had RCU disabled, but that's
long been fixed by commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle
deeper into the idle path").

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/arm/kernel/smp.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -639,7 +639,7 @@ static void do_handle_IPI(int ipinr)
unsigned int cpu = smp_processor_id();
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_entry_rcuidle(ipi_types[ipinr]);
+   trace_ipi_entry(ipi_types[ipinr]);
 
switch (ipinr) {
case IPI_WAKEUP:
@@ -686,7 +686,7 @@ static void do_handle_IPI(int ipinr)
}
 
if ((unsigned)ipinr < NR_IPI)
-   trace_ipi_exit_rcuidle(ipi_types[ipinr]);
+   trace_ipi_exit(ipi_types[ipinr]);
 }
 
 /* Legacy version, should go away once all irqchips have been converted */
@@ -709,7 +709,7 @@ static irqreturn_t ipi_handler(int irq,
 
 static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
 {
-   trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
+   trace_ipi_raise(target, ipi_types[ipinr]);
__ipi_send_mask(ipi_desc[ipinr], target);
 }
 




Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Google
ernel.org>, Masahiro Yamada , Jarkko Sakkinen 
, Sami Tolvanen , "Naveen N. Rao" 
, Marco Elver , Kees Cook 
, Steven Rostedt , Nathan 
Chancellor , "Russell King \(Oracle\)" 
, Mark Brown , Borislav Petkov 
, Alexander Egorenkov , Thomas 
Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav 
Benes , Chen Zhongjin , Ard Biesheuvel 
, the arch/x86 maintainers , Russell King 
, linux-riscv , Ingo 
Molnar , Aaron Tomlin , Albert Ou 
, Heiko Carstens , Liao Chang 
, Paul Walmsley , Josh 
Poimboeuf , Thomas Richter , "open 
list:BROADCOM NVRAM DRIVER" , Changbin Du 
, Palmer Dabbelt , linuxppc-dev 
, linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


Hi Jarkko,

On Wed, 8 Jun 2022 08:25:38 +0300
Jarkko Sakkinen  wrote:

> On Wed, Jun 08, 2022 at 10:35:42AM +0800, Guo Ren wrote:
> > .
> > 
> > On Wed, Jun 8, 2022 at 8:02 AM Jarkko Sakkinen  wrote:
> > >
> > > Tracing with kprobes while running a monolithic kernel is currently
> > > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES.  This
> > > dependency is a result of kprobes code using the module allocator for the
> > > trampoline code.
> > >
> > > Detaching kprobes from modules helps to squeeze down the user space,
> > > e.g. when developing new core kernel features, while still having all
> > > the nice tracing capabilities.
> > >
> > > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > > module_alloc.c, and compile as part of vmlinux when either CONFIG_MODULES
> > > or CONFIG_KPROBES is enabled.  In addition, flag kernel module specific
> > > code with CONFIG_MODULES.
> > >
> > > As the result, kprobes can be used with a monolithic kernel.
> > It's strange when MODULES is n, but vmlinux still obtains module_alloc.
> > 
> > Maybe we need a kprobe_alloc, right?
> 
> Perhaps not the best name but at least it documents the fact that
> they use the same allocator.
> 
> Few years ago I carved up something "half-way there" for kprobes,
> and I used the name text_alloc() [*].
> 
> [*] 
> https://lore.kernel.org/all/20200724050553.1724168-1-jarkko.sakki...@linux.intel.com/
>  

Yeah, I remember that. Thank you for updating your patch!
I think the idea (split module_alloc() from CONFIG_MODULE) is good to me.
If module support maintainers think this name is not good, you may be
able to rename it as text_alloc() and make the module_alloc() as a
wrapper of it.

Acked-by: Masami Hiramatsu (Google) 
for kprobe side.

Thank you,

-- 
Masami Hiramatsu (Google) 


Re: [PATCH V12 02/20] uapi: always define F_GETLK64/F_SETLK64/F_SETLKW64 in fcntl.h

2022-06-08 Thread Eugene Syromiatnikov
On Tue, Apr 05, 2022 at 03:12:56PM +0800, guo...@kernel.org wrote:
> From: Christoph Hellwig 
> 
> Note that before this change they were never visible to userspace due
> to the fact that CONFIG_64BIT is only set for kernel builds.

> -#ifndef CONFIG_64BIT
> +#if __BITS_PER_LONG == 32 || defined(__KERNEL__)

Actually, it's quite the opposite: "ifndef" usage made it vailable at all times
to the userspace, and this change has actually broken building strace
with the latest kernel headers[1][2].  There could be some debate
whether having these F_*64 definitions exposed to the user space 64-bit
applications, but it seems that were no harm (as they were exposed already
for quite some time), and they are useful at least for strace for compat
application tracing purposes.

[1] 
https://github.com/strace/strace/runs/6779763146?check_suite_focus=true#step:4:3222
[2] 
https://pipelines.actions.githubusercontent.com/serviceHosts/e5309ebd-8a2f-43f4-a212-b52080275b5d/_apis/pipelines/1/runs/1473/signedlogcontent/12?urlExpires=2022-06-08T09%3A37%3A13.9248496Z=HMACV1=fIT7vd0O4NNRwzwKWLXY4UVZBIIF3XiVI9skAsGvV0I%3D



Re: [PATCH v4,1/5] powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr

2022-06-08 Thread wenhu wang

-邮件原件-
发件人: Christophe Leroy  
发送时间: 2022年5月28日 15:03
收件人: 王文虎 ; ker...@vivo.com
抄送: Greg Kroah-Hartman ; Arnd Bergmann 
; Scott Wood ; Michael Ellerman 
; Randy Dunlap ; 
linuxppc-dev@lists.ozlabs.org
主题: Re: [PATCH v4,1/5] powerpc: sysdev: fix compile error for fsl_85xx_l2ctlr



Le 15/03/2022 à 13:45, Christophe Leroy a écrit :
> 
> 
> Le 24/04/2020 à 10:58, Wang Wenhu a écrit :
>> Include "linux/of_address.h" to fix the compile error for
>> mpc85xx_l2ctlr_of_probe() when compiling fsl_85xx_cache_sram.c.
>>
>>    CC  arch/powerpc/sysdev/fsl_85xx_l2ctlr.o
>> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c: In function
>> ‘mpc85xx_l2ctlr_of_probe’:
>> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:11: error: implicit 
>> declaration of function ‘of_iomap’; did you mean ‘pci_iomap’?
>> [-Werror=implicit-function-declaration]
>>    l2ctlr = of_iomap(dev->dev.of_node, 0);
>>     ^~~~
>>     pci_iomap
>> arch/powerpc/sysdev/fsl_85xx_l2ctlr.c:90:9: error: assignment makes 
>> pointer from integer without a cast [-Werror=int-conversion]
>>    l2ctlr = of_iomap(dev->dev.of_node, 0);
>>   ^
>> cc1: all warnings being treated as errors
>> scripts/Makefile.build:267: recipe for target 
>> 'arch/powerpc/sysdev/fsl_85xx_l2ctlr.o' failed
>> make[2]: *** [arch/powerpc/sysdev/fsl_85xx_l2ctlr.o] Error 1
>>
>> Cc: Greg Kroah-Hartman 
>> Cc: Arnd Bergmann 
>> Cc: Christophe Leroy 
>> Cc: Scott Wood 
>> Cc: Michael Ellerman 
>> Cc: Randy Dunlap 
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Fixes: 6db92cc9d07d ("powerpc/85xx: add cache-sram support")
>> Reviewed-by: Christophe Leroy 
>> Signed-off-by: Wang Wenhu 
> 
> Is there still an interest for this series ?
> 
> I see there is even a v5 at
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=172421;
> state=* allthough I can't find it in my emails.
> 
> If so do you plan to send any update of it at some point ?
> 
> Otherwise, as CONFIG_FSL_85XX_CACHE_SRAM is not user selectable and no 
> driver selects it, I think time has come to remove it completely.
> 
Thanks for the reference. As the v5 series failed to reach a appointment,  we 
currently
use it out of tree. I would try a new version with the whole driver implemented 
in UIO module.
So it's ok to remove the version here.

CONFIG_FSL_85XX_CACHE_SRAM has now been removed.

See
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=dc21ed2aef4150fc2fcf58227a4ff24502015c03


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Jarkko Sakkinen
da , Jarkko Sakkinen , Sami Tolvanen 
, "Naveen N. Rao" , Marco 
Elver , Kees Cook , Steven Rostedt 
, Nathan Chancellor , "Russell King 
\(Oracle\)" , Mark Brown , 
Borislav Petkov , Alexander Egorenkov , 
Thomas Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav 
Benes , Chen Zhongjin , Ard Biesheuvel , the arch/x86 
maintainers , Russell King , 
linux-riscv , Ingo Molnar , 
Aaron Tomlin , Albert Ou , Heiko 
Carstens , Liao Chang , Paul 
Walmsley , Josh Poimboeuf , 
Thomas Richter , "open list:BROADCOM NVRAM DRIVER" 
, Changbin Du , Palmer 
Dabbelt , linuxppc-dev , 
linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 10:35:42AM +0800, Guo Ren wrote:
> .
> 
> On Wed, Jun 8, 2022 at 8:02 AM Jarkko Sakkinen  wrote:
> >
> > Tracing with kprobes while running a monolithic kernel is currently
> > impossible because CONFIG_KPROBES is dependent of CONFIG_MODULES.  This
> > dependency is a result of kprobes code using the module allocator for the
> > trampoline code.
> >
> > Detaching kprobes from modules helps to squeeze down the user space,
> > e.g. when developing new core kernel features, while still having all
> > the nice tracing capabilities.
> >
> > For kernel/ and arch/*, move module_alloc() and module_memfree() to
> > module_alloc.c, and compile as part of vmlinux when either CONFIG_MODULES
> > or CONFIG_KPROBES is enabled.  In addition, flag kernel module specific
> > code with CONFIG_MODULES.
> >
> > As the result, kprobes can be used with a monolithic kernel.
> It's strange when MODULES is n, but vmlinux still obtains module_alloc.
> 
> Maybe we need a kprobe_alloc, right?

Perhaps not the best name but at least it documents the fact that
they use the same allocator.

Few years ago I carved up something "half-way there" for kprobes,
and I used the name text_alloc() [*].

[*] 
https://lore.kernel.org/all/20200724050553.1724168-1-jarkko.sakki...@linux.intel.com/
 

BR, Jarkko


[Bug 216041] Stack overflow at boot (do_IRQ: stack overflow: 1984) on a PowerMac G4 DP, KASAN debug build

2022-06-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216041

--- Comment #7 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 301129
  --> https://bugzilla.kernel.org/attachment.cgi?id=301129=edit
kernel .config (5.19-rc1, Outline KASAN + patches, PowerMac G4 DP)

Tried to reinvestigate this issue with a KASAN build of v5.19-rc1 but it seems
it's not quite there.

I applied the 2 patches "powerpc-kasan-Force-thread-size-increase-with-KASAN"
and
"v2-powerpc-irq-Increase-stack_overflow-detection-limit-when-KASAN-is-enabled"
on top of v5.19-rc1 but I get a non-booting kernel. The kernel boots first but
gets stuck on a white screen reading

"done
found display: /pci@f000/ATY,AlteracParent@10/ATY,Alterac_B@1, opening..."

Kernel with same config but with KFENCE instead of KASAN boots fine (see bug
#216095).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 216095] sysfs: cannot create duplicate filename '/devices/platform/of-display'

2022-06-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216095

--- Comment #1 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 301128
  --> https://bugzilla.kernel.org/attachment.cgi?id=301128=edit
kernel .config (5.19-rc1, PowerMac G4 DP)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 216095] New: sysfs: cannot create duplicate filename '/devices/platform/of-display'

2022-06-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216095

Bug ID: 216095
   Summary: sysfs: cannot create duplicate filename
'/devices/platform/of-display'
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 5.19-rc1
  Hardware: PPC-32
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PPC-32
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: erhar...@mailbox.org
Regression: No

Created attachment 301127
  --> https://bugzilla.kernel.org/attachment.cgi?id=301127=edit
dmesg (5.19-rc1, PowerMac G4 DP)

[...]
sysfs: cannot create duplicate filename '/devices/platform/of-display'
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc1-PMacG4+ #3
Call Trace:
[e9025cc0] [c05984d0] dump_stack_lvl+0x60/0x90 (unreliable)
[e9025ce0] [c02f043c] sysfs_warn_dup+0x64/0x84
[e9025d00] [c02f05cc] sysfs_create_dir_ns+0xfc/0x118
[e9025d30] [c059ffa4] kobject_add_internal+0x114/0x2f0
[e9025d60] [c05a0790] kobject_add+0x80/0xf0
[e9025da0] [c064c3d8] device_add+0x114/0x94c
[e9025e10] [c06f197c] of_platform_device_create_pdata+0xb8/0x144
[e9025e40] [c0c43bb4] of_platform_default_populate_init+0x284/0x2f4
[e9025e70] [c0007a94] do_one_initcall+0x50/0x294
[e9025ee0] [c0c03ff0] kernel_init_freeable+0x228/0x334
[e9025f20] [c0007efc] kernel_init+0x28/0x144
[e9025f40] [c0019334] ret_from_kernel_thread+0x5c/0x64
kobject_add_internal failed for of-display with -EEXIST, don't try to register
things with the same name in the same directory.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] PCI/ERR: handle disconnected devices in report_error_detected

2022-06-08 Thread Bjorn Helgaas
On Wed, Jun 01, 2022 at 09:40:24AM +0200, Christoph Hellwig wrote:
> When a device is already unplugged by pciehp by the time that the AER
> handler is invoked, the PCIe device will lready be in the
> pci_channel_io_perm_failure state.  In that case we should simply
> return PCI_ERS_RESULT_DISCONNECT instead of trying to do a state
> transition that will fail.
> 
> Also untangle the state transition failure from the lack of methods to
> improve the debugging output in case it will happen ever again.
> 
> Signed-off-by: Christoph Hellwig 

Applied with Sathy's reviewed-by to pci/err for v5.20, thanks!

> ---
>  drivers/pci/pcie/err.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 0c5a143025af4..59c90d04a609a 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -55,10 +55,14 @@ static int report_error_detected(struct pci_dev *dev,
>  
>   device_lock(>dev);
>   pdrv = dev->driver;
> - if (!pci_dev_set_io_state(dev, state) ||
> - !pdrv ||
> - !pdrv->err_handler ||
> - !pdrv->err_handler->error_detected) {
> + if (pci_dev_is_disconnected(dev)) {
> + vote = PCI_ERS_RESULT_DISCONNECT;
> + } else if (!pci_dev_set_io_state(dev, state)) {
> + pci_info(dev, "can't recover (state transition %u -> %u 
> invalid)\n",
> + dev->error_state, state);
> + vote = PCI_ERS_RESULT_NONE;
> + } else if (!pdrv || !pdrv->err_handler ||
> +!pdrv->err_handler->error_detected) {
>   /*
>* If any device in the subtree does not have an error_detected
>* callback, PCI_ERS_RESULT_NO_AER_DRIVER prevents subsequent
> -- 
> 2.30.2
> 


Re: ppc64le bzImage and Build_id elf note

2022-06-08 Thread Segher Boessenkool
On Wed, Jun 08, 2022 at 09:52:25PM +1000, Michael Ellerman wrote:
> What's the motivation for using the zImage?

You cannot boot a vmlinux from OF directly, with most PowerPC OF
implementations.  You *can* boot a zImage directly (with some
trickery sometimes).  That is one reason to use zImage, probably not
a very important reason for most people though.


Segher


Re: [PATCH] backlight: Use backlight helper

2022-06-08 Thread Daniel Thompson
^^^
Subject seems a bit generic...

On Tue, Jun 07, 2022 at 08:34:11PM +0200, Stephen Kitt wrote:
> diff --git a/drivers/macintosh/via-pmu-backlight.c 
> b/drivers/macintosh/via-pmu-backlight.c
> index 2194016122d2..c2d87e7fa85b 100644
> --- a/drivers/macintosh/via-pmu-backlight.c
> +++ b/drivers/macintosh/via-pmu-backlight.c
> @@ -71,12 +71,7 @@ static int pmu_backlight_get_level_brightness(int level)
>  static int __pmu_backlight_update_status(struct backlight_device *bd)
>  {
>   struct adb_request req;
> - int level = bd->props.brightness;
> -
> -
> - if (bd->props.power != FB_BLANK_UNBLANK ||
> - bd->props.fb_blank != FB_BLANK_UNBLANK)
> - level = 0;
> + int level = backlight_get_brightness(bd);

Other than that LGTM.


Daniel.


Re: ppc64le bzImage and Build_id elf note

2022-06-08 Thread Michael Ellerman
Donald Zickus  writes:
> Hi Michael,
>
> I am working on two packaging issues with Fedora and CKI that I am hoping
> you can give me some guidance on.
>
> 1 - Fedora has always packaged an eu-strip'd vmlinux file for powerpc.  The
> other arches we support used native compressed images.  I was looking into
> using powerpc's zImage (pseries) binary to remove the powerpc workarounds
> in our rpm spec file.

What's the motivation for using the zImage?

My naive hope was that as more advanced boot loaders become the norm we
could eventually get rid of the zImage.

It's generally a pain to work with, and a bit crufty, it also doesn't
get as much testing as booting the vmlinux, so I'd be a little wary of
switching to it.

There's also multiple zImages (and others), although admittedly for the
platforms that Fedora supports the zImage.pseries should work (I think).

> However, the rpmbuild fails because it can't find a build-id with
> eu-readelf -n zImage.  Sure enough the build-id is found in vmlinux and
> vmlinux.stripped but disappears with vmlinux.stripped.gz.

Looks like other arches use objcopy rather than strip, maybe that's it?

> I had hoped
> arch/powerpc/boot/addnote would stick it back in but it doesn't (I am
> ignorant of how addnote works).

addnote adds some notes that firmware needs to read, it doesn't do
anything else, though maybe it could.

> eu-readelf -n  data
> vmlinux:
>
> Displaying notes found in: .notes
>   OwnerData sizeDescription
>   GNU  0x0014   NT_GNU_BUILD_ID (unique build ID
> bitstring)
> Build ID: b4c026d72ead7b4316a221cddb7f2b10d75fb313
>   Linux0x0004   func
>description data: 00 00 00 00
>   Linux0x0001   OPEN
>description data: 00
>   PowerPC  0x0004   NT_VERSION (version)
>description data: 01 00 00 00
>
> zImage:
>
> Displaying notes found at file offset 0x0158 with length 0x002c:
>   OwnerData sizeDescription
>   PowerPC  0x0018   Unknown note type: (0x1275)
>description data: ff ff ff ff 02 00 00 00 ff ff ff ff ff ff ff ff ff ff
> ff ff 00 00 40 00
>
> Displaying notes found at file offset 0x0184 with length 0x0044:
>   OwnerData sizeDescription
>   IBM,RPA-Client-[...] 0x0020   Unknown note type: (0x1275)
>description data: 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 28 00 00
> 00 01 ff ff ff ff 00 00 00 00 00 00 00 01
>
> Is this something that can be addressed?  Or should I/we expect the
> build-id to never make it into the zImage and just continue with our
> current vmlinux process?

Maybe :)

Is it correct for the build-id to be copied into the zImage anyway? It's
a different binary so shouldn't it have a different build-id?

If you have a zImage and a vmlinux with the same build-id isn't that
going to confuse debugging tools?

> 2 - CKI builds kernels using 'make targz-pkg'.  The arches we support
> (x86_64, s390, aarch64) provide compressed binaries to package using
> KBUILD_IMAGE or a specific entry in scripts/package/buildtar.  As a result,
> because powerpc doesn't have a KBUILD_IMAGE variable defined, the script
> builds vmlinx and cp's that to vmlinux-kbuild.  The problem with powerpc is
> that vmlinux for us is huge ( >256MB) and cp'ing that to vmlinux-kbuild
> occupies > 512MB of /boot and our distro runs out of disk space on that
> partition.

Is that just because it has debug info built in? I thought the distro
solution for that was doing split debug info?

> I was hoping to add a patch to arch/powerpc/Makefile that defines
> KBUILD_IMAGE:=$(boot)/zImage (mimicing arch/s390), which I believe would
> solve our problem.  However, that circles back to our first problem above.
>
> Thoughts?  Help?

Happy to try and help, though see my concerns at the top about using zImage.

cheers


Re: [PATCH 5/6] scsi: remove stale BusLogic driver

2022-06-08 Thread Arnd Bergmann
On Mon, Jun 6, 2022 at 6:35 PM Khalid Aziz  wrote:
> On 6/6/22 02:41, Arnd Bergmann wrote: From: Arnd Bergmann
>
> I would say no to removing BusLogic driver. Virtualbox is another
> consumer of this driver. This driver is very old but I would rather fix
> the issues than remove it until we do not have any users.

Maciej already offered to help fix the driver, so I think it will be ok.

On the other hand, it sounds like VirtualBox users should not actually try to
use the BusLogic driver with modern Linux guests. From what I can tell
from the documentation [1], VirtualBox only provides this emulation because it
was shipped with early versions of VMware and is supported by Windows 2000
and earlier, but not actually on any modern Windows guest. The VMware
documentation in turn explicitly says that BusLogic does not work with 64-bit
guests [2], presumably this applies to both Windows and Linux guests.

Arnd

[1] https://www.virtualbox.org/manual/ch05.html#harddiskcontrollers
[2] https://kb.vmware.com/s/article/2010470