Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Michael Ellerman
"Nicholas Piggin"  writes:
> On Fri Oct 7, 2022 at 9:23 AM AEST, Segher Boessenkool wrote:
>> On Fri, Oct 07, 2022 at 07:56:09AM +1000, Nicholas Piggin wrote:
>> > On Fri Oct 7, 2022 at 5:54 AM AEST, Segher Boessenkool wrote:
>> > > On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
...
>> > > > +# No AltiVec or VSX or MMA instructions when building kernel
>> > > >  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
>> > > >  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
>> > > > +KBUILD_CFLAGS += $(call cc-option,-mno-mma)
>> > >
>> > > MMA code is never generated unless the code asks for it explicitly.
>> > > This is fundamental, not just an implementations side effect.
>> > 
>> > Well, now it double won't be generated :)
>>
>> Yeah, but there are many other things you can unnecessarily disable as
>> well!  :-)
>>
>> VMX and VSX are disabled here because the compiler *will* use those
>> registers if it feels like it (that is, if it thinks that will be
>> faster).  MMA is a very different beast: the compiler can never know if
>> it will be faster, to start with.
>
> True, but now I don't have to find the exact clause and have my lawyer
> confirm that it definitely probably won't change in future and break
> things.

Right. If someone asks "does the kernel ever use MMA instructions?" we
can just point at that line and we have a definite answer. No need to
audit the behaviour of all GCC and Clang versions ever released.

cheers


[PATCH] powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPU

2022-10-06 Thread Haren Myneni


The hypervisor assigns VAS (Virtual Accelerator Switchboard)
windows depends on cores configured in LPAR. The kernel uses
OF reconfig notifier to reconfig VAS windows for DLPAR CPU event.
In the case of shared CPU mode partition, the hypervisor assigns
VAS windows depends on CPU entitled capacity, not based on vcpus.
When the user changes CPU entitled capacity for the partition,
drmgr uses /proc/ppc64/lparcfg interface to notify the kernel.

This patch adds the following changes to update VAS resources
for shared mode:
- Call vas reconfig windows from lparcfg_write()
- Ignore reconfig changes in the VAS notifier

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/lparcfg.c | 15 
 arch/powerpc/platforms/pseries/vas.c | 44 
 arch/powerpc/platforms/pseries/vas.h |  5 +++
 3 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index 507dc0b5987d..23f9c96e9abc 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -35,6 +35,7 @@
 #include 
 
 #include "pseries.h"
+#include "vas.h"   /* pseries_vas_dlpar_cpu() */
 
 /*
  * This isn't a module but we expose that to userspace
@@ -748,6 +749,20 @@ static ssize_t lparcfg_write(struct file *file, const char 
__user * buf,
return -EINVAL;
 
retval = update_ppp(new_entitled_ptr, NULL);
+
+   if (retval == H_SUCCESS || retval == H_CONSTRAINED) {
+   /*
+* The hypervisor assigns VAS resources based
+* on entitled capacity for shared mode.
+* Reconfig VAS windows based on DLPAR CPU events.
+* Returns success, -EIO, -EINVAL, or -ENOMEM
+*/
+   retval = pseries_vas_dlpar_cpu();
+   if (!retval)
+   retval = count;
+
+   return retval;
+   }
} else if (!strcmp(kbuf, "capacity_weight")) {
char *endp;
*new_weight_ptr = (u8) simple_strtoul(tmp, , 10);
diff --git a/arch/powerpc/platforms/pseries/vas.c 
b/arch/powerpc/platforms/pseries/vas.c
index 93f87ac126df..bf15586f2af7 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -857,6 +857,26 @@ int vas_reconfig_capabilties(u8 type, int new_nr_creds)
mutex_unlock(_pseries_mutex);
return rc;
 }
+
+int pseries_vas_dlpar_cpu(void)
+{
+   int new_nr_creds, rc = 0;
+
+   rc = h_query_vas_capabilities(H_QUERY_VAS_CAPABILITIES,
+   vascaps[VAS_GZIP_DEF_FEAT_TYPE].feat,
+   (u64)virt_to_phys(_cop_caps));
+   if (!rc) {
+   new_nr_creds = be16_to_cpu(hv_cop_caps.target_lpar_creds);
+   rc = vas_reconfig_capabilties(VAS_GZIP_DEF_FEAT_TYPE,
+   new_nr_creds);
+   }
+
+   if (rc)
+   pr_err("Failed reconfig VAS capabilities with DLPAR\n");
+
+   return rc;
+}
+
 /*
  * Total number of default credits available (target_credits)
  * in LPAR depends on number of cores configured. It varies based on
@@ -871,7 +891,15 @@ static int pseries_vas_notifier(struct notifier_block *nb,
struct of_reconfig_data *rd = data;
struct device_node *dn = rd->dn;
const __be32 *intserv = NULL;
-   int new_nr_creds, len, rc = 0;
+   int len;
+
+   /*
+* For shared CPU partition, the hypervisor assigns total credits
+* based on entitled core capacity. So updating VAS windows will
+* be called from lparcfg_write().
+*/
+   if (is_shared_processor())
+   return NOTIFY_OK;
 
if ((action == OF_RECONFIG_ATTACH_NODE) ||
(action == OF_RECONFIG_DETACH_NODE))
@@ -883,19 +911,7 @@ static int pseries_vas_notifier(struct notifier_block *nb,
if (!intserv)
return NOTIFY_OK;
 
-   rc = h_query_vas_capabilities(H_QUERY_VAS_CAPABILITIES,
-   vascaps[VAS_GZIP_DEF_FEAT_TYPE].feat,
-   (u64)virt_to_phys(_cop_caps));
-   if (!rc) {
-   new_nr_creds = be16_to_cpu(hv_cop_caps.target_lpar_creds);
-   rc = vas_reconfig_capabilties(VAS_GZIP_DEF_FEAT_TYPE,
-   new_nr_creds);
-   }
-
-   if (rc)
-   pr_err("Failed reconfig VAS capabilities with DLPAR\n");
-
-   return rc;
+   return pseries_vas_dlpar_cpu();
 }
 
 static struct notifier_block pseries_vas_nb = {
diff --git a/arch/powerpc/platforms/pseries/vas.h 
b/arch/powerpc/platforms/pseries/vas.h
index a2cb12a31c17..7115043ec488 100644
--- a/arch/powerpc/platforms/pseries/vas.h
+++ 

[powerpc:next] BUILD SUCCESS 94746890202cf18e5266b4de77895243e55b0a79

2022-10-06 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: 94746890202cf18e5266b4de77895243e55b0a79  powerpc: Don't add 
__powerpc_ prefix to syscall entry points

elapsed time: 720m

configs tested: 67
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arc defconfig
um i386_defconfig
s390 allmodconfig
um   x86_64_defconfig
alpha   defconfig
s390defconfig
s390 allyesconfig
arm defconfig
x86_64  defconfig
x86_64   rhel-8.3
i386 randconfig-a011-20221003
i386 randconfig-a012-20221003
powerpc   allnoconfig
i386 randconfig-a013-20221003
i386 randconfig-a015-20221003
x86_64   randconfig-a011-20221003
i386 randconfig-a016-20221003
x86_64   randconfig-a012-20221003
i386 randconfig-a014-20221003
x86_64  rhel-8.3-func
x86_64rhel-8.3-kselftests
x86_64   randconfig-a013-20221003
x86_64   allyesconfig
x86_64   randconfig-a015-20221003
x86_64   randconfig-a016-20221003
x86_64   randconfig-a014-20221003
x86_64   rhel-8.3-syz
x86_64 rhel-8.3-kunit
x86_64   rhel-8.3-kvm
i386defconfig
arm64allyesconfig
arm  allyesconfig
sh   allmodconfig
riscvrandconfig-r042-20221003
mips allyesconfig
arc  randconfig-r043-20221003
powerpc  allmodconfig
arc   allnoconfig
i386 allyesconfig
arc  randconfig-r043-20221002
alpha allnoconfig
s390 randconfig-r044-20221003
riscv allnoconfig
csky  allnoconfig
ia64 allmodconfig
alphaallyesconfig
m68k allmodconfig
arc  allyesconfig
m68k allyesconfig

clang tested configs:
x86_64   randconfig-a005-20221003
x86_64   randconfig-a004-20221003
x86_64   randconfig-a001-20221003
hexagon  randconfig-r041-20221003
x86_64   randconfig-a002-20221003
riscvrandconfig-r042-20221002
x86_64   randconfig-a003-20221003
x86_64   randconfig-a006-20221003
hexagon  randconfig-r041-20221002
s390 randconfig-r044-20221002
hexagon  randconfig-r045-20221002
hexagon  randconfig-r045-20221003
i386 randconfig-a006-20221003
i386 randconfig-a003-20221003
i386 randconfig-a001-20221003
i386 randconfig-a004-20221003
i386 randconfig-a002-20221003
i386 randconfig-a005-20221003

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


[PATCH v3] i2c/pasemi: PASemi I2C controller IRQ enablement

2022-10-06 Thread Arminder Singh
This patch adds IRQ support to the PASemi I2C controller driver to 
increase the performace of I2C transactions on platforms with PASemi I2C 
controllers. While primarily intended for Apple silicon platforms, this 
patch should also help in enabling IRQ support for older PASemi hardware 
as well should the need arise.

Signed-off-by: Arminder Singh 
---
This version of the patch has been tested on an M1 Ultra Mac Studio,
as well as an M1 MacBook Pro, and userspace launches successfully
while using the IRQ path for I2C transactions.

This version of the patch only contains fixes to the whitespace and
alignment issues found in v2 of the patch, and as such the testing that
Christian Zigotsky did on PASemi hardware for v2 of the patch also applies
to this version of the patch as well.
(See v2 patch email thread for the "Tested-by" tag)

v2 to v3 changes:
 - Fixed some whitespace and alignment issues found in v2 of the patch

v1 to v2 changes:
 - moved completion setup from pasemi_platform_i2c_probe to
   pasemi_i2c_common_probe to allow PASemi and Apple platforms to share
   common completion setup code in case PASemi hardware gets IRQ support
   added
 - initialized the status variable in pasemi_smb_waitready when going down
   the non-IRQ path
 - removed an unnecessary cast of dev_id in the IRQ handler
 - fixed alignment of struct member names in i2c-pasemi-core.h
   (addresses Christophe's feedback in the original submission)
 - IRQs are now disabled after the wait_for_completion_timeout call
   instead of inside the IRQ handler
   (prevents the IRQ from going off after the completion times out)
 - changed the request_irq call to a devm_request_irq call to obviate
   the need for a remove function and a free_irq call
   (thanks to Sven for pointing this out in the original submission)
 - added a reinit_completion call to pasemi_reset 
   as a failsafe to prevent missed interrupts from causing the completion
   to never complete (thanks to Arnd Bergmann for pointing this out)
 - removed the bitmask variable in favor of just using the value
   directly (it wasn't used anywhere else)

v2 linked here: 
https://lore.kernel.org/linux-i2c/mn2pr01mb535821c8058c7814b2f8eedf9f...@mn2pr01mb5358.prod.exchangelabs.com/
v1 linked here: 
https://lore.kernel.org/linux-i2c/mn2pr01mb535838492432c910f2381f929f...@mn2pr01mb5358.prod.exchangelabs.com/T/#m11b3504c2667517aad7521514c99ca0e07a9381f

Hopefully the patch is good to go this time around!

 drivers/i2c/busses/i2c-pasemi-core.c | 29 
 drivers/i2c/busses/i2c-pasemi-core.h |  5 
 drivers/i2c/busses/i2c-pasemi-platform.c |  6 +
 3 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/i2c/busses/i2c-pasemi-core.c 
b/drivers/i2c/busses/i2c-pasemi-core.c
index 9028ffb58cc0..4855144b370e 100644
--- a/drivers/i2c/busses/i2c-pasemi-core.c
+++ b/drivers/i2c/busses/i2c-pasemi-core.c
@@ -21,6 +21,7 @@
 #define REG_MTXFIFO0x00
 #define REG_MRXFIFO0x04
 #define REG_SMSTA  0x14
+#define REG_IMASK  0x18
 #define REG_CTL0x1c
 #define REG_REV0x28
 
@@ -66,6 +67,7 @@ static void pasemi_reset(struct pasemi_smbus *smbus)
val |= CTL_EN;
 
reg_write(smbus, REG_CTL, val);
+   reinit_completion(>irq_completion);
 }
 
 static void pasemi_smb_clear(struct pasemi_smbus *smbus)
@@ -81,11 +83,18 @@ static int pasemi_smb_waitready(struct pasemi_smbus *smbus)
int timeout = 10;
unsigned int status;
 
-   status = reg_read(smbus, REG_SMSTA);
-
-   while (!(status & SMSTA_XEN) && timeout--) {
-   msleep(1);
+   if (smbus->use_irq) {
+   reinit_completion(>irq_completion);
+   reg_write(smbus, REG_IMASK, SMSTA_XEN | SMSTA_MTN);
+   wait_for_completion_timeout(>irq_completion, 
msecs_to_jiffies(10));
+   reg_write(smbus, REG_IMASK, 0);
status = reg_read(smbus, REG_SMSTA);
+   } else {
+   status = reg_read(smbus, REG_SMSTA);
+   while (!(status & SMSTA_XEN) && timeout--) {
+   msleep(1);
+   status = reg_read(smbus, REG_SMSTA);
+   }
}
 
/* Got NACK? */
@@ -344,10 +353,14 @@ int pasemi_i2c_common_probe(struct pasemi_smbus *smbus)
 
/* set up the sysfs linkage to our parent device */
smbus->adapter.dev.parent = smbus->dev;
+   smbus->use_irq = 0;
+   init_completion(>irq_completion);
 
if (smbus->hw_rev != PASEMI_HW_REV_PCI)
smbus->hw_rev = reg_read(smbus, REG_REV);
 
+   reg_write(smbus, REG_IMASK, 0);
+
pasemi_reset(smbus);
 
error = devm_i2c_add_adapter(smbus->dev, >adapter);
@@ -356,3 +369,11 @@ int pasemi_i2c_common_probe(struct pasemi_smbus *smbus)
 
return 0;
 }
+
+irqreturn_t pasemi_irq_handler(int irq, void *dev_id)
+{
+   struct pasemi_smbus *smbus = dev_id;
+
+   

Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Nicholas Piggin
On Fri Oct 7, 2022 at 9:23 AM AEST, Segher Boessenkool wrote:
> On Fri, Oct 07, 2022 at 07:56:09AM +1000, Nicholas Piggin wrote:
> > On Fri Oct 7, 2022 at 5:54 AM AEST, Segher Boessenkool wrote:
> > > On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> > > > This adds basic POWER10_CPU option, which builds with -mcpu=power10.
> > >
> > > > +# No prefix or pcrel
> > > > +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
> > > > +KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)
> > >
> > > Why do you disable all prefixed insns?  What goes wrong if you don't?
> > 
> > Potentially things like kprobes.
>
> So mention that?  "This patch is due to an abundance of caution".

Well it's in next now. I did say *basic*, I'm sure not changing the ABI
or adding prefix instructions isn't too mysterious.

>
> What I meant to ask is if you *saw* something going wrong, not if you
> can imagine something going wrong.  I can imagine ten gazillion things
> going wrong, that is not why I asked :-)
>
> > > Same question for pcrel.  I'm sure you want to optimise it better, but
> > > it's not clear to me how it fails now?
> > 
> > For pcrel addressing? Bootstrapping the C environment is one, the
> > module dynamic linker is another.
>
> I don't know what either of those mean.

arch/powerpc/kernel/head_64.S and arch/powerpc/kernel/module_64.c

Can discuss in the pcrel patch series thread if you would like to know
more.

>
> > Some details in this series.
> > 
> > https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-September/248521.html
>
> I've watched that series with great interest, but I don't remember
> anything like that?  Are you refering to the commentary in 7/7?
> "Definitely ftrace and probes, possibly BPF and KVM have some breakage.
> I haven't looked closely yet."...  This doesn't mean much does it :-)
> It can be a triviality or two.  Or it could be a massive roadblock.
>
> Just say in a comment where you disable stuff that it is to prevent
> possible problems, this is a WIP, that kind of thing?  Otherwise other
> people (like me :-) ) will read it and think there must be some deeper
> reason.  Like, changing code to work with pcrel is hard or a lot of
> work -- it isn't :-)  As you say in 0/7 yourself btw!
>

I will describe limitations and issues a bit more in changelog of patches
to enable prefix and pcrel when I submit as non-RFC candidate. It would
probably not be too hard to get things to a workable state that could be
merged.

> > > > +# No AltiVec or VSX or MMA instructions when building kernel
> > > >  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
> > > >  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> > > > +KBUILD_CFLAGS += $(call cc-option,-mno-mma)
> > >
> > > MMA code is never generated unless the code asks for it explicitly.
> > > This is fundamental, not just an implementations side effect.
> > 
> > Well, now it double won't be generated :)
>
> Yeah, but there are many other things you can unnecessarily disable as
> well!  :-)
>
> VMX and VSX are disabled here because the compiler *will* use those
> registers if it feels like it (that is, if it thinks that will be
> faster).  MMA is a very different beast: the compiler can never know if
> it will be faster, to start with.

True, but now I don't have to find the exact clause and have my lawyer
confirm that it definitely probably won't change in future and break
things.

Thanks,
Nick


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Segher Boessenkool
On Fri, Oct 07, 2022 at 07:56:09AM +1000, Nicholas Piggin wrote:
> On Fri Oct 7, 2022 at 5:54 AM AEST, Segher Boessenkool wrote:
> > On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> > > This adds basic POWER10_CPU option, which builds with -mcpu=power10.
> >
> > > +# No prefix or pcrel
> > > +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
> > > +KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)
> >
> > Why do you disable all prefixed insns?  What goes wrong if you don't?
> 
> Potentially things like kprobes.

So mention that?  "This patch is due to an abundance of caution".

What I meant to ask is if you *saw* something going wrong, not if you
can imagine something going wrong.  I can imagine ten gazillion things
going wrong, that is not why I asked :-)

> > Same question for pcrel.  I'm sure you want to optimise it better, but
> > it's not clear to me how it fails now?
> 
> For pcrel addressing? Bootstrapping the C environment is one, the
> module dynamic linker is another.

I don't know what either of those mean.

> Some details in this series.
> 
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-September/248521.html

I've watched that series with great interest, but I don't remember
anything like that?  Are you refering to the commentary in 7/7?
"Definitely ftrace and probes, possibly BPF and KVM have some breakage.
I haven't looked closely yet."...  This doesn't mean much does it :-)
It can be a triviality or two.  Or it could be a massive roadblock.

Just say in a comment where you disable stuff that it is to prevent
possible problems, this is a WIP, that kind of thing?  Otherwise other
people (like me :-) ) will read it and think there must be some deeper
reason.  Like, changing code to work with pcrel is hard or a lot of
work -- it isn't :-)  As you say in 0/7 yourself btw!

> > > +# No AltiVec or VSX or MMA instructions when building kernel
> > >  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
> > >  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> > > +KBUILD_CFLAGS += $(call cc-option,-mno-mma)
> >
> > MMA code is never generated unless the code asks for it explicitly.
> > This is fundamental, not just an implementations side effect.
> 
> Well, now it double won't be generated :)

Yeah, but there are many other things you can unnecessarily disable as
well!  :-)

VMX and VSX are disabled here because the compiler *will* use those
registers if it feels like it (that is, if it thinks that will be
faster).  MMA is a very different beast: the compiler can never know if
it will be faster, to start with.


Segher


Re: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Christophe Leroy




Le 06/10/2022 à 19:31, Christophe Leroy a écrit :



Le 06/10/2022 à 19:24, Jason A. Donenfeld a écrit :

Hi Christophe,

On Thu, Oct 6, 2022 at 11:21 AM Christophe Leroy
 wrote:

Le 06/10/2022 à 18:53, Jason A. Donenfeld a écrit :

The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32().

Reviewed-by: Kees Cook 
Acked-by: Toke Høiland-Jørgensen  # for sch_cake
Acked-by: Chuck Lever  # for nfsd
Reviewed-by: Jan Kara  # for ext4
Signed-off-by: Jason A. Donenfeld 
---


diff --git a/arch/powerpc/kernel/process.c 
b/arch/powerpc/kernel/process.c

index 0fbda89cd1bb..9c4c15afbbe8 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2308,6 +2308,6 @@ void notrace __ppc64_runlatch_off(void)
   unsigned long arch_align_stack(unsigned long sp)
   {
   if (!(current->personality & ADDR_NO_RANDOMIZE) && 
randomize_va_space)

- sp -= get_random_int() & ~PAGE_MASK;
+ sp -= get_random_u32() & ~PAGE_MASK;
   return sp & ~0xf;


Isn't that a candidate for prandom_u32_max() ?

Note that sp is deemed to be 16 bytes aligned at all time.


Yes, probably. It seemed non-trivial to think about, so I didn't. But
let's see here... maybe it's not too bad:

If PAGE_MASK is always ~(PAGE_SIZE-1), then ~PAGE_MASK is
(PAGE_SIZE-1), so prandom_u32_max(PAGE_SIZE) should yield the same
thing? Is that accurate? And holds across platforms (this comes up a
few places)? If so, I'll do that for a v4.



On powerpc it is always (from arch/powerpc/include/asm/page.h) :

/*
  * Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So if we
  * assign PAGE_MASK to a larger type it gets extended the way we want
  * (i.e. with 1s in the high bits)
  */
#define PAGE_MASK  (~((1 << PAGE_SHIFT) - 1))

#define PAGE_SIZE    (1UL << PAGE_SHIFT)


So it would work I guess.


But taking into account that sp must remain 16 bytes aligned, would it 
be better to do something like ?


sp -= prandom_u32_max(PAGE_SIZE >> 4) << 4;

return sp;




Re: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Christophe Leroy


Le 06/10/2022 à 19:24, Jason A. Donenfeld a écrit :
> Hi Christophe,
> 
> On Thu, Oct 6, 2022 at 11:21 AM Christophe Leroy
>  wrote:
>> Le 06/10/2022 à 18:53, Jason A. Donenfeld a écrit :
>>> The prandom_u32() function has been a deprecated inline wrapper around
>>> get_random_u32() for several releases now, and compiles down to the
>>> exact same code. Replace the deprecated wrapper with a direct call to
>>> the real function. The same also applies to get_random_int(), which is
>>> just a wrapper around get_random_u32().
>>>
>>> Reviewed-by: Kees Cook 
>>> Acked-by: Toke Høiland-Jørgensen  # for sch_cake
>>> Acked-by: Chuck Lever  # for nfsd
>>> Reviewed-by: Jan Kara  # for ext4
>>> Signed-off-by: Jason A. Donenfeld 
>>> ---
>>
>>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>>> index 0fbda89cd1bb..9c4c15afbbe8 100644
>>> --- a/arch/powerpc/kernel/process.c
>>> +++ b/arch/powerpc/kernel/process.c
>>> @@ -2308,6 +2308,6 @@ void notrace __ppc64_runlatch_off(void)
>>>unsigned long arch_align_stack(unsigned long sp)
>>>{
>>>if (!(current->personality & ADDR_NO_RANDOMIZE) && 
>>> randomize_va_space)
>>> - sp -= get_random_int() & ~PAGE_MASK;
>>> + sp -= get_random_u32() & ~PAGE_MASK;
>>>return sp & ~0xf;
>>
>> Isn't that a candidate for prandom_u32_max() ?
>>
>> Note that sp is deemed to be 16 bytes aligned at all time.
> 
> Yes, probably. It seemed non-trivial to think about, so I didn't. But
> let's see here... maybe it's not too bad:
> 
> If PAGE_MASK is always ~(PAGE_SIZE-1), then ~PAGE_MASK is
> (PAGE_SIZE-1), so prandom_u32_max(PAGE_SIZE) should yield the same
> thing? Is that accurate? And holds across platforms (this comes up a
> few places)? If so, I'll do that for a v4.
> 

On powerpc it is always (from arch/powerpc/include/asm/page.h) :

/*
  * Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So if we
  * assign PAGE_MASK to a larger type it gets extended the way we want
  * (i.e. with 1s in the high bits)
  */
#define PAGE_MASK  (~((1 << PAGE_SHIFT) - 1))

#define PAGE_SIZE   (1UL << PAGE_SHIFT)


So it would work I guess.

Christophe

Re: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Jason A. Donenfeld
Hi Christophe,

On Thu, Oct 6, 2022 at 11:21 AM Christophe Leroy
 wrote:
> Le 06/10/2022 à 18:53, Jason A. Donenfeld a écrit :
> > The prandom_u32() function has been a deprecated inline wrapper around
> > get_random_u32() for several releases now, and compiles down to the
> > exact same code. Replace the deprecated wrapper with a direct call to
> > the real function. The same also applies to get_random_int(), which is
> > just a wrapper around get_random_u32().
> >
> > Reviewed-by: Kees Cook 
> > Acked-by: Toke Høiland-Jørgensen  # for sch_cake
> > Acked-by: Chuck Lever  # for nfsd
> > Reviewed-by: Jan Kara  # for ext4
> > Signed-off-by: Jason A. Donenfeld 
> > ---
>
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 0fbda89cd1bb..9c4c15afbbe8 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -2308,6 +2308,6 @@ void notrace __ppc64_runlatch_off(void)
> >   unsigned long arch_align_stack(unsigned long sp)
> >   {
> >   if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
> > - sp -= get_random_int() & ~PAGE_MASK;
> > + sp -= get_random_u32() & ~PAGE_MASK;
> >   return sp & ~0xf;
>
> Isn't that a candidate for prandom_u32_max() ?
>
> Note that sp is deemed to be 16 bytes aligned at all time.

Yes, probably. It seemed non-trivial to think about, so I didn't. But
let's see here... maybe it's not too bad:

If PAGE_MASK is always ~(PAGE_SIZE-1), then ~PAGE_MASK is
(PAGE_SIZE-1), so prandom_u32_max(PAGE_SIZE) should yield the same
thing? Is that accurate? And holds across platforms (this comes up a
few places)? If so, I'll do that for a v4.

Jason


Re: [PATCH v3 4/5] treewide: use get_random_bytes when possible

2022-10-06 Thread Christophe Leroy


Le 06/10/2022 à 18:53, Jason A. Donenfeld a écrit :
> The prandom_bytes() function has been a deprecated inline wrapper around
> get_random_bytes() for several releases now, and compiles down to the
> exact same code. Replace the deprecated wrapper with a direct call to
> the real function.
> 
> Reviewed-by: Kees Cook 
> Signed-off-by: Jason A. Donenfeld 

Reviewed-by: Christophe Leroy  (Powerpc part)

> ---
>   arch/powerpc/crypto/crc-vpmsum_test.c   |  2 +-
> 
> diff --git a/arch/powerpc/crypto/crc-vpmsum_test.c 
> b/arch/powerpc/crypto/crc-vpmsum_test.c
> index c1c1ef9457fb..273c527868db 100644
> --- a/arch/powerpc/crypto/crc-vpmsum_test.c
> +++ b/arch/powerpc/crypto/crc-vpmsum_test.c
> @@ -82,7 +82,7 @@ static int __init crc_test_init(void)
>   
>   if (len <= offset)
>   continue;
> - prandom_bytes(data, len);
> + get_random_bytes(data, len);
>   len -= offset;
>   
>   crypto_shash_update(crct10dif_shash, data+offset, len);

Re: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Christophe Leroy


Le 06/10/2022 à 18:53, Jason A. Donenfeld a écrit :
> The prandom_u32() function has been a deprecated inline wrapper around
> get_random_u32() for several releases now, and compiles down to the
> exact same code. Replace the deprecated wrapper with a direct call to
> the real function. The same also applies to get_random_int(), which is
> just a wrapper around get_random_u32().
> 
> Reviewed-by: Kees Cook 
> Acked-by: Toke Høiland-Jørgensen  # for sch_cake
> Acked-by: Chuck Lever  # for nfsd
> Reviewed-by: Jan Kara  # for ext4
> Signed-off-by: Jason A. Donenfeld 
> ---

> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 0fbda89cd1bb..9c4c15afbbe8 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2308,6 +2308,6 @@ void notrace __ppc64_runlatch_off(void)
>   unsigned long arch_align_stack(unsigned long sp)
>   {
>   if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
> - sp -= get_random_int() & ~PAGE_MASK;
> + sp -= get_random_u32() & ~PAGE_MASK;
>   return sp & ~0xf;

Isn't that a candidate for prandom_u32_max() ?

Note that sp is deemed to be 16 bytes aligned at all time.


Christophe

[PATCH v3 5/5] prandom: remove unused functions

2022-10-06 Thread Jason A. Donenfeld
With no callers left of prandom_u32() and prandom_bytes(), as well as
get_random_int(), remove these deprecated wrappers, in favor of
get_random_u32() and get_random_bytes().

Reviewed-by: Kees Cook 
Signed-off-by: Jason A. Donenfeld 
---
 drivers/char/random.c   | 11 +--
 include/linux/prandom.h | 12 
 include/linux/random.h  |  5 -
 3 files changed, 5 insertions(+), 23 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 01acf235f263..2fe28eeb2f38 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -97,7 +97,7 @@ MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit 
suppression");
  * Returns whether or not the input pool has been seeded and thus guaranteed
  * to supply cryptographically secure random numbers. This applies to: the
  * /dev/urandom device, the get_random_bytes function, and the get_random_{u8,
- * u16,u32,u64,int,long} family of functions.
+ * u16,u32,u64,long} family of functions.
  *
  * Returns: true if the input pool has been seeded.
  *  false if the input pool has not been seeded.
@@ -161,15 +161,14 @@ EXPORT_SYMBOL(wait_for_random_bytes);
  * u16 get_random_u16()
  * u32 get_random_u32()
  * u64 get_random_u64()
- * unsigned int get_random_int()
  * unsigned long get_random_long()
  *
  * These interfaces will return the requested number of random bytes
  * into the given buffer or as a return value. This is equivalent to
- * a read from /dev/urandom. The u8, u16, u32, u64, int, and long
- * family of functions may be higher performance for one-off random
- * integers, because they do a bit of buffering and do not invoke
- * reseeding until the buffer is emptied.
+ * a read from /dev/urandom. The u8, u16, u32, u64, long family of
+ * functions may be higher performance for one-off random integers,
+ * because they do a bit of buffering and do not invoke reseeding
+ * until the buffer is emptied.
  *
  */
 
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index 78db003bc290..e0a0759dd09c 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -12,18 +12,6 @@
 #include 
 #include 
 
-/* Deprecated: use get_random_u32 instead. */
-static inline u32 prandom_u32(void)
-{
-   return get_random_u32();
-}
-
-/* Deprecated: use get_random_bytes instead. */
-static inline void prandom_bytes(void *buf, size_t nbytes)
-{
-   return get_random_bytes(buf, nbytes);
-}
-
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
diff --git a/include/linux/random.h b/include/linux/random.h
index 08322f700cdc..147a5e0d0b8e 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -42,10 +42,6 @@ u8 get_random_u8(void);
 u16 get_random_u16(void);
 u32 get_random_u32(void);
 u64 get_random_u64(void);
-static inline unsigned int get_random_int(void)
-{
-   return get_random_u32();
-}
 static inline unsigned long get_random_long(void)
 {
 #if BITS_PER_LONG == 64
@@ -100,7 +96,6 @@ declare_get_random_var_wait(u8, u8)
 declare_get_random_var_wait(u16, u16)
 declare_get_random_var_wait(u32, u32)
 declare_get_random_var_wait(u64, u32)
-declare_get_random_var_wait(int, unsigned int)
 declare_get_random_var_wait(long, unsigned long)
 #undef declare_get_random_var
 
-- 
2.37.3



[PATCH v3 4/5] treewide: use get_random_bytes when possible

2022-10-06 Thread Jason A. Donenfeld
The prandom_bytes() function has been a deprecated inline wrapper around
get_random_bytes() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function.

Reviewed-by: Kees Cook 
Signed-off-by: Jason A. Donenfeld 
---
 arch/powerpc/crypto/crc-vpmsum_test.c   |  2 +-
 block/blk-crypto-fallback.c |  2 +-
 crypto/async_tx/raid6test.c |  2 +-
 drivers/dma/dmatest.c   |  2 +-
 drivers/mtd/nand/raw/nandsim.c  |  2 +-
 drivers/mtd/tests/mtd_nandecctest.c |  2 +-
 drivers/mtd/tests/speedtest.c   |  2 +-
 drivers/mtd/tests/stresstest.c  |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c   |  2 +-
 drivers/net/ethernet/rocker/rocker_main.c   |  2 +-
 drivers/net/wireguard/selftest/allowedips.c | 12 ++--
 fs/ubifs/debug.c|  2 +-
 kernel/kcsan/selftest.c |  2 +-
 lib/random32.c  |  2 +-
 lib/test_objagg.c   |  2 +-
 lib/uuid.c  |  2 +-
 net/ipv4/route.c|  2 +-
 net/mac80211/rc80211_minstrel_ht.c  |  2 +-
 net/sched/sch_pie.c |  2 +-
 19 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/crypto/crc-vpmsum_test.c 
b/arch/powerpc/crypto/crc-vpmsum_test.c
index c1c1ef9457fb..273c527868db 100644
--- a/arch/powerpc/crypto/crc-vpmsum_test.c
+++ b/arch/powerpc/crypto/crc-vpmsum_test.c
@@ -82,7 +82,7 @@ static int __init crc_test_init(void)
 
if (len <= offset)
continue;
-   prandom_bytes(data, len);
+   get_random_bytes(data, len);
len -= offset;
 
crypto_shash_update(crct10dif_shash, data+offset, len);
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 621abd1b0e4d..ad9844c5b40c 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -539,7 +539,7 @@ static int blk_crypto_fallback_init(void)
if (blk_crypto_fallback_inited)
return 0;
 
-   prandom_bytes(blank_key, BLK_CRYPTO_MAX_KEY_SIZE);
+   get_random_bytes(blank_key, BLK_CRYPTO_MAX_KEY_SIZE);
 
err = bioset_init(_bio_split, 64, 0, 0);
if (err)
diff --git a/crypto/async_tx/raid6test.c b/crypto/async_tx/raid6test.c
index c9d218e53bcb..f74505f2baf0 100644
--- a/crypto/async_tx/raid6test.c
+++ b/crypto/async_tx/raid6test.c
@@ -37,7 +37,7 @@ static void makedata(int disks)
int i;
 
for (i = 0; i < disks; i++) {
-   prandom_bytes(page_address(data[i]), PAGE_SIZE);
+   get_random_bytes(page_address(data[i]), PAGE_SIZE);
dataptrs[i] = data[i];
dataoffs[i] = 0;
}
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 9fe2ae794316..ffe621695e47 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -312,7 +312,7 @@ static unsigned long dmatest_random(void)
 {
unsigned long buf;
 
-   prandom_bytes(, sizeof(buf));
+   get_random_bytes(, sizeof(buf));
return buf;
 }
 
diff --git a/drivers/mtd/nand/raw/nandsim.c b/drivers/mtd/nand/raw/nandsim.c
index 4bdaf4aa7007..c941a5a41ea6 100644
--- a/drivers/mtd/nand/raw/nandsim.c
+++ b/drivers/mtd/nand/raw/nandsim.c
@@ -1393,7 +1393,7 @@ static int ns_do_read_error(struct nandsim *ns, int num)
unsigned int page_no = ns->regs.row;
 
if (ns_read_error(page_no)) {
-   prandom_bytes(ns->buf.byte, num);
+   get_random_bytes(ns->buf.byte, num);
NS_WARN("simulating read error in page %u\n", page_no);
return 1;
}
diff --git a/drivers/mtd/tests/mtd_nandecctest.c 
b/drivers/mtd/tests/mtd_nandecctest.c
index 1c7201b0f372..440988562cfd 100644
--- a/drivers/mtd/tests/mtd_nandecctest.c
+++ b/drivers/mtd/tests/mtd_nandecctest.c
@@ -266,7 +266,7 @@ static int nand_ecc_test_run(const size_t size)
goto error;
}
 
-   prandom_bytes(correct_data, size);
+   get_random_bytes(correct_data, size);
ecc_sw_hamming_calculate(correct_data, size, correct_ecc, sm_order);
for (i = 0; i < ARRAY_SIZE(nand_ecc_test); i++) {
nand_ecc_test[i].prepare(error_data, error_ecc,
diff --git a/drivers/mtd/tests/speedtest.c b/drivers/mtd/tests/speedtest.c
index c9ec7086bfa1..075bce32caa5 100644
--- a/drivers/mtd/tests/speedtest.c
+++ b/drivers/mtd/tests/speedtest.c
@@ -223,7 +223,7 @@ static int __init mtd_speedtest_init(void)
if (!iobuf)
goto out;
 
-   prandom_bytes(iobuf, mtd->erasesize);
+   get_random_bytes(iobuf, mtd->erasesize);
 
bbt = kzalloc(ebcnt, GFP_KERNEL);
if (!bbt)
diff --git a/drivers/mtd/tests/stresstest.c 

[PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Jason A. Donenfeld
The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32().

Reviewed-by: Kees Cook 
Acked-by: Toke Høiland-Jørgensen  # for sch_cake
Acked-by: Chuck Lever  # for nfsd
Reviewed-by: Jan Kara  # for ext4
Signed-off-by: Jason A. Donenfeld 
---
 Documentation/networking/filter.rst|  2 +-
 arch/arm64/kernel/process.c|  2 +-
 arch/loongarch/kernel/process.c|  2 +-
 arch/mips/kernel/process.c |  2 +-
 arch/parisc/kernel/process.c   |  2 +-
 arch/parisc/kernel/sys_parisc.c|  4 ++--
 arch/powerpc/kernel/process.c  |  2 +-
 arch/s390/kernel/process.c |  2 +-
 arch/s390/mm/mmap.c|  2 +-
 arch/x86/kernel/cpu/amd.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c|  6 +++---
 drivers/gpu/drm/i915/selftests/i915_selftest.c |  2 +-
 drivers/gpu/drm/selftests/test-drm_buddy.c |  2 +-
 drivers/gpu/drm/selftests/test-drm_mm.c|  2 +-
 drivers/infiniband/hw/cxgb4/cm.c   |  4 ++--
 drivers/infiniband/hw/hfi1/tid_rdma.c  |  2 +-
 drivers/infiniband/hw/mlx4/mad.c   |  2 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  2 +-
 drivers/md/raid5-cache.c   |  2 +-
 .../media/test-drivers/vivid/vivid-touch-cap.c |  4 ++--
 drivers/misc/habanalabs/gaudi2/gaudi2.c|  2 +-
 drivers/mtd/nand/raw/nandsim.c |  2 +-
 drivers/net/bonding/bond_main.c|  2 +-
 drivers/net/ethernet/broadcom/cnic.c   |  2 +-
 .../chelsio/inline_crypto/chtls/chtls_cm.c |  2 +-
 drivers/net/ethernet/rocker/rocker_main.c  |  6 +++---
 .../wireless/broadcom/brcm80211/brcmfmac/pno.c |  2 +-
 .../net/wireless/marvell/mwifiex/cfg80211.c|  4 ++--
 .../net/wireless/microchip/wilc1000/cfg80211.c |  2 +-
 .../net/wireless/quantenna/qtnfmac/cfg80211.c  |  2 +-
 drivers/net/wireless/ti/wlcore/main.c  |  2 +-
 drivers/nvme/common/auth.c |  2 +-
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |  4 ++--
 drivers/target/iscsi/cxgbit/cxgbit_cm.c|  2 +-
 drivers/thunderbolt/xdomain.c  |  2 +-
 drivers/video/fbdev/uvesafb.c  |  2 +-
 fs/exfat/inode.c   |  2 +-
 fs/ext4/ialloc.c   |  2 +-
 fs/ext4/ioctl.c|  4 ++--
 fs/ext4/mmp.c  |  2 +-
 fs/f2fs/namei.c|  2 +-
 fs/fat/inode.c |  2 +-
 fs/nfsd/nfs4state.c|  4 ++--
 fs/ntfs3/fslog.c   |  6 +++---
 fs/ubifs/journal.c |  2 +-
 fs/xfs/libxfs/xfs_ialloc.c |  2 +-
 fs/xfs/xfs_icache.c|  2 +-
 fs/xfs/xfs_log.c   |  2 +-
 include/net/netfilter/nf_queue.h   |  2 +-
 include/net/red.h  |  2 +-
 include/net/sock.h |  2 +-
 kernel/bpf/bloom_filter.c  |  2 +-
 kernel/bpf/core.c  |  2 +-
 kernel/bpf/hashtab.c   |  2 +-
 kernel/bpf/verifier.c  |  2 +-
 kernel/kcsan/selftest.c|  2 +-
 lib/random32.c |  2 +-
 lib/reed_solomon/test_rslib.c  |  6 +++---
 lib/test_fprobe.c  |  2 +-
 lib/test_kprobes.c |  2 +-
 lib/test_min_heap.c|  6 +++---
 lib/test_rhashtable.c  |  6 +++---
 mm/shmem.c |  2 +-
 mm/slab.c  |  2 +-
 net/802/garp.c |  2 +-
 net/802/mrp.c  |  2 +-
 net/core/pktgen.c  |  4 ++--
 net/ipv4/route.c   |  2 +-
 net/ipv4/tcp_cdg.c |  2 +-
 net/ipv4/udp.c |  2 +-
 net/ipv6/ip6_flowlabel.c   |  2 +-
 net/ipv6/output_core.c |  2 +-
 net/netfilter/ipvs/ip_vs_conn.c|  2 +-
 net/netfilter/xt_statistic.c   |  2 +-
 net/openvswitch/actions.c  |  2 +-
 net/rds/bind.c |  2 +-
 net/sched/sch_cake.c   |  2 +-
 net/sched/sch_netem.c  | 18 +-
 net/sunrpc/auth_gss/gss_krb5_wrap.c|  4 ++--
 

[PATCH v3 2/5] treewide: use get_random_{u8,u16}() when possible

2022-10-06 Thread Jason A. Donenfeld
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
simply use the get_random_{u8,u16}() functions, which are faster than
wasting the additional bytes from a 32-bit value.

Reviewed-by: Kees Cook 
Acked-by: Toke Høiland-Jørgensen  # for sch_cake
Signed-off-by: Jason A. Donenfeld 
---
 arch/arm/kernel/signal.c  | 2 +-
 arch/arm64/kernel/syscall.c   | 2 +-
 arch/s390/kernel/process.c| 2 +-
 arch/sparc/vdso/vma.c | 2 +-
 crypto/testmgr.c  | 8 
 drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 2 +-
 drivers/media/test-drivers/vivid/vivid-radio-rx.c | 4 ++--
 .../net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c   | 2 +-
 drivers/net/hamradio/baycom_epp.c | 2 +-
 drivers/net/hamradio/hdlcdrv.c| 2 +-
 drivers/net/hamradio/yam.c| 2 +-
 drivers/net/wireguard/selftest/allowedips.c   | 4 ++--
 drivers/net/wireless/st/cw1200/wsm.c  | 2 +-
 drivers/scsi/lpfc/lpfc_hbadisc.c  | 6 +++---
 lib/cmdline_kunit.c   | 4 ++--
 lib/test_vmalloc.c| 2 +-
 net/dccp/ipv4.c   | 4 ++--
 net/ipv4/datagram.c   | 2 +-
 net/ipv4/ip_output.c  | 2 +-
 net/ipv4/tcp_ipv4.c   | 4 ++--
 net/mac80211/scan.c   | 2 +-
 net/netfilter/nf_nat_core.c   | 4 ++--
 net/sched/sch_cake.c  | 6 +++---
 net/sched/sch_sfb.c   | 2 +-
 net/sctp/socket.c | 2 +-
 25 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index ea128e32e8ca..e07f359254c3 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -655,7 +655,7 @@ struct page *get_signal_page(void)
 PAGE_SIZE / sizeof(u32));
 
/* Give the signal return code some randomness */
-   offset = 0x200 + (get_random_int() & 0x7fc);
+   offset = 0x200 + (get_random_u16() & 0x7fc);
signal_return_offset = offset;
 
/* Copy signal return handlers into the page */
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 733451fe7e41..d72e8f23422d 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -67,7 +67,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int 
scno,
 *
 * The resulting 5 bits of entropy is seen in SP[8:4].
 */
-   choose_random_kstack_offset(get_random_int() & 0x1FF);
+   choose_random_kstack_offset(get_random_u16() & 0x1FF);
 }
 
 static inline bool has_syscall_work(unsigned long flags)
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index d5119e039d85..6ec020fdf532 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -230,7 +230,7 @@ unsigned long arch_align_stack(unsigned long sp)
 
 static inline unsigned long brk_rnd(void)
 {
-   return (get_random_int() & BRK_RND_MASK) << PAGE_SHIFT;
+   return (get_random_u16() & BRK_RND_MASK) << PAGE_SHIFT;
 }
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
diff --git a/arch/sparc/vdso/vma.c b/arch/sparc/vdso/vma.c
index cc19e09b0fa1..04ee726859ca 100644
--- a/arch/sparc/vdso/vma.c
+++ b/arch/sparc/vdso/vma.c
@@ -354,7 +354,7 @@ static unsigned long vdso_addr(unsigned long start, 
unsigned int len)
unsigned int offset;
 
/* This loses some more bits than a modulo, but is cheaper */
-   offset = get_random_int() & (PTRS_PER_PTE - 1);
+   offset = get_random_u16() & (PTRS_PER_PTE - 1);
return start + (offset << PAGE_SHIFT);
 }
 
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index be45217acde4..981c637fa2ed 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -927,7 +927,7 @@ static void generate_random_bytes(u8 *buf, size_t count)
b = 0xff;
break;
default:
-   b = (u8)prandom_u32();
+   b = get_random_u8();
break;
}
memset(buf, b, count);
@@ -935,8 +935,8 @@ static void generate_random_bytes(u8 *buf, size_t count)
break;
case 2:
/* Ascending or descending bytes, plus optional mutations */
-   increment = (u8)prandom_u32();
-   b = (u8)prandom_u32();
+   increment = get_random_u8();
+   b = get_random_u8();
for (i = 0; i < count; i++, b += 

[PATCH v3 1/5] treewide: use prandom_u32_max() when possible

2022-10-06 Thread Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions.

Reviewed-by: Kees Cook 
Reviewed-by: KP Singh 
Reviewed-by: Christoph Böhmwalder  # for drbd
Reviewed-by: Jan Kara  # for ext2, ext4, and sbitmap
Signed-off-by: Jason A. Donenfeld 
---
 arch/arm/kernel/process.c |  2 +-
 arch/loongarch/kernel/vdso.c  |  2 +-
 arch/mips/kernel/vdso.c   |  2 +-
 arch/parisc/kernel/vdso.c |  2 +-
 arch/s390/kernel/vdso.c   |  2 +-
 arch/um/kernel/process.c  |  2 +-
 arch/x86/entry/vdso/vma.c |  2 +-
 arch/x86/kernel/module.c  |  2 +-
 arch/x86/kernel/process.c |  2 +-
 arch/x86/mm/pat/cpa-test.c|  4 +-
 crypto/testmgr.c  | 86 +--
 drivers/block/drbd/drbd_receiver.c|  4 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  2 +-
 drivers/infiniband/core/cma.c |  2 +-
 drivers/infiniband/hw/cxgb4/id_table.c|  4 +-
 drivers/infiniband/hw/hns/hns_roce_ah.c   |  5 +-
 drivers/infiniband/ulp/rtrs/rtrs-clt.c|  3 +-
 drivers/md/bcache/request.c   |  2 +-
 .../test-drivers/vivid/vivid-touch-cap.c  |  2 +-
 drivers/mmc/core/core.c   |  4 +-
 drivers/mmc/host/dw_mmc.c |  2 +-
 drivers/mtd/nand/raw/nandsim.c|  4 +-
 drivers/mtd/tests/mtd_nandecctest.c   | 10 +--
 drivers/mtd/tests/stresstest.c| 17 +---
 drivers/mtd/ubi/debug.c   |  2 +-
 drivers/mtd/ubi/debug.h   |  6 +-
 drivers/net/ethernet/broadcom/cnic.c  |  3 +-
 .../chelsio/inline_crypto/chtls/chtls_io.c|  4 +-
 drivers/net/hamradio/baycom_epp.c |  2 +-
 drivers/net/hamradio/hdlcdrv.c|  2 +-
 drivers/net/hamradio/yam.c|  2 +-
 drivers/net/phy/at803x.c  |  2 +-
 .../broadcom/brcm80211/brcmfmac/p2p.c |  2 +-
 .../net/wireless/intel/iwlwifi/mvm/mac-ctxt.c |  2 +-
 drivers/scsi/fcoe/fcoe_ctlr.c |  4 +-
 drivers/scsi/qedi/qedi_main.c |  2 +-
 fs/ceph/inode.c   |  2 +-
 fs/ceph/mdsmap.c  |  2 +-
 fs/ext2/ialloc.c  |  3 +-
 fs/ext4/ialloc.c  |  5 +-
 fs/ext4/super.c   |  7 +-
 fs/f2fs/gc.c  |  2 +-
 fs/f2fs/segment.c |  8 +-
 fs/ubifs/debug.c  |  8 +-
 fs/ubifs/lpt_commit.c | 14 +--
 fs/ubifs/tnc_commit.c |  2 +-
 fs/xfs/libxfs/xfs_alloc.c |  2 +-
 fs/xfs/libxfs/xfs_ialloc.c|  2 +-
 fs/xfs/xfs_error.c|  2 +-
 include/linux/nodemask.h  |  2 +-
 kernel/bpf/core.c |  4 +-
 kernel/locking/test-ww_mutex.c|  4 +-
 kernel/time/clocksource.c |  2 +-
 lib/fault-inject.c|  2 +-
 lib/find_bit_benchmark.c  |  4 +-
 lib/kobject.c |  2 +-
 lib/reed_solomon/test_rslib.c |  6 +-
 lib/sbitmap.c |  4 +-
 lib/test-string_helpers.c |  2 +-
 lib/test_hexdump.c| 10 +--
 lib/test_kasan.c  |  6 +-
 lib/test_list_sort.c  |  2 +-
 lib/test_vmalloc.c| 17 +---
 mm/migrate.c  |  2 +-
 mm/slub.c |  2 +-
 net/ceph/mon_client.c |  2 +-
 net/ceph/osd_client.c |  2 +-
 net/core/neighbour.c  |  2 +-
 net/core/pktgen.c | 43 +-
 net/core/stream.c |  2 +-
 net/ipv4/igmp.c   |  6 +-
 net/ipv4/inet_connection_sock.c   |  2 +-
 net/ipv4/inet_hashtables.c|  2 +-
 net/ipv6/addrconf.c   |  8 +-
 net/ipv6/mcast.c  | 10 +--
 net/netfilter/ipvs/ip_vs_twos.c   |  4 +-
 net/packet/af_packet.c|  2 +-
 net/sched/act_gact.c  |  2 +-
 net/sched/act_sample.c|  2 +-
 net/sched/sch_netem.c |  4 +-
 net/sctp/socket.c |  2 +-
 net/sunrpc/cache.c|  2 +-
 net/sunrpc/xprtsock.c |  2 +-
 

[PATCH v3 0/5] treewide cleanup of random integer usage

2022-10-06 Thread Jason A. Donenfeld
Changes v2->v3:
- Handle get_random_int() conversions too, which were overlooked before,
  in the same way as the rest.

Hi folks,

This is a five part treewide cleanup of random integer handling. The
rules for random integers are:

- If you want a secure or an insecure random u64, use get_random_u64().
- If you want a secure or an insecure random u32, use get_random_u32().
  * The old function prandom_u32() has been deprecated for a while now
and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16().
- If you want a secure or an insecure random u8, use get_random_u8().
- If you want secure or insecure random bytes, use get_random_bytes().
  * The old function prandom_bytes() has been deprecated for a while now
and has long been a wrapper around get_random_bytes().
- If you want a non-uniform random u32, u16, or u8 bounded by a certain
  open interval maximum, use prandom_u32_max().
  * I say "non-uniform", because it doesn't do any rejection sampling or
divisions. Hence, it stays within the prandom_* namespace.

These rules ought to be applied uniformly, so that we can clean up the
deprecated functions, and earn the benefits of using the modern
functions. In particular, in addition to the boring substitutions, this
patchset accomplishes a few nice effects:

- By using prandom_u32_max() with an upper-bound that the compiler can
  prove at compile-time is ≤65536 or ≤256, internally get_random_u16()
  or get_random_u8() is used, which wastes fewer batched random bytes,
  and hence has higher throughput.

- By using prandom_u32_max() instead of %, when the upper-bound is not a
  constant, division is still avoided, because prandom_u32_max() uses
  a faster multiplication-based trick instead.

- By using get_random_u16() or get_random_u8() in cases where the return
  value is intended to indeed be a u16 or a u8, we waste fewer batched
  random bytes, and hence have higher throughput.

So, based on those rules and benefits from following them, this patchset
breaks down into the following five steps, which were done mostly
manually:

1) Replace `prandom_u32() % max` and variants thereof with
   prandom_u32_max(max).

2) Replace `(type)get_random_u32()` and variants thereof with
   get_random_u16() or get_random_u8(). I took the pains to actually
   look and see what every lvalue type was across the entire tree.

3) Replace remaining deprecated uses of prandom_u32() and
   get_random_int() with get_random_u32(). 

4) Replace remaining deprecated uses of prandom_bytes() with
   get_random_bytes().

5) Remove the deprecated and now-unused prandom_u32() and
   prandom_bytes() inline wrapper functions.

I was thinking of taking this through my random.git tree (on which this
series is currently based) and submitting it near the end of the merge
window, or waiting for the very end of the 6.1 cycle when there will be
the fewest new patches brewing. If somebody with some treewide-cleanup
experience might share some wisdom about what the best timing usually
winds up being, I'm all ears.

Please take a look! At "379 insertions(+), 422 deletions(-)", this
should be a somewhat small patchset to review, despite it having the
scary "treewide" moniker on it.

Thanks,
Jason

Cc: Andreas Noever 
Cc: Andrew Morton 
Cc: Andy Shevchenko 
Cc: Borislav Petkov 
Cc: Catalin Marinas 
Cc: Christoph Böhmwalder 
Cc: Christoph Hellwig 
Cc: Christophe Leroy 
Cc: Daniel Borkmann 
Cc: Dave Airlie 
Cc: Dave Hansen 
Cc: David S. Miller 
Cc: Eric Dumazet 
Cc: Florian Westphal 
Cc: Greg Kroah-Hartman ,
Cc: H. Peter Anvin 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Herbert Xu 
Cc: Huacai Chen 
Cc: Hugh Dickins 
Cc: Jakub Kicinski 
Cc: James E.J. Bottomley 
Cc: Jan Kara 
Cc: Jason Gunthorpe 
Cc: Jens Axboe 
Cc: Johannes Berg 
Cc: Jonathan Corbet 
Cc: Jozsef Kadlecsik 
Cc: KP Singh 
Cc: Kees Cook 
Cc: Marco Elver 
Cc: Mauro Carvalho Chehab 
Cc: Michael Ellerman 
Cc: Pablo Neira Ayuso 
Cc: Paolo Abeni 
Cc: Peter Zijlstra 
Cc: Richard Weinberger 
Cc: Russell King 
Cc: Theodore Ts'o 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Thomas Graf 
Cc: Ulf Hansson 
Cc: Vignesh Raghavendra 
Cc: WANG Xuerui 
Cc: Will Deacon 
Cc: Yury Norov 
Cc: dri-de...@lists.freedesktop.org
Cc: kasan-...@googlegroups.com
Cc: kernel-janit...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-bl...@vger.kernel.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: linux-...@lists.infradead.org
Cc: linux-n...@lists.infradead.org
Cc: linux-par...@vger.kernel.org
Cc: linux-r...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-wirel...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: loonga...@lists.linux.dev
Cc: 

Re: [f2fs-dev] [PATCH v1 0/5] treewide cleanup of random integer usage

2022-10-06 Thread Jason A. Donenfeld
A v2 that won't murder your mail setup is now available here:
https://lore.kernel.org/lkml/20221006132510.23374-1-ja...@zx2c4.com/

Please do not (attempt to) post more replies to v1, as it kicks up a
storm of angry MTAs.


Re: [f2fs-dev] [PATCH v1 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Jason A. Donenfeld
On Thu, Oct 6, 2022 at 7:01 AM Andy Shevchenko
 wrote:
>
> On Thu, Oct 06, 2022 at 06:33:15AM -0600, Jason A. Donenfeld wrote:
> > On Thu, Oct 06, 2022 at 10:43:31AM +0200, Jan Kara wrote:
>
> ...
>
> > > The code here is effectively doing the
> > >
> > > parent_group = prandom_u32_max(ngroups);
> > >
> > > Similarly here we can use prandom_u32_max(ngroups) like:
> > >
> > > if (qstr) {
> > > ...
> > > parent_group = hinfo.hash % ngroups;
> > > } else
> > > parent_group = prandom_u32_max(ngroups);
> >
> > Nice catch. I'll move these to patch #1.
>
> I believe coccinelle is able to handle this kind of code as well

I'd be extremely surprised. The details were kind of non obvious. I
just spent a decent amount of time manually checking those blocks, to
make sure we didn't wind up with different behavior, given the
variable reuse.

Jason


Re: [PATCH v1 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Jason A. Donenfeld
On Thu, Oct 6, 2022 at 6:47 AM Jason Gunthorpe  wrote:
>
> On Wed, Oct 05, 2022 at 11:48:42PM +0200, Jason A. Donenfeld wrote:
>
> > index 14392c942f49..499a425a3379 100644
> > --- a/drivers/infiniband/hw/cxgb4/cm.c
> > +++ b/drivers/infiniband/hw/cxgb4/cm.c
> > @@ -734,7 +734,7 @@ static int send_connect(struct c4iw_ep *ep)
> >  >com.remote_addr;
> >   int ret;
> >   enum chip_type adapter_type = ep->com.dev->rdev.lldi.adapter_type;
> > - u32 isn = (prandom_u32() & ~7UL) - 1;
> > + u32 isn = (get_random_u32() & ~7UL) - 1;
>
> Maybe this wants to be written as
>
> (prandom_max(U32_MAX >> 7) << 7) | 7
>
> ?

Holy smokes. Yea I guess maybe? It doesn't exactly gain anything or
make the code clearer though, and is a little bit more magical than
I'd like on a first pass.

>
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
> > b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > index fd9d7f2c4d64..a605cf66b83e 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > @@ -465,7 +465,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id,
> >   goto err_qp;
> >   }
> >
> > - psn = prandom_u32() & 0xff;
> > + psn = get_random_u32() & 0xff;
>
>  prandom_max(0xff + 1)

That'd work, but again it's not more clear. Authors here are going for
a 24-bit number, and masking seems like a clear way to express that.

Jason


Re: [PATCH v1 1/5] treewide: use prandom_u32_max() when possible

2022-10-06 Thread Andy Shevchenko
On Thu, Oct 06, 2022 at 09:55:19AM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 06, 2022 at 06:45:25AM -0600, Jason A. Donenfeld wrote:
> > On Wed, Oct 05, 2022 at 09:16:50PM -0700, Kees Cook wrote:
> > > On Wed, Oct 05, 2022 at 11:48:40PM +0200, Jason A. Donenfeld wrote:
> > > > Rather than incurring a division or requesting too many random bytes for
> > > > the given range, use the prandom_u32_max() function, which only takes
> > > > the minimum required bytes from the RNG and avoids divisions.
> > > 
> > > Yes please!
> > > 
> > > Since this is a treewide patch, it's helpful for (me at least) doing
> > > reviews to detail the mechanism of the transformation.
> > 
> > This is hand done. There were also various wrong seds done. And then I'd
> > edit the .diff manually, and then reapply it, as an iterative process.
> > No internet on the airplane, and oddly no spatch already on my laptop (I
> > think I had some Gentoo ocaml issues at some point and removed it?).
> > 
> > > e.g. I imagine this could be done with something like Coccinelle and
> > 
> > Feel free to check the work here by using Coccinelle if you're into
> > that.
> 
> Generally these series are a lot easier to review if it is structured
> as a patches doing all the unusual stuff that had to be by hand
> followed by an unmodified Coccinelle/sed/etc handling the simple
> stuff.
> 
> Especially stuff that is reworking the logic beyond simple
> substitution should be one patch per subsystem not rolled into a giant
> one patch conversion.
> 
> This makes the whole workflow better because the hand-done stuff can
> have a chance to flow through subsystem trees.

+1 to all arguments for the splitting.

I looked a bit into the code I have the interest to, but I won't spam people
with not-so-important questions / comments / tags, etc.

-- 
With Best Regards,
Andy Shevchenko




Re: [f2fs-dev] [PATCH v1 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Andy Shevchenko
On Thu, Oct 06, 2022 at 06:33:15AM -0600, Jason A. Donenfeld wrote:
> On Thu, Oct 06, 2022 at 10:43:31AM +0200, Jan Kara wrote:

...

> > The code here is effectively doing the
> > 
> > parent_group = prandom_u32_max(ngroups);
> > 
> > Similarly here we can use prandom_u32_max(ngroups) like:
> > 
> > if (qstr) {
> > ...
> > parent_group = hinfo.hash % ngroups;
> > } else
> > parent_group = prandom_u32_max(ngroups);
> 
> Nice catch. I'll move these to patch #1.

I believe coccinelle is able to handle this kind of code as well, so Kees'
proposal to use it seems more plausible since it's less error prone and more
flexible / powerful.

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH v1 1/5] treewide: use prandom_u32_max() when possible

2022-10-06 Thread Jason Gunthorpe
On Thu, Oct 06, 2022 at 06:45:25AM -0600, Jason A. Donenfeld wrote:
> Hi Kees,
> 
> On Wed, Oct 05, 2022 at 09:16:50PM -0700, Kees Cook wrote:
> > On Wed, Oct 05, 2022 at 11:48:40PM +0200, Jason A. Donenfeld wrote:
> > > Rather than incurring a division or requesting too many random bytes for
> > > the given range, use the prandom_u32_max() function, which only takes
> > > the minimum required bytes from the RNG and avoids divisions.
> > 
> > Yes please!
> > 
> > Since this is a treewide patch, it's helpful for (me at least) doing
> > reviews to detail the mechanism of the transformation.
> 
> This is hand done. There were also various wrong seds done. And then I'd
> edit the .diff manually, and then reapply it, as an iterative process.
> No internet on the airplane, and oddly no spatch already on my laptop (I
> think I had some Gentoo ocaml issues at some point and removed it?).
> 
> > e.g. I imagine this could be done with something like Coccinelle and
> 
> Feel free to check the work here by using Coccinelle if you're into
> that.

Generally these series are a lot easier to review if it is structured
as a patches doing all the unusual stuff that had to be by hand
followed by an unmodified Coccinelle/sed/etc handling the simple
stuff.

Especially stuff that is reworking the logic beyond simple
substitution should be one patch per subsystem not rolled into a giant
one patch conversion.

This makes the whole workflow better because the hand-done stuff can
have a chance to flow through subsystem trees.

Thanks,
Jason


Re: [PATCH v1 0/5] treewide cleanup of random integer usage

2022-10-06 Thread Jason A. Donenfeld
On Wed, Oct 05, 2022 at 09:55:43PM -0700, Kees Cook wrote:
> It'd be nice to capture some (all?) of the above somewhere. Perhaps just
> a massive comment in the header?

I'll include (something like) this in some "how to use" documentation
I'm working on separately.

> > I've CC'd get_maintainers.pl, which is a pretty big list. Probably some
> > portion of those are going to bounce, too, and everytime you reply to
> > this thread, you'll have to deal with a bunch of bounces coming
> > immediately after. And a recipient list this big will probably dock my
> > email domain's spam reputation, at least temporarily. Sigh. I think
> > that's just how it goes with treewide cleanups though. Again, let me
> > know if I'm doing it wrong.
> 
> I usually stick to just mailing lists and subsystem maintainers.

Lord have mercy I really wish I had done that. I supremely butchered the
sending of this, and then tried to save it by resubmitting directly to
vger with the same message ID but truncated CC, which mostly worked, but
the whole thing is a mess. I'll trim this to subsystem maintainers and
resubmit a v2 right away, rather than having people wade through the
mess.

To any one who's reading this: no more replies to v1! It clogs the
tubes.

> If any of the subsystems ask you to break this up (I hope not), I've got

Oh god I surely hope not. Sounds like a massive waste of time and
paperwork.

Jason


Re: [PATCH v1 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Jason Gunthorpe
On Wed, Oct 05, 2022 at 11:48:42PM +0200, Jason A. Donenfeld wrote:

> index 14392c942f49..499a425a3379 100644
> --- a/drivers/infiniband/hw/cxgb4/cm.c
> +++ b/drivers/infiniband/hw/cxgb4/cm.c
> @@ -734,7 +734,7 @@ static int send_connect(struct c4iw_ep *ep)
>  >com.remote_addr;
>   int ret;
>   enum chip_type adapter_type = ep->com.dev->rdev.lldi.adapter_type;
> - u32 isn = (prandom_u32() & ~7UL) - 1;
> + u32 isn = (get_random_u32() & ~7UL) - 1;

Maybe this wants to be written as

(prandom_max(U32_MAX >> 7) << 7) | 7

?

> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index fd9d7f2c4d64..a605cf66b83e 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -465,7 +465,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id,
>   goto err_qp;
>   }
>  
> - psn = prandom_u32() & 0xff;
> + psn = get_random_u32() & 0xff;

 prandom_max(0xff + 1) 

?

Jason


Re: [PATCH v1 1/5] treewide: use prandom_u32_max() when possible

2022-10-06 Thread Jason A. Donenfeld
Hi Kees,

On Wed, Oct 05, 2022 at 09:16:50PM -0700, Kees Cook wrote:
> On Wed, Oct 05, 2022 at 11:48:40PM +0200, Jason A. Donenfeld wrote:
> > Rather than incurring a division or requesting too many random bytes for
> > the given range, use the prandom_u32_max() function, which only takes
> > the minimum required bytes from the RNG and avoids divisions.
> 
> Yes please!
> 
> Since this is a treewide patch, it's helpful for (me at least) doing
> reviews to detail the mechanism of the transformation.

This is hand done. There were also various wrong seds done. And then I'd
edit the .diff manually, and then reapply it, as an iterative process.
No internet on the airplane, and oddly no spatch already on my laptop (I
think I had some Gentoo ocaml issues at some point and removed it?).

> e.g. I imagine this could be done with something like Coccinelle and

Feel free to check the work here by using Coccinelle if you're into
that.

> >  static inline int ubi_dbg_is_bitflip(const struct ubi_device *ubi)
> >  {
> > if (ubi->dbg.emulate_bitflips)
> > -   return !(prandom_u32() % 200);
> > +   return !(prandom_u32_max(200));
> > return 0;
> >  }
> >  
> 
> Because some looks automated (why the parens?)

I saw this before going out and thought I'd fixed it but I guess I sent
the wrong one.

Jason


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Nicholas Piggin
On Fri Oct 7, 2022 at 4:07 AM AEST, Christophe Leroy wrote:
>
>
> Le 23/09/2022 à 08:23, Nicholas Piggin a écrit :
> > On Fri Sep 23, 2022 at 3:46 PM AEST, Christophe Leroy wrote:
> >>
> >>
> >> Le 23/09/2022 à 05:30, Nicholas Piggin a écrit :
> >>> This adds basic POWER10_CPU option, which builds with -mcpu=power10.
> >>>
> >>> Signed-off-by: Nicholas Piggin 
> >>> ---
> >>> There's quite a lot of asm and linker changes slated for the next merge
> >>> window already so I may leave the pcrel patch for next time. I think we
> >>> can add the basic POWER10 build option though.
> >>>
> >>> Thanks,
> >>> Nick
> >>>
> >>>arch/powerpc/Makefile  | 7 ++-
> >>>arch/powerpc/platforms/Kconfig.cputype | 8 +++-
> >>>2 files changed, 13 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> >>> index 8a3d69b02672..ea88af26f8c6 100644
> >>> --- a/arch/powerpc/Makefile
> >>> +++ b/arch/powerpc/Makefile
> >>> @@ -192,9 +192,14 @@ ifdef CONFIG_476FPE_ERR46
> >>>   -T 
> >>> $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
> >>>endif
> >>>
> >>> -# No AltiVec or VSX instructions when building kernel
> >>> +# No prefix or pcrel
> >>> +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
> >>
> >> We have lots of code to handle prefixed instructions in code_patching,
> >> and that code complexifies stuff and has a performance impact.
> >> And it is only partially taken into account, areas like ftrace don't
> >> properly take care of prefixed instructions.
> >>
> >> Should we get rid of prefixed instruction support completely in the
> >> kernel, and come back to more simple code ?
> > 
> > I would rather complete prefixed support in the kernel and use pcrel
> > addressing. Actually even if we don't compile with pcrel or prefixed,
> > there are some instructions and we will probably get more that require
> > prefixed, possible we might want to use them in kernel. Some of it is
> > required to handle user mode instructions too. So I think removing
> > it is premature, but I guess it's up for debate.
>
> Well ok, in fact I only had code_patching in mind.
>
> Code patching is only for kernel text. Today code patching is used for 
> things like kprobe, ftrace, etc  which really do not seems to be 
> prepared for prefixed instructions.
>
> If you are adding -mno-prefixed, it is worth keeping that code which 
> sometimes gives us some headacke ?
>
> Of course if there are plans to get real prefixed instruction in kernel 
> code anytime soon, lets live with it, in that case the support should 
> get completed. But otherwise I think it would be better to get rid of it 
> for now, and implement it completely when we need it in years.

I have a series to enable it again, just not ready for upstream yet
but it's not all that far off.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-September/248521.html

> When I see the following, I'm having hard time believing it would work 
> with prefixed instructions in the kernel text:
>
>   typedef u32 kprobe_opcode_t;
>
>   struct kprobe {
>   ...
>   /* Saved opcode (which has been replaced with breakpoint) */
>   kprobe_opcode_t opcode;
>
>
>   void arch_disarm_kprobe(struct kprobe *p)
>   {
>   WARN_ON_ONCE(patch_instruction(p->addr, ppc_inst(p->opcode)));
>   }

Yeah that needs work for sure.

Thanks,
Nick


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Nicholas Piggin
On Fri Oct 7, 2022 at 5:54 AM AEST, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> > This adds basic POWER10_CPU option, which builds with -mcpu=power10.
>
> > +# No prefix or pcrel
> > +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
> > +KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)
>
> Why do you disable all prefixed insns?  What goes wrong if you don't?

Potentially things like kprobes.

> Same question for pcrel.  I'm sure you want to optimise it better, but
> it's not clear to me how it fails now?

For pcrel addressing? Bootstrapping the C environment is one, the
module dynamic linker is another.

Some details in this series.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-September/248521.html

>
> Please say in the comment what is wrong, don't spread fear :-)
>
> > +# No AltiVec or VSX or MMA instructions when building kernel
> >  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
> >  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> > +KBUILD_CFLAGS += $(call cc-option,-mno-mma)
>
> MMA code is never generated unless the code asks for it explicitly.
> This is fundamental, not just an implementations side effect.

Well, now it double won't be generated :)

Thanks,
Nick


Re: [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function

2022-10-06 Thread Benjamin Gray
On Thu, 2022-10-06 at 09:19 +, Christophe Leroy wrote:
> 
> 
> Le 06/10/2022 à 05:36, Benjamin Gray a écrit :
> > On Wed, 2022-10-05 at 17:55 +, Christophe Leroy wrote:
> > > I'm on business trip this week so I can't test it on hardware,
> > > but
> > > the
> > > generated code looks horrid and sub-optimal, with a stack frame
> > > and
> > > so
> > > many registers saved into it. That's mpc885_ads_defconfig built
> > > with
> > > GCC
> > > 12, without modules without stackprotector with 4k pages.
> > 
> > Yeah, that definitely wasn't supposed to happen. I was looking at
> > the
> > 32 and 64 bit machine code closely while actively writing it, but I
> > must have refactored it badly when cleaning up for submission. It
> > was
> > supposed to automatically be inlined, leaving it identical to the
> > original on 32-bit.
> > 
> > Given inlining is desirable here, and 64-bit inlines anyway, I
> > think I
> > should just mark __patch_memory with __always_inline.
> 
> FWIW, I get a far better result with :
> 
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index ba00c550d9d2..447b8de6e427 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -21,7 +21,7 @@ static int __patch_memory(void *patch_addr,
> unsigned 
> long val, void *exec_addr,
> /* Prefixed instruction may cross cacheline if cacheline
> smaller than 
> 64 bytes */
> BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES <
> 64);
> 
> -   if (unlikely(is_dword))
> +   if (IS_ENABLED(CONFIG_PPC64) && unlikely(is_dword))
> __put_kernel_nofault(patch_addr, , u64, failed);
> else
> __put_kernel_nofault(patch_addr, , u32, failed);

That's very interesting, as that's what I had originally. I removed the
IS_ENABLED(CONFIG_PPC64) part of the if condition as part of submission
cleanup refactoring because it should be redundant after constant
propagation. The weird thing is that GCC constant propagates either
way, it's just changing it's mind about inlining.

I can add it back, but the optimisation here seems very fragile. It
feels more reliable just to specify it explicitly.


Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Segher Boessenkool
On Thu, Oct 06, 2022 at 08:50:31PM +, Christophe Leroy wrote:
> Le 06/10/2022 à 22:45, Segher Boessenkool a écrit :
> > I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
> > or something like that :-)  And, on 64 bit, which is what the question
> > was about!
> 
> Ah, does the size really matters here ? I was thinking more in terms of 
> performance when I made the comment.

Other than some unused code there should not be much performance impact
at all from enabling modules when not needed, on 64 bit.  Security (in
depth) is a very different kettle of fish of course ;-)


Segher


Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Christophe Leroy


Le 06/10/2022 à 22:45, Segher Boessenkool a écrit :
> On Thu, Oct 06, 2022 at 06:38:00PM +, Christophe Leroy wrote:
>> Le 06/10/2022 à 20:22, Segher Boessenkool a écrit :
>>> Long ago I built kernels that fit together with the boot firmware and a
>>> root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
>>> close to that at all these days :-)
>>
>> 4MB, not easy. But 8M still achievable. Well our smaller board has 32M,
>> we have thousands of it spread all over Europe and have to keep it up to
>> date 
> 
> The smallest of these systems had 256MB RAM.  This 4MB is flash ROM :-)

I fit Uboot + DTB + Kernel + Initramfs with klibc and mtdutils in a 2MB 
flash ROM.

> 
>>> What is the overhead if you enable modules but do not use them, these
>>> days?
>>
>> On the 8xx it is mainly the instruction TLB miss handler:
> 
> I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
> or something like that :-)  And, on 64 bit, which is what the question
> was about!

Ah, does the size really matters here ? I was thinking more in terms of 
performance when I made the comment.

Christophe

Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Segher Boessenkool
On Thu, Oct 06, 2022 at 06:38:00PM +, Christophe Leroy wrote:
> Le 06/10/2022 à 20:22, Segher Boessenkool a écrit :
> > Long ago I built kernels that fit together with the boot firmware and a
> > root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
> > close to that at all these days :-)
> 
> 4MB, not easy. But 8M still achievable. Well our smaller board has 32M, 
> we have thousands of it spread all over Europe and have to keep it up to 
> date 

The smallest of these systems had 256MB RAM.  This 4MB is flash ROM :-)

> > What is the overhead if you enable modules but do not use them, these
> > days?
> 
> On the 8xx it is mainly the instruction TLB miss handler:

I meant just an indicative code size number...  100 bytes, 100kB, 100MB,
or something like that :-)  And, on 64 bit, which is what the question
was about!


Segher


Re: [PATCH v2 06/19] powerpc/cputable: Split cpu_specs[] out of cputable.h

2022-10-06 Thread Christophe Leroy


Le 20/09/2022 à 10:56, Nicholas Piggin a écrit :
> On Tue Sep 20, 2022 at 3:01 AM AEST, Christophe Leroy wrote:
> 
> This series is a nice cleanup. No comments yet but kernel/ is getting
> pretty crowded. Should we make some subdirectories for subarch things
> like mm has?
> 
> Can do that after your series. Probably requires another merge window
> to do it.
> 

By the way, I'm wondering how we decide whether some code goes in 
arch/powerpc/kernel/ or in arch/powerpc/lib/

Christosphe

Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Segher Boessenkool
Hi!

On Thu, Oct 06, 2022 at 06:07:32PM +, Christophe Leroy wrote:
> Le 23/09/2022 à 08:23, Nicholas Piggin a écrit :
> > I would rather complete prefixed support in the kernel and use pcrel
> > addressing. Actually even if we don't compile with pcrel or prefixed,
> > there are some instructions and we will probably get more that require
> > prefixed, possible we might want to use them in kernel. Some of it is
> > required to handle user mode instructions too. So I think removing
> > it is premature, but I guess it's up for debate.
> 
> Well ok, in fact I only had code_patching in mind.
> 
> Code patching is only for kernel text. Today code patching is used for 
> things like kprobe, ftrace, etc  which really do not seems to be 
> prepared for prefixed instructions.
> 
> If you are adding -mno-prefixed, it is worth keeping that code which 
> sometimes gives us some headacke ?

-mpcrel requires -mprefixed.  Using PC relative addressing will be a
significant performance benefit.

> Of course if there are plans to get real prefixed instruction in kernel 
> code anytime soon, lets live with it, in that case the support should 
> get completed. But otherwise I think it would be better to get rid of it 
> for now, and implement it completely when we need it in years.

The future is unstoppable, certainly the near future is :-)

> When I see the following, I'm having hard time believing it would work 
> with prefixed instructions in the kernel text:
> 
>   typedef u32 kprobe_opcode_t;
> 
>   struct kprobe {
>   ...
>   /* Saved opcode (which has been replaced with breakpoint) */
>   kprobe_opcode_t opcode;
> 
> 
>   void arch_disarm_kprobe(struct kprobe *p)
>   {
>   WARN_ON_ONCE(patch_instruction(p->addr, ppc_inst(p->opcode)));
>   }

Why would it not work?


Segher


[linux-next:master] BUILD REGRESSION 7da9fed0474b4cd46055dd92d55c42faf32c19ac

2022-10-06 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 7da9fed0474b4cd46055dd92d55c42faf32c19ac  Add linux-next specific 
files for 20221006

Error/Warning reports:

https://lore.kernel.org/linux-doc/202210070057.npbamyxb-...@intel.com
https://lore.kernel.org/llvm/202209220019.yr2vuxhg-...@intel.com
https://lore.kernel.org/llvm/202210062012.xvdajoot-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

ERROR: modpost: "devm_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_ioremap_resource" [drivers/dma/idma64.ko] undefined!
ERROR: modpost: "devm_ioremap_resource" [drivers/dma/qcom/hdma.ko] undefined!
ERROR: modpost: "devm_memremap" [drivers/misc/open-dice.ko] undefined!
ERROR: modpost: "devm_memunmap" [drivers/misc/open-dice.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" 
[drivers/char/xillybus/xillybus_of.ko] undefined!
ERROR: modpost: "ioremap" [drivers/net/ethernet/8390/pcnet_cs.ko] undefined!
ERROR: modpost: "ioremap" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: modpost: "iounmap" [drivers/net/ethernet/8390/pcnet_cs.ko] undefined!
ERROR: modpost: "iounmap" [drivers/tty/ipwireless/ipwireless.ko] undefined!
Warning: MAINTAINERS references a file that doesn't exist: 
Documentation/devicetree/bindings/mtd/amlogic,meson-nand.txt
arch/arm64/kernel/alternative.c:199:6: warning: no previous prototype for 
'apply_alternatives_vdso' [-Wmissing-prototypes]
arch/arm64/kernel/alternative.c:295:14: warning: no previous prototype for 
'alt_cb_patch_nops' [-Wmissing-prototypes]
arch/loongarch/mm/init.c:166:24: warning: variable 'new' set but not used 
[-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/virtual/virtual_link_hwss.c:40:6: 
warning: no previous prototype for 'virtual_disable_link_output' 
[-Wmissing-prototypes]
drivers/net/ethernet/freescale/fman/fman_memac.c:1246 memac_initialization() 
error: uninitialized symbol 'fixed_link'.
drivers/nvme/target/loop.c:578 nvme_loop_create_ctrl() warn: 'opts->queue_size 
- 1' 4294967295 can't fit into 65535 'ctrl->ctrl.sqsize'
drivers/vfio/pci/mlx5/cmd.c:432 combine_ranges() error: uninitialized symbol 
'last'.
drivers/vfio/pci/mlx5/cmd.c:453 combine_ranges() error: potentially 
dereferencing uninitialized 'comb_end'.
drivers/vfio/pci/mlx5/cmd.c:453 combine_ranges() error: potentially 
dereferencing uninitialized 'comb_start'.
drivers/vfio/pci/vfio_pci_core.c:1035 vfio_pci_ioctl_get_region_info() warn: 
potential spectre issue 'vdev->region' [r]
drivers/vfio/pci/vfio_pci_core.c:958 vfio_pci_ioctl_get_region_info() warn: 
potential spectre issue 'pdev->resource' [w]
fs/ext4/super.c:1744:19: warning: 'deprecated_msg' defined but not used 
[-Wunused-const-variable=]
include/linux/compiler_types.h:357:45: error: call to 
'__compiletime_assert_417' declared with attribute error: FIELD_GET: mask is 
not constant
lib/test_vmalloc.c:154 random_size_alloc_test() error: uninitialized symbol 'n'.

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- arc-allyesconfig
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- arm-allyesconfig
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- arm64-allyesconfig
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-alt_cb_patch_nops
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-apply_alternatives_vdso
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- arm64-randconfig-r006-20221002
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-alt_cb_patch_nops
|   `-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-apply_alternatives_vdso
|-- arm64-randconfig-r016-20221003
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-alt_cb_patch_nops
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-apply_alternatives_vdso
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- csky-buildonly-randconfig-r005-20221002
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- i386-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|   `-- fs-ext4-super

Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Segher Boessenkool
Hi!

On Fri, Sep 23, 2022 at 01:30:04PM +1000, Nicholas Piggin wrote:
> This adds basic POWER10_CPU option, which builds with -mcpu=power10.

> +# No prefix or pcrel
> +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
> +KBUILD_CFLAGS += $(call cc-option,-mno-pcrel)

Why do you disable all prefixed insns?  What goes wrong if you don't?

Same question for pcrel.  I'm sure you want to optimise it better, but
it's not clear to me how it fails now?

Please say in the comment what is wrong, don't spread fear :-)

> +# No AltiVec or VSX or MMA instructions when building kernel
>  KBUILD_CFLAGS += $(call cc-option,-mno-altivec)
>  KBUILD_CFLAGS += $(call cc-option,-mno-vsx)
> +KBUILD_CFLAGS += $(call cc-option,-mno-mma)

MMA code is never generated unless the code asks for it explicitly.
This is fundamental, not just an implementations side effect.


Segher


Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Christophe Leroy


Le 06/10/2022 à 20:22, Segher Boessenkool a écrit :
> On Thu, Oct 06, 2022 at 11:39:50AM +1100, Michael Ellerman wrote:
>> Christophe Leroy  writes:
>>> However, thinking out loudly, I'm wondering, could we make things any
>>> simpler when CONFIG_MODULES is not selected, or is that a too much
>>> corner case on PPC64 ?
>>
>> I'd say it's mostly a corner case.
>>
>> Obviously no distros ship with modules disabled.
>>
>> AFAIK even the stripped down kernels we use in CPU bringup have modules
>> enabled.
>>
>> So I think it's probably not worth worrying about, unless there's an
>> obvious and fairly simple optimisation.
> 
> Long ago I built kernels that fit together with the boot firmware and a
> root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
> close to that at all these days :-)

4MB, not easy. But 8M still achievable. Well our smaller board has 32M, 
we have thousands of it spread all over Europe and have to keep it up to 
date 

> 
> What is the overhead if you enable modules but do not use them, these
> days?
> 

On the 8xx it is mainly the instruction TLB miss handler:

#ifdef CONFIG_MODULES
mfcrr11
not.r10, r10
#endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
#ifdef CONFIG_MODULES
blt+3f
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
3:
mtcrr11
#endif


And also some patches which have a interesting impact, like commit 
cb3ac45214c0 ("powerpc/code-patching: Don't call 
is_vmalloc_or_module_addr() without CONFIG_MODULES")


On the other hand, if we want one day to replace nftables by BPF jitted 
iptables, CONFIG_MODULES will be required. So what will be the 
trade-off, don't know yet because BPF is not yet cross-compile friendly.

Christophe

Re: [PATCH v3 5/6] powerpc/64: Add support for out-of-line static calls

2022-10-06 Thread Segher Boessenkool
On Thu, Oct 06, 2022 at 11:39:50AM +1100, Michael Ellerman wrote:
> Christophe Leroy  writes:
> > However, thinking out loudly, I'm wondering, could we make things any 
> > simpler when CONFIG_MODULES is not selected, or is that a too much 
> > corner case on PPC64 ?
> 
> I'd say it's mostly a corner case.
> 
> Obviously no distros ship with modules disabled. 
> 
> AFAIK even the stripped down kernels we use in CPU bringup have modules
> enabled.
> 
> So I think it's probably not worth worrying about, unless there's an
> obvious and fairly simple optimisation.

Long ago I built kernels that fit together with the boot firmware and a
root fs (busybox+dropbear essentially) in 4MB, but I doubt we can get
close to that at all these days :-)

What is the overhead if you enable modules but do not use them, these
days?


Segher


Re: [PATCH] powerpc/64s: POWER10 CPU Kconfig build option

2022-10-06 Thread Christophe Leroy


Le 23/09/2022 à 08:23, Nicholas Piggin a écrit :
> On Fri Sep 23, 2022 at 3:46 PM AEST, Christophe Leroy wrote:
>>
>>
>> Le 23/09/2022 à 05:30, Nicholas Piggin a écrit :
>>> This adds basic POWER10_CPU option, which builds with -mcpu=power10.
>>>
>>> Signed-off-by: Nicholas Piggin 
>>> ---
>>> There's quite a lot of asm and linker changes slated for the next merge
>>> window already so I may leave the pcrel patch for next time. I think we
>>> can add the basic POWER10 build option though.
>>>
>>> Thanks,
>>> Nick
>>>
>>>arch/powerpc/Makefile  | 7 ++-
>>>arch/powerpc/platforms/Kconfig.cputype | 8 +++-
>>>2 files changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
>>> index 8a3d69b02672..ea88af26f8c6 100644
>>> --- a/arch/powerpc/Makefile
>>> +++ b/arch/powerpc/Makefile
>>> @@ -192,9 +192,14 @@ ifdef CONFIG_476FPE_ERR46
>>> -T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
>>>endif
>>>
>>> -# No AltiVec or VSX instructions when building kernel
>>> +# No prefix or pcrel
>>> +KBUILD_CFLAGS += $(call cc-option,-mno-prefixed)
>>
>> We have lots of code to handle prefixed instructions in code_patching,
>> and that code complexifies stuff and has a performance impact.
>> And it is only partially taken into account, areas like ftrace don't
>> properly take care of prefixed instructions.
>>
>> Should we get rid of prefixed instruction support completely in the
>> kernel, and come back to more simple code ?
> 
> I would rather complete prefixed support in the kernel and use pcrel
> addressing. Actually even if we don't compile with pcrel or prefixed,
> there are some instructions and we will probably get more that require
> prefixed, possible we might want to use them in kernel. Some of it is
> required to handle user mode instructions too. So I think removing
> it is premature, but I guess it's up for debate.

Well ok, in fact I only had code_patching in mind.

Code patching is only for kernel text. Today code patching is used for 
things like kprobe, ftrace, etc  which really do not seems to be 
prepared for prefixed instructions.

If you are adding -mno-prefixed, it is worth keeping that code which 
sometimes gives us some headacke ?

Of course if there are plans to get real prefixed instruction in kernel 
code anytime soon, lets live with it, in that case the support should 
get completed. But otherwise I think it would be better to get rid of it 
for now, and implement it completely when we need it in years.

When I see the following, I'm having hard time believing it would work 
with prefixed instructions in the kernel text:

typedef u32 kprobe_opcode_t;

struct kprobe {
...
/* Saved opcode (which has been replaced with breakpoint) */
kprobe_opcode_t opcode;


void arch_disarm_kprobe(struct kprobe *p)
{
WARN_ON_ONCE(patch_instruction(p->addr, ppc_inst(p->opcode)));
}


Christophe

Re: [PATCH 11/17] powerpc/qspinlock: allow propagation of yield CPU down the queue

2022-10-06 Thread Laurent Dufour
On 28/07/2022 08:31:14, Nicholas Piggin wrote:
> Having all CPUs poll the lock word for the owner CPU that should be
> yielded to defeats most of the purpose of using MCS queueing for
> scalability. Yet it may be desirable for queued waiters to to yield
> to a preempted owner.
> 
> s390 addreses this problem by having queued waiters sample the lock
> word to find the owner much less frequently. In this approach, the
> waiters never sample it directly, but the queue head propagates the
> owner CPU back to the next waiter if it ever finds the owner has
> been preempted. Queued waiters then subsequently propagate the owner
> CPU back to the next waiter, and so on.
> 
> Disable this option by default for now, i.e., no logical change.
> ---
>  arch/powerpc/lib/qspinlock.c | 85 +++-
>  1 file changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
> index 94f007f66942..28c85a2d5635 100644
> --- a/arch/powerpc/lib/qspinlock.c
> +++ b/arch/powerpc/lib/qspinlock.c
> @@ -12,6 +12,7 @@
>  struct qnode {
>   struct qnode*next;
>   struct qspinlock *lock;
> + int yield_cpu;
>   u8  locked; /* 1 if lock acquired */
>  };
>  
> @@ -28,6 +29,7 @@ static int HEAD_SPINS __read_mostly = (1<<8);
>  static bool pv_yield_owner __read_mostly = true;
>  static bool pv_yield_allow_steal __read_mostly = false;
>  static bool pv_yield_prev __read_mostly = true;
> +static bool pv_yield_propagate_owner __read_mostly = true;
>  
>  static DEFINE_PER_CPU_ALIGNED(struct qnodes, qnodes);
>  
> @@ -257,13 +259,66 @@ static __always_inline void 
> yield_head_to_locked_owner(struct qspinlock *lock, u
>   __yield_to_locked_owner(lock, val, paravirt, clear_mustq);
>  }
>  
> +static __always_inline void propagate_yield_cpu(struct qnode *node, u32 val, 
> int *set_yield_cpu, bool paravirt)
> +{
> + struct qnode *next;
> + int owner;
> +
> + if (!paravirt)
> + return;
> + if (!pv_yield_propagate_owner)
> + return;
> +
> + owner = get_owner_cpu(val);
> + if (*set_yield_cpu == owner)
> + return;
> +
> + next = READ_ONCE(node->next);
> + if (!next)
> + return;
> +
> + if (vcpu_is_preempted(owner)) {
> + next->yield_cpu = owner;
> + *set_yield_cpu = owner;
> + } else if (*set_yield_cpu != -1) {
> + next->yield_cpu = owner;
> + *set_yield_cpu = owner;
> + }

This is bit confusing, the else branch is the same as the true one.
This might be written like this:

if (vcpu_is_preempted(owner) || *set_yield_cpu != -1) {
next->yield_cpu = owner;
*set_yield_cpu = owner;
}

> +}
> +
>  static __always_inline void yield_to_prev(struct qspinlock *lock, struct 
> qnode *node, int prev_cpu, bool paravirt)
>  {
>   u32 yield_count;
> + int yield_cpu;
>  
>   if (!paravirt)
>   goto relax;
>  
> + if (!pv_yield_propagate_owner)
> + goto yield_prev;
> +
> + yield_cpu = READ_ONCE(node->yield_cpu);
> + if (yield_cpu == -1) {
> + /* Propagate back the -1 CPU */
> + if (node->next && node->next->yield_cpu != -1)
> + node->next->yield_cpu = yield_cpu;
> + goto yield_prev;
> + }
> +
> + yield_count = yield_count_of(yield_cpu);
> + if ((yield_count & 1) == 0)
> + goto yield_prev; /* owner vcpu is running */
> +
> + smp_rmb();
> +
> + if (yield_cpu == node->yield_cpu) {
> + if (node->next && node->next->yield_cpu != yield_cpu)
> + node->next->yield_cpu = yield_cpu;
> + yield_to_preempted(yield_cpu, yield_count);
> + return;
> + }
> +

In the case that test is false, this means that the lock owner has probably
changed, why are we yeilding to the previous node instead of reading again
the node->yield_cpu, checking against -1 value etc..?
Yielding to the previous node is valid, but it might be better to yield to
the owner, isn't it?

> +yield_prev:
>   if (!pv_yield_prev)
>   goto relax;
>  
> @@ -337,6 +392,7 @@ static __always_inline void 
> queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>   node = >nodes[idx];
>   node->next = NULL;
>   node->lock = lock;
> + node->yield_cpu = -1;
>   node->locked = 0;
>  
>   tail = encode_tail_cpu();
> @@ -358,13 +414,21 @@ static __always_inline void 
> queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>   while (!node->locked)
>   yield_to_prev(lock, node, prev_cpu, paravirt);
>  
> + /* Clear out stale propagated yield_cpu */
> + if (paravirt && pv_yield_propagate_owner && node->yield_cpu != 
> -1)
> + node->yield_cpu = -1;

Why doing tests and not directly setting node->yield_cpu to -1?
Is the 

[PATCH 1/2] tools/perf/tests/shell: Update stat+csv_output.sh to include sanity check for topology

2022-10-06 Thread Athira Rajeev
Testcase stat+csv_output.sh fails in powerpc:

84: perf stat CSV output linter: FAILED!

The testcase "stat+csv_output.sh" verifies perf stat
CSV output. The test covers aggregation modes like
per-socket, per-core, per-die, -A (no_aggr mode) along
with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE
mode), expects 7 fields in the output having "CPU" as
first field. Same way, for per-socket, it expects the
first field in result to point to socket id. The testcases
compares the result with expected count.

The values for socket, die, core and cpu are fetched
from topology directory:
/sys/devices/system/cpu/cpu*/topology.
For example, socket value is fetched from
"physical_package_id" file of topology directory.
(cpu__get_topology_int() in util/cpumap.c)

If a platform fails to fetch the topology information,
values will be set to -1. For example, incase of pSeries
platform of powerpc, value for  "physical_package_id" is
restricted and not exposed. So, -1 will be assigned.

Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases
where topology values not exposed, first field of the
output displaying will be empty. This cause the testcase
to fail, as it counts  number of fields in the output.

Incase of -A (AGGR_NONE mode,), testcase expects 7 fields
in the output, becos of -1 value obtained from topology
files for some, only 6 fields are printed. Hence a testcase
failure reported due to mismatch in number of fields in
the output.

Patch here adds a sanity check in the testcase for topology.
Check will help to skip the test if -1 value found.

Fixes: 7473ee56dbc9 ("perf test: Add checking for perf stat CSV output.")
Reported-by: Disha Goel 
Signed-off-by: Athira Rajeev 
Suggested-by: James Clark 
Suggested-by: Ian Rogers 
---
 tools/perf/tests/shell/stat+csv_output.sh | 43 ---
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/shell/stat+csv_output.sh 
b/tools/perf/tests/shell/stat+csv_output.sh
index eb5196f58190..b7f050aa6210 100755
--- a/tools/perf/tests/shell/stat+csv_output.sh
+++ b/tools/perf/tests/shell/stat+csv_output.sh
@@ -6,6 +6,8 @@
 
 set -e
 
+skip_test=0
+
 function commachecker()
 {
local -i cnt=0
@@ -156,14 +158,47 @@ check_per_socket()
echo "[Success]"
 }
 
+# The perf stat options for per-socket, per-core, per-die
+# and -A ( no_aggr mode ) uses the info fetched from this
+# directory: "/sys/devices/system/cpu/cpu*/topology". For
+# example, socket value is fetched from "physical_package_id"
+# file in topology directory.
+# Reference: cpu__get_topology_int in util/cpumap.c
+# If the platform doesn't expose topology information, values
+# will be set to -1. For example, incase of pSeries platform
+# of powerpc, value for  "physical_package_id" is restricted
+# and set to -1. Check here validates the socket-id read from
+# topology file before proceeding further
+
+FILE_LOC="/sys/devices/system/cpu/cpu*/topology/"
+FILE_NAME="physical_package_id"
+
+check_for_topology()
+{
+   if ! ParanoidAndNotRoot 0
+   then
+   socket_file=`ls $FILE_LOC/$FILE_NAME | head -n 1`
+   [ -z $socket_file ] && return 0
+   socket_id=`cat $socket_file`
+   [ $socket_id == -1 ] && skip_test=1
+   return 0
+   fi
+}
+
+check_for_topology
 check_no_args
 check_system_wide
-check_system_wide_no_aggr
 check_interval
 check_event
-check_per_core
 check_per_thread
-check_per_die
 check_per_node
-check_per_socket
+if [ $skip_test -ne 1 ]
+then
+   check_system_wide_no_aggr
+   check_per_core
+   check_per_die
+   check_per_socket
+else
+   echo "[Skip] Skipping tests for system_wide_no_aggr, per_core, per_die 
and per_socket since socket id exposed via topology is invalid"
+fi
 exit 0
-- 
2.31.1



[PATCH 2/2] tools/perf/tests/shell: Update stat+json_output.sh to include sanity check for topology

2022-10-06 Thread Athira Rajeev
Testcase stat+json_output.sh fails in powerpc:

86: perf stat JSON output linter : FAILED!

The testcase "stat+json_output.sh" verifies perf stat
JSON output. The test covers aggregation modes like
per-socket, per-core, per-die, -A (no_aggr mode) along
with few other tests. It counts expected fields for
various commands. For example say -A (i.e, AGGR_NONE
mode), expects 7 fields in the output having "CPU" as
first field. Same way, for per-socket, it expects the
first field in result to point to socket id. The testcases
compares the result with expected count.

The values for socket, die, core and cpu are fetched
from topology directory:
/sys/devices/system/cpu/cpu*/topology.
For example, socket value is fetched from
"physical_package_id" file of topology directory.
(cpu__get_topology_int() in util/cpumap.c)

If a platform fails to fetch the topology information,
values will be set to -1. For example, incase of pSeries
platform of powerpc, value for  "physical_package_id" is
restricted and not exposed. So, -1 will be assigned.

Perf code has a checks for valid cpu id in "aggr_printout"
(stat-display.c), which displays the fields. So, in cases
where topology values not exposed, first field of the
output displaying will be empty. This cause the testcase
to fail, as it counts  number of fields in the output.

Incase of -A (AGGR_NONE mode,), testcase expects 7 fields
in the output, becos of -1 value obtained from topology
files for some, only 6 fields are printed. Hence a testcase
failure reported due to mismatch in number of fields in
the output.

Patch here adds a sanity check in the testcase for topology.
Check will help to skip the test if -1 value found.

Fixes: 0c343af2a2f8 ("perf test: JSON format checking")
Reported-by: Disha Goel 
Signed-off-by: Athira Rajeev 
Suggested-by: James Clark 
Suggested-by: Ian Rogers 
---
 tools/perf/tests/shell/stat+json_output.sh | 43 --
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/shell/stat+json_output.sh 
b/tools/perf/tests/shell/stat+json_output.sh
index ea8714a36051..2c4212c641ed 100755
--- a/tools/perf/tests/shell/stat+json_output.sh
+++ b/tools/perf/tests/shell/stat+json_output.sh
@@ -6,6 +6,8 @@
 
 set -e
 
+skip_test=0
+
 pythonchecker=$(dirname $0)/lib/perf_json_output_lint.py
 if [ "x$PYTHON" == "x" ]
 then
@@ -134,14 +136,47 @@ check_per_socket()
echo "[Success]"
 }
 
+# The perf stat options for per-socket, per-core, per-die
+# and -A ( no_aggr mode ) uses the info fetched from this
+# directory: "/sys/devices/system/cpu/cpu*/topology". For
+# example, socket value is fetched from "physical_package_id"
+# file in topology directory.
+# Reference: cpu__get_topology_int in util/cpumap.c
+# If the platform doesn't expose topology information, values
+# will be set to -1. For example, incase of pSeries platform
+# of powerpc, value for  "physical_package_id" is restricted
+# and set to -1. Check here validates the socket-id read from
+# topology file before proceeding further
+
+FILE_LOC="/sys/devices/system/cpu/cpu*/topology/"
+FILE_NAME="physical_package_id"
+
+check_for_topology()
+{
+   if ! ParanoidAndNotRoot 0
+   then
+   socket_file=`ls $FILE_LOC/$FILE_NAME | head -n 1`
+   [ -z $socket_file ] && return 0
+   socket_id=`cat $socket_file`
+   [ $socket_id == -1 ] && skip_test=1
+   return 0
+   fi
+}
+
+check_for_topology
 check_no_args
 check_system_wide
-check_system_wide_no_aggr
 check_interval
 check_event
-check_per_core
 check_per_thread
-check_per_die
 check_per_node
-check_per_socket
+if [ $skip_test -ne 1 ]
+then
+   check_system_wide_no_aggr
+   check_per_core
+   check_per_die
+   check_per_socket
+else
+   echo "[Skip] Skipping tests for system_wide_no_aggr, per_core, per_die 
and per_socket since socket id exposed via topology is invalid"
+fi
 exit 0
-- 
2.31.1



[PATCH] powerpc/64: poison __per_cpu_offset to catch use-before-init

2022-10-06 Thread Nicholas Piggin
If the boot CPU tries to access per-cpu data of other CPUs before
per cpu areas are set up, it will unexpectedly use offset 0.

Try to catch such accesses by poisoning the __per_cpu_offset array.

Signed-off-by: Nicholas Piggin 
---
With the early boot machine check handler added to 64s, this might be
worth another try. We did have a bug in mce_init() that was using
cpu_to_node() too early that would be caught by this.

Thanks,
Nick

 arch/powerpc/include/asm/percpu.h | 1 +
 arch/powerpc/kernel/paca.c| 2 +-
 arch/powerpc/kernel/setup_64.c| 3 ++-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/percpu.h 
b/arch/powerpc/include/asm/percpu.h
index 8e5b7d0b851c..6ca1a9fc5725 100644
--- a/arch/powerpc/include/asm/percpu.h
+++ b/arch/powerpc/include/asm/percpu.h
@@ -7,6 +7,7 @@
  * Same as asm-generic/percpu.h, except that we store the per cpu offset
  * in the paca. Based on the x86-64 implementation.
  */
+#define PER_CPU_OFFSET_POISON 0xfeeeULL
 
 #ifdef CONFIG_SMP
 
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index be8db402e963..c33de70596e7 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -198,7 +198,7 @@ void __init initialise_paca(struct paca_struct *new_paca, 
int cpu)
new_paca->hw_cpu_id = 0x;
new_paca->kexec_state = KEXEC_STATE_NONE;
new_paca->__current = _task;
-   new_paca->data_offset = 0xfeeeULL;
+   new_paca->data_offset = PER_CPU_OFFSET_POISON;
 #ifdef CONFIG_PPC_64S_HASH_MMU
new_paca->slb_shadow_ptr = NULL;
 #endif
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index a0dee7354fe6..039a4eac03c3 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -362,6 +362,7 @@ void __init early_setup(unsigned long dt_ptr)
 * So set up a temporary paca. It will be replaced below once we know
 * what CPU we are on.
 */
+   __per_cpu_offset[0] = 0;
initialise_paca(_paca, 0);
fixup_boot_paca(_paca);
WARN_ON(local_paca != 0);
@@ -828,7 +829,7 @@ static __init int pcpu_cpu_to_node(int cpu)
return early_cpu_to_node(cpu);
 }
 
-unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
+unsigned long __per_cpu_offset[NR_CPUS] __read_mostly = { [0 ... NR_CPUS-1 ] = 
PER_CPU_OFFSET_POISON };
 EXPORT_SYMBOL(__per_cpu_offset);
 
 void __init setup_per_cpu_areas(void)
-- 
2.37.2



Re: [PATCH] tools/perf: Fix aggr_printout to display cpu field irrespective of core value

2022-10-06 Thread Athira Rajeev



> On 06-Oct-2022, at 7:33 PM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Thu, Oct 06, 2022 at 06:16:14PM +0530, Athira Rajeev escreveu:
>> 
>> 
>>> On 05-Oct-2022, at 6:05 PM, Arnaldo Carvalho de Melo  
>>> wrote:
>>> 
>>> Em Wed, Oct 05, 2022 at 09:28:52AM -0300, Arnaldo Carvalho de Melo escreveu:
 Em Wed, Oct 05, 2022 at 10:23:39AM +0530, Athira Rajeev escreveu:
> diff --git a/tools/perf/util/stat-display.c 
> b/tools/perf/util/stat-display.c
> index b82844cb0ce7..cf28020798ec 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -168,7 +168,7 @@ static void aggr_printout(struct perf_stat_config 
> *config,
>   id.socket,
>   id.die,
>   id.core);
> - } else if (id.core > -1) {
> + } else if (id.cpu.cpu > -1) {
>   fprintf(config->output, "\"cpu\" : \"%d\", ",
>   id.cpu.cpu);
>   }
> @@ -179,7 +179,7 @@ static void aggr_printout(struct perf_stat_config 
> *config,
>   id.die,
>   config->csv_output ? 0 : -3,
>   id.core, config->csv_sep);
> - } else if (id.core > -1) {
> + } else if (id.cpu.cpu > -1) {
>   fprintf(config->output, "CPU%*d%s",
>   config->csv_output ? 0 : -7,
>   id.cpu.cpu, config->csv_sep);
> -- 
> If it is confusing, shall I send it as a separate patch along with 
> Tested-by from Ian ?
 
 I'll have to do this by hand, tried pointing b4 to this message and it
 picked the old one, also tried to save the message and apply by hand,
 its mangled.
>>> 
>>> This is what I have now, will force push later, please triple check :-)
>> 
>> 
>> Sorry for all the confusion Arnaldo. Hereafter, I will resubmit the patch to 
>> avoid this.
>> In below patch which you shared, code change is correct. But commit message 
>> is different.
>> So to avoid further confusion, I went ahead and posted a separate patch in 
>> the mailing list with:
>> 
>> subject: [PATCH] tools/perf: Fix cpu check to use id.cpu.cpu in aggr_printout
>> Patch link: 
>> https://lore.kernel.org/lkml/20221006114225.66303-1-atraj...@linux.vnet.ibm.com/T/#u
>> 
>> Please pick the separately send patch.
> 
> Ok, will do.

Thanks Arnaldo.
> 
> - Arnaldo



[PATCH] KVM: PPC: Book3S HV: Fix stack frame regs marker

2022-10-06 Thread Nicholas Piggin
The hard-coded marker is out of date now, fix it using the nice define.

Fixes: 17773afdcd15 ("powerpc/64: use 32-bit immediate for 
STACK_FRAME_REGS_MARKER")
Reported-by: Joel Stanley 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c984021e62c8..37f50861dd98 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2728,7 +2728,7 @@ kvmppc_bad_host_intr:
std r5, _XER(r1)
std r6, SOFTE(r1)
LOAD_PACA_TOC()
-   LOAD_REG_IMMEDIATE(3, 0x7265677368657265)
+   LOAD_REG_IMMEDIATE(3, STACK_FRAME_REGS_MARKER)
std r3, STACK_FRAME_OVERHEAD-16(r1)
 
/*
-- 
2.37.2



[RFC PATCH] powerpc/64: Don't recurse irq replay

2022-10-06 Thread Nicholas Piggin
Use irq_enter/irq_exit around irq replay to prevent softirqs running
while interrupts are being replayed. Instead they run at the irq_exit()
call where reentrancy is less problematic. A new PACA_IRQ_REPLAYING is
added to prevent interrupt handlers hard-enabling EE while being
replayed.
---
I finally might have worked out a way to do this without surgery to
other parts of the kernel - use a nested irq_enter around replay,
which seems to do the right thing. Almost seems a bit too easy.
I think we probably want to do this even though it adds a bit of
complexity itself and yet more soft-masking churn. But it is
debatable. I will post again with more discussion some tie later
probably for next merge window.

Thanks,
Nick

 arch/powerpc/include/asm/hw_irq.h |   6 +-
 arch/powerpc/kernel/irq_64.c  | 101 +++---
 2 files changed, 70 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 265d0eb7ed79..5ffe5c63dc82 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -36,15 +36,17 @@
 #define PACA_IRQ_DEC   0x08 /* Or FIT */
 #define PACA_IRQ_HMI   0x10
 #define PACA_IRQ_PMI   0x20
+#define PACA_IRQ_REPLAYING 0x40
 
 /*
  * Some soft-masked interrupts must be hard masked until they are replayed
  * (e.g., because the soft-masked handler does not clear the exception).
+ * Interrupt replay itself must remain hard masked too.
  */
 #ifdef CONFIG_PPC_BOOK3S
-#define PACA_IRQ_MUST_HARD_MASK(PACA_IRQ_EE|PACA_IRQ_PMI)
+#define PACA_IRQ_MUST_HARD_MASK
(PACA_IRQ_EE|PACA_IRQ_PMI|PACA_IRQ_REPLAYING)
 #else
-#define PACA_IRQ_MUST_HARD_MASK(PACA_IRQ_EE)
+#define PACA_IRQ_MUST_HARD_MASK(PACA_IRQ_EE|PACA_IRQ_REPLAYING)
 #endif
 
 #endif /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kernel/irq_64.c b/arch/powerpc/kernel/irq_64.c
index eb2b380e52a0..9dc0ad3c533a 100644
--- a/arch/powerpc/kernel/irq_64.c
+++ b/arch/powerpc/kernel/irq_64.c
@@ -70,22 +70,19 @@ int distribute_irqs = 1;
 
 static inline void next_interrupt(struct pt_regs *regs)
 {
-   /*
-* Softirq processing can enable/disable irqs, which will leave
-* MSR[EE] enabled and the soft mask set to IRQS_DISABLED. Fix
-* this up.
-*/
-   if (!(local_paca->irq_happened & PACA_IRQ_HARD_DIS))
-   hard_irq_disable();
-   else
-   irq_soft_mask_set(IRQS_ALL_DISABLED);
+   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
+   WARN_ON(!(local_paca->irq_happened & PACA_IRQ_HARD_DIS));
+   WARN_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
+   }
 
/*
 * We are responding to the next interrupt, so interrupt-off
 * latencies should be reset here.
 */
+   lockdep_hardirq_exit();
trace_hardirqs_on();
trace_hardirqs_off();
+   lockdep_hardirq_enter();
 }
 
 static inline bool irq_happened_test_and_clear(u8 irq)
@@ -97,22 +94,11 @@ static inline bool irq_happened_test_and_clear(u8 irq)
return false;
 }
 
-void replay_soft_interrupts(void)
+static void __replay_soft_interrupts(void)
 {
struct pt_regs regs;
 
/*
-* Be careful here, calling these interrupt handlers can cause
-* softirqs to be raised, which they may run when calling irq_exit,
-* which will cause local_irq_enable() to be run, which can then
-* recurse into this function. Don't keep any state across
-* interrupt handler calls which may change underneath us.
-*
-* Softirqs can not be disabled over replay to stop this recursion
-* because interrupts taken in idle code may require RCU softirq
-* to run in the irq RCU tracking context. This is a hard problem
-* to fix without changes to the softirq or idle layer.
-*
 * We use local_paca rather than get_paca() to avoid all the
 * debug_smp_processor_id() business in this low level function.
 */
@@ -120,13 +106,20 @@ void replay_soft_interrupts(void)
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
WARN_ON_ONCE(mfmsr() & MSR_EE);
WARN_ON(!(local_paca->irq_happened & PACA_IRQ_HARD_DIS));
+   WARN_ON(local_paca->irq_happened & PACA_IRQ_REPLAYING);
}
 
+   /*
+* PACA_IRQ_REPLAYING prevents interrupt handlers from enabling
+* MSR[EE] to get PMIs, which can result in more IRQs becoming
+* pending.
+*/
+   local_paca->irq_happened |= PACA_IRQ_REPLAYING;
+
ppc_save_regs();
regs.softe = IRQS_ENABLED;
regs.msr |= MSR_EE;
 
-again:
/*
 * Force the delivery of pending soft-disabled interrupts on PS3.
 * Any HV call will have this side effect.
@@ -175,13 +168,14 @@ void replay_soft_interrupts(void)
next_interrupt();
}
 
-   /*
-* Softirq 

[RFC PATCH] KVM: PPC: Book3S PR: Fix context tracking

2022-10-06 Thread Nicholas Piggin
The main difficulty with supporting context tracking is exiting
from guest context before taking any host interrupts. That path
is all done in assembly with a clever trampoline and bounce to
interrupt handler then back to exit handler with interrupts enabled.
This patch adds a call out to C in virtual mode to exit guest context
before any host interrupts are taken, and splits the context from
timing management because timing wants to be called after interrupts
are enabled.

This is able to boot a a guest on a 64s host without crashing now.
32s doesn't support context tracking yet so that isn't tested, but
it should work, modulo possible incorrect instruction relocation
address when enabling IR.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/Kconfig |  2 --
 arch/powerpc/kvm/book3s_exports.c|  7 +++
 arch/powerpc/kvm/book3s_pr.c | 10 +-
 arch/powerpc/kvm/book3s_rmhandlers.S |  3 ++-
 arch/powerpc/kvm/book3s_segment.S| 28 +++-
 arch/powerpc/kvm/powerpc.c   |  3 ++-
 6 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index a9f57dad6d91..5d580c0f6500 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -51,7 +51,6 @@ config KVM_BOOK3S_HV_POSSIBLE
 config KVM_BOOK3S_32
tristate "KVM support for PowerPC book3s_32 processors"
depends on PPC_BOOK3S_32 && !SMP && !PTE_64BIT
-   depends on !CONTEXT_TRACKING_USER
select KVM
select KVM_BOOK3S_32_HANDLER
select KVM_BOOK3S_PR_POSSIBLE
@@ -106,7 +105,6 @@ config KVM_BOOK3S_64_HV
 config KVM_BOOK3S_64_PR
tristate "KVM support without using hypervisor mode in host"
depends on KVM_BOOK3S_64
-   depends on !CONTEXT_TRACKING_USER
select KVM_BOOK3S_PR_POSSIBLE
help
  Support running guest kernels in virtual machines on processors
diff --git a/arch/powerpc/kvm/book3s_exports.c 
b/arch/powerpc/kvm/book3s_exports.c
index f08565885ddf..cd40a46215cb 100644
--- a/arch/powerpc/kvm/book3s_exports.c
+++ b/arch/powerpc/kvm/book3s_exports.c
@@ -6,6 +6,7 @@
  * Authors: Alexander Graf 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -15,5 +16,11 @@ EXPORT_SYMBOL_GPL(kvmppc_hv_entry_trampoline);
 #endif
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 EXPORT_SYMBOL_GPL(kvmppc_entry_trampoline);
+
+void kvmppc_guest_state_exit_irqoff(void)
+{
+   guest_state_exit_irqoff();
+}
+EXPORT_SYMBOL_GPL(kvmppc_guest_state_exit_irqoff);
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d6abed6e51e6..60d7a11a7cbc 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1144,6 +1144,12 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, 
unsigned int exit_nr)
int r = RESUME_HOST;
int s;
 
+#ifdef CONFIG_CONTEXT_TRACKING_USER
+   local_irq_disable();
+   guest_timing_exit_irqoff();
+   local_irq_enable();
+#endif
+
vcpu->stat.sum_exits++;
 
run->exit_reason = KVM_EXIT_UNKNOWN;
@@ -1152,7 +1158,6 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned 
int exit_nr)
/* We get here with MSR.EE=1 */
 
trace_kvm_exit(exit_nr, vcpu);
-   guest_exit();
 
switch (exit_nr) {
case BOOK3S_INTERRUPT_INST_STORAGE:
@@ -1448,6 +1453,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned 
int exit_nr)
else {
/* interrupts now hard-disabled */
kvmppc_fix_ee_before_entry();
+   guest_state_enter_irqoff();
}
 
kvmppc_handle_lost_ext(vcpu);
@@ -1844,6 +1850,8 @@ static int kvmppc_vcpu_run_pr(struct kvm_vcpu *vcpu)
 
kvmppc_fix_ee_before_entry();
 
+   guest_state_enter_irqoff();
+
ret = __kvmppc_vcpu_run(vcpu);
 
kvmppc_clear_debug(vcpu);
diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S 
b/arch/powerpc/kvm/book3s_rmhandlers.S
index 03886ca24498..d3558a7509e7 100644
--- a/arch/powerpc/kvm/book3s_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_rmhandlers.S
@@ -152,7 +152,8 @@ _GLOBAL_TOC(kvmppc_entry_trampoline)
andcr6, r5, r6  /* Clear DR and IR in MSR value */
/*
 * Set EE in HOST_MSR so that it's enabled when we get into our
-* C exit handler function.
+* C exit handler function. If we are tracking context then must
+* not take host interrupts before switching out of guest context.
 */
ori r5, r5, MSR_EE
mtsrr0  r7
diff --git a/arch/powerpc/kvm/book3s_segment.S 
b/arch/powerpc/kvm/book3s_segment.S
index 202046a83fc1..04f479d07ca8 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -363,12 +363,38 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 *
 * R1   = host R1
 * R2   = host R2
-* R10  = raw exit handler id
+* 

[PATCH 2/2] powerpc/32: select HAVE_VIRT_CPU_ACCOUNTING_GEN

2022-10-06 Thread Nicholas Piggin
cputime_t is no longer a type, so VIRT_CPU_ACCOUNTING_GEN does not
have any affect on the type for 32-bit architectures, so there is
no reason it can't be supported.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f667279ec74c..a6e1ba506b72 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -252,6 +252,7 @@ config PPC
select HAVE_STATIC_CALL if PPC32
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
+   select HAVE_VIRT_CPU_ACCOUNTING_GEN
select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
select IOMMU_HELPER if PPC64
select IRQ_DOMAIN
-- 
2.37.2



[PATCH 1/2] powerpc/32: implement HAVE_CONTEXT_TRACKING_USER support

2022-10-06 Thread Nicholas Piggin
Context tracking involves tracking user, kernel, guest switches. This
enables existing context tracking code for interrupt entry on 32-bit.
interrupt exit already has context tracking calls, and KVM can not be
selected if CONTEXT_TRACKING_USER=y.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/interrupt.h | 35 +++-
 2 files changed, 9 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 81c9f895d690..f667279ec74c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -204,7 +204,7 @@ config PPC
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ASM_MODVERSIONS
-   select HAVE_CONTEXT_TRACKING_USER   if PPC64
+   select HAVE_CONTEXT_TRACKING_USER
select HAVE_C_RECORDMCOUNT
select HAVE_DEBUG_KMEMLEAK
select HAVE_DEBUG_STACKOVERFLOW
diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 4745bb9998bd..f88a145fb61a 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -74,17 +74,18 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
 /*
  * WARN/BUG is handled with a program interrupt so minimise checks here to
  * avoid recursion and maximise the chance of getting the first oops handled.
  */
 #define INT_SOFT_MASK_BUG_ON(regs, cond)   \
 do {   \
-   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG) &&   \
-   (user_mode(regs) || (TRAP(regs) != INTERRUPT_PROGRAM))) \
+   if ((user_mode(regs) || (TRAP(regs) != INTERRUPT_PROGRAM))) \
BUG_ON(cond);   \
 } while (0)
+#else
+#define INT_SOFT_MASK_BUG_ON(regs, cond)
 #endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -151,28 +152,8 @@ static inline void booke_restore_dbcr0(void)
 
 static inline void interrupt_enter_prepare(struct pt_regs *regs)
 {
-#ifdef CONFIG_PPC32
-   if (!arch_irq_disabled_regs(regs))
-   trace_hardirqs_off();
-
-   if (user_mode(regs))
-   kuap_lock();
-   else
-   kuap_save_and_lock(regs);
-
-   if (user_mode(regs))
-   account_cpu_user_entry();
-#endif
-
 #ifdef CONFIG_PPC64
-   bool trace_enable = false;
-
-   if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS)) {
-   if (irq_soft_mask_set_return(IRQS_ALL_DISABLED) == IRQS_ENABLED)
-   trace_enable = true;
-   } else {
-   irq_soft_mask_set(IRQS_ALL_DISABLED);
-   }
+   irq_soft_mask_set(IRQS_ALL_DISABLED);
 
/*
 * If the interrupt was taken with HARD_DIS clear, then enable MSR[EE].
@@ -188,9 +169,10 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs)
} else {
__hard_RI_enable();
}
+   /* Enable MSR[RI] early, to support kernel SLB and hash faults */
+#endif
 
-   /* Do this when RI=1 because it can cause SLB faults */
-   if (trace_enable)
+   if (!arch_irq_disabled_regs(regs))
trace_hardirqs_off();
 
if (user_mode(regs)) {
@@ -215,7 +197,6 @@ static inline void interrupt_enter_prepare(struct pt_regs 
*regs)
}
INT_SOFT_MASK_BUG_ON(regs, !arch_irq_disabled_regs(regs) &&
   !(regs->msr & MSR_EE));
-#endif
 
booke_restore_dbcr0();
 }
-- 
2.37.2



[PATCH 0/2] powerpc/32: nohz full support

2022-10-06 Thread Nicholas Piggin
On top of the previous fix to prevent KVM being selected with
context tracking on 32-bit, this should be good to go.

Since RFC:
- Improved code sharing in interrupt.h
- Minimal patch for VIRT_CPU_ACCOUNTING_GEN support

Thanks,
Nick

Nicholas Piggin (2):
  powerpc/32: implement HAVE_CONTEXT_TRACKING_USER support
  powerpc/32: select HAVE_VIRT_CPU_ACCOUNTING_GEN

 arch/powerpc/Kconfig |  3 ++-
 arch/powerpc/include/asm/interrupt.h | 35 +++-
 2 files changed, 10 insertions(+), 28 deletions(-)

-- 
2.37.2



[PATCH 4/4] powerpc/64: Fix perf profiling asynchronous interrupt handlers

2022-10-06 Thread Nicholas Piggin
Interrupt entry sets the soft mask to IRQS_ALL_DISABLED to match the
hard irq disabled state. So when should_hard_irq_enable() reutrns true
because we want PMI interrupts in irq handlers, MSR[EE] is enabled but
any PMI just gets masked. Fix this by clearing IRQS_PMI_DISABLED before
enabling MSR[EE].

This also tidies some of the warnings, no need to duplicate them in
both should_hard_irq_enable() and do_hard_irq_enable().

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/hw_irq.h | 41 ++-
 arch/powerpc/kernel/dbell.c   |  2 +-
 arch/powerpc/kernel/irq.c |  2 +-
 arch/powerpc/kernel/time.c|  2 +-
 4 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 77fa88c2aed0..265d0eb7ed79 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -173,6 +173,15 @@ static inline notrace unsigned long 
irq_soft_mask_or_return(unsigned long mask)
return flags;
 }
 
+static inline notrace unsigned long irq_soft_mask_andc_return(unsigned long 
mask)
+{
+   unsigned long flags = irq_soft_mask_return();
+
+   irq_soft_mask_set(flags & ~mask);
+
+   return flags;
+}
+
 static inline unsigned long arch_local_save_flags(void)
 {
return irq_soft_mask_return();
@@ -331,10 +340,11 @@ bool power_pmu_wants_prompt_pmi(void);
  * is a different soft-masked interrupt pending that requires hard
  * masking.
  */
-static inline bool should_hard_irq_enable(void)
+static inline bool should_hard_irq_enable(struct pt_regs *regs)
 {
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
-   WARN_ON(irq_soft_mask_return() == IRQS_ENABLED);
+   WARN_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
+   WARN_ON(!(get_paca()->irq_happened & PACA_IRQ_HARD_DIS));
WARN_ON(mfmsr() & MSR_EE);
}
 
@@ -347,8 +357,17 @@ static inline bool should_hard_irq_enable(void)
 *
 * TODO: Add test for 64e
 */
-   if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !power_pmu_wants_prompt_pmi())
-   return false;
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S_64)) {
+   if (!power_pmu_wants_prompt_pmi())
+   return false;
+   /*
+* If PMIs are disabled then IRQs should be disabled as well,
+* so we shouldn't see this condition, check for it just in
+* case because we are about to enable PMIs.
+*/
+   if (WARN_ON_ONCE(regs->softe & IRQS_PMI_DISABLED))
+   return false;
+   }
 
if (get_paca()->irq_happened & PACA_IRQ_MUST_HARD_MASK)
return false;
@@ -358,18 +377,16 @@ static inline bool should_hard_irq_enable(void)
 
 /*
  * Do the hard enabling, only call this if should_hard_irq_enable is true.
+ * This allows PMI interrupts to profile irq handlers.
  */
 static inline void do_hard_irq_enable(void)
 {
-   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
-   WARN_ON(irq_soft_mask_return() == IRQS_ENABLED);
-   WARN_ON(get_paca()->irq_happened & PACA_IRQ_MUST_HARD_MASK);
-   WARN_ON(mfmsr() & MSR_EE);
-   }
/*
-* This allows PMI interrupts (and watchdog soft-NMIs) through.
-* There is no other reason to enable this way.
+* Asynch interrupts come in with IRQS_ALL_DISABLED,
+* PACA_IRQ_HARD_DIS, and MSR[EE]=0.
 */
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+   irq_soft_mask_andc_return(IRQS_PMI_DISABLED);
get_paca()->irq_happened &= ~PACA_IRQ_HARD_DIS;
__hard_irq_enable();
 }
@@ -452,7 +469,7 @@ static inline bool arch_irq_disabled_regs(struct pt_regs 
*regs)
return !(regs->msr & MSR_EE);
 }
 
-static __always_inline bool should_hard_irq_enable(void)
+static __always_inline bool should_hard_irq_enable(struct pt_regs *regs)
 {
return false;
 }
diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
index f55c6fb34a3a..5712dd846263 100644
--- a/arch/powerpc/kernel/dbell.c
+++ b/arch/powerpc/kernel/dbell.c
@@ -27,7 +27,7 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(doorbell_exception)
 
ppc_msgsync();
 
-   if (should_hard_irq_enable())
+   if (should_hard_irq_enable(regs))
do_hard_irq_enable();
 
kvmppc_clear_host_ipi(smp_processor_id());
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 0f17268c1f0b..58e8774793b8 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -238,7 +238,7 @@ static void __do_irq(struct pt_regs *regs, unsigned long 
oldsp)
irq = static_call(ppc_get_irq)();
 
/* We can hard enable interrupts now to allow perf interrupts */
-   if (should_hard_irq_enable())
+   if (should_hard_irq_enable(regs))
do_hard_irq_enable();
 
/* And 

[PATCH 3/4] powerpc/64e/interrupt: Prevent NMI PMI causing a dangerous warning

2022-10-06 Thread Nicholas Piggin
As explained in the fix for 64s, NMI PMIs should not return using
the normal interrupt_return function. If such a PMI hits in code
returning to user with the context switched to user mode, this warning
can fire. This was enough to cause crashes when reproducing on 64s,
because another perf interrupt would hit while reporting bug, and
that would cause another bug, and so on.

Work around this for now just by disabling that warning on 64e, which
improves stability. Make a note of what the cleaner fix would be.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64e.S |  7 +++
 arch/powerpc/kernel/interrupt.c  | 13 ++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 930e36099015..d8bf8b94401b 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -813,6 +813,13 @@ kernel_dbg_exc:
EXCEPTION_COMMON(0x260)
CHECK_NAPPING()
addir3,r1,STACK_FRAME_OVERHEAD
+   /*
+* XXX: Returning from performance_monitor_exception taken as a
+* soft-NMI (Linux irqs disabled) may be risky to use interrupt_return
+* and could cause bugs in return or elsewhere. That case should just
+* restore registers and return. There is a workaround for this for one
+* known problem in interrupt_exit_kernel_prepare().
+*/
bl  performance_monitor_exception
b   interrupt_return
 
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index f9db0a172401..299683d1f8e5 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -374,10 +374,17 @@ notrace unsigned long 
interrupt_exit_kernel_prepare(struct pt_regs *regs)
if (regs_is_unrecoverable(regs))
unrecoverable_exception(regs);
/*
-* CT_WARN_ON comes here via program_check_exception,
-* so avoid recursion.
+* CT_WARN_ON comes here via program_check_exception, so avoid
+* recursion.
+*
+* Skip the assertion if 64e to work around a problem caused by NMI
+* PMIs incorrectly taking this interrupt return path, it's possible
+* for this to hit after interrupt exit to user switches context to
+* user. See also the performance monitor handler in
+* exceptions-64e.S
 */
-   if (TRAP(regs) != INTERRUPT_PROGRAM)
+   if (TRAP(regs) != INTERRUPT_PROGRAM &&
+   !(IS_ENABLED(CONFIG_PPC_BOOK3E_64)))
CT_WARN_ON(ct_state() == CONTEXT_USER);
 
kuap = kuap_get_and_assert_locked();
-- 
2.37.2



[PATCH 2/4] powerpc/64s/interrupt: Perf NMI should not take normal exit path

2022-10-06 Thread Nicholas Piggin
NMI interrupts should exit with EXCEPTION_RESTORE_REGS not with
interrupt_return_srr, which is what the perf NMI handler currently does.
This breaks if a PMI hits after interrupt_exit_user_prepare_main() has
switched the context tracking to user mode, then the CT_WARN_ON() in
interrupt_exit_kernel_prepare() fires because it returns to kernel with
context set to user.

This could possibly be solved by soft-disabling PMIs in the exit path,
but that reduces our ability to profile that code. The warning could be
removed, but it's potentially useful.

All other NMIs and soft-NMIs return using EXCEPTION_RESTORE_REGS, so
this makes perf interrupts consistent with that and seems like the best
fix.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 14 +-
 arch/powerpc/kernel/traps.c  |  2 --
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 5381a43e50fe..651c36b056bd 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2357,9 +2357,21 @@ EXC_VIRT_END(performance_monitor, 0x4f00, 0x20)
 EXC_COMMON_BEGIN(performance_monitor_common)
GEN_COMMON performance_monitor
addir3,r1,STACK_FRAME_OVERHEAD
-   bl  performance_monitor_exception
+   lbz r4,PACAIRQSOFTMASK(r13)
+   cmpdi   r4,IRQS_ENABLED
+   bne 1f
+   bl  performance_monitor_exception_async
b   interrupt_return_srr
+1:
+   bl  performance_monitor_exception_nmi
+   /* Clear MSR_RI before setting SRR0 and SRR1. */
+   li  r9,0
+   mtmsrd  r9,1
 
+   kuap_kernel_restore r9, r10
+
+   EXCEPTION_RESTORE_REGS hsrr=0
+   RFI_TO_KERNEL
 
 /**
  * Interrupt 0xf20 - Vector Unavailable Interrupt.
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 9bdd79aa51cf..6138ee22d06c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1899,7 +1899,6 @@ DEFINE_INTERRUPT_HANDLER(vsx_unavailable_tm)
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 #ifdef CONFIG_PPC64
-DECLARE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi);
 DEFINE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi)
 {
__this_cpu_inc(irq_stat.pmu_irqs);
@@ -1910,7 +1909,6 @@ 
DEFINE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi)
 }
 #endif
 
-DECLARE_INTERRUPT_HANDLER_ASYNC(performance_monitor_exception_async);
 DEFINE_INTERRUPT_HANDLER_ASYNC(performance_monitor_exception_async)
 {
__this_cpu_inc(irq_stat.pmu_irqs);
-- 
2.37.2



[PATCH 1/4] KVM: PPC: BookS PR-KVM and BookE do not support context tracking

2022-10-06 Thread Nicholas Piggin
The context tracking code in PR-KVM and BookE implementations is not
complete, and can cause host crashes if context tracking is enabled.

Make these implementations depend on !CONTEXT_TRACKING_USER.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/Kconfig | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 61cdd782d3c5..a9f57dad6d91 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -51,6 +51,7 @@ config KVM_BOOK3S_HV_POSSIBLE
 config KVM_BOOK3S_32
tristate "KVM support for PowerPC book3s_32 processors"
depends on PPC_BOOK3S_32 && !SMP && !PTE_64BIT
+   depends on !CONTEXT_TRACKING_USER
select KVM
select KVM_BOOK3S_32_HANDLER
select KVM_BOOK3S_PR_POSSIBLE
@@ -105,6 +106,7 @@ config KVM_BOOK3S_64_HV
 config KVM_BOOK3S_64_PR
tristate "KVM support without using hypervisor mode in host"
depends on KVM_BOOK3S_64
+   depends on !CONTEXT_TRACKING_USER
select KVM_BOOK3S_PR_POSSIBLE
help
  Support running guest kernels in virtual machines on processors
@@ -190,6 +192,7 @@ config KVM_EXIT_TIMING
 config KVM_E500V2
bool "KVM support for PowerPC E500v2 processors"
depends on PPC_E500 && !PPC_E500MC
+   depends on !CONTEXT_TRACKING_USER
select KVM
select KVM_MMIO
select MMU_NOTIFIER
@@ -205,6 +208,7 @@ config KVM_E500V2
 config KVM_E500MC
bool "KVM support for PowerPC E500MC/E5500/E6500 processors"
depends on PPC_E500MC
+   depends on !CONTEXT_TRACKING_USER
select KVM
select KVM_MMIO
select KVM_BOOKE_HV
-- 
2.37.2



[PATCH 0/4] powerpc: misc interrupt and context tracking fixes

2022-10-06 Thread Nicholas Piggin
These are several fixes for regressions and bugs that can crash
the host.

Thanks,
Nick

Nicholas Piggin (4):
  KVM: PPC: BookS PR-KVM and BookE do not support context tracking
  powerpc/64s/interrupt: Perf NMI should not take normal exit path
  powerpc/64e/interrupt: Prevent NMI PMI causing a dangerous warning
  powerpc/64: Fix perf profiling asynchronous interrupt handlers

 arch/powerpc/include/asm/hw_irq.h| 41 
 arch/powerpc/kernel/dbell.c  |  2 +-
 arch/powerpc/kernel/exceptions-64e.S |  7 +
 arch/powerpc/kernel/exceptions-64s.S | 14 +-
 arch/powerpc/kernel/interrupt.c  | 13 +++--
 arch/powerpc/kernel/irq.c|  2 +-
 arch/powerpc/kernel/time.c   |  2 +-
 arch/powerpc/kernel/traps.c  |  2 --
 arch/powerpc/kvm/Kconfig |  4 +++
 9 files changed, 66 insertions(+), 21 deletions(-)

-- 
2.37.2



Re: [PATCH] tools/perf: Fix aggr_printout to display cpu field irrespective of core value

2022-10-06 Thread Arnaldo Carvalho de Melo
Em Thu, Oct 06, 2022 at 06:16:14PM +0530, Athira Rajeev escreveu:
> 
> 
> > On 05-Oct-2022, at 6:05 PM, Arnaldo Carvalho de Melo  
> > wrote:
> > 
> > Em Wed, Oct 05, 2022 at 09:28:52AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Em Wed, Oct 05, 2022 at 10:23:39AM +0530, Athira Rajeev escreveu:
> >>> diff --git a/tools/perf/util/stat-display.c 
> >>> b/tools/perf/util/stat-display.c
> >>> index b82844cb0ce7..cf28020798ec 100644
> >>> --- a/tools/perf/util/stat-display.c
> >>> +++ b/tools/perf/util/stat-display.c
> >>> @@ -168,7 +168,7 @@ static void aggr_printout(struct perf_stat_config 
> >>> *config,
> >>>   id.socket,
> >>>   id.die,
> >>>   id.core);
> >>> - } else if (id.core > -1) {
> >>> + } else if (id.cpu.cpu > -1) {
> >>>   fprintf(config->output, "\"cpu\" : \"%d\", ",
> >>>   id.cpu.cpu);
> >>>   }
> >>> @@ -179,7 +179,7 @@ static void aggr_printout(struct perf_stat_config 
> >>> *config,
> >>>   id.die,
> >>>   config->csv_output ? 0 : -3,
> >>>   id.core, config->csv_sep);
> >>> - } else if (id.core > -1) {
> >>> + } else if (id.cpu.cpu > -1) {
> >>>   fprintf(config->output, "CPU%*d%s",
> >>>   config->csv_output ? 0 : -7,
> >>>   id.cpu.cpu, config->csv_sep);
> >>> -- 
> >>> If it is confusing, shall I send it as a separate patch along with 
> >>> Tested-by from Ian ?
> >> 
> >> I'll have to do this by hand, tried pointing b4 to this message and it
> >> picked the old one, also tried to save the message and apply by hand,
> >> its mangled.
> > 
> > This is what I have now, will force push later, please triple check :-)
> 
> 
> Sorry for all the confusion Arnaldo. Hereafter, I will resubmit the patch to 
> avoid this.
> In below patch which you shared, code change is correct. But commit message 
> is different.
> So to avoid further confusion, I went ahead and posted a separate patch in 
> the mailing list with:
> 
> subject: [PATCH] tools/perf: Fix cpu check to use id.cpu.cpu in aggr_printout
> Patch link: 
> https://lore.kernel.org/lkml/20221006114225.66303-1-atraj...@linux.vnet.ibm.com/T/#u
> 
> Please pick the separately send patch.

Ok, will do.

- Arnaldo


[PATCH] powerpc: Don't add __powerpc_ prefix to syscall entry points

2022-10-06 Thread Michael Ellerman
When using syscall wrappers the __SYSCALL_DEFINEx() and related macros
add a "__powerpc_" prefix to all syscall entry points.

So for example sys_mmap becomes __powerpc_sys_mmap.

This risks breaking workflows and tools that expect the old naming
scheme. At a minimum setting a breakpoint on eg. sys_mmap with gdb no
longer works.

There seems to be no compelling reason to add the "__powerpc_" prefix,
other than that it follows what some other arches do (x86, arm64, s390).

But unlike other arches powerpc doesn't always enable syscall wrappers,
so the syscall entry points can change name depending on CONFIG options.

For those reasons drop the "__powerpc_" prefix, reverting to the
existing naming.

Doing so reveals two prototypes in signal.h that have the incorrect type
when syscall wrappers are enabled. There are already prototypes for both
functions in syscalls.h, so drop the ones from signal.h.

Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/syscall_wrapper.h | 18 --
 arch/powerpc/include/asm/syscalls.h|  2 +-
 arch/powerpc/kernel/signal.h   |  3 ---
 arch/powerpc/kernel/systbl.c   |  3 +--
 4 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/syscall_wrapper.h 
b/arch/powerpc/include/asm/syscall_wrapper.h
index 75b41b173f7a..67486c67e8a2 100644
--- a/arch/powerpc/include/asm/syscall_wrapper.h
+++ b/arch/powerpc/include/asm/syscall_wrapper.h
@@ -16,11 +16,11 @@ struct pt_regs;
  ,,regs->gpr[6],,regs->gpr[7],,regs->gpr[8])
 
 #define __SYSCALL_DEFINEx(x, name, ...)
\
-   long __powerpc_sys##name(const struct pt_regs *regs);   
\
-   ALLOW_ERROR_INJECTION(__powerpc_sys##name, ERRNO);  
\
+   long sys##name(const struct pt_regs *regs); \
+   ALLOW_ERROR_INJECTION(sys##name, ERRNO);\
static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); 
\
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));  
\
-   long __powerpc_sys##name(const struct pt_regs *regs)
\
+   long sys##name(const struct pt_regs *regs)  \
{   
\
return __se_sys##name(SC_POWERPC_REGS_TO_ARGS(x,__VA_ARGS__));  
\
}   
\
@@ -35,17 +35,15 @@ struct pt_regs;
 
 #define SYSCALL_DEFINE0(sname) 
\
SYSCALL_METADATA(_##sname, 0);  
\
-   long __powerpc_sys_##sname(const struct pt_regs *__unused); 
\
-   ALLOW_ERROR_INJECTION(__powerpc_sys_##sname, ERRNO);
\
-   long __powerpc_sys_##sname(const struct pt_regs *__unused)
+   long sys_##sname(const struct pt_regs *__unused);   \
+   ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);  \
+   long sys_##sname(const struct pt_regs *__unused)
 
 #define COND_SYSCALL(name) 
\
-   long __powerpc_sys_##name(const struct pt_regs *regs);  
\
-   long __weak __powerpc_sys_##name(const struct pt_regs *regs)
\
+   long sys_##name(const struct pt_regs *regs);\
+   long __weak sys_##name(const struct pt_regs *regs)  \
{   
\
return sys_ni_syscall();
\
}
 
-#define SYS_NI(name) SYSCALL_ALIAS(__powerpc_sys_##name, sys_ni_posix_timers);
-
 #endif // __ASM_POWERPC_SYSCALL_WRAPPER_H
diff --git a/arch/powerpc/include/asm/syscalls.h 
b/arch/powerpc/include/asm/syscalls.h
index 49bbc3e0733d..9840d572da55 100644
--- a/arch/powerpc/include/asm/syscalls.h
+++ b/arch/powerpc/include/asm/syscalls.h
@@ -124,7 +124,7 @@ long sys_ppc_fadvise64_64(int fd, int advice,
 
 #define __SYSCALL_WITH_COMPAT(nr, native, compat)  __SYSCALL(nr, native)
 #define __SYSCALL(nr, entry) \
-   long __powerpc_##entry(const struct pt_regs *regs);
+   long entry(const struct pt_regs *regs);
 
 #ifdef CONFIG_PPC64
 #include 
diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
index 618aeccdf691..a429c57ed433 100644
--- a/arch/powerpc/kernel/signal.h
+++ b/arch/powerpc/kernel/signal.h
@@ -196,9 +196,6 @@ extern int handle_rt_signal64(struct ksignal *ksig, 
sigset_t *set,
 
 #else /* CONFIG_PPC64 */
 
-extern long sys_rt_sigreturn(void);
-extern long sys_sigreturn(void);
-
 static inline int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
 struct task_struct *tsk)
 {
diff --git 

Re: [PATCH v1 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Andy Shevchenko
On Thu, Oct 06, 2022 at 07:05:48AM -0600, Jason A. Donenfeld wrote:
> On Thu, Oct 6, 2022 at 6:47 AM Jason Gunthorpe  wrote:
> > On Wed, Oct 05, 2022 at 11:48:42PM +0200, Jason A. Donenfeld wrote:

...

> > > - u32 isn = (prandom_u32() & ~7UL) - 1;
> > > + u32 isn = (get_random_u32() & ~7UL) - 1;
> >
> > Maybe this wants to be written as
> >
> > (prandom_max(U32_MAX >> 7) << 7) | 7

> > ?
> 
> Holy smokes. Yea I guess maybe? It doesn't exactly gain anything or
> make the code clearer though, and is a little bit more magical than
> I'd like on a first pass.

Shouldn't the two first 7s to be 3s?

...

> > > - psn = prandom_u32() & 0xff;
> > > + psn = get_random_u32() & 0xff;
> >
> >  prandom_max(0xff + 1)
> 
> That'd work, but again it's not more clear. Authors here are going for
> a 24-bit number, and masking seems like a clear way to express that.

We have some 24-bit APIs (and 48-bit) already in kernel, why not to have
get_random_u24() ?


-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH v1 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Jason Gunthorpe
On Thu, Oct 06, 2022 at 07:05:48AM -0600, Jason A. Donenfeld wrote:

> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
> > > b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > index fd9d7f2c4d64..a605cf66b83e 100644
> > > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > @@ -465,7 +465,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id 
> > > *cm_id,
> > >   goto err_qp;
> > >   }
> > >
> > > - psn = prandom_u32() & 0xff;
> > > + psn = get_random_u32() & 0xff;
> >
> >  prandom_max(0xff + 1)
> 
> That'd work, but again it's not more clear. Authors here are going for
> a 24-bit number, and masking seems like a clear way to express that.

vs just asking directly for a 24 bit number?

Jason


Re: [PATCH] tools/perf: Fix aggr_printout to display cpu field irrespective of core value

2022-10-06 Thread Athira Rajeev



> On 05-Oct-2022, at 6:05 PM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Wed, Oct 05, 2022 at 09:28:52AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Wed, Oct 05, 2022 at 10:23:39AM +0530, Athira Rajeev escreveu:
>>> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
>>> index b82844cb0ce7..cf28020798ec 100644
>>> --- a/tools/perf/util/stat-display.c
>>> +++ b/tools/perf/util/stat-display.c
>>> @@ -168,7 +168,7 @@ static void aggr_printout(struct perf_stat_config 
>>> *config,
>>> id.socket,
>>> id.die,
>>> id.core);
>>> -   } else if (id.core > -1) {
>>> +   } else if (id.cpu.cpu > -1) {
>>> fprintf(config->output, "\"cpu\" : \"%d\", ",
>>> id.cpu.cpu);
>>> }
>>> @@ -179,7 +179,7 @@ static void aggr_printout(struct perf_stat_config 
>>> *config,
>>> id.die,
>>> config->csv_output ? 0 : -3,
>>> id.core, config->csv_sep);
>>> -   } else if (id.core > -1) {
>>> +   } else if (id.cpu.cpu > -1) {
>>> fprintf(config->output, "CPU%*d%s",
>>> config->csv_output ? 0 : -7,
>>> id.cpu.cpu, config->csv_sep);
>>> -- 
>>> If it is confusing, shall I send it as a separate patch along with 
>>> Tested-by from Ian ?
>> 
>> I'll have to do this by hand, tried pointing b4 to this message and it
>> picked the old one, also tried to save the message and apply by hand,
>> its mangled.
> 
> This is what I have now, will force push later, please triple check :-)


Sorry for all the confusion Arnaldo. Hereafter, I will resubmit the patch to 
avoid this.
In below patch which you shared, code change is correct. But commit message is 
different.
So to avoid further confusion, I went ahead and posted a separate patch in the 
mailing list with:

subject: [PATCH] tools/perf: Fix cpu check to use id.cpu.cpu in aggr_printout
Patch link: 
https://lore.kernel.org/lkml/20221006114225.66303-1-atraj...@linux.vnet.ibm.com/T/#u

Please pick the separately send patch.

Thanks
Athira

> 
> - Arnaldo
> 
> From b7dd96f9211e4ddbd6fa080da8dec2eac98d3f2a Mon Sep 17 00:00:00 2001
> From: Athira Rajeev 
> Date: Tue, 13 Sep 2022 17:27:17 +0530
> Subject: [PATCH 1/1] perf stat: Fix aggr_printout to display CPU field
> irrespective of core value
> 
> perf stat includes option to specify aggr_mode to display per-socket,
> per-core, per-die, per-node counter details.  Also there is option -A (
> AGGR_NONE, -no-aggr ), where the counter values are displayed for each
> CPU along with "CPU" value in one field of the output.
> 
> Each of the aggregate mode uses the information fetched from
> "/sys/devices/system/cpu/cpuX/topology" like core_id,
> physical_package_id. Utility functions in "cpumap.c" fetches this
> information and populates the socket id, core id, CPU etc.  If the
> platform does not expose the topology information, these values will be
> set to -1. Example, in case of powerpc, details like physical_package_id
> is restricted to be exposed in pSeries platform. So id.socket, id.core,
> id.cpu all will be set as -1.
> 
> In case of displaying socket or die value, there is no check done in the
> "aggr_printout" function to see if it points to valid socket id or die.
> But for displaying "cpu" value, there is a check for "if (id.core >
> -1)". In case of powerpc pSeries where detail like physical_package_id
> is restricted to be exposed, id.core will be set to -1. Hence the column
> or field itself for CPU won't be displayed in the output.
> 
> Result for per-socket:
> 
>  perf stat -e branches --per-socket -a true
> 
>   Performance counter stats for 'system wide':
> 
>  S-1  32416,851  branches
> 
> Here S has -1 in above result. But with -A option which also expects CPU
> in one column in the result, below is observed.
> 
>  perf stat -e instructions -A -a true
> 
>   Performance counter stats for 'system wide':
> 
>47,146  instructions
>45,226  instructions
>43,354  instructions
>45,184  instructions
> 
> If the CPU id value is pointing to -1 also, it makes sense to display
> the column in the output to replicate the behaviour or to be in
> precedence with other aggr options(like per-socket, per-core). Remove
> the check "id.core" so that CPU field gets displayed in the output.
> 
> After the fix:
> 
>  perf stat -e instructions -A -a true
> 
>   Performance counter stats for 'system wide':
> 
>  CPU-1  64,034  instructions
>  CPU-1  68,941  instructions
>  CPU-1  59,418  instructions
>  CPU-1  70,478 

Re: [f2fs-dev] [PATCH v1 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Jason A. Donenfeld
On Thu, Oct 06, 2022 at 10:43:31AM +0200, Jan Kara wrote:
> The code here is effectively doing the
> 
>   parent_group = prandom_u32_max(ngroups);
> 
> Similarly here we can use prandom_u32_max(ngroups) like:
> 
>   if (qstr) {
>   ...
>   parent_group = hinfo.hash % ngroups;
>   } else
>   parent_group = prandom_u32_max(ngroups);

Nice catch. I'll move these to patch #1.


> > diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
> > index 9af68a7ecdcf..588cb09c5291 100644
> > --- a/fs/ext4/mmp.c
> > +++ b/fs/ext4/mmp.c
> > @@ -265,7 +265,7 @@ static unsigned int mmp_new_seq(void)
> > u32 new_seq;
> >  
> > do {
> > -   new_seq = prandom_u32();
> > +   new_seq = get_random_u32();
> > } while (new_seq > EXT4_MMP_SEQ_MAX);
> 
> OK, here we again effectively implement prandom_u32_max(EXT4_MMP_SEQ_MAX + 1).
> Just presumably we didn't want to use modulo here because EXT4_MMP_SEQ_MAX
> is rather big and so the resulting 'new_seq' would be seriously
> non-uniform.

I'm not handling this during this patch set, but if in the course of
review we find enough places that want actually uniformly bounded
integers, I'll implement efficient rejection sampling to clean up these
cases, with something faster and general, and add a new function for it.
So far this is the first case to come up, but we'll probably eventually
find others. So I'll make note of this.

Jason


Re: [PATCH v1 2/5] treewide: use get_random_{u8,u16}() when possible

2022-10-06 Thread Jason A. Donenfeld
On Wed, Oct 05, 2022 at 09:38:02PM -0700, Kees Cook wrote:
> > diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
> > index 56ffaa8dd3f6..0131ed2cd1bd 100644
> > --- a/lib/test_vmalloc.c
> > +++ b/lib/test_vmalloc.c
> > @@ -80,7 +80,7 @@ static int random_size_align_alloc_test(void)
> > int i;
> >  
> > for (i = 0; i < test_loop_count; i++) {
> > -   rnd = prandom_u32();
> > +   rnd = get_random_u8();
> >  
> > /*
> >  * Maximum 1024 pages, if PAGE_SIZE is 4096.
> 
> This wasn't obvious either, but it looks like it's because it never
> consumes more than u8?

Right. The only uses of that are %23 and %10 later on down.

Jason


Re: [PATCH v2 1/2] device property: Introduce fwnode_device_is_compatible() helper

2022-10-06 Thread Andy Shevchenko
On Wed, Oct 05, 2022 at 09:05:54PM +, Sakari Ailus wrote:
> On Wed, Oct 05, 2022 at 06:29:46PM +0300, Andy Shevchenko wrote:

...

> fwnode_property_match_string() returns zero on success, therefore >= 0 is
> not needed. I'd just use !fwnode_property_match_string(...).

No, it's bug in the documentation, thanks to rising an attention,
I forgot to send a fix for it earlier.

> For both patches:
> 
> Reviewed-by: Sakari Ailus 

Thanks, but as stated above the condition in my patch is correct.
It seems due to documentation bug we have some kind of "buggy" code,
luckily not too many to fix.

That said, I'm not going to resend this until PPC (Freescale) maintainers
ask for it. Yang, what's your vision on this series?

-- 
With Best Regards,
Andy Shevchenko




[PATCH] tools/perf: Fix cpu check to use id.cpu.cpu in aggr_printout

2022-10-06 Thread Athira Rajeev
perf stat has options to aggregate the counts in different
modes like per socket, per core etc. The function "aggr_printout"
in util/stat-display.c which is used to print the aggregates,
has a check for cpu in case of AGGR_NONE. This check was
originally using condition : "if (id.cpu.cpu > -1)". But
this got changed after commit df936cadfb58 ("perf stat: Add
JSON output option"), which added option to output json format
for different aggregation modes. After this commit, the
check in "aggr_printout" is using "if (id.core > -1)".

The old code was using "id.cpu.cpu > -1" while the new code
is using "id.core > -1". But since the value printed is
id.cpu.cpu, fix this check to use cpu and not core.

Suggested-by: James Clark 
Suggested-by: Ian Rogers 
Signed-off-by: Athira Rajeev 
Tested-by: Ian Rogers 
---
Note: Details for discussion on this logic:
https://www.spinics.net/lists/linux-perf-users/msg22473.html

 tools/perf/util/stat-display.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index b82844cb0ce7..cf28020798ec 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -168,7 +168,7 @@ static void aggr_printout(struct perf_stat_config *config,
id.socket,
id.die,
id.core);
-   } else if (id.core > -1) {
+   } else if (id.cpu.cpu > -1) {
fprintf(config->output, "\"cpu\" : \"%d\", ",
id.cpu.cpu);
}
@@ -179,7 +179,7 @@ static void aggr_printout(struct perf_stat_config *config,
id.die,
config->csv_output ? 0 : -3,
id.core, config->csv_sep);
-   } else if (id.core > -1) {
+   } else if (id.cpu.cpu > -1) {
fprintf(config->output, "CPU%*d%s",
config->csv_output ? 0 : -7,
id.cpu.cpu, config->csv_sep);
-- 
2.31.1



Re: [PATCH v1 3/5] treewide: use get_random_u32() when possible

2022-10-06 Thread Jan Kara
On Wed 05-10-22 23:48:42, Jason A. Donenfeld wrote:
> The prandom_u32() function has been a deprecated inline wrapper around
> get_random_u32() for several releases now, and compiles down to the
> exact same code. Replace the deprecated wrapper with a direct call to
> the real function.
> 
> Signed-off-by: Jason A. Donenfeld 

...

> diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
> index 998dd2ac8008..e439a872c398 100644
> --- a/fs/ext2/ialloc.c
> +++ b/fs/ext2/ialloc.c
> @@ -277,7 +277,7 @@ static int find_group_orlov(struct super_block *sb, 
> struct inode *parent)
>   int best_ndir = inodes_per_group;
>   int best_group = -1;
>  
> - group = prandom_u32();
> + group = get_random_u32();
>   parent_group = (unsigned)group % ngroups;
>   for (i = 0; i < ngroups; i++) {
>   group = (parent_group + i) % ngroups;

The code here is effectively doing the

parent_group = prandom_u32_max(ngroups);

> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index f73e5eb43eae..954ec9736a8d 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -465,7 +465,7 @@ static int find_group_orlov(struct super_block *sb, 
> struct inode *parent,
>   ext4fs_dirhash(parent, qstr->name, qstr->len, );
>   grp = hinfo.hash;
>   } else
> - grp = prandom_u32();
> + grp = get_random_u32();

Similarly here we can use prandom_u32_max(ngroups) like:

if (qstr) {
...
parent_group = hinfo.hash % ngroups;
} else
parent_group = prandom_u32_max(ngroups);

> diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
> index 9af68a7ecdcf..588cb09c5291 100644
> --- a/fs/ext4/mmp.c
> +++ b/fs/ext4/mmp.c
> @@ -265,7 +265,7 @@ static unsigned int mmp_new_seq(void)
>   u32 new_seq;
>  
>   do {
> - new_seq = prandom_u32();
> + new_seq = get_random_u32();
>   } while (new_seq > EXT4_MMP_SEQ_MAX);

OK, here we again effectively implement prandom_u32_max(EXT4_MMP_SEQ_MAX + 1).
Just presumably we didn't want to use modulo here because EXT4_MMP_SEQ_MAX
is rather big and so the resulting 'new_seq' would be seriously
non-uniform.

Honza
-- 
Jan Kara 
SUSE Labs, CR


[PATCH] powerpc: remove the last remnants of cputime_t

2022-10-06 Thread Nicholas Piggin
cputime_t was a core kernel type, removed by commits
ed5c8c854f2b..b672592f0221. As explained in commit b672592f0221
("sched/cputime: Remove generic asm headers"), the final cleanup
is for the arch to provide cputime_to_nsec[s](). Commit ade7667a981b
("powerpc: Add cputime_to_nsecs()") did that, but justdidn't remove
the then-unused cputime_to_usecs(), cputime_t type, and associated
remnants.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/cputime.h | 17 +
 arch/powerpc/kernel/time.c | 23 ++-
 2 files changed, 3 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index 431ae2343022..4961fb38e438 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -21,23 +21,8 @@
 #include 
 #include 
 
-typedef u64 __nocast cputime_t;
-typedef u64 __nocast cputime64_t;
-
-#define cmpxchg_cputime(ptr, old, new) cmpxchg(ptr, old, new)
-
 #ifdef __KERNEL__
-/*
- * Convert cputime <-> microseconds
- */
-extern u64 __cputime_usec_factor;
-
-static inline unsigned long cputime_to_usecs(const cputime_t ct)
-{
-   return mulhdu((__force u64) ct, __cputime_usec_factor);
-}
-
-#define cputime_to_nsecs(cputime) tb_to_ns((__force u64)cputime)
+#define cputime_to_nsecs(cputime) tb_to_ns(cputime)
 
 /*
  * PPC64 uses PACA which is task independent for storing accounting data while
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index a2ab397065c6..d68de3618741 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -130,7 +130,7 @@ unsigned long tb_ticks_per_jiffy;
 unsigned long tb_ticks_per_usec = 100; /* sane default */
 EXPORT_SYMBOL(tb_ticks_per_usec);
 unsigned long tb_ticks_per_sec;
-EXPORT_SYMBOL(tb_ticks_per_sec);   /* for cputime_t conversions */
+EXPORT_SYMBOL(tb_ticks_per_sec);   /* for cputime conversions */
 
 DEFINE_SPINLOCK(rtc_lock);
 EXPORT_SYMBOL_GPL(rtc_lock);
@@ -150,21 +150,6 @@ EXPORT_SYMBOL_GPL(ppc_tb_freq);
 bool tb_invalid;
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-/*
- * Factor for converting from cputime_t (timebase ticks) to
- * microseconds. This is stored as 0.64 fixed-point binary fraction.
- */
-u64 __cputime_usec_factor;
-EXPORT_SYMBOL(__cputime_usec_factor);
-
-static void calc_cputime_factors(void)
-{
-   struct div_result res;
-
-   div128_by_32(100, 0, tb_ticks_per_sec, );
-   __cputime_usec_factor = res.result_low;
-}
-
 /*
  * Read the SPURR on systems that have it, otherwise the PURR,
  * or if that doesn't exist return the timebase value passed in.
@@ -369,10 +354,7 @@ void vtime_flush(struct task_struct *tsk)
acct->hardirq_time = 0;
acct->softirq_time = 0;
 }
-
-#else /* ! CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
-#define calc_cputime_factors()
-#endif
+#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 
 void __delay(unsigned long loops)
 {
@@ -914,7 +896,6 @@ void __init time_init(void)
tb_ticks_per_jiffy = ppc_tb_freq / HZ;
tb_ticks_per_sec = ppc_tb_freq;
tb_ticks_per_usec = ppc_tb_freq / 100;
-   calc_cputime_factors();
 
/*
 * Compute scale factor for sched_clock.
-- 
2.37.2



Re: [RFC PATCH 2/3] powerpc: remove the last remnants of cputime_t

2022-10-06 Thread Michael Ellerman
Nicholas Piggin  writes:
> cputime_t is no longer, converted to u64.

Would be good to have some explanation of why we had it and why we don't
need it anymore.

cheers

> diff --git a/arch/powerpc/include/asm/cputime.h 
> b/arch/powerpc/include/asm/cputime.h
> index 431ae2343022..4961fb38e438 100644
> --- a/arch/powerpc/include/asm/cputime.h
> +++ b/arch/powerpc/include/asm/cputime.h
> @@ -21,23 +21,8 @@
>  #include 
>  #include 
>  
> -typedef u64 __nocast cputime_t;
> -typedef u64 __nocast cputime64_t;
> -
> -#define cmpxchg_cputime(ptr, old, new) cmpxchg(ptr, old, new)
> -
>  #ifdef __KERNEL__
> -/*
> - * Convert cputime <-> microseconds
> - */
> -extern u64 __cputime_usec_factor;
> -
> -static inline unsigned long cputime_to_usecs(const cputime_t ct)
> -{
> - return mulhdu((__force u64) ct, __cputime_usec_factor);
> -}
> -
> -#define cputime_to_nsecs(cputime) tb_to_ns((__force u64)cputime)
> +#define cputime_to_nsecs(cputime) tb_to_ns(cputime)
>  
>  /*
>   * PPC64 uses PACA which is task independent for storing accounting data 
> while
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index a2ab397065c6..d68de3618741 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -130,7 +130,7 @@ unsigned long tb_ticks_per_jiffy;
>  unsigned long tb_ticks_per_usec = 100; /* sane default */
>  EXPORT_SYMBOL(tb_ticks_per_usec);
>  unsigned long tb_ticks_per_sec;
> -EXPORT_SYMBOL(tb_ticks_per_sec); /* for cputime_t conversions */
> +EXPORT_SYMBOL(tb_ticks_per_sec); /* for cputime conversions */
>  
>  DEFINE_SPINLOCK(rtc_lock);
>  EXPORT_SYMBOL_GPL(rtc_lock);
> @@ -150,21 +150,6 @@ EXPORT_SYMBOL_GPL(ppc_tb_freq);
>  bool tb_invalid;
>  
>  #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> -/*
> - * Factor for converting from cputime_t (timebase ticks) to
> - * microseconds. This is stored as 0.64 fixed-point binary fraction.
> - */
> -u64 __cputime_usec_factor;
> -EXPORT_SYMBOL(__cputime_usec_factor);
> -
> -static void calc_cputime_factors(void)
> -{
> - struct div_result res;
> -
> - div128_by_32(100, 0, tb_ticks_per_sec, );
> - __cputime_usec_factor = res.result_low;
> -}
> -
>  /*
>   * Read the SPURR on systems that have it, otherwise the PURR,
>   * or if that doesn't exist return the timebase value passed in.
> @@ -369,10 +354,7 @@ void vtime_flush(struct task_struct *tsk)
>   acct->hardirq_time = 0;
>   acct->softirq_time = 0;
>  }
> -
> -#else /* ! CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
> -#define calc_cputime_factors()
> -#endif
> +#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>  
>  void __delay(unsigned long loops)
>  {
> @@ -914,7 +896,6 @@ void __init time_init(void)
>   tb_ticks_per_jiffy = ppc_tb_freq / HZ;
>   tb_ticks_per_sec = ppc_tb_freq;
>   tb_ticks_per_usec = ppc_tb_freq / 100;
> - calc_cputime_factors();
>  
>   /*
>* Compute scale factor for sched_clock.
> -- 
> 2.37.2


Re: [PATCH v3 1/6] powerpc/code-patching: Implement generic text patching function

2022-10-06 Thread Christophe Leroy


Le 06/10/2022 à 05:36, Benjamin Gray a écrit :
> On Wed, 2022-10-05 at 17:55 +, Christophe Leroy wrote:
>> I'm on business trip this week so I can't test it on hardware, but
>> the
>> generated code looks horrid and sub-optimal, with a stack frame and
>> so
>> many registers saved into it. That's mpc885_ads_defconfig built with
>> GCC
>> 12, without modules without stackprotector with 4k pages.
> 
> Yeah, that definitely wasn't supposed to happen. I was looking at the
> 32 and 64 bit machine code closely while actively writing it, but I
> must have refactored it badly when cleaning up for submission. It was
> supposed to automatically be inlined, leaving it identical to the
> original on 32-bit.
> 
> Given inlining is desirable here, and 64-bit inlines anyway, I think I
> should just mark __patch_memory with __always_inline.

FWIW, I get a far better result with :

diff --git a/arch/powerpc/lib/code-patching.c 
b/arch/powerpc/lib/code-patching.c
index ba00c550d9d2..447b8de6e427 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -21,7 +21,7 @@ static int __patch_memory(void *patch_addr, unsigned 
long val, void *exec_addr,
/* Prefixed instruction may cross cacheline if cacheline smaller than 
64 bytes */
BUILD_BUG_ON(IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64);

-   if (unlikely(is_dword))
+   if (IS_ENABLED(CONFIG_PPC64) && unlikely(is_dword))
__put_kernel_nofault(patch_addr, , u64, failed);
else
__put_kernel_nofault(patch_addr, , u32, failed);

Re: [PATCH v3 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub

2022-10-06 Thread Andrew Donnellan
On Wed, 2022-10-05 at 16:32 +1100, Benjamin Gray wrote:
> Inserts a direct branch to the stub target when possible, replacing
> the
> mtctr/btctr sequence.
> 
> The load into r12 could potentially be skipped too, but that change
> would need to refactor the arguments to indicate that the address
> does not have a separate local entry point.
> 
> This helps the static call implementation, where modules calling
> their
> own trampolines are called through this stub and the trampoline is
> easily within range of a direct branch.
> 
> Signed-off-by: Benjamin Gray 

I'm not well versed in this code but nothing stands out as problematic
and it makes sense.

Reviewed-by: Andrew Donnellan 

> ---
>  arch/powerpc/kernel/module_64.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/module_64.c
> b/arch/powerpc/kernel/module_64.c
> index 4d816f7785b4..13ce7a4d8a8d 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -141,6 +141,12 @@ static u32 ppc64_stub_insns[] = {
> PPC_RAW_BCTR(),
>  };
>  
> +#ifdef CONFIG_PPC64_ELF_ABI_V1
> +#define PPC64_STUB_MTCTR_OFFSET 5
> +#else
> +#define PPC64_STUB_MTCTR_OFFSET 4
> +#endif
> +
>  /* Count how many different 24-bit relocations (different symbol,
>     different addend) */
>  static unsigned int count_relocs(const Elf64_Rela *rela, unsigned
> int num)
> @@ -426,6 +432,7 @@ static inline int create_stub(const Elf64_Shdr
> *sechdrs,
>   struct module *me,
>   const char *name)
>  {
> +   int err;
> long reladdr;
> func_desc_t desc;
> int i;
> @@ -439,6 +446,11 @@ static inline int create_stub(const Elf64_Shdr
> *sechdrs,
> return 0;
> }
>  
> +   /* Replace indirect branch sequence with direct branch where
> possible */
> +   err = patch_branch(>jump[PPC64_STUB_MTCTR_OFFSET],
> addr, 0);
> +   if (err && err != -ERANGE)
> +   return 0;
> +
> /* Stub uses address relative to r2. */
> reladdr = (unsigned long)entry - my_r2(sechdrs, me);
> if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {

-- 
Andrew DonnellanOzLabs, ADL Canberra
a...@linux.ibm.com   IBM Australia Limited



Re: [RFC PATCH 1/3] powerpc/32: Implement HAVE_CONTEXT_TRACKING_USER support

2022-10-06 Thread Nicholas Piggin
On Tue Oct 4, 2022 at 9:32 PM AEST, Christophe Leroy wrote:
>
>
> Le 04/10/2022 à 08:33, Nicholas Piggin a écrit :
> > Context tracking involves tracking user, kernel, guest switches. This
> > enables existing context tracking code for interrupt entry on 32-bit.
> > KVM and interrupt exit already has context tracking calls.
> > 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >   arch/powerpc/Kconfig |  2 +-
> >   arch/powerpc/include/asm/interrupt.h | 21 ++---
> >   2 files changed, 7 insertions(+), 16 deletions(-)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 81c9f895d690..f667279ec74c 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -204,7 +204,7 @@ config PPC
> > select HAVE_ARCH_SECCOMP_FILTER
> > select HAVE_ARCH_TRACEHOOK
> > select HAVE_ASM_MODVERSIONS
> > -   select HAVE_CONTEXT_TRACKING_USER   if PPC64
> > +   select HAVE_CONTEXT_TRACKING_USER
> > select HAVE_C_RECORDMCOUNT
> > select HAVE_DEBUG_KMEMLEAK
> > select HAVE_DEBUG_STACKOVERFLOW
> > diff --git a/arch/powerpc/include/asm/interrupt.h 
> > b/arch/powerpc/include/asm/interrupt.h
> > index 4745bb9998bd..8860a246d51a 100644
> > --- a/arch/powerpc/include/asm/interrupt.h
> > +++ b/arch/powerpc/include/asm/interrupt.h
> > @@ -85,6 +85,8 @@ do {  
> > \
> > (user_mode(regs) || (TRAP(regs) != INTERRUPT_PROGRAM))) \
> > BUG_ON(cond);   \
> >   } while (0)
> > +#else
> > +#define INT_SOFT_MASK_BUG_ON(regs, cond)
>
> Here you can just drop the ifdef CONFIG_PPC64 I guess instead of adding 
> an additional empty macro.

I couldn't because some of the statements use some PPC64-only variables.
It ends up looking a bit better this way.

> >   #endif
> >   
> >   #ifdef CONFIG_PPC_BOOK3S_64
> > @@ -152,19 +154,8 @@ static inline void booke_restore_dbcr0(void)
> >   static inline void interrupt_enter_prepare(struct pt_regs *regs)
> >   {
> >   #ifdef CONFIG_PPC32
> > -   if (!arch_irq_disabled_regs(regs))
> > -   trace_hardirqs_off();
> > -
> > -   if (user_mode(regs))
> > -   kuap_lock();
> > -   else
> > -   kuap_save_and_lock(regs);
> > -
> > -   if (user_mode(regs))
> > -   account_cpu_user_entry();
> > -#endif
> > -
> > -#ifdef CONFIG_PPC64
> > +   bool trace_enable = !arch_irq_disabled_regs(regs);
>
> nit: You could be put this as an #else to the existing #ifdef CONFIG_PPC64

Yep, I was able to clean it up even a bit better actually.

Thanks,
Nick


Re: [RFC PATCH 0/3] powerpc/32: nohz full support

2022-10-06 Thread Nicholas Piggin
On Tue Oct 4, 2022 at 9:58 PM AEST, Christophe Leroy wrote:
>
>
> Le 04/10/2022 à 08:33, Nicholas Piggin a écrit :
> > Doesn't seem to be much more involved in adding context tracking and
> > generic virt cpu accounting support for 32-bit, which is all that's
> > left to support NO_HZ_FULL.
> > 
> > I tested this with e5500 SMP kernel with isolated and nohz CPU, and
> > it seems to be doing the right thing -- periodic tick is stopped on
> > the nohz CPUs when they are running in userspace.
> > 
> > Context tracking warnings should catch quite quickly if we got
> > something wrong there (with the force context tracking option). I
> > don't have a 32-bit KVM environment to test so that might have some
> > issues but it should be quite easy to fix if it can be tested.
> > 
> > I assume the virt cpu accounting gen option removal is okay, but not
> > exactly sure what to look for in terms of possible problems, so we'll
> > see what comments that gets back.
>
> I'm having hard time finding the link between patch 1 and patch 2/3.

Oh, it's just that NO_HZ_FULL needs both context tracking and
VIRT_CPU_ACCOUNTING_GEN. The patches aren't logically related
at all (and enabling the latter requires no more than removing
the 'if PPC64' clause in our arch support so it doesn't depend
on making these arch changes either -- I'll submit that as the
patch if 2 and 3 have not been picked up before then.

Thanks,
Nick


Re: [PATCH] tty: evh_bytechan: Replace NO_IRQ by 0

2022-10-06 Thread Laurentiu Tudor




On 10/6/2022 8:20 AM, Christophe Leroy wrote:

NO_IRQ is used to check the return of irq_of_parse_and_map().

On some architecture NO_IRQ is 0, on other architectures it is -1.

irq_of_parse_and_map() returns 0 on error, independent of NO_IRQ.

So use 0 instead of using NO_IRQ.

Signed-off-by: Christophe Leroy 


Acked-by: Laurentiu Tudor 


---
  drivers/tty/ehv_bytechan.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/ehv_bytechan.c b/drivers/tty/ehv_bytechan.c
index 19d32cb6af84..8595483f4697 100644
--- a/drivers/tty/ehv_bytechan.c
+++ b/drivers/tty/ehv_bytechan.c
@@ -118,7 +118,7 @@ static int find_console_handle(void)
return 0;
  
  	stdout_irq = irq_of_parse_and_map(np, 0);

-   if (stdout_irq == NO_IRQ) {
+   if (!stdout_irq) {
pr_err("ehv-bc: no 'interrupts' property in %pOF node\n", np);
return 0;
}
@@ -696,7 +696,7 @@ static int ehv_bc_tty_probe(struct platform_device *pdev)
  
  	bc->rx_irq = irq_of_parse_and_map(np, 0);

bc->tx_irq = irq_of_parse_and_map(np, 1);
-   if ((bc->rx_irq == NO_IRQ) || (bc->tx_irq == NO_IRQ)) {
+   if (!bc->rx_irq || !bc->tx_irq) {
dev_err(>dev, "no 'interrupts' property in %pOFn node\n",
np);
ret = -ENODEV;


Re: [PATCH v1 0/5] treewide cleanup of random integer usage

2022-10-06 Thread Kees Cook
On Wed, Oct 05, 2022 at 09:55:43PM -0700, Kees Cook wrote:
> If any of the subsystems ask you to break this up (I hope not), I've got
> this[1], which does a reasonable job of splitting a commit up into
> separate commits for each matching subsystem.

[1] https://github.com/kees/kernel-tools/blob/trunk/split-on-maintainer

-- 
Kees Cook


Re: [PATCH v1 0/5] treewide cleanup of random integer usage

2022-10-06 Thread Kees Cook
On Wed, Oct 05, 2022 at 11:48:39PM +0200, Jason A. Donenfeld wrote:
> Hi folks,
> 
> This is a five part treewide cleanup of random integer handling. The
> rules for random integers are:
> 
> - If you want a secure or an insecure random u64, use get_random_u64().
> - If you want a secure or an insecure random u32, use get_random_u32().
>   * The old function prandom_u32() has been deprecated for a while now
> and is just a wrapper around get_random_u32().
> - If you want a secure or an insecure random u16, use get_random_u16().
> - If you want a secure or an insecure random u8, use get_random_u8().
> - If you want secure or insecure random bytes, use get_random_bytes().
>   * The old function prandom_bytes() has been deprecated for a while now
> and has long been a wrapper around get_random_bytes().
> - If you want a non-uniform random u32, u16, or u8 bounded by a certain
>   open interval maximum, use prandom_u32_max().
>   * I say "non-uniform", because it doesn't do any rejection sampling or
> divisions. Hence, it stays within the prandom_* namespace.
> 
> These rules ought to be applied uniformly, so that we can clean up the
> deprecated functions, and earn the benefits of using the modern
> functions. In particular, in addition to the boring substitutions, this
> patchset accomplishes a few nice effects:
> 
> - By using prandom_u32_max() with an upper-bound that the compiler can
>   prove at compile-time is ≤65536 or ≤256, internally get_random_u16()
>   or get_random_u8() is used, which wastes fewer batched random bytes,
>   and hence has higher throughput.
> 
> - By using prandom_u32_max() instead of %, when the upper-bound is not a
>   constant, division is still avoided, because prandom_u32_max() uses
>   a faster multiplication-based trick instead.
> 
> - By using get_random_u16() or get_random_u8() in cases where the return
>   value is intended to indeed be a u16 or a u8, we waste fewer batched
>   random bytes, and hence have higher throughput.
> 
> So, based on those rules and benefits from following them, this patchset
> breaks down into the following five steps:
> 
> 1) Replace `prandom_u32() % max` and variants thereof with
>prandom_u32_max(max).
> 
> 2) Replace `(type)get_random_u32()` and variants thereof with
>get_random_u16() or get_random_u8(). I took the pains to actually
>look and see what every lvalue type was across the entire tree.
> 
> 3) Replace remaining deprecated uses of prandom_u32() with
>get_random_u32(). 
> 
> 4) Replace remaining deprecated uses of prandom_bytes() with
>get_random_bytes().
> 
> 5) Remove the deprecated and now-unused prandom_u32() and
>prandom_bytes() inline wrapper functions.
> 
> I was thinking of taking this through my random.git tree (on which this
> series is currently based) and submitting it near the end of the merge
> window, or waiting for the very end of the 6.1 cycle when there will be
> the fewest new patches brewing. If somebody with some treewide-cleanup
> experience might share some wisdom about what the best timing usually
> winds up being, I'm all ears.

It'd be nice to capture some (all?) of the above somewhere. Perhaps just
a massive comment in the header?

> I've CC'd get_maintainers.pl, which is a pretty big list. Probably some
> portion of those are going to bounce, too, and everytime you reply to
> this thread, you'll have to deal with a bunch of bounces coming
> immediately after. And a recipient list this big will probably dock my
> email domain's spam reputation, at least temporarily. Sigh. I think
> that's just how it goes with treewide cleanups though. Again, let me
> know if I'm doing it wrong.

I usually stick to just mailing lists and subsystem maintainers.

If any of the subsystems ask you to break this up (I hope not), I've got
this[1], which does a reasonable job of splitting a commit up into
separate commits for each matching subsystem.

Showing that a treewide change can be reproduced mechanically helps with
keeping it together as one bit treewide patch, too, I've found. :)

Thank you for the cleanup! The "u8 rnd = get_random_u32()" in the tree
has bothered me for a lng time.

-Kees

-- 
Kees Cook