Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-19 Thread Michal Suchánek
Hello,

On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote:
> Gaurav Batra  writes:
> > At the time of LPAR reboot, partition firmware provides Open Firmware
> > property ibm,dma-window for the PE. This property is provided on the PCI
> > bus the PE is attached to.
> 
> AFAICS you're actually describing a bug that happens during boot *up*?
> 
> Describing it as "reboot" makes me think you're talking about the
> shutdown path. I think that will confuse people, me at least :)

there is probably an assumption that it must have been running
previously for the errors to happen in the first place but given the
error state persists for a day it may be a very long 'reboot'.

Thanks

Michal
> 
> cheers
> 
> > There are execptions where the partition firmware might not provide this
> > property for the PE at the time of LPAR reboot. One of the scenario is
> > where the firmware has frozen the PE due to some error conditions. This
> > PE is frozen for 24 hours or unless the whole system is reinitialized.
> >
> > Within this time frame, if the LPAR is rebooted, the frozen PE will be
> > presented to the LPAR but ibm,dma-window property could be missing.
> >
> > Today, under these circumstances, the LPAR oopses with NULL pointer
> > dereference, when configuring the PCI bus the PE is attached to.
> >
> > BUG: Kernel NULL pointer dereference on read at 0x00c8
> > Faulting instruction address: 0xc01024c0
> > Oops: Kernel access of bad area, sig: 7 [#1]
> > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > Modules linked in:
> > Supported: Yes
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
> > Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 
> > of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
> > NIP:  c01024c0 LR: c01024b0 CTR: c0102450
> > REGS: c37db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
> > MSR:  82009033   CR: 28000822  XER: 
> > 
> > CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0
> > ...
> > NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
> > LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
> > Call Trace:
> > pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
> > pcibios_setup_bus_self+0x1c0/0x370
> > __of_scan_bus+0x2f8/0x330
> > pcibios_scan_phb+0x280/0x3d0
> > pcibios_init+0x88/0x12c
> > do_one_initcall+0x60/0x320
> > kernel_init_freeable+0x344/0x3e4
> > kernel_init+0x34/0x1d0
> > ret_from_kernel_user_thread+0x14/0x1c
> >
> > Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
> > ibm,dma-window")
> > Signed-off-by: Gaurav Batra 
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index e8c4129697b1..e808d5b1fa49 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
> > *bus)
> >  * parent bus. During reboot, there will be ibm,dma-window property to
> >  * define DMA window. For kdump, there will at least be default window 
> > or DDW
> >  * or both.
> > +* There is an exception to the above. In case the PE goes into frozen
> > +* state, firmware may not provide ibm,dma-window property at the time
> > +* of LPAR reboot.
> >  */
> >  
> > +   if (!pdn) {
> > +   pr_debug("  no ibm,dma-window property !\n");
> > +   return;
> > +   }
> > +
> > ppci = PCI_DN(pdn);
> >  
> > pr_debug("  parent is %pOF, iommu_table: 0x%p\n",
> >
> > base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
> > -- 
> > 2.39.3 (Apple Git-146)


Re: [PATCH v4 07/10] cpu/SMT: Allow enabling partial SMT states via sysfs

2024-04-08 Thread Michal Suchánek
Hello,

On Wed, Jul 05, 2023 at 04:51:40PM +0200, Laurent Dufour wrote:
> From: Michael Ellerman 
> 
> Add support to the /sys/devices/system/cpu/smt/control interface for
> enabling a specified number of SMT threads per core, including partial
> SMT states where not all threads are brought online.
> 
> The current interface accepts "on" and "off", to enable either 1 or all
> SMT threads per core.
> 
> This commit allows writing an integer, between 1 and the number of SMT
> threads supported by the machine. Writing 1 is a synonym for "off", 2 or
> more enables SMT with the specified number of threads.
> 
> When reading the file, if all threads are online "on" is returned, to
> avoid changing behaviour for existing users. If some other number of
> threads is online then the integer value is returned.
> 
> Architectures like x86 only supporting 1 thread or all threads, should not
> define CONFIG_SMT_NUM_THREADS_DYNAMIC. Architecture supporting partial SMT
> states, like PowerPC, should define it.

This causes a regression:
https://groups.google.com/g/powerpc-utils-devel/c/wrwVzAAnRlI/m/5KJSoqP4BAAJ

The userspace code only changes the SMT mode of online CPUs (at lest one
thread is online) and does not touch offline CPUs.

Using this new interface offlined CPUs are onlined when SMT value is
set.

Would you look into special-casing the offline CPUs?

Thanks

Michal

> 
> Signed-off-by: Michael Ellerman 
> [ldufour: slightly reword the commit's description]
> [ldufour: remove switch() in __store_smt_control()]
> Reported-by: kernel test robot 
> Closes: 
> https://lore.kernel.org/oe-kbuild-all/202306282340.ihqm0fla-...@intel.com/
> [ldufour: fix build issue in control_show()]
> Signed-off-by: Laurent Dufour 
> ---
>  .../ABI/testing/sysfs-devices-system-cpu  |  1 +
>  kernel/cpu.c  | 60 ++-
>  2 files changed, 45 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
> b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index ecd585ca2d50..6dba65fb1956 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -555,6 +555,7 @@ Description:  Control Symmetric Multi Threading (SMT)
> 
> =
>"on" SMT is enabled
>"off"SMT is disabled
> +  ""SMT is enabled with N threads per 
> core.
>"forceoff"   SMT is force disabled. Cannot be 
> changed.
>"notsupported"   SMT is not supported by the CPU
>"notimplemented" SMT runtime toggling is not
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 9a8d0685e055..7e8f1b044772 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2876,11 +2876,19 @@ static const struct attribute_group 
> cpuhp_cpu_root_attr_group = {
>  
>  #ifdef CONFIG_HOTPLUG_SMT
>  
> +static bool cpu_smt_num_threads_valid(unsigned int threads)
> +{
> + if (IS_ENABLED(CONFIG_SMT_NUM_THREADS_DYNAMIC))
> + return threads >= 1 && threads <= cpu_smt_max_threads;
> + return threads == 1 || threads == cpu_smt_max_threads;
> +}
> +
>  static ssize_t
>  __store_smt_control(struct device *dev, struct device_attribute *attr,
>   const char *buf, size_t count)
>  {
> - int ctrlval, ret;
> + int ctrlval, ret, num_threads, orig_threads;
> + bool force_off;
>  
>   if (cpu_smt_control == CPU_SMT_FORCE_DISABLED)
>   return -EPERM;
> @@ -2888,30 +2896,39 @@ __store_smt_control(struct device *dev, struct 
> device_attribute *attr,
>   if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
>   return -ENODEV;
>  
> - if (sysfs_streq(buf, "on"))
> + if (sysfs_streq(buf, "on")) {
>   ctrlval = CPU_SMT_ENABLED;
> - else if (sysfs_streq(buf, "off"))
> + num_threads = cpu_smt_max_threads;
> + } else if (sysfs_streq(buf, "off")) {
>   ctrlval = CPU_SMT_DISABLED;
> - else if (sysfs_streq(buf, "forceoff"))
> + num_threads = 1;
> + } else if (sysfs_streq(buf, "forceoff")) {
>   ctrlval = CPU_SMT_FORCE_DISABLED;
> - else
> + num_threads = 1;
> + } else if (kstrtoint(buf, 10, _threads) == 0) {
> + if (num_threads == 1)
> + ctrlval = CPU_SMT_DISABLED;
> + else if (cpu_smt_num_threads_valid(num_threads))
> + ctrlval = CPU_SMT_ENABLED;
> + else
> + return -EINVAL;
> + } else {
>   return -EINVAL;
> + }
>  
>   ret = lock_device_hotplug_sysfs();
>   if (ret)
>   return ret;
>  
> - if (ctrlval != cpu_smt_control) {
> - switch (ctrlval) {
> - case CPU_SMT_ENABLED:
> -

Re: Cannot load wireguard module

2024-03-20 Thread Michal Suchánek
On Wed, Mar 20, 2024 at 11:41:32PM +1100, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > On Mon, Mar 18, 2024 at 06:08:55PM +0100, Michal Suchánek wrote:
> >> On Mon, Mar 18, 2024 at 10:50:49PM +1100, Michael Ellerman wrote:
> >> > Michael Ellerman  writes:
> >> > > Michal Suchánek  writes:
> >> > >> Hello,
> >> > >>
> >> > >> I cannot load the wireguard module.
> >> > >>
> >> > >> Loading the module provides no diagnostic other than 'No such device'.
> >> > >>
> >> > >> Please provide maningful diagnostics for loading software-only driver,
> >> > >> clearly there is no particular device needed.
> >> > >
> >> > > Presumably it's just bubbling up an -ENODEV from somewhere.
> >> > >
> >> > > Can you get a trace of it?
> >> > >
> >> > > Something like:
> >> > >
> >> > >   # trace-cmd record -p function_graph -F modprobe wireguard
> >
> > Attached.
> 
> Sorry :/, you need to also trace children of modprobe, with -c.
> 
> But, I was able to reproduce the same issue here.
> 
> On a P9, a kernel with CONFIG_CRYPTO_CHACHA20_P10=n everything works:
> 
>   $ modprobe -v wireguard
>   insmod /lib/modules/6.8.0/kernel/net/ipv4/udp_tunnel.ko
>   insmod /lib/modules/6.8.0/kernel/net/ipv6/ip6_udp_tunnel.ko
>   insmod /lib/modules/6.8.0/kernel/lib/crypto/libchacha.ko
>   insmod /lib/modules/6.8.0/kernel/lib/crypto/libchacha20poly1305.ko
>   insmod /lib/modules/6.8.0/kernel/drivers/net/wireguard/wireguard.ko
>   [   19.180564][  T692] wireguard: allowedips self-tests: pass
>   [   19.185080][  T692] wireguard: nonce counter self-tests: pass
>   [   19.310438][  T692] wireguard: ratelimiter self-tests: pass
>   [   19.310639][  T692] wireguard: WireGuard 1.0.0 loaded. See 
> www.wireguard.com for information.
>   [   19.310746][  T692] wireguard: Copyright (C) 2015-2019 Jason A. 
> Donenfeld . All Rights Reserved.
> 
> 
> If I build CONFIG_CRYPTO_CHACHA20_P10 as a module then it breaks:
> 
>   $ modprobe -v wireguard
>   insmod /lib/modules/6.8.0/kernel/net/ipv4/udp_tunnel.ko
>   insmod /lib/modules/6.8.0/kernel/net/ipv6/ip6_udp_tunnel.ko
>   insmod /lib/modules/6.8.0/kernel/lib/crypto/libchacha.ko
>   insmod /lib/modules/6.8.0/kernel/arch/powerpc/crypto/chacha-p10-crypto.ko
>   modprobe: ERROR: could not insert 'wireguard': No such device
> 
> 
> The ENODEV is coming from module_cpu_feature_match(), which blocks the
> driver from loading on non-p10.
> 
> Looking at other arches (arm64 at least) it seems like the driver should
> instead be loading but disabling the p10 path. Which then allows
> chacha_crypt_arch() to exist, and it has a fallback to use
> chacha_crypt_generic().
> 
> I don't see how module_cpu_feature_match() can co-exist with the driver
> also providing a fallback. Hopefully someone who knows crypto better
> than me can explain it.

Maybe it doesn't. ppc64le is the only platform that needs the fallback,
on other platforms that have hardware-specific chacha implementation it
seems to be using pretty common feature so the fallback is rarely if
ever needed in practice.

Thanks

Michal

> 
> This diff fixes it for me:
> 
> diff --git a/arch/powerpc/crypto/chacha-p10-glue.c 
> b/arch/powerpc/crypto/chacha-p10-glue.c
> index 74fb86b0d209..9d2c30b0904c 100644
> --- a/arch/powerpc/crypto/chacha-p10-glue.c
> +++ b/arch/powerpc/crypto/chacha-p10-glue.c
> @@ -197,6 +197,9 @@ static struct skcipher_alg algs[] = {
>  
>  static int __init chacha_p10_init(void)
>  {
> + if (!cpu_has_feature(PPC_FEATURE2_ARCH_3_1))
> + return 0;
> +
>   static_branch_enable(_p10);
>  
>   return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
> @@ -207,7 +210,7 @@ static void __exit chacha_p10_exit(void)
>   crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
>  }
>  
> -module_cpu_feature_match(PPC_MODULE_FEATURE_P10, chacha_p10_init);
> +module_init(chacha_p10_init);
>  module_exit(chacha_p10_exit);
>  
>  MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (P10 accelerated)");
> 
> 
> Giving me:
> 
>   $ modprobe -v wireguard
>   insmod /lib/modules/6.8.0-dirty/kernel/net/ipv4/udp_tunnel.ko
>   insmod /lib/modules/6.8.0-dirty/kernel/net/ipv6/ip6_udp_tunnel.ko
>   insmod /lib/modules/6.8.0-dirty/kernel/lib/crypto/libchacha.ko
>   insmod 
> /lib/modules/6.8.0-dirty/kernel/arch/powerpc/crypto/chacha-p10-crypto.ko
>   insmod /lib/modules/6.8.0-dirty/kernel/lib/crypto/libchacha20poly1305.ko
>   insmod /lib/modules/6.8.0-dirty/kernel/drivers/net/wireguard/wireguard.ko
>   [   19.657941][  T718] wireguard: allowedips self-tests: pass
>   [   19.662501][  T718] wireguard: nonce counter self-tests: pass
>   [   19.782933][  T718] wireguard: ratelimiter self-tests: pass
>   [   19.783114][  T718] wireguard: WireGuard 1.0.0 loaded. See 
> www.wireguard.com for information.
>   [   19.783223][  T718] wireguard: Copyright (C) 2015-2019 Jason A. 
> Donenfeld . All Rights Reserved.
>   
> 
> cheers


Re: Cannot load wireguard module

2024-03-18 Thread Michal Suchánek
On Mon, Mar 18, 2024 at 10:50:49PM +1100, Michael Ellerman wrote:
> Michael Ellerman  writes:
> > Michal Suchánek  writes:
> >> Hello,
> >>
> >> I cannot load the wireguard module.
> >>
> >> Loading the module provides no diagnostic other than 'No such device'.
> >>
> >> Please provide maningful diagnostics for loading software-only driver,
> >> clearly there is no particular device needed.
> >
> > Presumably it's just bubbling up an -ENODEV from somewhere.
> >
> > Can you get a trace of it?
> >
> > Something like:
> >
> >   # trace-cmd record -p function_graph -F modprobe wireguard
> >
> > That should probably show where it's bailing out.
> >
> >> jostaberry-1:~ # uname -a
> >> Linux jostaberry-1 6.8.0-lp155.8.g7e0e887-default #1 SMP Wed Mar 13 
> >> 09:02:21 UTC 2024 (7e0e887) ppc64le ppc64le ppc64le GNU/Linux
> >> jostaberry-1:~ # modprobe wireguard
> >> modprobe: ERROR: could not insert 'wireguard': No such device
> >> jostaberry-1:~ # modprobe -v wireguard
> >> insmod 
> >> /lib/modules/6.8.0-lp155.8.g7e0e887-default/kernel/arch/powerpc/crypto/chacha-p10-crypto.ko.zst
> >>  
> >> modprobe: ERROR: could not insert 'wireguard': No such device
> >  
> > What machine is this? A Power10?
> 
> I am able to load the module successfully on a P10 running v6.8.0.

Of course, it's not a Power10. Otherwise the Power10-specific chacha
implementation would load.

Thanks

Michal


Cannot load wireguard module

2024-03-15 Thread Michal Suchánek
Hello,

I cannot load the wireguard module.

Loading the module provides no diagnostic other than 'No such device'.

Please provide maningful diagnostics for loading software-only driver,
clearly there is no particular device needed.

Thanks

Michal

jostaberry-1:~ # uname -a
Linux jostaberry-1 6.8.0-lp155.8.g7e0e887-default #1 SMP Wed Mar 13 09:02:21 
UTC 2024 (7e0e887) ppc64le ppc64le ppc64le GNU/Linux
jostaberry-1:~ # modprobe wireguard
modprobe: ERROR: could not insert 'wireguard': No such device
jostaberry-1:~ # modprobe -v wireguard
insmod 
/lib/modules/6.8.0-lp155.8.g7e0e887-default/kernel/arch/powerpc/crypto/chacha-p10-crypto.ko.zst
 
modprobe: ERROR: could not insert 'wireguard': No such device
jostaberry-1:~ # modprobe chacha-generic
jostaberry-1:~ # modprobe -v wireguard
insmod 
/lib/modules/6.8.0-lp155.8.g7e0e887-default/kernel/arch/powerpc/crypto/chacha-p10-crypto.ko.zst
 
modprobe: ERROR: could not insert 'wireguard': No such device
jostaberry-1:~ # 



Re: [PATCH] selftests: powerpc: Add header symlinks for building papr character device tests

2024-02-15 Thread Michal Suchánek
On Thu, Feb 15, 2024 at 01:39:27PM -0600, Nathan Lynch wrote:
> Michal Suchánek  writes:
> > On Thu, Feb 15, 2024 at 01:13:34PM -0600, Nathan Lynch wrote:
> >> Michal Suchanek  writes:
> >> >
> >> > Without the headers the tests don't build.
> >> >
> >> > Fixes: 9118c5d32bdd ("powerpc/selftests: Add test for papr-vpd")
> >> > Fixes: 76b2ec3faeaa ("powerpc/selftests: Add test for papr-sysparm")
> >> > Signed-off-by: Michal Suchanek 
> >> > ---
> >> >  tools/testing/selftests/powerpc/include/asm/papr-miscdev.h | 1 +
> >> >  tools/testing/selftests/powerpc/include/asm/papr-sysparm.h | 1 +
> >> >  tools/testing/selftests/powerpc/include/asm/papr-vpd.h | 1 +
> >> >  3 files changed, 3 insertions(+)
> >> >  create mode 12 
> >> > tools/testing/selftests/powerpc/include/asm/papr-miscdev.h
> >> >  create mode 12 
> >> > tools/testing/selftests/powerpc/include/asm/papr-sysparm.h
> >> >  create mode 12
> >> > tools/testing/selftests/powerpc/include/asm/papr-vpd.h
> >> 
> >> I really hope making symlinks into the kernel source isn't necessary. I
> >> haven't experienced build failures with these tests. How are you
> >> building them?
> >> 
> >> I usually do something like (on a x86 build host):
> >> 
> >> $ make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux- ppc64le_defconfig
> >> $ make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux- headers
> >> $ make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux- -C 
> >> tools/testing/selftests/powerpc/
> >> 
> >> without issue.
> >
> > I am not configuring the kernel, only building the tests, and certainly
> > not installing headers on the system.
> 
> OK, but again: how do you provoke the build errors, exactly? Don't make
> us guess please.

cd tools/testing/selftests/powerpc/

make -k

> > Apparently this is what people aim to do, and report bugs when it does
> > not work: build the kselftests as self-contained testsuite that relies
> > only on standard libc, and whatever it brought in the sources.
> >
> > That said, the target to install headers is headers_install, not
> > headers. The headers target is not documented, it's probably meant to be
> > internal to the build system. Yet it is not enforced that it is built
> > before building the selftests.
> 
>  the headers target is used in Documentation/dev-tools/kselftest.rst:
> 
> """
> To build the tests::
> 
>   $ make headers
>   $ make -C tools/testing/selftests
> """

Indeed so it's not supposed to work otherwise. It would be nice if it
did but might be difficult to achieve with plain makefiles.

'headers' is not in 'make help' output but whatever.

Thanks

Michal


Re: [PATCH] selftests: powerpc: Add header symlinks for building papr character device tests

2024-02-15 Thread Michal Suchánek
On Thu, Feb 15, 2024 at 01:13:34PM -0600, Nathan Lynch wrote:
> Michal Suchanek  writes:
> >
> > Without the headers the tests don't build.
> >
> > Fixes: 9118c5d32bdd ("powerpc/selftests: Add test for papr-vpd")
> > Fixes: 76b2ec3faeaa ("powerpc/selftests: Add test for papr-sysparm")
> > Signed-off-by: Michal Suchanek 
> > ---
> >  tools/testing/selftests/powerpc/include/asm/papr-miscdev.h | 1 +
> >  tools/testing/selftests/powerpc/include/asm/papr-sysparm.h | 1 +
> >  tools/testing/selftests/powerpc/include/asm/papr-vpd.h | 1 +
> >  3 files changed, 3 insertions(+)
> >  create mode 12 
> > tools/testing/selftests/powerpc/include/asm/papr-miscdev.h
> >  create mode 12 
> > tools/testing/selftests/powerpc/include/asm/papr-sysparm.h
> >  create mode 12
> > tools/testing/selftests/powerpc/include/asm/papr-vpd.h
> 
> I really hope making symlinks into the kernel source isn't necessary. I
> haven't experienced build failures with these tests. How are you
> building them?
> 
> I usually do something like (on a x86 build host):
> 
> $ make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux- ppc64le_defconfig
> $ make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux- headers
> $ make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux- -C 
> tools/testing/selftests/powerpc/
> 
> without issue.

I am not configuring the kernel, only building the tests, and certainly
not installing headers on the system.

Apparently this is what people aim to do, and report bugs when it does
not work: build the kselftests as self-contained testsuite that relies
only on standard libc, and whatever it brought in the sources.

That said, the target to install headers is headers_install, not
headers. The headers target is not documented, it's probably meant to be
internal to the build system. Yet it is not enforced that it is built
before building the selftests.

Thanks

Michal


Re: [RFC] UBUNTU: [Config] y2038: Disable COMPAT and COMPAT_32BIT_TIME on ppc64le

2023-11-24 Thread Michal Suchánek
On Fri, Nov 24, 2023 at 03:59:04PM +1100, Michael Ellerman wrote:
> Dimitri John Ledkov  writes:
> > BugLink: https://bugs.launchpad.net/bugs/2038587
> >
> > ppc64le is exclusively little endian and 64-bit, thus there is no need
> > for COMPAT_32BIT_TIME, nor COMPAT.
> 
> To be pedantic, the ppc64le kernel does support running 32-bit little
> endian userspace in compat mode (CONFIG_COMPAT=y). It's a distro choice
> as to whether you support COMPAT. Notably there are two other major
> distros that don't support COMPAT for ppc64le, and the set of 32-bit LE
> software is effectively empty.

I have seen software that does not work when compiled 64bit so it would
build 32bit binary even on ppc64le and abuse the compat layer to run.

It quite rare, though.

Thanks

Michal


Re: [PATCH v4 09/13] powerpc/pseries: Add papr-vpd character driver for VPD retrieval

2023-11-21 Thread Michal Suchánek
ppens
> +  * continuously. But we'll stop trying on a fatal signal.
> +  */
> + do {
> + blob = papr_vpd_run_sequence(loc_code);
> + if (!IS_ERR(blob)) /* Success. */
> + break;
> + if (PTR_ERR(blob) != -EAGAIN) /* Hard error. */
> + break;
> + pr_info_ratelimited("VPD changed during retrieval, retrying\n");
> + cond_resched();
> + } while (!fatal_signal_pending(current));

this is defined in linux/sched/signal.h which is not included.

> +
> + return blob;
> +}
> +
> +static ssize_t papr_vpd_handle_read(struct file *file, char __user *buf, 
> size_t size, loff_t *off)
> +{
> + const struct vpd_blob *blob = file->private_data;
> +
> + /* bug: we should not instantiate a handle without any data attached. */
> + if (!vpd_blob_has_data(blob)) {
> + pr_err_once("handle without data\n");
> + return -EIO;
> + }
> +
> + return simple_read_from_buffer(buf, size, off, blob->data, blob->len);
> +}
> +
> +static int papr_vpd_handle_release(struct inode *inode, struct file *file)
> +{
> + const struct vpd_blob *blob = file->private_data;
> +
> + vpd_blob_free(blob);
> +
> + return 0;
> +}
> +
> +static loff_t papr_vpd_handle_seek(struct file *file, loff_t off, int whence)
> +{
> + const struct vpd_blob *blob = file->private_data;
> +
> + return fixed_size_llseek(file, off, whence, blob->len);
> +}
> +
> +
> +static const struct file_operations papr_vpd_handle_ops = {
> + .read = papr_vpd_handle_read,
> + .llseek = papr_vpd_handle_seek,
> + .release = papr_vpd_handle_release,
> +};
> +
> +/**
> + * papr_vpd_create_handle() - Create a fd-based handle for reading VPD.
> + * @ulc: Location code in user memory; defines the scope of the VPD to
> + *   retrieve.
> + *
> + * Handler for PAPR_VPD_IOC_CREATE_HANDLE ioctl command. Validates
> + * @ulc and instantiates an immutable VPD "blob" for it. The blob is
> + * attached to a file descriptor for reading by user space. The memory
> + * backing the blob is freed when the file is released.
> + *
> + * The entire requested VPD is retrieved by this call and all
> + * necessary RTAS interactions are performed before returning the fd
> + * to user space. This keeps the read handler simple and ensures that
> + * the kernel can prevent interleaving of ibm,get-vpd call sequences.
> + *
> + * Return: The installed fd number if successful, -ve errno otherwise.
> + */
> +static long papr_vpd_create_handle(struct papr_location_code __user *ulc)
> +{
> + struct papr_location_code klc;
> + const struct vpd_blob *blob;
> + struct file *file;
> + long err;
> + int fd;
> +
> + if (copy_from_user(, ulc, sizeof(klc)))
> + return -EFAULT;

This is defined in linux/uaccess.h which is not included.

Same for the sysparm driver.

Tested-by: Michal Suchánek 

> +
> + if (!string_is_terminated(klc.str, ARRAY_SIZE(klc.str)))
> + return -EINVAL;
> +
> + blob = papr_vpd_retrieve();
> + if (IS_ERR(blob))
> + return PTR_ERR(blob);
> +
> + fd = get_unused_fd_flags(O_RDONLY | O_CLOEXEC);
> + if (fd < 0) {
> + err = fd;
> + goto free_blob;
> + }
> +
> + file = anon_inode_getfile("[papr-vpd]", _vpd_handle_ops,
> +   (void *)blob, O_RDONLY);
> + if (IS_ERR(file)) {
> + err = PTR_ERR(file);
> + goto put_fd;
> + }
> +
> + file->f_mode |= FMODE_LSEEK | FMODE_PREAD;
> + fd_install(fd, file);
> + return fd;
> +put_fd:
> + put_unused_fd(fd);
> +free_blob:
> + vpd_blob_free(blob);
> + return err;
> +}
> +
> +/*
> + * Top-level ioctl handler for /dev/papr-vpd.
> + */
> +static long papr_vpd_dev_ioctl(struct file *filp, unsigned int ioctl, 
> unsigned long arg)
> +{
> + void __user *argp = (__force void __user *)arg;
> + long ret;
> +
> + switch (ioctl) {
> + case PAPR_VPD_IOC_CREATE_HANDLE:
> + ret = papr_vpd_create_handle(argp);
> + break;
> + default:
> + ret = -ENOIOCTLCMD;
> + break;
> + }
> + return ret;
> +}
> +
> +static const struct file_operations papr_vpd_ops = {
> + .unlocked_ioctl = papr_vpd_dev_ioctl,
> +};
> +
> +static struct miscdevice papr_vpd_dev = {
> + .minor = MISC_DYNAMIC_MINOR,
> + .name = "papr-vpd",
> + .fops = _vpd_ops,
> +};
> +
> +static __init int papr_vpd_init(void)
> +{
> + if (!rtas_function_implemented(RTAS_FN_IBM_GET_VPD))
> + return -ENODEV;
> +
> + return misc_register(_vpd_dev);
> +}
> +machine_device_initcall(pseries, papr_vpd_init);
> 
> -- 
> 2.41.0
> 


Re: [PATCH v3 00/10] powerpc/pseries: New character devices for system parameters and VPD

2023-11-13 Thread Michal Suchánek
Hello,

On Thu, Oct 26, 2023 at 06:56:36PM -0500, Nathan Lynch wrote:
> Nathan Lynch via B4 Relay 
> writes:
> > I have made changes to librtas to prefer the new interfaces and
> > verified that existing clients work correctly with the new code.
> 
> Unfortunately I made a mistake in testing this time and introduced a
> boot-time oops:
> 
> BUG: Kernel NULL pointer dereference on read at 0x0018
> Faulting instruction address: 0xc004223c
> Oops: Kernel access of bad area, sig: 7 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Tainted: GW  6.6.0-rc2+ #129
> NIP:  c004223c LR: c0042238 CTR: 
> REGS: c2c579d0 TRAP: 0300   Tainted: GW   (6.6.0-rc2+)
> MSR:  80001033   CR: 28000284  XER: 
> CFAR: c0042008 DAR: 0018 DSISR: 0008 IRQMASK: 3 
> GPR00: c0042238 c2c57c70 c1f5eb00  
> GPR04: c294cd08 0002 c2c579b4  
> GPR08:  0002 c2c0da80  
> GPR12:  c5e4  02097728 
> GPR16:  0001 02097b80 020975b8 
> GPR20: 020976f0 020974e8 030feb00 030feb00 
> GPR24: 2008  0001 c28f3d70 
> GPR28: 02d31020 c2cac268 c2d31020  
> NIP [c004223c] do_enter_rtas+0xcc/0x460
> LR [c0042238] do_enter_rtas+0xc8/0x460
> Call Trace:
> [c2c57c70] [c0042238] do_enter_rtas+0xc8/0x460 (unreliable)
> [c2c57cc0] [c0042e34] rtas_call+0x434/0x490
> [c2c57d20] [c00fd584] papr_sysparm_get+0xe4/0x230
> [c2c57db0] [c20267d0] pSeries_probe+0x2f0/0x5fc
> [c2c57e80] [c200a318] setup_arch+0x11c/0x524
> [c2c57f10] [c200418c] start_kernel+0xcc/0xc1c
> [c2c57fe0] [c000e788] start_here_common+0x1c/0x20
> 
> This was introduced by patch #4 "powerpc/rtas: Warn if per-function lock
> isn't held": __do_enter_rtas() is now attempting token -> descriptor
> lookups unconditionally, before the xarray for that has been initialized.
> 
> With that change reverted, the series tests OK.

What's the status here?

Can this move on with the 4th patch skipped, or is new revision
expected?

Thanks

Michal


Re: [PATCH v7 3/3 RESEND] powerpc/pseries: PLPKS SED Opal keystore support

2023-09-14 Thread Michal Suchánek
Hello,

On Thu, Sep 14, 2023 at 02:13:32PM +1000, Michael Ellerman wrote:
> Nathan Chancellor  writes:
> > Hi Greg,
> >
> > On Fri, Sep 08, 2023 at 10:30:56AM -0500, gjo...@linux.vnet.ibm.com wrote:
> >> From: Greg Joyce 
> >>
> >> Define operations for SED Opal to read/write keys
> >> from POWER LPAR Platform KeyStore(PLPKS). This allows
> >> non-volatile storage of SED Opal keys.
> >>
> >> Signed-off-by: Greg Joyce 
> >> Reviewed-by: Jonathan Derrick 
> >> Reviewed-by: Hannes Reinecke 
> >
> > After this change in -next as commit 9f2c7411ada9 ("powerpc/pseries:
> > PLPKS SED Opal keystore support"), I see the following crash when
> > booting some distribution configurations, such as OpenSUSE's [1] (the
> > rootfs is available at [2] if necessary):
> 
> Thanks for testing Nathan.
> 
> The code needs to check plpks_is_available() somewhere, before calling
> the plpks routines.

would this fixup do it?

I don't really see any other place to plug the check with the current
code structure.

Thanks

Michal

diff --git a/arch/powerpc/platforms/pseries/plpks_sed_ops.c 
b/arch/powerpc/platforms/pseries/plpks_sed_ops.c
index c1d08075e850..f8038d998eae 100644
--- a/arch/powerpc/platforms/pseries/plpks_sed_ops.c
+++ b/arch/powerpc/platforms/pseries/plpks_sed_ops.c
@@ -64,6 +64,9 @@ int sed_read_key(char *keyname, char *key, u_int *keylen)
int ret;
u_int len;
 
+   if (!plpks_is_available())
+   return -ENODEV;
+
plpks_init_var(, keyname);
var.data = (u8 *)
var.datalen = sizeof(data);
@@ -89,6 +92,9 @@ int sed_write_key(char *keyname, char *key, u_int keylen)
struct plpks_sed_object_data data;
struct plpks_var_name vname;
 
+   if (!plpks_is_available())
+   return -ENODEV;
+
plpks_init_var(, keyname);
 
var.datalen = sizeof(struct plpks_sed_object_data);
-- 
2.41.0



Re: [PATCH] integrity: powerpc: Do not select CA_MACHINE_KEYRING

2023-09-12 Thread Michal Suchánek
On Mon, Sep 11, 2023 at 11:39:38PM -0400, Nayna wrote:
> 
> On 9/7/23 13:32, Michal Suchánek wrote:
> > Adding more CC's from the original patch, looks like get_maintainers is
> > not that great for this file.
> > 
> > On Thu, Sep 07, 2023 at 06:52:19PM +0200, Michal Suchanek wrote:
> > > No other platform needs CA_MACHINE_KEYRING, either.
> > > 
> > > This is policy that should be decided by the administrator, not Kconfig
> > > dependencies.
> 
> We certainly agree that flexibility is important. However, in this case,
> this also implies that we are expecting system admins to be security
> experts. As per our understanding, CA based infrastructure(PKI) is the
> standard to be followed and not the policy decision. And we can only speak
> for Power.
> 
> INTEGRITY_CA_MACHINE_KEYRING ensures that we always have CA signed leaf
> certs.

And that's the problem.

>From a distribution point of view there are two types of leaf certs:

 - leaf certs signed by the distribution CA which need not be imported
   because the distribution CA cert is enrolled one way or another
 - user generated ad-hoc certificates that are not signed in any way,
   and enrolled by the user

The latter are vouched for by the user by enrolling the certificate, and
confirming that they really want to trust this certificate. Enrolling
user certificates is vital for usability or secure boot. Adding extra
step of creating a CA certificate stored on the same system only
complicates things with no added benefit.

> INTEGRITY_CA_MACHINE_KEYRING_MAX ensures that CA is only allowed to do key
> signing and not code signing.
> 
> Having CA signed certs also permits easy revocation of all leaf certs.

Revocation can be also done be removing the certificate from the keyring.

If the user can add it they should also be able to remove it.

> Loading certificates is completely new for Power Systems. We would like to
> make it as clean as possible from the start. We want to enforce CA signed
> leaf certificates(INTEGRITY_CA_MACHINE_KEYRING). As per
> keyUsage(INTEGRITY_CA_MACHINE_KEYRING_MAX), if we want more flexibility,
> probably a boot time override can be considered.

If boot time override can exist it can as well be made permanent with a
Kconfig option.

I think that a boot time override is even more problematic for security
than a Kconfig option - the kernel arguments are rarely signed.

Thanks

Michal

> 
> Thanks & Regards,
> 
>     - Nayna
> 
> 
> > > 
> > > cc: joeyli 
> > > Signed-off-by: Michal Suchanek 
> > > ---
> > >   security/integrity/Kconfig | 2 --
> > >   1 file changed, 2 deletions(-)
> > > 
> > > diff --git a/security/integrity/Kconfig b/security/integrity/Kconfig
> > > index 232191ee09e3..b6e074ac0227 100644
> > > --- a/security/integrity/Kconfig
> > > +++ b/security/integrity/Kconfig
> > > @@ -68,8 +68,6 @@ config INTEGRITY_MACHINE_KEYRING
> > >   depends on INTEGRITY_ASYMMETRIC_KEYS
> > >   depends on SYSTEM_BLACKLIST_KEYRING
> > >   depends on LOAD_UEFI_KEYS || LOAD_PPC_KEYS
> > > - select INTEGRITY_CA_MACHINE_KEYRING if LOAD_PPC_KEYS
> > > - select INTEGRITY_CA_MACHINE_KEYRING_MAX if LOAD_PPC_KEYS
> > >   help
> > >If set, provide a keyring to which Machine Owner Keys (MOK) may
> > >be added. This keyring shall contain just MOK keys.  Unlike 
> > > keys
> > > -- 
> > > 2.41.0
> > > 


Re: [PATCH] integrity: powerpc: Do not select CA_MACHINE_KEYRING

2023-09-07 Thread Michal Suchánek
Adding more CC's from the original patch, looks like get_maintainers is
not that great for this file.

On Thu, Sep 07, 2023 at 06:52:19PM +0200, Michal Suchanek wrote:
> No other platform needs CA_MACHINE_KEYRING, either.
> 
> This is policy that should be decided by the administrator, not Kconfig
> dependencies.
> 
> cc: joeyli 
> Signed-off-by: Michal Suchanek 
> ---
>  security/integrity/Kconfig | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/security/integrity/Kconfig b/security/integrity/Kconfig
> index 232191ee09e3..b6e074ac0227 100644
> --- a/security/integrity/Kconfig
> +++ b/security/integrity/Kconfig
> @@ -68,8 +68,6 @@ config INTEGRITY_MACHINE_KEYRING
>   depends on INTEGRITY_ASYMMETRIC_KEYS
>   depends on SYSTEM_BLACKLIST_KEYRING
>   depends on LOAD_UEFI_KEYS || LOAD_PPC_KEYS
> - select INTEGRITY_CA_MACHINE_KEYRING if LOAD_PPC_KEYS
> - select INTEGRITY_CA_MACHINE_KEYRING_MAX if LOAD_PPC_KEYS
>   help
>If set, provide a keyring to which Machine Owner Keys (MOK) may
>be added. This keyring shall contain just MOK keys.  Unlike keys
> -- 
> 2.41.0
> 


Re: [PATCH RFC] powerpc/rtas: Make it possible to disable sys_rtas

2023-09-07 Thread Michal Suchánek
On Thu, Sep 07, 2023 at 11:52:44AM -0500, Nathan Lynch wrote:
> Michal Suchánek  writes:
> > On Wed, Sep 06, 2023 at 02:34:59PM -0500, Nathan Lynch wrote:
> >> Michal Suchanek  writes:
> >> 
> >> > Additional patch suggestion to go with the rtas devices:
> >> >
> >> > ---
> >> >
> >> > With most important rtas functions available through different
> >> > interfaces the sys_rtas interface can be disabled completely.
> >> >
> >> > Do not remove it for now to make it possible to run older versions of
> >> > userspace tools that don't support other interfaces.
> >> 
> >> Thanks. I hope making sys_rtas on/off-configurable will make sense
> >> eventually, and I expect this series to get us closer to that. But to me
> >> it seems too early and too coarse. A kernel built with RTAS_SYSCALL=n is
> >> not something I'd want to support or run in production soon. It would
> >> break too many known use cases, and likely some unknown ones as well.
> >
> > There are about 3 known use cases that absolutely need access by other
> > means than sys_rtas to work with lockdown, and about other 3 that would
> > work either way.
> 
> How do you figure that? I count 11 librtas APIs that use work areas (and
> therefore /dev/mem) that are definitely broken under lockdown. Maybe a
> couple of them are unused.

I am referring to this list of known uses:

https://github.com/ibm-power-utilities/librtas/issues/29

rtas_get_vpd is used by lsvpd (through libvpd and librtas)
rtas_platform_dump and rtas_get_indices is used by ppc64-diag
rtas_cfg_connector is used by powerpc-utils and is planned to be
replaced by in-kernel solution

That covers the complex cases.

rtas_set_sysparm is used by ppc64-diag and powerpc-utils
rtas_set_dynamic_indicator, rtas_get_dynamic_sensor are used by
ppc64-diag (related to rtas_get_indices)
rtas_errinjct + open/close are used by powerpc-utils

That's the simple cases.

Additional discussion here https://github.com/linuxppc/issues/issues/252

> > That's not so staggering that it could not be implemented in the kernel
> > from the start.
> > How long it will take for the known userspace users to catch up is
> > anotehr questio but again it's something that can be addressed.
> >
> > Making it possible to turn off sys_rtas will make it easier to uncover
> > the not yet known cases.
> 
> This is also true of making the configuration more granular than on or
> off. You would be free to disallow all calls if desired.
> 
> > What people want to support depends a lot on what is converted, and also
> > the situation of the distribution in question. Fast-rollong
> > distributions may want only the new interface quite soon, and so may
> > distributions that are starting development of new release.
> >
> > All this makes sense only if there is a plan to discontinue sys_rtas
> > entirely. For the simple calls that don't need data buffers it's still
> > usable.
> 
> I don't understand. How would it remain usable for the simple calls if
> it can be completely disabled?

Either the goal is to completely remove sys_rtas, and then an option for
disabling it is helpful for uncovering unexpected problems. Or thre
isn't a goal of sys_rtas removal, and then there is no point in having
an option to disable it, and it can be used for calls that don't need
buffers indefinitely.

Thanks

Michal


Re: [PATCH RFC] powerpc/rtas: Make it possible to disable sys_rtas

2023-09-07 Thread Michal Suchánek
On Wed, Sep 06, 2023 at 02:34:59PM -0500, Nathan Lynch wrote:
> Michal Suchanek  writes:
> 
> > Additional patch suggestion to go with the rtas devices:
> >
> > ---
> >
> > With most important rtas functions available through different
> > interfaces the sys_rtas interface can be disabled completely.
> >
> > Do not remove it for now to make it possible to run older versions of
> > userspace tools that don't support other interfaces.
> 
> Thanks. I hope making sys_rtas on/off-configurable will make sense
> eventually, and I expect this series to get us closer to that. But to me
> it seems too early and too coarse. A kernel built with RTAS_SYSCALL=n is
> not something I'd want to support or run in production soon. It would
> break too many known use cases, and likely some unknown ones as well.

There are about 3 known use cases that absolutely need access by other
means than sys_rtas to work with lockdown, and about other 3 that would
work either way.

That's not so staggering that it could not be implemented in the kernel
from the start.
How long it will take for the known userspace users to catch up is
anotehr questio but again it's something that can be addressed.

Making it possible to turn off sys_rtas will make it easier to uncover
the not yet known cases.

What people want to support depends a lot on what is converted, and also
the situation of the distribution in question. Fast-rollong
distributions may want only the new interface quite soon, and so may
distributions that are starting development of new release.

All this makes sense only if there is a plan to discontinue sys_rtas
entirely. For the simple calls that don't need data buffers it's still
usable.

> It could be more useful in the near term to construct a configurable
> list of RTAS functions that sys_rtas is allowed to expose.

If we really need this level of datail I guess it is too early.

Thanks

Michal


Re: [PATCH RFC 0/2] powerpc/pseries: new character devices for RTAS functions

2023-09-06 Thread Michal Suchánek
Hello,

On Tue, Aug 22, 2023 at 04:33:38PM -0500, Nathan Lynch via B4 Relay wrote:
> This is a proposal for adding chardev-based access to a select subset
> of RTAS functions on the pseries platform.
> 
> The problem: important platform features are enabled on Linux VMs
> through the powerpc-specific rtas() syscall in combination with
> writeable mappings of /dev/mem. In typical usage, this is encapsulated
> behind APIs provided by the librtas library. This paradigm is
> incompatible with lockdown, which prohibits /dev/mem access.
> 
> The solution I'm working on is to add a small pseries-specific
> "driver" for each functional area, exposing the relevant features to
> user space in ways that are compatible with lockdown. In most of these
> areas, I believe it's possible to change librtas to prefer the new
> chardev interfaces without disrupting existing users.

thanks for the driver.

> 
> I've broken down the affected functions into the following areas and
> priorities:
> 
> High priority:
> * VPD retrieval.
> * System parameters: retrieval and update.
> 
> Medium priority:
> * Platform dump retrieval.
> * Light path diagnostics (get/set-dynamic-indicator,
>   get-dynamic-sensor-state, get-indices).
> 
> Low priority (may never happen):
> * Error injection: would have to be carefully restricted.
> * Physical attestation: no known users.
> * LPAR perftools: no known users.
> 
> Out of scope:
> * DLPAR (configure-connector et al): involves device tree updates
>   which must be handled entirely in-kernel for lockdown. This is the
>   object of a separate effort.
> 
> See https://github.com/ibm-power-utilities/librtas/issues/29 for more
> details.
> 
> In this RFC, I've included a single driver for VPD retrieval. Clients
> use ioctl() to obtain a file descriptor-based handle for the VPD they
> want. I think this could be a good model for the other areas too, but
> I'd like to get opinions on it.

The call has parameters so it cannot be reasonably done with sysfs or
similar.

The paramater is string which is unweildy with ioctl, and netlink has
helpers for getting strings into and out of messages without garbage
pointers nad crashes. However, netlink does not have permissions, and
setting permissions for the different platform features available
through rtas is desirable.

Then this is as good as it gets with the kernel facilities Linux
provides.

Thanks

Michal


Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-09-06 Thread Michal Suchánek
On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch 
> 
> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> components using the ibm,get-vpd RTAS function.
> 
> We can expose this to user space with a /dev/papr-vpd character
> device, where the programming model is:
> 
>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
>   int devfd = open("/dev/papr-vpd", O_WRONLY);
>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, );
>   size_t size = lseek(vpdfd, 0, SEEK_END);
>   char *buf = malloc(size);
>   pread(devfd, buf, size, 0);
> 
> When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
> the file contains the result of a complete ibm,get-vpd sequence. The
> file contents are immutable from the POV of user space. To get a new
> view of VPD, clients must create a new handle.
> 
> This design choice insulates user space from most of the complexities
> that ibm,get-vpd brings:
> 
> * ibm,get-vpd must be called more than once to obtain complete
>   results.
> * Only one ibm,get-vpd call sequence should be in progress at a time;
>   concurrent sequences will disrupt each other. Callers must have a
>   protocol for serializing their use of the function.
> * A call sequence in progress may receive a "VPD changed, try again"
>   status, requiring the client to start over. (The driver does not yet
>   handle this, but it should be easy to add.)
> 
> The memory required for the VPD buffers seems acceptable, around 20KB
> for all VPD on one of my systems. And the value of the
> /rtas/ibm,vpd-size DT property (the estimated maximum size of VPD) is
> consistently 300KB across various systems I've checked.
> 
> I've implemented support for this new ABI in the rtas_get_vpd()
> function in librtas, which the vpdupdate command currently uses to
> populate its VPD database. I've verified that an unmodified vpdupdate
> binary generates an identical database when using a librtas.so that
> prefers the new ABI.
> 
> Likely remaining work:
> 
> * Handle RTAS call status -4 (VPD changed) during ibm,get-vpd call
>   sequence.
> * Prevent ibm,get-vpd calls via rtas(2) from disrupting ibm,get-vpd
>   call sequences in this driver.
> * (Maybe) implement a poll method for delivering notifications of
>   potential changes to VPD, e.g. after a partition migration.
> 
> Questions, points for discussion:
> 
> * Am I allocating the ioctl numbers correctly?
> * The only way to discover the size of a VPD buffer is
>   lseek(SEEK_END). fstat() doesn't work for anonymous fds like
>   this. Is this OK, or should the buffer length be discoverable some
>   other way?
> 
> Signed-off-by: Nathan Lynch 
> ---
>  Documentation/userspace-api/ioctl/ioctl-number.rst |   2 +
>  arch/powerpc/include/uapi/asm/papr-vpd.h   |  29 ++
>  arch/powerpc/platforms/pseries/Makefile|   1 +
>  arch/powerpc/platforms/pseries/papr-vpd.c  | 353 
> +
>  4 files changed, 385 insertions(+)
> 
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 4ea5b837399a..a950545bf7cd 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -349,6 +349,8 @@ Code  Seq#Include File
>Comments
>   
> 
>  0xB1  00-1F  PPPoX
>   
> 
> +0xB2  00 arch/powerpc/include/uapi/asm/papr-vpd.h
> powerpc/pseries VPD API
> + 
> 
>  0xB3  00 linux/mmc/ioctl.h
>  0xB4  00-0F  linux/gpio.h
> 
>  0xB5  00-0F  uapi/linux/rpmsg.h  
> 
> diff --git a/arch/powerpc/include/uapi/asm/papr-vpd.h 
> b/arch/powerpc/include/uapi/asm/papr-vpd.h
> new file mode 100644
> index ..aa33217ad5de
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/papr-vpd.h
> @@ -0,0 +1,29 @@
> +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
> +#ifndef _UAPI_PAPR_VPD_H_
> +#define _UAPI_PAPR_VPD_H_
> +
> +#include 
> +#include 
> +
> +struct papr_location_code {
> + /*
> +  * PAPR+ 12.3.2.4 Converged Location Code Rules - Length
> +  * Restrictions. 79 characters plus nul.
> +  */
> + char str[80];
> +};
> +
> +#define PAPR_VPD_IOCTL_BASE 0xb2
> +
> +#define PAPR_VPD_IO(nr) _IO(PAPR_VPD_IOCTL_BASE, nr)
> +#define PAPR_VPD_IOR(nr, type)  _IOR(PAPR_VPD_IOCTL_BASE, nr, type)
> +#define PAPR_VPD_IOW(nr, type)  _IOW(PAPR_VPD_IOCTL_BASE, nr, type)
> 

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-09-05 Thread Michal Suchánek
On Tue, Sep 05, 2023 at 12:42:11PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > On Thu, Aug 31, 2023 at 12:59:25PM -0500, Nathan Lynch wrote:
> ...
> >> You (Michal) seem to favor a kernel-user ABI where user space is allowed
> >> to invoke arbitrary RTAS functions by name. But we already have that in
> >> the form of the rtas() syscall. (User space looks up function tokens by
> >> name in the DT.) The point of the series is that we need to move away
> >> from that. It's too low-level and user space has to use /dev/mem when
> >> invoking any of the less-simple RTAS functions.
> >
> > We don't have that, directly accessing /dev/mem does not really work.
> > And that's what needs fixing in my view.
> >
> > The rtas calls are all mechanically the same, the function implemented
> > here should be able to call any of them if there was a way to specify
> > the call.
> >
> > Given that there is desire to have access to multiple calls I don't
> > think it makes sense to allocate a separate device with different name
> > for each.
> 
> I think it does make sense.
> 
> We explicitly don't want a general "call any RTAS function" API.
> 
> We want tightly scoped APIs that do one thing, or a family of related
> things, but not anything & everything.
> 
> Having different devices for each of those APIs means permissions can be
> granted separately on those devices. So a user/group can be given access
> to the "papr-vpd" device, but not some other unrelated device that also
> happens to expose an RTAS service (eg. error injection).

Yes, it does work as a kludge for setting permissions for individual
calls.

Thanks

Michal


Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-09-04 Thread Michal Suchánek
On Thu, Aug 31, 2023 at 03:34:37PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > Hello,
> >
> > thanks for working on this.
> >
> > On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay wrote:
> >> From: Nathan Lynch 
> >> 
> >> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> >> components using the ibm,get-vpd RTAS function.
> >> 
> >> We can expose this to user space with a /dev/papr-vpd character
> >> device, where the programming model is:
> >> 
> >>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
> >>   int devfd = open("/dev/papr-vpd", O_WRONLY);
> >>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, );
> >>   size_t size = lseek(vpdfd, 0, SEEK_END);
> >>   char *buf = malloc(size);
> >>   pread(devfd, buf, size, 0);
> >> 
> >> When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
> >> the file contains the result of a complete ibm,get-vpd sequence. The
> >
> > Could this be somewhat less obfuscated?
> >
> > What the caller wants is the result of "ibm,get-vpd", which is a
> > well-known string identifier of the rtas call.
> 
> Not really. What the caller wants is *the VPD*. Currently that's done
> by calling the RTAS "ibm,get-vpd" function, but that could change in
> future. There's RTAS calls that have been replaced with a "version 2" in
> the past, that could happen here too. Or the RTAS call could be replaced
> by a hypercall (though unlikely).
> 
> But hopefully if the underlying mechanism changed the kernel would be
> able to hide that detail behind this new API, and users would not need
> to change at all.

With the device named rtas-vpd it's clearly tied to rtas.

If 'version 2' of the call happens it's more likely than not going to
have new data format because limit of current format was reached. Then
emulating that old call with the new one would be counterproductive or
impossible.

Even if the same data is available through different call there is no
problem. If the user really used the well-known "ibm,get-vpd" identifier
documented in the specification then the kernel can translate it
internally to whatever new method for obtaining the data exists. The
current revisions of the specification are not going to go away, and the
identifier is still well-known and documented, even if newer revisions
of the platform use different way to provide the data to the kernel.

Sure, with the current syscall interface it would not work because the
user translates this well-known identifier into a function token, and
passes that to the kernel. With that if the "ibm,get-vpd" is gone the
kernel cannot provide the data anymore.

That was done to make it possible to call functions that were not yet
known when the kernel was written. This is no longer allowed, and the
kernel has functionality for translating function names to tokens for
the functions it does know about. Then it can do the translation for
userspace as well, and when the call is implemented differently in the
future abstract that detail away.

> > Yet this identifier is never passed in. Instead we have this new
> > PAPR_VPD_CREATE_HANDLE. This is a completely new identifier, specific to
> > this call only as is the /dev/papr-vpd device name, another new
> > identifier.
> >
> > Maybe the interface could provide a way to specify the service name?
> >
> >> file contents are immutable from the POV of user space. To get a new
> >> view of VPD, clients must create a new handle.
> >
> > Which is basically the same as creating a file descriptor with open().
> 
> Sort of. But much cleaner becuase you don't need to create a file in the
> filesystem and tell userspace how to find it.

Instead, you create a device in the filesystem, and assign an IOCTL, and
need to tell the userspace how to find both.

> 
> This pattern of creating file descriptors from existing file descriptors
> to model a hiearachy of objects is well established in eg. the KVM and
> DRM APIs.

Yet there is no object hierarchy to speak of here. There is one device
with one ioctl on it. The device name is tied to this specific call so a
different call will need both a new device and new IOCTL.

> 
> > Maybe creating a directory in sysfs or procfs with filenames
> > corresponding to rtas services would do the same job without extra
> > obfuscation?
> 
> It's not obfuscation, it's abstraction. The kernel talks to firmware to
> do things, and provides an API to user space. Not all the details of how
> the firmware works are relevant to user space, including the exact

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-09-04 Thread Michal Suchánek
Hello,

On Thu, Aug 31, 2023 at 12:59:25PM -0500, Nathan Lynch wrote:
> Michal Suchánek  writes:
> > On Thu, Aug 31, 2023 at 09:37:12PM +1000, Michael Ellerman wrote:
> >> Michal Suchánek  writes:
> >> > On Thu, Aug 31, 2023 at 03:34:37PM +1000, Michael Ellerman wrote:
> >> >> Michal Suchánek  writes:
> >> >> > On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay 
> >> >> > wrote:
> >> >> >> From: Nathan Lynch 
> >> >> >> 
> >> >> >> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> >> >> >> components using the ibm,get-vpd RTAS function.
> >> >> >> 
> >> >> >> We can expose this to user space with a /dev/papr-vpd character
> >> >> >> device, where the programming model is:
> >> >> >> 
> >> >> >>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD 
> >> >> >> */
> >> >> >>   int devfd = open("/dev/papr-vpd", O_WRONLY);
> >> >> >>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, );
> >> >> >>   size_t size = lseek(vpdfd, 0, SEEK_END);
> >> >> >>   char *buf = malloc(size);
> >> >> >>   pread(devfd, buf, size, 0);
> >> >> >> 
> >> >> >> When a file descriptor is obtained from 
> >> >> >> ioctl(PAPR_VPD_CREATE_HANDLE),
> >> >> >> the file contains the result of a complete ibm,get-vpd sequence. The
> >> >> >
> >> >> > Could this be somewhat less obfuscated?
> >> >> >
> >> >> > What the caller wants is the result of "ibm,get-vpd", which is a
> >> >> > well-known string identifier of the rtas call.
> >> >> 
> >> >> Not really. What the caller wants is *the VPD*. Currently that's done
> >> >> by calling the RTAS "ibm,get-vpd" function, but that could change in
> >> >> future. There's RTAS calls that have been replaced with a "version 2" in
> >> >> the past, that could happen here too. Or the RTAS call could be replaced
> >> >> by a hypercall (though unlikely).
> >> >> 
> >> >> But hopefully if the underlying mechanism changed the kernel would be
> >> >> able to hide that detail behind this new API, and users would not need
> >> >> to change at all.
> >> >> 
> >> >> > Yet this identifier is never passed in. Instead we have this new
> >> >> > PAPR_VPD_CREATE_HANDLE. This is a completely new identifier, specific 
> >> >> > to
> >> >> > this call only as is the /dev/papr-vpd device name, another new
> >> >> > identifier.
> >> >> >
> >> >> > Maybe the interface could provide a way to specify the service name?
> >> >> >
> >> >> >> file contents are immutable from the POV of user space. To get a new
> >> >> >> view of VPD, clients must create a new handle.
> >> >> >
> >> >> > Which is basically the same as creating a file descriptor with open().
> >> >> 
> >> >> Sort of. But much cleaner becuase you don't need to create a file in the
> >> >> filesystem and tell userspace how to find it.
> >> >
> >> > You very much do. There is the /dev/papr-vpd and PAPR_VPD_CREATE_HANDLE
> >> > which userspace has to know about, the PAPR_VPD_CREATE_HANDLE is not
> >> > even possible to find at all.
> >> 
> >> Well yeah you need the device itself :)
> >
> > And as named it's specific to this call, and new devices will be needed
> > for any additional rtas called implemented.
> >
> >> 
> >> And yes the ioctl is defined in a header, not in the filesystem, but
> >> that's entirely normal for an ioctl based API.
> >
> > Of course, because the ioctl API has no safe way of passing a string
> > identifier for the function. Then it needs to create these obscure IDs.
> >
> > Other APIs that don't have this problem exist.
> 
> Looking at the cover letter for the series, I wonder if my framing and
> word choice is confusing? Instead of "new character devices for RTAS
> functions", what I would really like to convey is "new character devices
> for platform features that are currently onl

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-08-31 Thread Michal Suchánek
On Thu, Aug 31, 2023 at 09:37:12PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > On Thu, Aug 31, 2023 at 03:34:37PM +1000, Michael Ellerman wrote:
> >> Michal Suchánek  writes:
> >> > On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay 
> >> > wrote:
> >> >> From: Nathan Lynch 
> >> >> 
> >> >> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> >> >> components using the ibm,get-vpd RTAS function.
> >> >> 
> >> >> We can expose this to user space with a /dev/papr-vpd character
> >> >> device, where the programming model is:
> >> >> 
> >> >>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
> >> >>   int devfd = open("/dev/papr-vpd", O_WRONLY);
> >> >>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, );
> >> >>   size_t size = lseek(vpdfd, 0, SEEK_END);
> >> >>   char *buf = malloc(size);
> >> >>   pread(devfd, buf, size, 0);
> >> >> 
> >> >> When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
> >> >> the file contains the result of a complete ibm,get-vpd sequence. The
> >> >
> >> > Could this be somewhat less obfuscated?
> >> >
> >> > What the caller wants is the result of "ibm,get-vpd", which is a
> >> > well-known string identifier of the rtas call.
> >> 
> >> Not really. What the caller wants is *the VPD*. Currently that's done
> >> by calling the RTAS "ibm,get-vpd" function, but that could change in
> >> future. There's RTAS calls that have been replaced with a "version 2" in
> >> the past, that could happen here too. Or the RTAS call could be replaced
> >> by a hypercall (though unlikely).
> >> 
> >> But hopefully if the underlying mechanism changed the kernel would be
> >> able to hide that detail behind this new API, and users would not need
> >> to change at all.
> >> 
> >> > Yet this identifier is never passed in. Instead we have this new
> >> > PAPR_VPD_CREATE_HANDLE. This is a completely new identifier, specific to
> >> > this call only as is the /dev/papr-vpd device name, another new
> >> > identifier.
> >> >
> >> > Maybe the interface could provide a way to specify the service name?
> >> >
> >> >> file contents are immutable from the POV of user space. To get a new
> >> >> view of VPD, clients must create a new handle.
> >> >
> >> > Which is basically the same as creating a file descriptor with open().
> >> 
> >> Sort of. But much cleaner becuase you don't need to create a file in the
> >> filesystem and tell userspace how to find it.
> >
> > You very much do. There is the /dev/papr-vpd and PAPR_VPD_CREATE_HANDLE
> > which userspace has to know about, the PAPR_VPD_CREATE_HANDLE is not
> > even possible to find at all.
> 
> Well yeah you need the device itself :)

And as named it's specific to this call, and new devices will be needed
for any additional rtas called implemented.

> 
> And yes the ioctl is defined in a header, not in the filesystem, but
> that's entirely normal for an ioctl based API.

Of course, because the ioctl API has no safe way of passing a string
identifier for the function. Then it needs to create these obscure IDs.

Other APIs that don't have this problem exist.

Thanks

Michal


Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-08-31 Thread Michal Suchánek
On Thu, Aug 31, 2023 at 03:34:37PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > Hello,
> >
> > thanks for working on this.
> >
> > On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay wrote:
> >> From: Nathan Lynch 
> >> 
> >> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> >> components using the ibm,get-vpd RTAS function.
> >> 
> >> We can expose this to user space with a /dev/papr-vpd character
> >> device, where the programming model is:
> >> 
> >>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
> >>   int devfd = open("/dev/papr-vpd", O_WRONLY);
> >>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, );
> >>   size_t size = lseek(vpdfd, 0, SEEK_END);
> >>   char *buf = malloc(size);
> >>   pread(devfd, buf, size, 0);
> >> 
> >> When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
> >> the file contains the result of a complete ibm,get-vpd sequence. The
> >
> > Could this be somewhat less obfuscated?
> >
> > What the caller wants is the result of "ibm,get-vpd", which is a
> > well-known string identifier of the rtas call.
> 
> Not really. What the caller wants is *the VPD*. Currently that's done
> by calling the RTAS "ibm,get-vpd" function, but that could change in
> future. There's RTAS calls that have been replaced with a "version 2" in
> the past, that could happen here too. Or the RTAS call could be replaced
> by a hypercall (though unlikely).
> 
> But hopefully if the underlying mechanism changed the kernel would be
> able to hide that detail behind this new API, and users would not need
> to change at all.

Still the kernel could use the name that is well-known today even if it
uses different implementation internally in the future.

> 
> > Yet this identifier is never passed in. Instead we have this new
> > PAPR_VPD_CREATE_HANDLE. This is a completely new identifier, specific to
> > this call only as is the /dev/papr-vpd device name, another new
> > identifier.
> >
> > Maybe the interface could provide a way to specify the service name?
> >
> >> file contents are immutable from the POV of user space. To get a new
> >> view of VPD, clients must create a new handle.
> >
> > Which is basically the same as creating a file descriptor with open().
> 
> Sort of. But much cleaner becuase you don't need to create a file in the
> filesystem and tell userspace how to find it.
> 
> This pattern of creating file descriptors from existing file descriptors
> to model a hiearachy of objects is well established in eg. the KVM and
> DRM APIs.

> 
> >> The memory required for the VPD buffers seems acceptable, around 20KB
> >> for all VPD on one of my systems. And the value of the
> >> /rtas/ibm,vpd-size DT property (the estimated maximum size of VPD) is
> >> consistently 300KB across various systems I've checked.
> >> 
> >> I've implemented support for this new ABI in the rtas_get_vpd()
> >> function in librtas, which the vpdupdate command currently uses to
> >> populate its VPD database. I've verified that an unmodified vpdupdate
> >> binary generates an identical database when using a librtas.so that
> >> prefers the new ABI.
> >> 
> >> Likely remaining work:
> >> 
> >> * Handle RTAS call status -4 (VPD changed) during ibm,get-vpd call
> >>   sequence.
> >> * Prevent ibm,get-vpd calls via rtas(2) from disrupting ibm,get-vpd
> >>   call sequences in this driver.
> >> * (Maybe) implement a poll method for delivering notifications of
> >>   potential changes to VPD, e.g. after a partition migration.
> >
> > That sounds like something for netlink. If that is desired maybe it
> > should be used in the first place?
> 
> I don't see why that is related to netlink. It's entirely normal for
> file descriptor based APIs to implement poll.
> 
> netlink adds a lot of complexity for zero gain IMO.

It kind of solves the problem with shoehorning something that's not
really a file into file descriptors. You don't have to when not using
them. It also solves how to access multiple services without creating a
large number of files and large number of obscure constants.

Thanks

Michal


Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-08-31 Thread Michal Suchánek
On Thu, Aug 31, 2023 at 03:34:37PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > Hello,
> >
> > thanks for working on this.
> >
> > On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay wrote:
> >> From: Nathan Lynch 
> >> 
> >> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> >> components using the ibm,get-vpd RTAS function.
> >> 
> >> We can expose this to user space with a /dev/papr-vpd character
> >> device, where the programming model is:
> >> 
> >>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
> >>   int devfd = open("/dev/papr-vpd", O_WRONLY);
> >>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, );
> >>   size_t size = lseek(vpdfd, 0, SEEK_END);
> >>   char *buf = malloc(size);
> >>   pread(devfd, buf, size, 0);
> >> 
> >> When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
> >> the file contains the result of a complete ibm,get-vpd sequence. The
> >
> > Could this be somewhat less obfuscated?
> >
> > What the caller wants is the result of "ibm,get-vpd", which is a
> > well-known string identifier of the rtas call.
> 
> Not really. What the caller wants is *the VPD*. Currently that's done
> by calling the RTAS "ibm,get-vpd" function, but that could change in
> future. There's RTAS calls that have been replaced with a "version 2" in
> the past, that could happen here too. Or the RTAS call could be replaced
> by a hypercall (though unlikely).
> 
> But hopefully if the underlying mechanism changed the kernel would be
> able to hide that detail behind this new API, and users would not need
> to change at all.
> 
> > Yet this identifier is never passed in. Instead we have this new
> > PAPR_VPD_CREATE_HANDLE. This is a completely new identifier, specific to
> > this call only as is the /dev/papr-vpd device name, another new
> > identifier.
> >
> > Maybe the interface could provide a way to specify the service name?
> >
> >> file contents are immutable from the POV of user space. To get a new
> >> view of VPD, clients must create a new handle.
> >
> > Which is basically the same as creating a file descriptor with open().
> 
> Sort of. But much cleaner becuase you don't need to create a file in the
> filesystem and tell userspace how to find it.

You very much do. There is the /dev/papr-vpd and PAPR_VPD_CREATE_HANDLE
which userspace has to know about, the PAPR_VPD_CREATE_HANDLE is not
even possible to find at all.

Thanks

Michal


Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-08-30 Thread Michal Suchánek
Hello,

thanks for working on this.

On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch 
> 
> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> components using the ibm,get-vpd RTAS function.
> 
> We can expose this to user space with a /dev/papr-vpd character
> device, where the programming model is:
> 
>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
>   int devfd = open("/dev/papr-vpd", O_WRONLY);
>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, );
>   size_t size = lseek(vpdfd, 0, SEEK_END);
>   char *buf = malloc(size);
>   pread(devfd, buf, size, 0);
> 
> When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
> the file contains the result of a complete ibm,get-vpd sequence. The

Could this be somewhat less obfuscated?

What the caller wants is the result of "ibm,get-vpd", which is a
well-known string identifier of the rtas call.

Yet this identifier is never passed in. Instead we have this new
PAPR_VPD_CREATE_HANDLE. This is a completely new identifier, specific to
this call only as is the /dev/papr-vpd device name, another new
identifier.

Maybe the interface could provide a way to specify the service name?

> file contents are immutable from the POV of user space. To get a new
> view of VPD, clients must create a new handle.

Which is basically the same as creating a file descriptor with open().

Maybe creating a directory in sysfs or procfs with filenames
corresponding to rtas services would do the same job without extra
obfuscation?

> This design choice insulates user space from most of the complexities
> that ibm,get-vpd brings:
> 
> * ibm,get-vpd must be called more than once to obtain complete
>   results.
> * Only one ibm,get-vpd call sequence should be in progress at a time;
>   concurrent sequences will disrupt each other. Callers must have a
>   protocol for serializing their use of the function.
> * A call sequence in progress may receive a "VPD changed, try again"
>   status, requiring the client to start over. (The driver does not yet
>   handle this, but it should be easy to add.)

That certainly reduces the complexity of the user interface making it
much saner.

> The memory required for the VPD buffers seems acceptable, around 20KB
> for all VPD on one of my systems. And the value of the
> /rtas/ibm,vpd-size DT property (the estimated maximum size of VPD) is
> consistently 300KB across various systems I've checked.
> 
> I've implemented support for this new ABI in the rtas_get_vpd()
> function in librtas, which the vpdupdate command currently uses to
> populate its VPD database. I've verified that an unmodified vpdupdate
> binary generates an identical database when using a librtas.so that
> prefers the new ABI.
> 
> Likely remaining work:
> 
> * Handle RTAS call status -4 (VPD changed) during ibm,get-vpd call
>   sequence.
> * Prevent ibm,get-vpd calls via rtas(2) from disrupting ibm,get-vpd
>   call sequences in this driver.
> * (Maybe) implement a poll method for delivering notifications of
>   potential changes to VPD, e.g. after a partition migration.

That sounds like something for netlink. If that is desired maybe it
should be used in the first place?

> Questions, points for discussion:
> 
> * Am I allocating the ioctl numbers correctly?
> * The only way to discover the size of a VPD buffer is
>   lseek(SEEK_END). fstat() doesn't work for anonymous fds like
>   this. Is this OK, or should the buffer length be discoverable some
>   other way?

So long as users have /rtas/ibm,vpd-size as the top bound of the data
they can receive I don't think it's critical to know the current VPD
size.

Thanks

Michal


Re: [PATCH v6 02/14] x86/kexec: refactor for kernel/Kconfig.kexec

2023-08-22 Thread Michal Suchánek
Hello,

On Thu, Jul 13, 2023 at 07:13:57PM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2023/7/13 0:15, Eric DeVolder wrote:
> > The kexec and crash kernel options are provided in the common
> > kernel/Kconfig.kexec. Utilize the common options and provide
> > the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
> > equivalent set of KEXEC and CRASH options.
> > 
> > Signed-off-by: Eric DeVolder 
> > ---
> >  arch/x86/Kconfig | 92 ++--
> >  1 file changed, 19 insertions(+), 73 deletions(-)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 7422db409770..9767a343f7c2 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -2040,88 +2040,34 @@ config EFI_RUNTIME_MAP
> >  
> >  source "kernel/Kconfig.hz"
> >  
> > -config KEXEC
> > -   bool "kexec system call"
> > -   select KEXEC_CORE
> > -   help
> > - kexec is a system call that implements the ability to shutdown your
> > - current kernel, and to start another kernel.  It is like a reboot
> > - but it is independent of the system firmware.   And like a reboot
> > - you can start any kernel with it, not just Linux.
> > -
> > - The name comes from the similarity to the exec system call.
> > -
> > - It is an ongoing process to be certain the hardware in a machine
> > - is properly shutdown, so do not be surprised if this code does not
> > - initially work for you.  As of this writing the exact hardware
> > - interface is strongly in flux, so no good recommendation can be
> > - made.
> > -
> > -config KEXEC_FILE
> > -   bool "kexec file based system call"
> > -   select KEXEC_CORE
> > -   select HAVE_IMA_KEXEC if IMA
> > -   depends on X86_64
> > -   depends on CRYPTO=y
> > -   depends on CRYPTO_SHA256=y
> > -   help
> > - This is new version of kexec system call. This system call is
> > - file based and takes file descriptors as system call argument
> > - for kernel and initramfs as opposed to list of segments as
> > - accepted by previous system call.
> > +config ARCH_SUPPORTS_KEXEC
> > +   def_bool y
> 
> In v5, Joel Fernandes seems to suggest you change it to the following form:

It's unfortunate that the suggestion did not make it to the mailinglist.

> In arch/Kconfig:
> +config ARCH_SUPPORTS_KEXEC
> + bool
> 
> In arch/x86/Kconfig:
> config X86
>   ... ...
> + select ARCH_SUPPORTS_KEXEC
> 
> In arch/arm64/Kconfig:
> config ARM64
>   ... ...
> + select ARCH_SUPPORTS_KEXEC if PM_SLEEP_SMP

Which might work for this case

> 
> etc..
> 
> You can refer to ARCH_HAS_DEBUG_VIRTUAL.
> 
> >  
> > -config ARCH_HAS_KEXEC_PURGATORY
> > -   def_bool KEXEC_FILE
> > +config ARCH_SUPPORTS_KEXEC_FILE
> > +   def_bool X86_64 && CRYPTO && CRYPTO_SHA256
> >  
> > -config KEXEC_SIG
> > -   bool "Verify kernel signature during kexec_file_load() syscall"
> > +config ARCH_SELECTS_KEXEC_FILE
> > +   def_bool y
> > depends on KEXEC_FILE
> > -   help
> > +   select HAVE_IMA_KEXEC if IMA

but not this case, at least not this trivially.

Than for consistency it looks better to keep as is.

Thanks

Michal

> >  
> > - This option makes the kexec_file_load() syscall check for a valid
> > - signature of the kernel image.  The image can still be loaded without
> > - a valid signature unless you also enable KEXEC_SIG_FORCE, though if
> > - there's a signature that we can check, then it must be valid.
> > +config ARCH_HAS_KEXEC_PURGATORY
> > +   def_bool KEXEC_FILE
> >  
> > - In addition to this option, you need to enable signature
> > - verification for the corresponding kernel image type being
> > - loaded in order for this to work.
> > +config ARCH_SUPPORTS_KEXEC_SIG
> > +   def_bool y
> >  
> > -config KEXEC_SIG_FORCE
> > -   bool "Require a valid signature in kexec_file_load() syscall"
> > -   depends on KEXEC_SIG
> > -   help
> > - This option makes kernel signature verification mandatory for
> > - the kexec_file_load() syscall.
> > +config ARCH_SUPPORTS_KEXEC_SIG_FORCE
> > +   def_bool y
> >  
> > -config KEXEC_BZIMAGE_VERIFY_SIG
> > -   bool "Enable bzImage signature verification support"
> > -   depends on KEXEC_SIG
> > -   depends on SIGNED_PE_FILE_VERIFICATION
> > -   select SYSTEM_TRUSTED_KEYRING
> > -   help
> > - Enable bzImage signature verification support.
> > +config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG
> > +   def_bool y
> >  
> > -config CRASH_DUMP
> > -   bool "kernel crash dumps"
> > -   depends on X86_64 || (X86_32 && HIGHMEM)
> > -   help
> > - Generate crash dump after being started by kexec.
> > - This should be normally only set in special crash dump kernels
> > - which are loaded in the main kernel with kexec-tools into
> > - a specially reserved region and then later executed after
> > - a crash by kdump/kexec. The crash dump kernel must be compiled
> > - to a memory address not used by the main kernel or BIOS using
> > - 

Re: [PATCH 1/2] pseries/smp: export the smt level in the SYS FS.

2023-04-14 Thread Michal Suchánek
Hello,

On Fri, Apr 14, 2023 at 10:11:24PM +1000, Michael Ellerman wrote:
> Laurent Dufour  writes:
> > On 13/04/2023 15:37:59, Michael Ellerman wrote:
> >> Laurent Dufour  writes:
> >>> There is no SMT level recorded in the kernel neither in user space.
> >>> Indeed there is no real constraint about that and mixed SMT levels are
> >>> allowed and system is working fine this way.
> >>>
> >>> However when new CPU are added, the kernel is onlining all the threads
> >>> which is leading to mixed SMT levels and confuse end user a bit.
> >>>
> >>> To prevent this exports a SMT level from the kernel so user space
> >>> application like the energy daemon, could read it to adjust their 
> >>> settings.
> >>> There is no action unless recording the value when a SMT value is written
> >>> into the new sysfs entry. User space applications like ppc64_cpu should
> >>> update the sysfs when changing the SMT level to keep the system 
> >>> consistent.
> >>>
> >>> Suggested-by: Srikar Dronamraju 
> >>> Signed-off-by: Laurent Dufour 
> >>> ---
> >>>  arch/powerpc/platforms/pseries/pseries.h |  3 ++
> >>>  arch/powerpc/platforms/pseries/smp.c | 39 
> >>>  2 files changed, 42 insertions(+)
> >>
> >> There is a generic sysfs interface for smt in /sys/devices/system/cpu/smt
> >>
> >> I think we should be enabling that on powerpc and then adapting it to
> >> our needs, rather than adding a pseries specific file.
> >
> > Thanks Michael, I was not aware of this sysfs interface.
> >
> >> Currently the generic code is only aware of SMT on/off, so it would need
> >> to be taught about SMT4 and 8 at least.
> >
> > Do you think we should limit our support to SMT4 and SMT8 only?
> 
> Possibly? Currently the SMT state is represented by an enum:
> 
> enum cpuhp_smt_control {
>   CPU_SMT_ENABLED,
>   CPU_SMT_DISABLED,
>   CPU_SMT_FORCE_DISABLED,
>   CPU_SMT_NOT_SUPPORTED,
>   CPU_SMT_NOT_IMPLEMENTED,
> };
> 
> Adding two states for SMT4 and SMT8 seeems like it might be acceptable.
> 
> On the other hand if we want to support artbitrary SMT values from 3 to
> 8 then it might be better to store that value separately from the state
> enum.
> 
> TBH I'm not sure whether we want to support values other than 1/2/4/8
> via this interface.
> 
> A user who wants some odd numbered SMT value can always configure that
> manually using the existing tools.
> 
> But maybe it's less confusing if this interface supports all values?
> Even if they're unlikely to get much usage.

It looks like ppc64_cpu simply enables first n threads of the CPU where
n is the smt value without any interleaving hoping that the architecture
does the right thing. Under this implementation smt=3 is well-defined.

For the dual cluster P9 CPUs that have two clusters of four this might
work out well for some workloads, and others might want that
interleaving. With that the odd smt values are not well-definedd
anymore.

Nonetheless, if the kernel does support some smt=n parameter whatever
the semantic this should be also supported by the runtime knob.

If it's too difficult to get right there is always that option to not
enable any thread by default, and let the userspace to implement
arbitrarily complex schemes :)

Thanks

Michal


Re: [PATCH] Revert "powerpc/rtas: Implement reentrant rtas call"

2023-04-14 Thread Michal Suchánek
Hello,

On Fri, Sep 16, 2022 at 04:56:18PM -0500, Nathan Lynch wrote:
> "Nicholas Piggin"  writes:
> > On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote:
> >> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote:
> >> > Leonardo Brás  writes:
> >> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote:

> >> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas:
> >> > > > Implement reentrant rtas call") change is incorrect. The "reentrant"
> >> > > > property described in the spec applies only to the individual RTAS
> >> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple 
> >> > > > CPUs
> >> > > > simultaneously, but it must adhere to the more general requirement to
> >> > > > serialize with other RTAS functions.
> >> > > > 
> >> > > 
> >> > > I see. Thanks for explaining that part!
> >> > > I agree: reentrant calls that way don't look as useful on Linux than I
> >> > > previously thought.
> >> > > 
> >> > > OTOH, I think that instead of reverting the change, we could make use 
> >> > > of the
> >> > > correct information and fix the current implementation. (This could 
> >> > > help when we
> >> > > do the same rtas call in multiple cpus)
> >> > 
> >> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do
> >> > that. I'm not seeing the need.
> >> > 
> >> > > I have an idea of a patch to fix this. 
> >> > > Do you think it would be ok if I sent that, to prospect being an 
> >> > > alternative to
> >> > > this reversion?
> >> > 
> >> > It is my preference, and I believe it is more common, to revert to the
> >> > well-understood prior state, imperfect as it may be. The revert can be
> >> > backported to -stable and distros while development and review of
> >> > another approach proceeds.
> >>
> >> Ok then, as long as you are aware of the kdump bug, I'm good.
> >>
> >> FWIW:
> >> Reviewed-by: Leonardo Bras 
> >
> > A shame. I guess a reader/writer lock would not be much help because
> > the crash is probably more likely to hit longer running rtas calls?
> >
> > Alternative is just cheat and do this...?
> >
> > Thanks,
> > Nick
> >
> > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> > index 693133972294..89728714a06e 100644
> > --- a/arch/powerpc/kernel/rtas.c
> > +++ b/arch/powerpc/kernel/rtas.c
> > @@ -26,6 +26,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -97,6 +98,19 @@ static unsigned long lock_rtas(void)
> >  {
> > unsigned long flags;
> >  
> > +   if (atomic_read(_cpu) == raw_smp_processor_id()) {
> > +   /*
> > +* Crash in progress on this CPU. Other CPUs should be
> > +* stopped by now, so skip the lock in case it was being
> > +* held, and is now needed for crashing e.g., kexec
> > +* (machine_kexec_mask_interrupts) requires rtas calls.
> > +*
> > +* It's possible this could have caused rtas state
> > breakage
> > +* but the alternative is deadlock.
> > +*/
> > +   return 0;
> > +   }
> > +
> > local_irq_save(flags);
> > preempt_disable();
> > arch_spin_lock();
> > @@ -105,6 +119,9 @@ static unsigned long lock_rtas(void)
> >  
> >  static void unlock_rtas(unsigned long flags)
> >  {
> > +   if (atomic_read(_cpu) == raw_smp_processor_id())
> > +   return;
> > +
> > arch_spin_unlock();
> > local_irq_restore(flags);
> > preempt_enable();
> 
> Looks correct.
> 
> I wonder - would it be worth making the panic path use a separate
> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in
> RTAS at panic time, then leaving rtas.args untouched might make the
> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we
> incur on the panic path more likely to succeed.

Was some fix for the case of crashing in rtas merged?

Looks like there is none unless I missed something.

The paramater area allocator might help with the latter
but the former does not seem addressed.

Thanks

Michal


Re: [PATCH 1/2] pseries/smp: export the smt level in the SYS FS.

2023-03-31 Thread Michal Suchánek
Hello,

On Fri, Mar 31, 2023 at 05:39:04PM +0200, Laurent Dufour wrote:
> There is no SMT level recorded in the kernel neither in user space.
> Indeed there is no real constraint about that and mixed SMT levels are
> allowed and system is working fine this way.
> 
> However when new CPU are added, the kernel is onlining all the threads
> which is leading to mixed SMT levels and confuse end user a bit.
> 
> To prevent this exports a SMT level from the kernel so user space
> application like the energy daemon, could read it to adjust their settings.
> There is no action unless recording the value when a SMT value is written
> into the new sysfs entry. User space applications like ppc64_cpu should
> update the sysfs when changing the SMT level to keep the system consistent.
> 
> Suggested-by: Srikar Dronamraju 
> Signed-off-by: Laurent Dufour 
> ---
>  arch/powerpc/platforms/pseries/pseries.h |  3 ++
>  arch/powerpc/platforms/pseries/smp.c | 39 
>  2 files changed, 42 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/pseries.h 
> b/arch/powerpc/platforms/pseries/pseries.h
> index f8bce40ebd0c..af0a145af98f 100644
> --- a/arch/powerpc/platforms/pseries/pseries.h
> +++ b/arch/powerpc/platforms/pseries/pseries.h
> @@ -23,7 +23,9 @@ extern int pSeries_machine_check_exception(struct pt_regs 
> *regs);
>  extern long pseries_machine_check_realmode(struct pt_regs *regs);
>  void pSeries_machine_check_log_err(void);
>  
> +
>  #ifdef CONFIG_SMP
> +extern int pseries_smt;
>  extern void smp_init_pseries(void);
>  
>  /* Get state of physical CPU from query_cpu_stopped */
> @@ -34,6 +36,7 @@ int smp_query_cpu_stopped(unsigned int pcpu);
>  #define QCSS_HARDWARE_ERROR -1
>  #define QCSS_HARDWARE_BUSY -2
>  #else
> +#define pseries_smt 1

Is this really needed for anything?

The code using pseries_smt would not compile with a define, and would be
only compiled with SMP enabled anyway so we should not need this.

Thanks

Michal

>  static inline void smp_init_pseries(void) { }
>  #endif
>  
> diff --git a/arch/powerpc/platforms/pseries/smp.c 
> b/arch/powerpc/platforms/pseries/smp.c
> index c597711ef20a..6c382922f8f3 100644
> --- a/arch/powerpc/platforms/pseries/smp.c
> +++ b/arch/powerpc/platforms/pseries/smp.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -45,6 +46,8 @@
>  
>  #include "pseries.h"
>  
> +int pseries_smt;
> +
>  /*
>   * The Primary thread of each non-boot processor was started from the OF 
> client
>   * interface by prom_hold_cpus and is spinning on secondary_hold_spinloop.
> @@ -280,3 +283,39 @@ void __init smp_init_pseries(void)
>  
>   pr_debug(" <- smp_init_pSeries()\n");
>  }
> +
> +static ssize_t pseries_smt_store(struct class *class,
> +  struct class_attribute *attr,
> +  const char *buf, size_t count)
> +{
> + int smt;
> +
> + if (kstrtou32(buf, 0, ) || !smt || smt > (u32) threads_per_core) {
> + pr_err("Invalid pseries_smt specified.\n");
> + return -EINVAL;
> + }
> +
> + pseries_smt = smt;
> +
> + return count;
> +}
> +
> +static ssize_t pseries_smt_show(struct class *class, struct class_attribute 
> *attr,
> +   char *buf)
> +{
> + return sysfs_emit(buf, "%d\n", pseries_smt);
> +}
> +
> +static CLASS_ATTR_RW(pseries_smt);
> +
> +static int __init pseries_smt_init(void)
> +{
> + int rc;
> +
> + pseries_smt = smt_enabled_at_boot;
> + rc = sysfs_create_file(kernel_kobj, _attr_pseries_smt.attr);
> + if (rc)
> + pr_err("Can't create pseries_smt sysfs/kernel entry.\n");
> + return rc;
> +}
> +machine_device_initcall(pseries, pseries_smt_init);
> -- 
> 2.40.0
> 


Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

2023-03-30 Thread Michal Suchánek
On Thu, Mar 30, 2023 at 05:51:57PM +0200, Laurent Dufour wrote:
> On 13/02/2023 16:40:50, Nathan Lynch wrote:
> > Michal Suchánek  writes:
> >> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
> >>> Laurent Dufour  writes:
> >>>> When a new CPU is added, the kernel is activating all its threads. This
> >>>> leads to weird, but functional, result when adding CPU on a SMT 4 system
> >>>> for instance.
> >>>>
> >>>> Here the newly added CPU 1 has 8 threads while the other one has 4 
> >>>> threads
> >>>> active (system has been booted with the 'smt-enabled=4' kernel option):
> >>>>
> >>>> ltcden3-lp12:~ # ppc64_cpu --info
> >>>> Core   0:0*1*2*3*4 5 6 7
> >>>> Core   1:8*9*   10*   11*   12*   13*   14*   15*
> >>>>
> >>>> There is no SMT value in the kernel. It is possible to run unbalanced 
> >>>> LPAR
> >>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest.

> Indeed, that's not so easy. There are multiple ways for the SMT level to be
> impacted:
>  - smt-enabled kernel option
>  - smtstate systemctl service (if activated), saving SMT level at shutdown
> time to restore it a boot time
>  - pseries-energyd daemon (if activated) could turn off threads
>  - ppc64_cpu --smt=x user command
>  - sysfs direct writing to turn off/on specific threads.
> 
> There is no SMT level saved, on "disk" or in the kernel, and any of these
> options can interact in parallel. So from the user space point of view, the
> best we could do is looking for the SMT current values, there could be
> multiple values in the case of a mixed SMT state, peek one value and apply it.
> 
> Extending the drmgr's hook is still valid, and I sent a patch series on the
> powerpc-utils mailing list to achieve that. However, changing the SMT level
> in that hook means that newly added CPU will be first turn on and there is
> a window where this threads could be seen active. Not a big deal but not
> turning on these extra threads looks better to me.

Which means

1) add an option to not onlince hotplugged CPUs by default

2) when a tool that wants to manage CPU onlining is active it can set
the option so that no threads are onlined automatically, and online the
desired threads

3) when no such tool is active the default should be to online all
threeads to preserve compatibility with existing behavior

> That's being said, I can't see any benefit of a user space implementation
> compared to the option I'm proposing in that patch.

The userspace implementation can implement arbitrily complex policy,
that's not something that belongs into the kernel.

Thanks

Michal


Re: [PATCH] modpost: support arbitrary symbol length in modversion

2023-03-13 Thread Michal Suchánek
On Mon, Mar 13, 2023 at 10:53:34PM +0100, Andrea Righi wrote:
> On Mon, Mar 13, 2023 at 10:48:53PM +0100, Michal Suchánek wrote:
> > Hello,
> > 
> > On Mon, Mar 13, 2023 at 09:32:16PM +0100, Andrea Righi wrote:
> > > On Wed, Jan 11, 2023 at 04:11:51PM +, Gary Guo wrote:
> > > > Currently modversion uses a fixed size array of size (64 - sizeof(long))
> > > > to store symbol names, thus placing a hard limit on length of symbols.
> > > > Rust symbols (which encodes crate and module names) can be quite a bit
> > > > longer. The length limit in kallsyms is increased to 512 for this 
> > > > reason.
> > > > 
> > > > It's a waste of space to simply expand the fixed array size to 512 in
> > > > modversion info entries. I therefore make it variably sized, with offset
> > > > to the next entry indicated by the initial "next" field.
> > > > 
> > > > In addition to supporting longer-than-56/60 byte symbols, this patch 
> > > > also
> > > > reduce the size for short symbols by getting rid of excessive 0 
> > > > paddings.
> > > > There are still some zero paddings to ensure "next" and "crc" fields are
> > > > properly aligned.
> > > > 
> > > > This patch does have a tiny drawback that it makes ".mod.c" files 
> > > > generated
> > > > a bit less easy to read, as code like
> > > > 
> > > > "\x08\x00\x00\x00\x78\x56\x34\x12"
> > > > "symbol\0\0"
> > > > 
> > > > is generated as opposed to
> > > > 
> > > > { 0x12345678, "symbol" },
> > > > 
> > > > because the structure is now variable-length. But hopefully nobody reads
> > > > the generated file :)
> > > > 
> > > > Link: b8a94bfb3395 ("kallsyms: increase maximum kernel symbol length to 
> > > > 512")
> > > > Link: https://github.com/Rust-for-Linux/linux/pull/379
> > > > 
> > > > Signed-off-by: Gary Guo 
> > > 
> > > Is there any newer version of this patch?
> > > 
> > > I'm doing some tests with it, but I'm getting boot failures on ppc64
> > > with this applied (at boot kernel is spitting out lots of oops'es and
> > > unfortunately it's really hard to copy paste or just read them from the
> > > console).
> > 
> > Are you using the ELF ABI v1 or v2?
> > 
> > v1 may have some additional issues when it comes to these symbol tables.
> > 
> > Thanks
> > 
> > Michal
> 
> I have CONFIG_PPC64_ELF_ABI_V2=y in my .config, so I guess I'm using v2.
> 
> BTW, the issue seems to be in dedotify_versions(), as a silly test I
> tried to comment out this function completely to be a no-op and now my
> system boots fine (but I guess I'm probably breaking something else).

Probably not. You should not have the extra leading dot on ABI v2. So if
dedotify does something that means something generates and then expects
back symbols with a leading dot, and this workaround for ABI v1 breaks
that. Or maybe it is called when it shouldn't.

Thanks

Michal


Re: [PATCH] modpost: support arbitrary symbol length in modversion

2023-03-13 Thread Michal Suchánek
Hello,

On Mon, Mar 13, 2023 at 09:32:16PM +0100, Andrea Righi wrote:
> On Wed, Jan 11, 2023 at 04:11:51PM +, Gary Guo wrote:
> > Currently modversion uses a fixed size array of size (64 - sizeof(long))
> > to store symbol names, thus placing a hard limit on length of symbols.
> > Rust symbols (which encodes crate and module names) can be quite a bit
> > longer. The length limit in kallsyms is increased to 512 for this reason.
> > 
> > It's a waste of space to simply expand the fixed array size to 512 in
> > modversion info entries. I therefore make it variably sized, with offset
> > to the next entry indicated by the initial "next" field.
> > 
> > In addition to supporting longer-than-56/60 byte symbols, this patch also
> > reduce the size for short symbols by getting rid of excessive 0 paddings.
> > There are still some zero paddings to ensure "next" and "crc" fields are
> > properly aligned.
> > 
> > This patch does have a tiny drawback that it makes ".mod.c" files generated
> > a bit less easy to read, as code like
> > 
> > "\x08\x00\x00\x00\x78\x56\x34\x12"
> > "symbol\0\0"
> > 
> > is generated as opposed to
> > 
> > { 0x12345678, "symbol" },
> > 
> > because the structure is now variable-length. But hopefully nobody reads
> > the generated file :)
> > 
> > Link: b8a94bfb3395 ("kallsyms: increase maximum kernel symbol length to 
> > 512")
> > Link: https://github.com/Rust-for-Linux/linux/pull/379
> > 
> > Signed-off-by: Gary Guo 
> 
> Is there any newer version of this patch?
> 
> I'm doing some tests with it, but I'm getting boot failures on ppc64
> with this applied (at boot kernel is spitting out lots of oops'es and
> unfortunately it's really hard to copy paste or just read them from the
> console).

Are you using the ELF ABI v1 or v2?

v1 may have some additional issues when it comes to these symbol tables.

Thanks

Michal


Re: [PATCH v4 1/2] powerpc/mm: Support execute-only memory on the Radix MMU

2023-03-08 Thread Michal Suchánek
Hello,

On Wed, Aug 31, 2022 at 11:13:59PM +1000, Michael Ellerman wrote:
> On Wed, 17 Aug 2022 15:06:39 +1000, Russell Currey wrote:
> > Add support for execute-only memory (XOM) for the Radix MMU by using an
> > execute-only mapping, as opposed to the RX mapping used by powerpc's
> > other MMUs.
> > 
> > The Hash MMU already supports XOM through the execute-only pkey,
> > which is a separate mechanism shared with x86.  A PROT_EXEC-only mapping
> > will map to RX, and then the pkey will be applied on top of it.
> > 
> > [...]
> 
> Applied to powerpc/next.
> 
> [1/2] powerpc/mm: Support execute-only memory on the Radix MMU
>   
> https://git.kernel.org/powerpc/c/395cac7752b905318ae454a8b859d4c190485510

This breaks libaio tests (on POWER9 hash PowerVM):
https://pagure.io/libaio/blob/master/f/harness/cases/5.t#_43

cases/5.p
expect   512: (w), res =   512 [Success]
expect   512: (r), res =   512 [Success]
expect   512: (r), res =   512 [Success]
expect   512: (w), res =   512 [Success]
expect   512: (w), res =   512 [Success]
expect   -14: (r), res =   -14 [Bad address]
expect   512: (r), res =   512 [Success]
expect   512: (w), res =   512 [Success]
test cases/5.t completed PASSED.

cases/5.p
expect   512: (w), res =   512 [Success]
expect   512: (r), res =   512 [Success]
expect   512: (r), res =   512 [Success]
expect   512: (w), res =   512 [Success]
expect   512: (w), res =   512 [Success]
expect   -14: (r), res =   -14 [Bad address]
expect   512: (r), res =   512 [Success]
expect   -14: (w), res =   512 [Success] -- FAILED
test cases/5.t completed FAILED.

Can you have a look if that test assumption is OK?

Thanks

Michal


Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

2023-02-13 Thread Michal Suchánek
Hello,

On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
> Laurent Dufour  writes:
> > When a new CPU is added, the kernel is activating all its threads. This
> > leads to weird, but functional, result when adding CPU on a SMT 4 system
> > for instance.
> >
> > Here the newly added CPU 1 has 8 threads while the other one has 4 threads
> > active (system has been booted with the 'smt-enabled=4' kernel option):
> >
> > ltcden3-lp12:~ # ppc64_cpu --info
> > Core   0:0*1*2*3*4 5 6 7
> > Core   1:8*9*   10*   11*   12*   13*   14*   15*
> >
> > There is no SMT value in the kernel. It is possible to run unbalanced LPAR
> > with 2 threads for a CPU, 4 for another one, and 5 on the latest.
> >
> > To work around this possibility, and assuming that the LPAR run with the
> > same number of threads for each CPU, which is the common case,
> 
> I am skeptical at best of baking that assumption into this code. Mixed
> SMT modes within a partition doesn't strike me as an unreasonable
> possibility for some use cases. And if that's wrong, then we should just
> add a global smt value instead of using heuristics.
> 
> > the number
> > of active threads of the CPU doing the hot-plug operation is computed. Only
> > that number of threads will be activated for the newly added CPU.
> >
> > This way on a LPAR running in SMT=4, newly added CPU will be running 4
> > threads, which is what a end user would expect.
> 
> I could see why most users would prefer this new behavior. But surely
> some users have come to expect the existing behavior, which has been in
> place for years, and developed workarounds that might be broken by this
> change?
> 
> I would suggest that to handle this well, we need to give user space
> more ability to tell the kernel what actions to take on added cores, on
> an opt-in basis.
> 
> This could take the form of extending the DLPAR sysfs command set:
> 
> Option 1 - Add a flag that tells the kernel not to online any threads at
> all; user space will online the desired threads later.
> 
> Option 2 - Add an option that tells the kernel which SMT mode to apply.

powerpc-utils grew some drmgr hooks recently so maybe the policy can be
moved to userspace?

Thanks

Michal


Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-20 Thread Michal Suchánek
Hello,

On Fri, Jan 20, 2023 at 11:23:39AM -0600, Rob Herring wrote:
> On Thu, Jan 19, 2023 at 3:53 AM Michal Suchanek  wrote:
> >
> > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > breaks build because of wrong argument to snprintf. That certainly
> > avoids the runtime error but is not the intended outcome.
> >
> > Also use standard device name format of-display.N for all created
> > devices.
> >
> > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2: Update the device name format
> > ---
> >  drivers/of/platform.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index f2a5d679a324..8c1b1de22036 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -525,7 +525,9 @@ static int __init 
> > of_platform_default_populate_init(void)
> > if (IS_ENABLED(CONFIG_PPC)) {
> > struct device_node *boot_display = NULL;
> > struct platform_device *dev;
> > -   int display_number = 1;
> > +   int display_number = 0;
> > +   char buf[14];
> > +   char *of_display_format = "of-display.%d";
> 
> static const as suggested and can we just move on please...
Only const, static could be dodgy

> > int ret;
> >
> > /* Check if we have a MacOS display without a node spec */
> > @@ -556,7 +558,10 @@ static int __init 
> > of_platform_default_populate_init(void)
> > if (!of_get_property(node, "linux,opened", NULL) ||
> > !of_get_property(node, "linux,boot-display", 
> > NULL))
> > continue;
> > -   dev = of_platform_device_create(node, "of-display", 
> > NULL);
> > +   ret = snprintf(buf, sizeof(buf), of_display_format, 
> > display_number++);
> 
> The boot display is always "of-display.0". Just use the fixed string
> here. Then we can get rid of the whole debate around static const.

I prefer to use the same format string when the names should be
consistent. Also it would resurrect the starting from 1 debate.

But if you really want to have two strings I do not care all that much.

> 
> > +   if (ret >= sizeof(buf))
> > +   continue;
> 
> This only happens if display_number becomes too big. Why continue on?
> The next iteration will fail too.

Yes, there is no need to continue with the loop.

Thanks

Michal

> 
> > +   dev = of_platform_device_create(node, buf, NULL);
> > if (WARN_ON(!dev))
> > return -ENOMEM;
> > boot_display = node;
> > @@ -564,10 +569,9 @@ static int __init 
> > of_platform_default_populate_init(void)
> > }
> >
> > for_each_node_by_type(node, "display") {
> > -   char *buf[14];
> > if (!of_get_property(node, "linux,opened", NULL) || 
> > node == boot_display)
> > continue;
> > -   ret = snprintf(buf, "of-display-%d", 
> > display_number++);
> > +   ret = snprintf(buf, sizeof(buf), of_display_format, 
> > display_number++);
> > if (ret >= sizeof(buf))
> > continue;
> 
> Here too in the original change.
> 
> > of_platform_device_create(node, buf, NULL);
> > --
> > 2.35.3
> >


Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-20 Thread Michal Suchánek
On Thu, Jan 19, 2023 at 11:34:46AM +0100, Michal Suchánek wrote:
> Hello,
> 
> On Thu, Jan 19, 2023 at 10:24:07AM +, Christophe Leroy wrote:
> > 
> > 
> > Le 19/01/2023 à 10:53, Michal Suchanek a écrit :
> > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > breaks build because of wrong argument to snprintf. That certainly
> > > avoids the runtime error but is not the intended outcome.
> > > 
> > > Also use standard device name format of-display.N for all created
> > > devices.
> > > 
> > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > Signed-off-by: Michal Suchanek 
> > > ---
> > > v2: Update the device name format
> > > ---
> > >   drivers/of/platform.c | 12 
> > >   1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > index f2a5d679a324..8c1b1de22036 100644
> > > --- a/drivers/of/platform.c
> > > +++ b/drivers/of/platform.c
> > > @@ -525,7 +525,9 @@ static int __init 
> > > of_platform_default_populate_init(void)
> > >   if (IS_ENABLED(CONFIG_PPC)) {
> > >   struct device_node *boot_display = NULL;
> > >   struct platform_device *dev;
> > > - int display_number = 1;
> > > + int display_number = 0;
> > > + char buf[14];
> > 
> > Can you declare that in the for block where it is used instead ?
> 
> No, there are two for blocks.
> 
> > 
> > > + char *of_display_format = "of-display.%d";
> > 
> > Should be const ?
> 
> Yes, could be.
> 
> > 
> > >   int ret;
> > >   
> > >   /* Check if we have a MacOS display without a node spec 
> > > */
> > > @@ -556,7 +558,10 @@ static int __init 
> > > of_platform_default_populate_init(void)
> > >   if (!of_get_property(node, "linux,opened", 
> > > NULL) ||
> > >   !of_get_property(node, 
> > > "linux,boot-display", NULL))
> > >   continue;
> > > - dev = of_platform_device_create(node, "of-display", 
> > > NULL);
> > > + ret = snprintf(buf, sizeof(buf), of_display_format, 
> > > display_number++);
> > > + if (ret >= sizeof(buf))
> > > + continue;
> > 
> > 
> > Can you make buf big enough to avoid that ?
> 
> It would be a bit fragile that way.
> 
> The buffer would have to theoretically accomodate
> "of-display.-9223372036854775808", and any change to the format requires
> recalculating the length, by hand.
> 
> Of course, the memory would run out way before allocating that many
> devices so it's kind of pointless to try and accomodate all possible
> device numbers.
> 
> > 
> > And by the way could it be called something else than 'buf' ?
> > 
> > See exemple here : 
> > https://elixir.bootlin.com/linux/v6.1/source/drivers/fsi/fsi-occ.c#L690
> 
> Yes, that is quite possible. Nonetheless, just like 'ret' generic
> variable names also work.

And in fact judicious use of short generic variable names is more
readeable than naming all variables foobar_* as far as I am concerned.
Of course, YMMV.

Thanks

Michal


Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-20 Thread Michal Suchánek
On Fri, Jan 20, 2023 at 12:39:23PM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 20.01.23 um 12:27 schrieb Michal Suchánek:
> > Hello,
> > 
> > On Thu, Jan 19, 2023 at 04:20:57PM +0100, Thomas Zimmermann wrote:
> > > Hi
> > > 
> > > Am 19.01.23 um 14:23 schrieb Michal Suchánek:
> > > > On Thu, Jan 19, 2023 at 02:11:13PM +0100, Thomas Zimmermann wrote:
> > > > > Hi
> > > > > 
> > > > > Am 19.01.23 um 11:24 schrieb Christophe Leroy:
> > > > > > 
> > > > > > 
> > > > > > Le 19/01/2023 à 10:53, Michal Suchanek a écrit :
> > > > > > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > > > > > breaks build because of wrong argument to snprintf. That certainly
> > > > > > > avoids the runtime error but is not the intended outcome.
> > > > > > > 
> > > > > > > Also use standard device name format of-display.N for all created
> > > > > > > devices.
> > > > > > > 
> > > > > > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > > > > > Signed-off-by: Michal Suchanek 
> > > > > > > ---
> > > > > > > v2: Update the device name format
> > > > > > > ---
> > > > > > >  drivers/of/platform.c | 12 
> > > > > > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > > > > > index f2a5d679a324..8c1b1de22036 100644
> > > > > > > --- a/drivers/of/platform.c
> > > > > > > +++ b/drivers/of/platform.c
> > > > > > > @@ -525,7 +525,9 @@ static int __init 
> > > > > > > of_platform_default_populate_init(void)
> > > > > > >   if (IS_ENABLED(CONFIG_PPC)) {
> > > > > > >   struct device_node *boot_display = NULL;
> > > > > > >   struct platform_device *dev;
> > > > > > > - int display_number = 1;
> > > > > > > + int display_number = 0;
> > > > > > > + char buf[14];
> > > > > > 
> > > > > > Can you declare that in the for block where it is used instead ?
> > > > > > 
> > > > > > > + char *of_display_format = "of-display.%d";
> > > > > > 
> > > > > > Should be const ?
> > > > > 
> > > > > That should be static const of_display_format[] = then
> > > > 
> > > > Why? It sounds completely fine to have a const pointer to a string
> > > > constatnt.
> > > 
> > > Generally speaking:
> > > 
> > > 'static' because your const pointer is then not a local variable, so it
> > > takes pressure off the stack. For global variables, you don't want them to
> > > show up in any linker symbol tables.
> > 
> > This sounds a lot like an exemplar case of premature optimization.
> > A simplistic compiler might do exactly what you say, and allocate a slot
> > for the variable on the stack the moment the function is entered.
> > 
> > However, in real compilers there is no stack pressure from having a
> > local variable:
> >   - the compiler can put the variable into a register
> >   - it can completely omit the variable before and after it's actually
> > used which is that specific function call
> > 
> > > The string "of-display.%d" is stored as an array in the ELF data section.
> > > And your char pointer is a reference to that array. For static pointers,
> > > these indirections take CPU cycles to update when the loader has to 
> > > relocate
> > 
> > Provided that the char pointer ever exists in the compiled code. Its
> > address is not taken so it does not need to.
> > 
> > > sections. If you declare of_display_format[] directly as array, you avoid
> > > the reference and work directly with the array.
> > > 
> > > Of course, this is a kernel module and the string is self-contained within
> > > the function. So the compiler can probably detect that and optimize the 
> > > code
> > > to be like the 'static const []' version. It's still good to follow best
> > &

Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-20 Thread Michal Suchánek
Hello,

On Thu, Jan 19, 2023 at 04:20:57PM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 19.01.23 um 14:23 schrieb Michal Suchánek:
> > On Thu, Jan 19, 2023 at 02:11:13PM +0100, Thomas Zimmermann wrote:
> > > Hi
> > > 
> > > Am 19.01.23 um 11:24 schrieb Christophe Leroy:
> > > > 
> > > > 
> > > > Le 19/01/2023 à 10:53, Michal Suchanek a écrit :
> > > > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > > > breaks build because of wrong argument to snprintf. That certainly
> > > > > avoids the runtime error but is not the intended outcome.
> > > > > 
> > > > > Also use standard device name format of-display.N for all created
> > > > > devices.
> > > > > 
> > > > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > > > Signed-off-by: Michal Suchanek 
> > > > > ---
> > > > > v2: Update the device name format
> > > > > ---
> > > > > drivers/of/platform.c | 12 
> > > > > 1 file changed, 8 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > > > index f2a5d679a324..8c1b1de22036 100644
> > > > > --- a/drivers/of/platform.c
> > > > > +++ b/drivers/of/platform.c
> > > > > @@ -525,7 +525,9 @@ static int __init 
> > > > > of_platform_default_populate_init(void)
> > > > >   if (IS_ENABLED(CONFIG_PPC)) {
> > > > >   struct device_node *boot_display = NULL;
> > > > >   struct platform_device *dev;
> > > > > - int display_number = 1;
> > > > > + int display_number = 0;
> > > > > + char buf[14];
> > > > 
> > > > Can you declare that in the for block where it is used instead ?
> > > > 
> > > > > + char *of_display_format = "of-display.%d";
> > > > 
> > > > Should be const ?
> > > 
> > > That should be static const of_display_format[] = then
> > 
> > Why? It sounds completely fine to have a const pointer to a string
> > constatnt.
> 
> Generally speaking:
> 
> 'static' because your const pointer is then not a local variable, so it
> takes pressure off the stack. For global variables, you don't want them to
> show up in any linker symbol tables.

This sounds a lot like an exemplar case of premature optimization.
A simplistic compiler might do exactly what you say, and allocate a slot
for the variable on the stack the moment the function is entered.

However, in real compilers there is no stack pressure from having a
local variable:
 - the compiler can put the variable into a register
 - it can completely omit the variable before and after it's actually
   used which is that specific function call

> The string "of-display.%d" is stored as an array in the ELF data section.
> And your char pointer is a reference to that array. For static pointers,
> these indirections take CPU cycles to update when the loader has to relocate

Provided that the char pointer ever exists in the compiled code. Its
address is not taken so it does not need to.

> sections. If you declare of_display_format[] directly as array, you avoid
> the reference and work directly with the array.
> 
> Of course, this is a kernel module and the string is self-contained within
> the function. So the compiler can probably detect that and optimize the code
> to be like the 'static const []' version. It's still good to follow best
> practices, as someone might copy from this function.

If it could not detect it there would be a lot of trouble all around.

Thanks

Michal


Re: [PATCH] modpost: support arbitrary symbol length in modversion

2023-01-19 Thread Michal Suchánek
On Thu, Jan 19, 2023 at 03:09:36PM +, Gary Guo wrote:
> On Tue, 17 Jan 2023 11:22:45 -0800
> Lucas De Marchi  wrote:
> 
> > On Tue, Jan 17, 2023 at 06:51:44PM +0100, Michal Suchánek wrote:
> > >Hello,
> > >
> > >On Fri, Jan 13, 2023 at 06:18:41PM +, Gary Guo wrote:  
> > >> On Thu, 12 Jan 2023 14:40:59 -0700
> > >> Lucas De Marchi  wrote:
> > >>  
> > >> > On Wed, Jan 11, 2023 at 04:11:51PM +, Gary Guo wrote:  
> > >> > >
> > >> > > struct modversion_info {
> > >> > >- unsigned long crc;
> > >> > >- char name[MODULE_NAME_LEN];
> > >> > >+ /* Offset of the next modversion entry in relation to this one. 
> > >> > >*/
> > >> > >+ u32 next;
> > >> > >+ u32 crc;
> > >> > >+ char name[0];  
> > >> >
> > >> > although not really exported as uapi, this will break userspace as 
> > >> > this is
> > >> > used in the  elf file generated for the modules. I think
> > >> > this change must be made in a backward compatible way and kmod updated
> > >> > to deal with the variable name length:
> > >> >
> > >> > kmod $ git grep "\[64"
> > >> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint32_t)];
> > >> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint64_t)];
> > >> >
> > >> > in kmod we have both 32 and 64 because a 64-bit kmod can read both 32
> > >> > and 64 bit module, and vice versa.
> > >> >  
> > >>
> > >> Hi Lucas,
> > >>
> > >> Thanks for the information.
> > >>
> > >> The change can't be "truly" backward compatible, in a sense that
> > >> regardless of the new format we choose, kmod would not be able to decode
> > >> symbols longer than "64 - sizeof(long)" bytes. So the list it retrieves
> > >> is going to be incomplete, isn't it?
> > >>
> > >> What kind of backward compatibility should be expected? It could be:
> > >> * short symbols can still be found by old versions of kmod, but not
> > >>   long symbols;  
> > >
> > >That sounds good. Not everyone is using rust, and with this option
> > >people who do will need to upgrade tooling, and people who don't care
> > >don't need to do anything.  
> > 
> > that could be it indeed. My main worry here is:
> > 
> > "After the support is added in kmod, kmod needs to be able to output the
> > correct information regardless if the module is from before/after the
> > change in the kernel and also without relying on kernel version."
> > Just changing the struct modversion_info doesn't make that possible.
> > 
> > Maybe adding the long symbols in another section?
> 
> Yeah, that's what I imagined how it could be implemented when I said
> "short symbols can still be found by old versions of kmod, but not long
> symbols".
> 
> > Or ble just increase to 512 and add the size to a
> > "__versions_hdr" section. If we then output a max size per module,
> > this would offset a little bit the additional size gained for the
> > modules using rust.
> 
> That format isn't really elegant IMO. And symbol length can vary a lot,
> having all symbols dictated by the longest symbol doesn't sound a good
> approach.
> 
> > And the additional 0's should compress well
> > so I'm not sure the additional size is that much relevant here.
> 
> I am not sure why compression is mentioned here. I don't think section
> in .ko files are compressed.

There is the option to compress the whole .ko files, and it's commonly
used.

Thanks

Michal


Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Michal Suchánek
On Thu, Jan 19, 2023 at 02:11:13PM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 19.01.23 um 11:24 schrieb Christophe Leroy:
> > 
> > 
> > Le 19/01/2023 à 10:53, Michal Suchanek a écrit :
> > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > breaks build because of wrong argument to snprintf. That certainly
> > > avoids the runtime error but is not the intended outcome.
> > > 
> > > Also use standard device name format of-display.N for all created
> > > devices.
> > > 
> > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > > Signed-off-by: Michal Suchanek 
> > > ---
> > > v2: Update the device name format
> > > ---
> > >drivers/of/platform.c | 12 
> > >1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > index f2a5d679a324..8c1b1de22036 100644
> > > --- a/drivers/of/platform.c
> > > +++ b/drivers/of/platform.c
> > > @@ -525,7 +525,9 @@ static int __init 
> > > of_platform_default_populate_init(void)
> > >   if (IS_ENABLED(CONFIG_PPC)) {
> > >   struct device_node *boot_display = NULL;
> > >   struct platform_device *dev;
> > > - int display_number = 1;
> > > + int display_number = 0;
> > > + char buf[14];
> > 
> > Can you declare that in the for block where it is used instead ?
> > 
> > > + char *of_display_format = "of-display.%d";
> > 
> > Should be const ?
> 
> That should be static const of_display_format[] = then

Why? It sounds completely fine to have a const pointer to a string
constatnt.

Thanks

Michal

> 
> > 
> > >   int ret;
> > >   /* Check if we have a MacOS display without a node spec 
> > > */
> > > @@ -556,7 +558,10 @@ static int __init 
> > > of_platform_default_populate_init(void)
> > >   if (!of_get_property(node, "linux,opened", 
> > > NULL) ||
> > >   !of_get_property(node, 
> > > "linux,boot-display", NULL))
> > >   continue;
> > > - dev = of_platform_device_create(node, "of-display", 
> > > NULL);
> > > + ret = snprintf(buf, sizeof(buf), of_display_format, 
> > > display_number++);
> > > + if (ret >= sizeof(buf))
> > > + continue;
> > 
> > 
> > Can you make buf big enough to avoid that ?
> > 
> > And by the way could it be called something else than 'buf' ?
> > 
> > See exemple here :
> > https://elixir.bootlin.com/linux/v6.1/source/drivers/fsi/fsi-occ.c#L690
> > 
> > 
> > > + dev = of_platform_device_create(node, buf, NULL);
> > >   if (WARN_ON(!dev))
> > >   return -ENOMEM;
> > >   boot_display = node;
> > > @@ -564,10 +569,9 @@ static int __init 
> > > of_platform_default_populate_init(void)
> > >   }
> > >   for_each_node_by_type(node, "display") {
> > > - char *buf[14];
> > >   if (!of_get_property(node, "linux,opened", 
> > > NULL) || node == boot_display)
> > >   continue;
> > > - ret = snprintf(buf, "of-display-%d", display_number++);
> > > + ret = snprintf(buf, sizeof(buf), of_display_format, 
> > > display_number++);
> > >   if (ret >= sizeof(buf))
> > >   continue;
> > >   of_platform_device_create(node, buf, NULL);
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev





Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code

2023-01-19 Thread Michal Suchánek
Hello,

On Thu, Jan 19, 2023 at 10:24:07AM +, Christophe Leroy wrote:
> 
> 
> Le 19/01/2023 à 10:53, Michal Suchanek a écrit :
> > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > breaks build because of wrong argument to snprintf. That certainly
> > avoids the runtime error but is not the intended outcome.
> > 
> > Also use standard device name format of-display.N for all created
> > devices.
> > 
> > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique")
> > Signed-off-by: Michal Suchanek 
> > ---
> > v2: Update the device name format
> > ---
> >   drivers/of/platform.c | 12 
> >   1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index f2a5d679a324..8c1b1de22036 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -525,7 +525,9 @@ static int __init 
> > of_platform_default_populate_init(void)
> > if (IS_ENABLED(CONFIG_PPC)) {
> > struct device_node *boot_display = NULL;
> > struct platform_device *dev;
> > -   int display_number = 1;
> > +   int display_number = 0;
> > +   char buf[14];
> 
> Can you declare that in the for block where it is used instead ?

No, there are two for blocks.

> 
> > +   char *of_display_format = "of-display.%d";
> 
> Should be const ?

Yes, could be.

> 
> > int ret;
> >   
> > /* Check if we have a MacOS display without a node spec */
> > @@ -556,7 +558,10 @@ static int __init 
> > of_platform_default_populate_init(void)
> > if (!of_get_property(node, "linux,opened", NULL) ||
> > !of_get_property(node, "linux,boot-display", NULL))
> > continue;
> > -   dev = of_platform_device_create(node, "of-display", 
> > NULL);
> > +   ret = snprintf(buf, sizeof(buf), of_display_format, 
> > display_number++);
> > +   if (ret >= sizeof(buf))
> > +   continue;
> 
> 
> Can you make buf big enough to avoid that ?

It would be a bit fragile that way.

The buffer would have to theoretically accomodate
"of-display.-9223372036854775808", and any change to the format requires
recalculating the length, by hand.

Of course, the memory would run out way before allocating that many
devices so it's kind of pointless to try and accomodate all possible
device numbers.

> 
> And by the way could it be called something else than 'buf' ?
> 
> See exemple here : 
> https://elixir.bootlin.com/linux/v6.1/source/drivers/fsi/fsi-occ.c#L690

Yes, that is quite possible. Nonetheless, just like 'ret' generic
variable names also work.

Thanks

Michal


Re: [PATCH] of: Make of framebuffer devices unique

2023-01-19 Thread Michal Suchánek
On Thu, Jan 19, 2023 at 09:00:44AM +0100, Thomas Zimmermann wrote:
> Hi Michal,
> 
> thanks for fixing this issue. But the review time was way too short. Please
> see my comments below.
> 
> Am 18.01.23 um 22:46 schrieb Michal Suchánek:
> > On Wed, Jan 18, 2023 at 09:13:05PM +0100, Erhard F. wrote:
> > > On Tue, 17 Jan 2023 17:58:04 +0100
> > > Michal Suchanek  wrote:
> > > 
> > > > Since Linux 5.19 this error is observed:
> > > > 
> > > > sysfs: cannot create duplicate filename '/devices/platform/of-display'
> > > > 
> > > > This is because multiple devices with the same name 'of-display' are
> > > > created on the same bus.
> > > > 
> > > > Update the code to create numbered device names for the non-boot
> > > > disaplay.
> > > > 
> > > > cc: linuxppc-dev@lists.ozlabs.org
> > > > References: https://bugzilla.kernel.org/show_bug.cgi?id=216095
> > > > Fixes: 52b1b46c39ae ("of: Create platform devices for OF framebuffers")
> > > > Reported-by: Erhard F. 
> > > > Suggested-by: Thomas Zimmermann 
> > > > Signed-off-by: Michal Suchanek 
> > > > ---
> > > >   drivers/of/platform.c | 8 +++-
> > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > > > index 81c8c227ab6b..f2a5d679a324 100644
> > > > --- a/drivers/of/platform.c
> > > > +++ b/drivers/of/platform.c
> > > > @@ -525,6 +525,7 @@ static int __init 
> > > > of_platform_default_populate_init(void)
> > > > if (IS_ENABLED(CONFIG_PPC)) {
> > > > struct device_node *boot_display = NULL;
> > > > struct platform_device *dev;
> > > > +   int display_number = 1;
> > > > int ret;
> > > > /* Check if we have a MacOS display without a node spec 
> > > > */
> > > > @@ -561,10 +562,15 @@ static int __init 
> > > > of_platform_default_populate_init(void)
> > > > boot_display = node;
> > > > break;
> > > > }
> > > > +
> > > > for_each_node_by_type(node, "display") {
> > > > +   char *buf[14];
> > > > if (!of_get_property(node, "linux,opened", 
> > > > NULL) || node == boot_display)
> > > > continue;
> > > > -   of_platform_device_create(node, "of-display", 
> > > > NULL);
> > > > +   ret = snprintf(buf, "of-display-%d", 
> > > > display_number++);
> 
> Platform devices use a single dot (.) as separator before the index.
> Counting starts at zero. See /sys/bus/platform/ for examples. Can we please
> stick with that scheme? Generated names would then be of-display.0,
> of-display.1, etc.

Yes, there was surprisingly no bikeshedding.

Do we also want to change the name of the device that did manage to
instantiate before?

This scheme changes the name only for those that did not in the past,
hence "of-display" and "of-display-%d", starting from 1.

Sure, replacing '-' with '.' is easy enough, and using the same format
for both as well.

Thanks

Michal

> 
> Best regards
> Thomas
> 
> 
> 
> > > > +   if (ret >= sizeof(buf))
> > > > +   continue;
> > > > +   of_platform_device_create(node, buf, NULL);
> > > > }
> > > > } else {
> > > > -- 
> > > > 2.35.3
> > > > 
> > > 
> > > Thank you for the patch Michal!
> > > 
> > > It applies on 6.2-rc4 but I get this build error with my config:
> > 
> > Indeed, it's doubly bad.
> > 
> > Where is the kernel test robot when you need it?
> > 
> > It should not be that easy to miss this file but clearly it can happen.
> > 
> > I will send a fixup.
> > 
> > Sorry about the mess.
> > 
> > Thanks
> > 
> > Michal
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev





Re: [PATCH] of: Make of framebuffer devices unique

2023-01-18 Thread Michal Suchánek
On Wed, Jan 18, 2023 at 09:13:05PM +0100, Erhard F. wrote:
> On Tue, 17 Jan 2023 17:58:04 +0100
> Michal Suchanek  wrote:
> 
> > Since Linux 5.19 this error is observed:
> > 
> > sysfs: cannot create duplicate filename '/devices/platform/of-display'
> > 
> > This is because multiple devices with the same name 'of-display' are
> > created on the same bus.
> > 
> > Update the code to create numbered device names for the non-boot
> > disaplay.
> > 
> > cc: linuxppc-dev@lists.ozlabs.org
> > References: https://bugzilla.kernel.org/show_bug.cgi?id=216095
> > Fixes: 52b1b46c39ae ("of: Create platform devices for OF framebuffers")
> > Reported-by: Erhard F. 
> > Suggested-by: Thomas Zimmermann 
> > Signed-off-by: Michal Suchanek 
> > ---
> >  drivers/of/platform.c | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index 81c8c227ab6b..f2a5d679a324 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -525,6 +525,7 @@ static int __init 
> > of_platform_default_populate_init(void)
> > if (IS_ENABLED(CONFIG_PPC)) {
> > struct device_node *boot_display = NULL;
> > struct platform_device *dev;
> > +   int display_number = 1;
> > int ret;
> >  
> > /* Check if we have a MacOS display without a node spec */
> > @@ -561,10 +562,15 @@ static int __init 
> > of_platform_default_populate_init(void)
> > boot_display = node;
> > break;
> > }
> > +
> > for_each_node_by_type(node, "display") {
> > +   char *buf[14];
> > if (!of_get_property(node, "linux,opened", NULL) || 
> > node == boot_display)
> > continue;
> > -   of_platform_device_create(node, "of-display", NULL);
> > +   ret = snprintf(buf, "of-display-%d", display_number++);
> > +   if (ret >= sizeof(buf))
> > +   continue;
> > +   of_platform_device_create(node, buf, NULL);
> > }
> >  
> > } else {
> > -- 
> > 2.35.3
> > 
> 
> Thank you for the patch Michal!
> 
> It applies on 6.2-rc4 but I get this build error with my config:

Indeed, it's doubly bad.

Where is the kernel test robot when you need it?

It should not be that easy to miss this file but clearly it can happen.

I will send a fixup.

Sorry about the mess.

Thanks

Michal


Re: [PATCH] modpost: support arbitrary symbol length in modversion

2023-01-17 Thread Michal Suchánek
Hello,

On Fri, Jan 13, 2023 at 06:18:41PM +, Gary Guo wrote:
> On Thu, 12 Jan 2023 14:40:59 -0700
> Lucas De Marchi  wrote:
> 
> > On Wed, Jan 11, 2023 at 04:11:51PM +, Gary Guo wrote:
> > >
> > > struct modversion_info {
> > >-  unsigned long crc;
> > >-  char name[MODULE_NAME_LEN];
> > >+  /* Offset of the next modversion entry in relation to this one. */
> > >+  u32 next;
> > >+  u32 crc;
> > >+  char name[0];  
> > 
> > although not really exported as uapi, this will break userspace as this is
> > used in the  elf file generated for the modules. I think
> > this change must be made in a backward compatible way and kmod updated
> > to deal with the variable name length:
> > 
> > kmod $ git grep "\[64"
> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint32_t)];
> > libkmod/libkmod-elf.c:  char name[64 - sizeof(uint64_t)];
> > 
> > in kmod we have both 32 and 64 because a 64-bit kmod can read both 32
> > and 64 bit module, and vice versa.
> > 
> 
> Hi Lucas,
> 
> Thanks for the information.
> 
> The change can't be "truly" backward compatible, in a sense that
> regardless of the new format we choose, kmod would not be able to decode
> symbols longer than "64 - sizeof(long)" bytes. So the list it retrieves
> is going to be incomplete, isn't it?
> 
> What kind of backward compatibility should be expected? It could be:
> * short symbols can still be found by old versions of kmod, but not
>   long symbols;

That sounds good. Not everyone is using rust, and with this option
people who do will need to upgrade tooling, and people who don't care
don't need to do anything.

Thanks

Michal


Re: [PATCH] powerpc/ftrace: fix syscall tracing on PPC64_ELF_ABI_V1

2022-12-07 Thread Michal Suchánek
Hello,

On Wed, Dec 07, 2022 at 10:18:13AM -0500, Mathieu Desnoyers wrote:
> On 2022-12-06 21:09, Michael Ellerman wrote:
> > Mathieu Desnoyers  writes:
> > > On 2022-12-05 17:50, Michael Ellerman wrote:

> > 
> > Relatedly we have a patch in next to optionally use ABIv2 for 64-bit big
> > endian builds.
> 
> Interesting. Does it require a matching user-space ? (built with PPC64 ABIv2

No, the kernel and userspace ABI is separate.

> ?) Does it handle legacy PPC32 executables ?

Theoretically it should. No idea if anybody has tested it.

Thanks

Michal


Re: [PATCH -next] fbdev: offb: allow build when DRM_OFDRM=m

2022-11-23 Thread Michal Suchánek
On Wed, Nov 23, 2022 at 09:02:54AM +0100, Thomas Zimmermann wrote:
> 
> Am 23.11.22 um 04:16 schrieb Randy Dunlap:
> > Fix build when CONFIG_FB_OF=y and CONFIG_DRM_OFDRM=m.
> > When the latter symbol is =m, kconfig downgrades (limits) the 'select's
> > under FB_OF to modular (=m). This causes undefined symbol references:
> > 
> > powerpc64-linux-ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x58): 
> > undefined reference to `cfb_fillrect'
> > powerpc64-linux-ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x60): 
> > undefined reference to `cfb_copyarea'
> > powerpc64-linux-ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x68): 
> > undefined reference to `cfb_imageblit'
> > 
> > Fix this by allowing FB_OF any time that DRM_OFDRM != y so that the
> > selected FB_CFB_* symbols will become =y instead of =m.
> > 
> > In tristate logic (for DRM_OFDRM), this changes the dependency from
> >  !DRM_OFDRM == 2 - 1 == 1 => modular only (or disabled)
> > to (boolean)
> >  DRM_OFDRM != y == y, allowing the 'select's to cause the
> > FB_CFB_* symbols to =y instead of =m.
> > 
> > Fixes: c8a17756c425 ("drm/ofdrm: Add ofdrm for Open Firmware framebuffers")
> > Signed-off-by: Randy Dunlap 
> > Suggested-by: Masahiro Yamada 
> > Cc: Thomas Zimmermann 
> > Cc: Michal Suchánek 
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: Daniel Vetter 
> > Cc: Helge Deller 
> > Cc: linux-fb...@vger.kernel.org
> > Cc: dri-de...@lists.freedesktop.org
> 
> Acked-by: Thomas Zimmermann 

Tested-by: Michal Suchánek 

> 
> > ---
> >   drivers/video/fbdev/Kconfig |2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff -- a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
> > --- a/drivers/video/fbdev/Kconfig
> > +++ b/drivers/video/fbdev/Kconfig
> > @@ -455,7 +455,7 @@ config FB_ATARI
> >   config FB_OF
> > bool "Open Firmware frame buffer device support"
> > depends on (FB = y) && PPC && (!PPC_PSERIES || PCI)
> > -   depends on !DRM_OFDRM
> > +   depends on DRM_OFDRM != y
> > select APERTURE_HELPERS
> > select FB_CFB_FILLRECT
> > select FB_CFB_COPYAREA
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev





Re: build failure in linux-next: offb missing fb helpers

2022-11-19 Thread Michal Suchánek
Hello,

On Sat, Nov 19, 2022 at 10:27:04PM +0900, Masahiro Yamada wrote:
> On Sat, Nov 19, 2022 at 3:20 PM Randy Dunlap  wrote:
> >
> > Hi--
> >
> > [adding Masahiro-san]
> >
> >
> > On 11/18/22 07:03, Michal Suchánek wrote:
> > > Hello,
> > >
> > > I am seeing these errors:
> > >
> > > [ 3825s]   AR  built-in.a
> > > [ 3827s]   AR  vmlinux.a
> > > [ 3835s]   LD  vmlinux.o
> > > [ 3835s]   OBJCOPY modules.builtin.modinfo
> > > [ 3835s]   GEN modules.builtin
> > > [ 3835s]   GEN .vmlinux.objs
> > > [ 3848s]   MODPOST Module.symvers
> > > [ 3848s]   CC  .vmlinux.export.o
> > > [ 3849s]   UPD include/generated/utsversion.h
> > > [ 3849s]   CC  init/version-timestamp.o
> > > [ 3849s]   LD  .tmp_vmlinux.btf
> > > [ 3864s] ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x58): undefined
> > > reference to `cfb_fillrect'
> > > [ 3864s] ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x60): undefined
> > > reference to `cfb_copyarea'
> > > [ 3864s] ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x68): undefined
> > > reference to `cfb_imageblit'
> > >
> > > cfb_fillrect is provided by drivers/video/fbdev/core/cfbfillrect.c
> > >
> > > It is compiled when CONFIG_FB_CFB_FILLRECT
> > > drivers/video/fbdev/core/Makefile:obj-$(CONFIG_FB_CFB_FILLRECT)  += 
> > > cfbfillrect.o
> > >
> > > drivers/video/fbdev/Makefile:obj-$(CONFIG_FB_OF)   += offb.o
> > > is compiled when CONFIG_FB_OF
> > >
> > > It selects CONFIG_FB_CFB_FILLRECT
> > > config FB_OF
> > > bool "Open Firmware frame buffer device support"
> > > depends on (FB = y) && PPC && (!PPC_PSERIES || PCI)
> > > select APERTURE_HELPERS
> > > select FB_CFB_FILLRECT
> > > select FB_CFB_COPYAREA
> > > select FB_CFB_IMAGEBLIT
> > > select FB_MACMODES
> > >
> > > The config has FB_OF built-in and FB_CFB_FILLRECT modular
> > > config/ppc64le/vanilla:CONFIG_FB_CFB_FILLRECT=m
> > > config/ppc64le/vanilla:CONFIG_FB_CFB_COPYAREA=m
> > > config/ppc64le/vanilla:CONFIG_FB_CFB_IMAGEBLIT=m
> > > config/ppc64le/vanilla:CONFIG_FB_OF=y
> > >
> > > It only depends on FB which mut be built-in for FB_OF
> > > config FB_CFB_FILLRECT
> > > tristate
> > > depends on FB
> > >
> > > Is select in kconfig broken?
> > >
> > > Attachnig the config in question.
> >
> > The symbol info from xconfig says:
> >
> > Symbol: FB_CFB_FILLRECT [=m]
> > Type : tristate
> > Defined at drivers/video/fbdev/Kconfig:69
> > Depends on: HAS_IOMEM [=y] && FB [=y]
> > Selected by [m]:
> > [deleted]
> > - FB_OF [=y] && HAS_IOMEM [=y] && FB [=y]=y && PPC [=y] && (!PPC_PSERIES 
> > [=y] || PCI [=y]) && !DRM_OFDRM [=m]
> >
> > I don't see why the 'select' from (bool) FB_OF would leave FB_CFB_FILLRECT 
> > (and the others)
> > as =m instead of =y.
> >
> > Hopefully Masahiro can shed some light on this.
> >
> > --
> > ~Randy
> 
> 
> The reason is shown in your paste of help message:
> 
> "&& !DRM_OFDRM [=m]" downgrades it to "selected by m"
> 
> To aid this particular case, the following will select
> FB_CFB_FILLRECT=y.
> 
> 
> 
> 
> diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
> index 66f36b69e8f3..2bcf8627819f 100644
> --- a/drivers/video/fbdev/Kconfig
> +++ b/drivers/video/fbdev/Kconfig
> @@ -458,7 +458,7 @@ config FB_ATARI
>  config FB_OF
> bool "Open Firmware frame buffer device support"
> depends on (FB = y) && PPC && (!PPC_PSERIES || PCI)
> -   depends on !DRM_OFDRM
> +   depends on DRM_OFDRM != y
> select APERTURE_HELPERS
> select FB_CFB_FILLRECT
> select FB_CFB_COPYAREA

Thanks for clarification.

This change fixess the config for me.

Michal


> Or, perhaps "depends on DRM_OFDRM = n"
> I do not know the intention of this dependency.
> 
> Recommendation is to use "depends on" instead of "select" though.
> 
> 
> 
> BTW, this is similar to what you asked before.
> 
> https://lore.kernel.org/linux-kbuild/e1a6228d-1341-6264-d97a-e2bd52a65...@infradead.org/
> 
> 
> I tried to fix it in the past, but the issue was not as shallow as I
> had expected.
> I did not get around to revisiting this topic.
> 
> https://patchwork.kernel.org/project/linux-kbuild/patch/1543216969-2227-1-git-send-email-yamada.masah...@socionext.com/
> 


Re: [PATCH v5 1/5] drm/ofdrm: Add ofdrm for Open Firmware framebuffers

2022-11-19 Thread Michal Suchánek
Hello,

On Tue, Oct 11, 2022 at 05:07:08PM +0200, Thomas Zimmermann wrote:
> Open Firmware provides basic display output via the 'display' node.
> DT platform code already provides a device that represents the node's
> framebuffer. Add a DRM driver for the device. The display mode and
> color format is pre-initialized by the system's firmware. Runtime
> modesetting via DRM is not possible. The display is useful during
> early boot stages or as error fallback.
> 
> Similar functionality is already provided by fbdev's offb driver,
> which is insufficient for modern userspace. The old driver includes
> support for BootX device tree, which can be found on old 32-bit
> PowerPC Macintosh systems. If these are still in use, the
> functionality can be added to ofdrm or implemented in a new
> driver. As with simpledrm, the fbdev driver cannot be selected if
> ofdrm is already enabled.
> 
> Two notable points about the driver:
> 
>  * Reading the framebuffer aperture from the device tree is not
> reliable on all systems. Ofdrm takes the heuristics and a comment
> from offb to pick the correct range.
> 
>  * No resource management may be tied to the underlying PCI device.
> Otherwise the handover to the native driver will fail with a resource
> conflict. PCI management is therefore done as part of the platform
> device's cleanup.
> 
> The driver has been tested on qemu's ppc64le emulation. The device
> hand-over has been tested with bochs.
> 
> v5:
>   * use drm_atomic_helper_check_crtc_primary_plane()
> v4:
>   * set preferred depth to the correct value
>   * set bpp value for console emulation
>   * output scanout-buffer parameters with drm_dbg()
> v3:
>   * reintegrate FWFB helpers into ofdrm
>   * use damage iterator
>   * sync GEM BOs with drm_gem_fb_{begin,end}_cpu_access()
>   * fix various atomic_check helpers
>   * remove CRTC atomic_{enable,disable} (Javier)
>   * compute stride with drm_format_info_min_pitch() (Daniel)
> v2:
>   * removed simple-pipe helpers
>   * built driver on top of FWFB helpers
>   * merged all init code into single function
>   * make PCI support optional (Michal)
>   * support COMPILE_TEST (Javier)
> 
> Signed-off-by: Thomas Zimmermann 
> Reviewed-by: Javier Martinez Canillas 
> 
> convert
> ---
>  MAINTAINERS   |   1 +
>  drivers/gpu/drm/tiny/Kconfig  |  13 +
>  drivers/gpu/drm/tiny/Makefile |   1 +
>  drivers/gpu/drm/tiny/ofdrm.c  | 763 ++
>  drivers/video/fbdev/Kconfig   |   1 +
>  5 files changed, 779 insertions(+)
>  create mode 100644 drivers/gpu/drm/tiny/ofdrm.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f07a8bf8744f..7c3bb04bd08e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6656,6 +6656,7 @@ L:  dri-de...@lists.freedesktop.org
>  S:   Maintained
>  T:   git git://anongit.freedesktop.org/drm/drm-misc
>  F:   drivers/gpu/drm/drm_aperture.c
> +F:   drivers/gpu/drm/tiny/ofdrm.c
>  F:   drivers/gpu/drm/tiny/simpledrm.c
>  F:   drivers/video/aperture.c
>  F:   include/drm/drm_aperture.h
> diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
> index 565957264875..a300b03a3c7a 100644
> --- a/drivers/gpu/drm/tiny/Kconfig
> +++ b/drivers/gpu/drm/tiny/Kconfig
> @@ -51,6 +51,19 @@ config DRM_GM12U320
>This is a KMS driver for projectors which use the GM12U320 chipset
>for video transfer over USB2/3, such as the Acer C120 mini projector.
>  
> +config DRM_OFDRM
> + tristate "Open Firmware display driver"
> + depends on DRM && OF && (PPC || COMPILE_TEST)
> + select APERTURE_HELPERS
> + select DRM_GEM_SHMEM_HELPER
> + select DRM_KMS_HELPER
> + help
> +   DRM driver for Open Firmware framebuffers.
> +
> +   This driver assumes that the display hardware has been initialized
> +   by the Open Firmware before the kernel boots. Scanout buffer, size,
> +   and display format must be provided via device tree.
> +
>  config DRM_PANEL_MIPI_DBI
>   tristate "DRM support for MIPI DBI compatible panels"
>   depends on DRM && SPI
> diff --git a/drivers/gpu/drm/tiny/Makefile b/drivers/gpu/drm/tiny/Makefile
> index 1d9d6227e7ab..76dde89a044b 100644
> --- a/drivers/gpu/drm/tiny/Makefile
> +++ b/drivers/gpu/drm/tiny/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_DRM_ARCPGU)  += arcpgu.o
>  obj-$(CONFIG_DRM_BOCHS)  += bochs.o
>  obj-$(CONFIG_DRM_CIRRUS_QEMU)+= cirrus.o
>  obj-$(CONFIG_DRM_GM12U320)   += gm12u320.o
> +obj-$(CONFIG_DRM_OFDRM)  += ofdrm.o
>  obj-$(CONFIG_DRM_PANEL_MIPI_DBI) += panel-mipi-dbi.o
>  obj-$(CONFIG_DRM_SIMPLEDRM)  += simpledrm.o
>  obj-$(CONFIG_TINYDRM_HX8357D)+= hx8357d.o
> diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
> new file mode 100644
> index ..96a46078ade8
> --- /dev/null
> +++ b/drivers/gpu/drm/tiny/ofdrm.c
> @@ -0,0 +1,763 @@
> 

Re: [PATCH v4 5/5] drm/ofdrm: Support big-endian scanout buffers

2022-10-12 Thread Michal Suchánek
On Wed, Oct 12, 2022 at 05:59:45PM +0300, Ville Syrjälä wrote:
> On Wed, Oct 12, 2022 at 04:31:14PM +0200, Thomas Zimmermann wrote:
> > Hi
> > 
> > Am 12.10.22 um 15:12 schrieb Arnd Bergmann:
> > > On Wed, Oct 12, 2022, at 2:00 PM, Thomas Zimmermann wrote:
> > >>
> > >> Could well be. But ofdrm intents to replace offb and this test has
> > >> worked well in offb for almost 15 yrs. If there are bug reports, I'm
> > >> happy to take patches, but until then I see no reason to change it.
> > > 
> > > I wouldn't change the code in offb unless a user reports a bug,
> > > but I don't see a point in adding the same mistake to ofdrm if we
> > > know it can't work on real hardware.
> > 
> > As I said, this has worked with offb and apparently on real hardware. 
> > For all I know, ATI hardware (before it became AMD) was used in PPC 
> > Macintoshs and assumed big-endian access on those machines.
> 
> At least mach64 class hardware has two frame buffer apertures, and
> byte swapping can be configured separately for each. But that means
> you only get correct byte swapping for at most two bpps at the same
> time (and that only if you know which aperture to access each time).
> IIRC Rage 128 already has the surface register stuff where you
> could byte swap a limited set of ranges independently. And old
> mga hardware has just one byte swap setting for the whole frame
> buffer aperture, so only one bpp at a time.
> 
> That kind of horrible limitations of the byte swappers is the
> main reason why I wanted to make drm fourcc endianness explicit.
> Simply assuming host endianness would end in tears on big endian
> as soon as you need to access stuff with two bpps at the same time.
> Much better to just switch off those useless byte swappers and
> swap by hand when necessary.

If you have hardware-specific driver, sure.

This is firmware-provided framebuffer, though. You get one framebuffer
address, and one endian - whatever the firmware set up and described in
the DT.

Thanks

Michal


Re: [PATCH v4 5/5] drm/ofdrm: Support big-endian scanout buffers

2022-10-12 Thread Michal Suchánek
Hello,

On Wed, Oct 12, 2022 at 03:12:35PM +0200, Arnd Bergmann wrote:
> On Wed, Oct 12, 2022, at 2:00 PM, Thomas Zimmermann wrote:
> >
> > Could well be. But ofdrm intents to replace offb and this test has 
> > worked well in offb for almost 15 yrs. If there are bug reports, I'm 
> > happy to take patches, but until then I see no reason to change it.
> 
> I wouldn't change the code in offb unless a user reports a bug,
> but I don't see a point in adding the same mistake to ofdrm if we
> know it can't work on real hardware.
> 
> I tried to find out where this is configured in qemu, but it seems
> to depend on the framebuffer backend there: most are always little-endian,
> ati/bochs/vga-pci/virtio-vga are configurable from the guest through
> some register setting, but vga.c picks a default from the
> 'TARGET_WORDS_BIGENDIAN' macro, which I think is set differently
> between qemu-system-ppc64le and qemu-system-ppc64.
> 
> If you are using the framebuffer code from vga.c, I would guess that
> that you can run a big-endian kernel with qemu-system-ppc64,
> or a little-endian kernel with qemu-system-ppc64le and get the
> correct colors, while running a little-endian kernel with
> qemu-system-ppc64 and vga.c, or using a different framebuffer
> emulation on a big-endian kernel would give you the wrong colors.

Thanks for digging this up.

That makes one thing clear: qemu does not emulate this framebuffer
property correctly, and cannot be relied on for verification.

If you can provide test results from real hardware that show the current
logic as flawed it should be changed.

In absence of such test results I think the most reasonable thing is to
keep the logic that nobody complained about for 10+ years.

Thanks

Michal


Re: [PATCH v4 5/5] drm/ofdrm: Support big-endian scanout buffers

2022-10-12 Thread Michal Suchánek
On Wed, Oct 12, 2022 at 10:38:29AM +0200, Arnd Bergmann wrote:
> On Wed, Oct 12, 2022, at 10:27 AM, Thomas Zimmermann wrote:
> > Am 12.10.22 um 09:44 schrieb Arnd Bergmann:
> >> On Wed, Oct 12, 2022, at 9:40 AM, Thomas Zimmermann wrote:
> >>> Am 12.10.22 um 09:17 schrieb Arnd Bergmann:
>  On Wed, Oct 12, 2022, at 8:46 AM, Thomas Zimmermann wrote:
> >>>
>  Does qemu mark the device has having a particular endianess then, or
>  does it switch the layout of the framebuffer to match what the CPU
>  does?
> >>>
> >>> The latter. On neither architecture does qemu expose this flag. The
> >>> default endianess corresponds to the host.
> >> 
> >> "host" as in the machine that qemu runs on, or the machine that is
> >> being emulated? I suppose it would be broken either way, but in the
> >> latter case, we could get away with detecting that the machine is
> >> running under qemu.
> >
> > Sorry, my mistake. I meant "guest": the endianess of the framebuffer 
> > corresponds to the endianess of the emulated machine.  Given that many 
> > graphics cards support LE and BE modes, I assume that this behavior 
> > mimics real-hardware systems.
> 
> Not really: While the hardware may be able to switch between
> the modes, something has to actively set some hardware registers up
> that way, but the offb/ofdrm driver has no interface for interacting
> with that register, and the bootloader or firmware code that knows
> about the register has no information about what kernel it will
> eventually run. This is a bit architecture dependent, as e.g. on
> MIPS, a bi-endian hardware platform has to run a bootloader with the
> same endianness as the kernel, but on arm and powerpc the bootloader
> is usually fixed and the kernel switches to its configured endianness
> in the first few instructions after it gets entered.

But then the firmware knows that the kernel can switch endian and the
endian information should be provided. And maybe that should be emulated
better by qemu. Unfortunately, modern Power servers rarely come with a
graphics card so it's hard to test on real hardware.

Thanks

Michal


Re: [PATCH v4 5/5] drm/ofdrm: Support big-endian scanout buffers

2022-10-12 Thread Michal Suchánek
On Wed, Oct 12, 2022 at 08:29:39AM +0200, Arnd Bergmann wrote:
> On Tue, Oct 11, 2022, at 11:38 PM, Michal Suchánek wrote:
> > On Tue, Oct 11, 2022 at 10:06:59PM +0200, Arnd Bergmann wrote:
> >> On Tue, Oct 11, 2022, at 1:30 PM, Thomas Zimmermann wrote:
> >> > Am 11.10.22 um 09:46 schrieb Javier Martinez Canillas:
> >> >>> +static bool display_get_big_endian_of(struct drm_device *dev, struct 
> >> >>> device_node *of_node)
> >> >>> +{
> >> >>> +  bool big_endian;
> >> >>> +
> >> >>> +#ifdef __BIG_ENDIAN
> >> >>> +  big_endian = true;
> >> >>> +  if (of_get_property(of_node, "little-endian", NULL))
> >> >>> +  big_endian = false;
> >> >>> +#else
> >> >>> +  big_endian = false;
> >> >>> +  if (of_get_property(of_node, "big-endian", NULL))
> >> >>> +  big_endian = true;
> >> >>> +#endif
> >> >>> +
> >> >>> +  return big_endian;
> >> >>> +}
> >> >>> +
> >> >> 
> >> >> Ah, I see. The heuristic then is whether the build is BE or LE or if 
> >> >> the Device
> >> >> Tree has an explicit node defining the endianess. The patch looks good 
> >> >> to me:
> >> >
> >> > Yes. I took this test from offb.
> >> 
> >> Has the driver been tested with little-endian kernels though? While
> >> ppc32 kernels are always BE, you can build kernels as either big-endian
> >> or little-endian for most (modern) powerpc64 and arm/arm64 hardware,
> >> and I don't see why that should change the defaults of the driver
> >> when describing the same framebuffer hardware.
> >
> > The original code was added with
> > commit 7f29b87a7779 ("powerpc: offb: add support for foreign endianness")
> >
> > The hardware is either big-endian or runtime-switchable-endian.
> 
> Are you referring to CPU hardware or framebuffer hardware here?
CPU hardware
> 
> > It makes
> > sense to assume big-endian when runnig big-endian and the DT does not
> > specify endian which is likely on a historical system.
> 
> Agreed, assuming big-endian here clearly makes sense.
> 
> > It also makes sense to assume that on system with
> > runtime-switchable-endian the DT specifies the framebuffer endian.
> >
> > If systems that only do little-endian exist or emerge later then it also
> > makes sense to assume that the framebuffer matches the host if not
> > specified.
> >
> > I don't really see a problem here.
> >
> > BTW is this used on arm and on what platform?
> 
> I'm not aware of any users on Arm, most likely they all use
> simplefb/simpledrm or a gpu specific binding. There might be
> users on sparc, but they would obviously be big-endian
> as well.
> 
> > I do not see any bindings in dts.
> 
> Right, that is the real problem I see as well. I found the original
> CHRP binding document at
> https://www.devicetree.org/open-firmware/bindings/devices/html/lfb-1_0d.html
> 
> Unfortunately, this only specifies an 8-bit-per-pixel mode, and the
> multi-byte pixel support that was added in linux-2.1.125 was
> probably powermac specific without a public specification.
> 
> I think ideally we should add a binding document that describes what
> the driver actually expects, but in this case I would just drop the
> #ifdef check and always assume the framebuffer is big-endian unless
> the "little-endian" property is set, in order to have a sensible
> definition that does not depend on what OS (i.e. Linux
> CONFIG_CPU_BIG_ENDIAN) you are running.
> 
>Arnd


Re: [PATCH v4 5/5] drm/ofdrm: Support big-endian scanout buffers

2022-10-11 Thread Michal Suchánek
On Tue, Oct 11, 2022 at 10:06:59PM +0200, Arnd Bergmann wrote:
> On Tue, Oct 11, 2022, at 1:30 PM, Thomas Zimmermann wrote:
> > Am 11.10.22 um 09:46 schrieb Javier Martinez Canillas:
> >>> +static bool display_get_big_endian_of(struct drm_device *dev, struct 
> >>> device_node *of_node)
> >>> +{
> >>> + bool big_endian;
> >>> +
> >>> +#ifdef __BIG_ENDIAN
> >>> + big_endian = true;
> >>> + if (of_get_property(of_node, "little-endian", NULL))
> >>> + big_endian = false;
> >>> +#else
> >>> + big_endian = false;
> >>> + if (of_get_property(of_node, "big-endian", NULL))
> >>> + big_endian = true;
> >>> +#endif
> >>> +
> >>> + return big_endian;
> >>> +}
> >>> +
> >> 
> >> Ah, I see. The heuristic then is whether the build is BE or LE or if the 
> >> Device
> >> Tree has an explicit node defining the endianess. The patch looks good to 
> >> me:
> >
> > Yes. I took this test from offb.
> 
> Has the driver been tested with little-endian kernels though? While
> ppc32 kernels are always BE, you can build kernels as either big-endian
> or little-endian for most (modern) powerpc64 and arm/arm64 hardware,
> and I don't see why that should change the defaults of the driver
> when describing the same framebuffer hardware.

The original code was added with
commit 7f29b87a7779 ("powerpc: offb: add support for foreign endianness")

The hardware is either big-endian or runtime-switchable-endian. It makes
sense to assume big-endian when runnig big-endian and the DT does not
specify endian which is likely on a historical system.

It also makes sense to assume that on system with
runtime-switchable-endian the DT specifies the framebuffer endian.

If systems that only do little-endian exist or emerge later then it also
makes sense to assume that the framebuffer matches the host if not
specified.

I don't really see a problem here.

BTW is this used on arm and on what platform?

I do not see any bindings in dts.

Thanks

Michal


Re: [PATCH] powerpc/pseries/vas: Pass hw_cpu_id to node associativity HCALL

2022-09-30 Thread Michal Suchánek
Hello,

On Thu, Sep 29, 2022 at 05:16:40PM -0500, Nathan Lynch wrote:
> Haren Myneni  writes:
> > Generally the hypervisor decides to allocate a window on different
> > VAS instances. But if the user space wishes to allocate on the
> > current VAS instance where the process is executing, the kernel has
> > to pass associativity domain IDs to allocate VAS window HCALL. To
> > determine the associativity domain IDs for the current CPU, passing
> > smp_processor_id() to node associativity HCALL which may return
> > H_P2 (-55) error during DLPAR CPU event.
> >
> > This patch fixes this issue by passing hard_smp_processor_id() with
> > VPHN_FLAG_VCPU flag (PAPR 14.11.6.1 H_HOME_NODE_ASSOCIATIVITY).
> >
> > Signed-off-by: Haren Myneni 
> > ---
> >  arch/powerpc/platforms/pseries/vas.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/vas.c 
> > b/arch/powerpc/platforms/pseries/vas.c
> > index fe33bdb620d5..533026fd1f40 100644
> > --- a/arch/powerpc/platforms/pseries/vas.c
> > +++ b/arch/powerpc/platforms/pseries/vas.c
> > @@ -348,7 +348,7 @@ static struct vas_window *vas_allocate_window(int 
> > vas_id, u64 flags,
> >  * So no unpacking needs to be done.
> >  */
> > rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, domain,
> > - VPHN_FLAG_VCPU, smp_processor_id());
> > + VPHN_FLAG_VCPU, hard_smp_processor_id());
> > if (rc != H_SUCCESS) {
> > pr_err("H_HOME_NODE_ASSOCIATIVITY error: %d\n", rc);
> > goto out;
> 
> Yes, it is always wrong to pass Linux CPU numbers to the hypervisor,
> which has its own numbering for hardware threads. It usually coincides
> with Linux's numbering in practice, which tends to hide bugs like this.
> 
> Reviewed-by: Nathan Lynch 

This is the code that introduces the problem, right?

Fixes: b22f2d88e435 ("powerpc/pseries/vas: Integrate API with open/close 
windows")

Thanks

Michal


Re: [PATCH v4 5/5] drm/ofdrm: Support big-endian scanout buffers

2022-09-28 Thread Michal Suchánek
Hello,

On Wed, Sep 28, 2022 at 12:50:10PM +0200, Thomas Zimmermann wrote:
> All DRM formats assume little-endian byte order. On big-endian systems,
> it is likely that the scanout buffer is in big endian as well. Update
> the format accordingly and add endianess conversion to the format-helper
> library. Also opt-in to allocated buffers in host format by default.

This sounds backwards to me.

Skimming through the code it sounds like the buffer is in fact in the
same format all the time but when the CPU is switched to BE it sees the
data loaded from it differently.

Or am I missing something?

Thanks

Michal

> 
> Suggested-by: Geert Uytterhoeven 
> Signed-off-by: Thomas Zimmermann 
> ---
>  drivers/gpu/drm/drm_format_helper.c | 10 ++
>  drivers/gpu/drm/tiny/ofdrm.c| 55 +++--
>  2 files changed, 63 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_format_helper.c 
> b/drivers/gpu/drm/drm_format_helper.c
> index 4afc4ac27342..fca7936db083 100644
> --- a/drivers/gpu/drm/drm_format_helper.c
> +++ b/drivers/gpu/drm/drm_format_helper.c
> @@ -659,6 +659,11 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned 
> int *dst_pitch, uint32_t d
>   drm_fb_xrgb_to_rgb565(dst, dst_pitch, src, fb, 
> clip, false);
>   return 0;
>   }
> + } else if (dst_format == (DRM_FORMAT_RGB565 | DRM_FORMAT_BIG_ENDIAN)) {
> + if (fb_format == DRM_FORMAT_RGB565) {
> + drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
> + return 0;
> + }
>   } else if (dst_format == DRM_FORMAT_RGB888) {
>   if (fb_format == DRM_FORMAT_XRGB) {
>   drm_fb_xrgb_to_rgb888(dst, dst_pitch, src, fb, 
> clip);
> @@ -677,6 +682,11 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned 
> int *dst_pitch, uint32_t d
>   drm_fb_xrgb_to_xrgb2101010(dst, dst_pitch, src, fb, 
> clip);
>   return 0;
>   }
> + } else if (dst_format == DRM_FORMAT_BGRX) {
> + if (fb_format == DRM_FORMAT_XRGB) {
> + drm_fb_swab(dst, dst_pitch, src, fb, clip, false);
> + return 0;
> + }
>   }
>  
>   drm_warn_once(fb->dev, "No conversion helper from %p4cc to %p4cc 
> found.\n",
> diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
> index 0bf5eebf6678..6e100a7f5db7 100644
> --- a/drivers/gpu/drm/tiny/ofdrm.c
> +++ b/drivers/gpu/drm/tiny/ofdrm.c
> @@ -94,7 +94,7 @@ static int display_get_validated_int0(struct drm_device 
> *dev, const char *name,
>  }
>  
>  static const struct drm_format_info *display_get_validated_format(struct 
> drm_device *dev,
> -   u32 depth)
> +   u32 depth, 
> bool big_endian)
>  {
>   const struct drm_format_info *info;
>   u32 format;
> @@ -115,6 +115,29 @@ static const struct drm_format_info 
> *display_get_validated_format(struct drm_dev
>   return ERR_PTR(-EINVAL);
>   }
>  
> + /*
> +  * DRM formats assume little-endian byte order. Update the format
> +  * if the scanout buffer uses big-endian ordering.
> +  */
> + if (big_endian) {
> + switch (format) {
> + case DRM_FORMAT_XRGB:
> + format = DRM_FORMAT_BGRX;
> + break;
> + case DRM_FORMAT_ARGB:
> + format = DRM_FORMAT_BGRA;
> + break;
> + case DRM_FORMAT_RGB565:
> + format = DRM_FORMAT_RGB565 | DRM_FORMAT_BIG_ENDIAN;
> + break;
> + case DRM_FORMAT_XRGB1555:
> + format = DRM_FORMAT_XRGB1555 | DRM_FORMAT_BIG_ENDIAN;
> + break;
> + default:
> + break;
> + }
> + }
> +
>   info = drm_format_info(format);
>   if (!info) {
>   drm_err(dev, "cannot find framebuffer format for depth %u\n", 
> depth);
> @@ -134,6 +157,23 @@ static int display_read_u32_of(struct drm_device *dev, 
> struct device_node *of_no
>   return ret;
>  }
>  
> +static bool display_get_big_endian_of(struct drm_device *dev, struct 
> device_node *of_node)
> +{
> + bool big_endian;
> +
> +#ifdef __BIG_ENDIAN
> + big_endian = true;
> + if (of_get_property(of_node, "little-endian", NULL))
> + big_endian = false;
> +#else
> + big_endian = false;
> + if (of_get_property(of_node, "big-endian", NULL))
> + big_endian = true;
> +#endif
> +
> + return big_endian;
> +}
> +
>  static int display_get_width_of(struct drm_device *dev, struct device_node 
> *of_node)
>  {
>   u32 width;
> @@ -613,6 +653,7 @@ static void ofdrm_device_set_gamma_linear(struct 
> 

Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies

2022-09-27 Thread Michal Suchánek
On Tue, Sep 27, 2022 at 11:39:52AM +0900, AKASHI Takahiro wrote:
> On Mon, Sep 26, 2022 at 09:40:25AM +0200, Michal Such??nek wrote:
> > On Mon, Sep 26, 2022 at 08:47:32AM +0200, Greg Kroah-Hartman wrote:
> > > On Sat, Sep 24, 2022 at 01:55:23PM +0200, Michal Suchánek wrote:
> > > > On Sat, Sep 24, 2022 at 12:13:34PM +0200, Greg Kroah-Hartman wrote:
> > > > > On Sat, Sep 24, 2022 at 11:45:21AM +0200, Michal Suchánek wrote:
> > > > > > On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote:
> > > > > > > On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote:
> > > > > > > > Hello,
> > > > > > > > 
> > > > > > > > this is backport of commit 0d519cadf751
> > > > > > > > ("arm64: kexec_file: use more system keyrings to verify kernel 
> > > > > > > > image signature")
> > > > > > > > to table 5.15 tree including the preparatory patches.
> > > > > > > 
> > > > > > > This feels to me like a new feature for arm64, one that has never 
> > > > > > > worked
> > > > > > > before and you are just making it feature-parity with x86, right?
> > > > > > > 
> > > > > > > Or is this a regression fix somewhere?  Why is this needed in 
> > > > > > > 5.15.y and
> > > > > > > why can't people who need this new feature just use a newer kernel
> > > > > > > version (5.19?)
> > > > > > 
> > > > > > It's half-broken implementation of the kexec kernel verification. 
> > > > > > At the time
> > > > > > it was implemented for arm64 we had the platform and secondary 
> > > > > > keyrings
> > > > > > and x86 was using them but on arm64 the initial implementation 
> > > > > > ignores
> > > > > > them.
> > > > > 
> > > > > Ok, so it's something that never worked.  Adding support to get it to
> > > > > work doesn't really fall into the stable kernel rules, right?
> > > > 
> > > > Not sure. It was defective, not using the facilities available at the
> > > > time correctly. Which translates to kernels that can be kexec'd on x86
> > > > failing to kexec on arm64 without any explanation (signed with same key,
> > > > built for the appropriate arch).
> > > 
> > > Feature parity across architectures is not a "regression", but rather a
> > > "this feature is not implemented for this architecture yet" type of
> > > thing.
> > 
> > That depends on the view - before kexec verification you could boot any
> > kernel, now you can boot some kernels signed with a valid key, but not
> > others - the initial implementation is buggy, probably because it
> > is based on an old version of the x86 code.
> 
> Buggy?
> The feature of supporting platform ring had been slipped in just before
> I submitted the latest patch series which was eventually merged.
> (I should have noticed it though.)

It's difficult to notice another in-flight patch that does not conflict
with yours, and is for a different architecture. That's why we have
followup patches and Fixes tags.

However, the support for secondary keyring was added in 4.19 by commit
ea93102f3224 ("Fix kexec forbidding kernels signed with keys in the
secondary keyring to boot") which was not supported by the arm64 code
either.

> Looking at changes in the commit 278311e417be ("kexec, KEYS: Make use of 
> platform
> keyring for signature verify"), it seems to be obvious that it is a new 
> feature
> because it introduced a new Kconfig option, CONFIG_INTEGRITY_PLATFORM_KEYRING,
> which allows for enabling/disabling platform ring support.

Yes, and that feature exists since 5.1, and we are talking about 5.15
here. Not making use of the keyring that is supported by the kernel
results in inability to kexec kernels that are signed by a valid key,
arguably a bug.

Thanks

Michal


Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies

2022-09-26 Thread Michal Suchánek
On Mon, Sep 26, 2022 at 08:47:32AM +0200, Greg Kroah-Hartman wrote:
> On Sat, Sep 24, 2022 at 01:55:23PM +0200, Michal Suchánek wrote:
> > On Sat, Sep 24, 2022 at 12:13:34PM +0200, Greg Kroah-Hartman wrote:
> > > On Sat, Sep 24, 2022 at 11:45:21AM +0200, Michal Suchánek wrote:
> > > > On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > this is backport of commit 0d519cadf751
> > > > > > ("arm64: kexec_file: use more system keyrings to verify kernel 
> > > > > > image signature")
> > > > > > to table 5.15 tree including the preparatory patches.
> > > > > 
> > > > > This feels to me like a new feature for arm64, one that has never 
> > > > > worked
> > > > > before and you are just making it feature-parity with x86, right?
> > > > > 
> > > > > Or is this a regression fix somewhere?  Why is this needed in 5.15.y 
> > > > > and
> > > > > why can't people who need this new feature just use a newer kernel
> > > > > version (5.19?)
> > > > 
> > > > It's half-broken implementation of the kexec kernel verification. At 
> > > > the time
> > > > it was implemented for arm64 we had the platform and secondary keyrings
> > > > and x86 was using them but on arm64 the initial implementation ignores
> > > > them.
> > > 
> > > Ok, so it's something that never worked.  Adding support to get it to
> > > work doesn't really fall into the stable kernel rules, right?
> > 
> > Not sure. It was defective, not using the facilities available at the
> > time correctly. Which translates to kernels that can be kexec'd on x86
> > failing to kexec on arm64 without any explanation (signed with same key,
> > built for the appropriate arch).
> 
> Feature parity across architectures is not a "regression", but rather a
> "this feature is not implemented for this architecture yet" type of
> thing.

That depends on the view - before kexec verification you could boot any
kernel, now you can boot some kernels signed with a valid key, but not
others - the initial implementation is buggy, probably because it
is based on an old version of the x86 code.

> 
> > > Again, what's wrong with 5.19 for anyone who wants this?  Who does want
> > > this?
> > 
> > Not sure, really.
> > 
> > The final patch was repeatedly backported to stable and failed to build
> > because the prerequisites were missing.
> 
> That's because it was tagged, but now that you show the full set of
> requirements, it's pretty obvious to me that this is not relevant for
> going this far back.

That also works.

Thanks

Michal


Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies

2022-09-24 Thread Michal Suchánek
On Fri, Sep 23, 2022 at 09:16:50PM +0200, Michal Suchánek wrote:
> Hello,
> 
> On Fri, Sep 23, 2022 at 03:03:36PM -0400, Mimi Zohar wrote:
> > On Fri, 2022-09-23 at 19:10 +0200, Michal Suchanek wrote:
> > > Hello,
> > > 
> > > this is backport of commit 0d519cadf751
> > > ("arm64: kexec_file: use more system keyrings to verify kernel image 
> > > signature")
> > > to table 5.15 tree including the preparatory patches.
> > > 
> > > Some patches needed minor adjustment for context.
> > 
> > In general when backporting this patch set, there should be a
> > dependency on backporting these commits as well.  In this instance for
> > linux-5.15.y, they've already been backported.
> > 
> > 543ce63b664e ("lockdown: Fix kexec lockdown bypass with ima policy")

AFAICT this is everywhere relevant, likely because it's considered a CVE
fix.

> > af16df54b89d ("ima: force signature verification when CONFIG_KEXEC_SIG is 
> > configured")

This is missing in 5.4, and 5.4 is missing this prerequisite:
fd7af71be542 ("kexec: do not verify the signature without the lockdown or 
mandatory signature")

> 
> Thanks for bringing these up. It might be in general useful to backport
> these fixes as well.
> 
> However, this patchset does one very specific thing: it lifts the x86
> kexec_file signature verification to arch-independent and uses it on
> arm64 to unify all features (and any existing warts) between EFI
> architectures.
> 
> So unless I am missing something the fixes you pointed out are
> completely independent of this.
> 
> Thanks
> 
> Michal


Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies

2022-09-24 Thread Michal Suchánek
On Sat, Sep 24, 2022 at 12:13:34PM +0200, Greg Kroah-Hartman wrote:
> On Sat, Sep 24, 2022 at 11:45:21AM +0200, Michal Suchánek wrote:
> > On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote:
> > > On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote:
> > > > Hello,
> > > > 
> > > > this is backport of commit 0d519cadf751
> > > > ("arm64: kexec_file: use more system keyrings to verify kernel image 
> > > > signature")
> > > > to table 5.15 tree including the preparatory patches.
> > > 
> > > This feels to me like a new feature for arm64, one that has never worked
> > > before and you are just making it feature-parity with x86, right?
> > > 
> > > Or is this a regression fix somewhere?  Why is this needed in 5.15.y and
> > > why can't people who need this new feature just use a newer kernel
> > > version (5.19?)
> > 
> > It's half-broken implementation of the kexec kernel verification. At the 
> > time
> > it was implemented for arm64 we had the platform and secondary keyrings
> > and x86 was using them but on arm64 the initial implementation ignores
> > them.
> 
> Ok, so it's something that never worked.  Adding support to get it to
> work doesn't really fall into the stable kernel rules, right?

Not sure. It was defective, not using the facilities available at the
time correctly. Which translates to kernels that can be kexec'd on x86
failing to kexec on arm64 without any explanation (signed with same key,
built for the appropriate arch).

> Again, what's wrong with 5.19 for anyone who wants this?  Who does want
> this?

Not sure, really.

The final patch was repeatedly backported to stable and failed to build
because the prerequisites were missing.

So this is a backport that includes the prerequisites for it to build.

If nobody wanted this why is it repeatedly backported generating the
failure messages?

Thanks

Michal


Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies

2022-09-24 Thread Michal Suchánek
On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote:
> > Hello,
> > 
> > this is backport of commit 0d519cadf751
> > ("arm64: kexec_file: use more system keyrings to verify kernel image 
> > signature")
> > to table 5.15 tree including the preparatory patches.
> 
> This feels to me like a new feature for arm64, one that has never worked
> before and you are just making it feature-parity with x86, right?
> 
> Or is this a regression fix somewhere?  Why is this needed in 5.15.y and
> why can't people who need this new feature just use a newer kernel
> version (5.19?)

It's half-broken implementation of the kexec kernel verification. At the time
it was implemented for arm64 we had the platform and secondary keyrings
and x86 was using them but on arm64 the initial implementation ignores
them.

Thanks

Michal


Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies

2022-09-23 Thread Michal Suchánek
Hello,

On Fri, Sep 23, 2022 at 03:03:36PM -0400, Mimi Zohar wrote:
> On Fri, 2022-09-23 at 19:10 +0200, Michal Suchanek wrote:
> > Hello,
> > 
> > this is backport of commit 0d519cadf751
> > ("arm64: kexec_file: use more system keyrings to verify kernel image 
> > signature")
> > to table 5.15 tree including the preparatory patches.
> > 
> > Some patches needed minor adjustment for context.
> 
> In general when backporting this patch set, there should be a
> dependency on backporting these commits as well.  In this instance for
> linux-5.15.y, they've already been backported.
> 
> 543ce63b664e ("lockdown: Fix kexec lockdown bypass with ima policy")
> af16df54b89d ("ima: force signature verification when CONFIG_KEXEC_SIG is 
> configured")

Thanks for bringing these up. It might be in general useful to backport
these fixes as well.

However, this patchset does one very specific thing: it lifts the x86
kexec_file signature verification to arch-independent and uses it on
arm64 to unify all features (and any existing warts) between EFI
architectures.

So unless I am missing something the fixes you pointed out are
completely independent of this.

Thanks

Michal


Re: [PATCH] powerpc/pseries: add lparctl driver for platform-specific functions

2022-09-14 Thread Michal Suchánek
On Tue, Sep 13, 2022 at 12:02:42PM -0500, Nathan Lynch wrote:
> Michal Suchánek  writes:
> > On Tue, Sep 13, 2022 at 10:59:56AM -0500, Nathan Lynch wrote:
> >> Michal Suchánek  writes:
> >> 
> >> > On Fri, Aug 12, 2022 at 02:14:21PM -0500, Nathan Lynch wrote:
> >> >> Laurent Dufour  writes:
> >> >> > Le 30/07/2022 à 02:04, Nathan Lynch a écrit :
> >> >> >> +static long lparctl_get_sysparm(struct lparctl_get_system_parameter 
> >> >> >> __user *argp)
> >> >> >> +{
> >> >> >> +struct lparctl_get_system_parameter *gsp;
> >> >> >> +long ret;
> >> >> >> +int fwrc;
> >> >> >> +
> >> >> >> +/*
> >> >> >> + * Special case to allow user space to probe the command.
> >> >> >> + */
> >> >> >> +if (argp == NULL)
> >> >> >> +return 0;
> >> >> >> +
> >> >> >> +gsp = memdup_user(argp, sizeof(*gsp));
> >> >> >> +if (IS_ERR(gsp)) {
> >> >> >> +ret = PTR_ERR(gsp);
> >> >> >> +goto err_return;
> >> >> >> +}
> >> >> >> +
> >> >> >> +ret = -EINVAL;
> >> >> >> +if (gsp->rtas_status != 0)
> >> >> >> +goto err_free;
> >> >> >> +
> >> >> >> +do {
> >> >> >> +static_assert(sizeof(gsp->data) <= 
> >> >> >> sizeof(rtas_data_buf));
> >> >> >> +
> >> >> >> +spin_lock(_data_buf_lock);
> >> >> >> +memset(rtas_data_buf, 0, sizeof(rtas_data_buf));
> >> >> >> +memcpy(rtas_data_buf, gsp->data, sizeof(gsp->data));
> >> >> >> +fwrc = 
> >> >> >> rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
> >> >> >> + NULL, gsp->token, __pa(rtas_data_buf),
> >> >> >> + sizeof(gsp->data));
> >> >> >> +if (fwrc == 0)
> >> >> >> +memcpy(gsp->data, rtas_data_buf, 
> >> >> >> sizeof(gsp->data));
> >> >> >
> >> >> > May be the amount of data copied out to the user space could be
> >> >> > gsp->length. This would prevent copying 4K bytes all the time.
> >> >> >
> >> >> > In a more general way, the size of the RTAS buffer is quite big, and 
> >> >> > I'm
> >> >> > wondering if all the data need to be copied back and forth to the 
> >> >> > kernel.
> >> >> >
> >> >> > Unless there are a high frequency of calls this doesn't make sense, 
> >> >> > and
> >> >> > keeping the code simple might be the best way. Otherwise limiting the 
> >> >> > bytes
> >> >> > copied could help a bit.
> >> >> 
> >> >> This is not intended to be a high-bandwidth interface and I don't think
> >> >> there's much of a performance concern here, so I'd rather just keep the
> >> >> copy sizes involved constant.
> >> >
> >> > But that's absolutely horrible!
> >> 
> >> ?
> >> 
> >> > The user wants the VPD data, all of it. And you only give one page with
> >> > this interface.
> >> 
> >> The code here is for system parameters, which have a known maximum size,
> >> unlike VPD. There's no code for VPD retrieval in this patch.
> >
> > But we do need to support the calls that return multiple pages of data.
> >
> > If the new driver supports only the simple calls it's a failure.
> 
> Michal, will you please moderate your tone? I think you can communicate
> your concerns without calling my work "absolutely horrible" or a
> "failure". Thanks.

Sorry, it's not a good wording.

> Anyway, of course I intend to support the more complex calls, but
> supporting the simple calls actually unbreaks a lot of stuff.

The thing is that supporting calls that return more than one page of
data is absolutely required, and this interface built around fixed size
data transfer can't do it.

So it sounds like a ticked for redoin

Re: [PATCH] powerpc/pseries: add lparctl driver for platform-specific functions

2022-09-13 Thread Michal Suchánek
On Tue, Sep 13, 2022 at 10:59:56AM -0500, Nathan Lynch wrote:
> Michal Suchánek  writes:
> 
> > On Fri, Aug 12, 2022 at 02:14:21PM -0500, Nathan Lynch wrote:
> >> Laurent Dufour  writes:
> >> > Le 30/07/2022 à 02:04, Nathan Lynch a écrit :
> >> >> +static long lparctl_get_sysparm(struct lparctl_get_system_parameter 
> >> >> __user *argp)
> >> >> +{
> >> >> +   struct lparctl_get_system_parameter *gsp;
> >> >> +   long ret;
> >> >> +   int fwrc;
> >> >> +
> >> >> +   /*
> >> >> +* Special case to allow user space to probe the command.
> >> >> +*/
> >> >> +   if (argp == NULL)
> >> >> +   return 0;
> >> >> +
> >> >> +   gsp = memdup_user(argp, sizeof(*gsp));
> >> >> +   if (IS_ERR(gsp)) {
> >> >> +   ret = PTR_ERR(gsp);
> >> >> +   goto err_return;
> >> >> +   }
> >> >> +
> >> >> +   ret = -EINVAL;
> >> >> +   if (gsp->rtas_status != 0)
> >> >> +   goto err_free;
> >> >> +
> >> >> +   do {
> >> >> +   static_assert(sizeof(gsp->data) <= 
> >> >> sizeof(rtas_data_buf));
> >> >> +
> >> >> +   spin_lock(_data_buf_lock);
> >> >> +   memset(rtas_data_buf, 0, sizeof(rtas_data_buf));
> >> >> +   memcpy(rtas_data_buf, gsp->data, sizeof(gsp->data));
> >> >> +   fwrc = 
> >> >> rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
> >> >> +NULL, gsp->token, __pa(rtas_data_buf),
> >> >> +sizeof(gsp->data));
> >> >> +   if (fwrc == 0)
> >> >> +   memcpy(gsp->data, rtas_data_buf, 
> >> >> sizeof(gsp->data));
> >> >
> >> > May be the amount of data copied out to the user space could be
> >> > gsp->length. This would prevent copying 4K bytes all the time.
> >> >
> >> > In a more general way, the size of the RTAS buffer is quite big, and I'm
> >> > wondering if all the data need to be copied back and forth to the kernel.
> >> >
> >> > Unless there are a high frequency of calls this doesn't make sense, and
> >> > keeping the code simple might be the best way. Otherwise limiting the 
> >> > bytes
> >> > copied could help a bit.
> >> 
> >> This is not intended to be a high-bandwidth interface and I don't think
> >> there's much of a performance concern here, so I'd rather just keep the
> >> copy sizes involved constant.
> >
> > But that's absolutely horrible!
> 
> ?
> 
> > The user wants the VPD data, all of it. And you only give one page with
> > this interface.
> 
> The code here is for system parameters, which have a known maximum size,
> unlike VPD. There's no code for VPD retrieval in this patch.

But we do need to support the calls that return multiple pages of data.

If the new driver supports only the simple calls it's a failure.

> 
> But I'm happy to constructively discuss how a VPD ioctl interface should
> work.
> 
> > Worse, the call is not reentrant so you need to lock against other users
> > calling the call while the current caller is retrieving the inidividual
> > pagaes.
> >
> > You could do that per process, but then processes with userspace
> > threading would want the data as well so you would have to save the
> > arguments of the last call, and compare to arguments of any subsequent
> > call to determine if you can let it pass or block.
> >
> > And when you do all that there will be a process that retrieves a couple
> > of pages and goes out for lunch or loses interest completely, blocking
> > out everyone from accessing the interface at all.
> 
> Right, the ibm,get-vpd RTAS function is tricky to expose to user space.
> 
> It needs to be called repeatedly until all data has been returned, 4KB
> at a time.
> 
> Only one ibm,get-vpd sequence can be in progress at any time. If an
> ibm,get-vpd sequence is begun while another sequence is already
> outstanding, the first one is invalidated -- I would guess -1 or some
> other error is returned on its next call.
> 
> So a new system-call level interface for VPD retrieval probably should
> not expose the repeating sequence-based nature of the RTAS function to
> user space, to prevent concurrent clients from interfering with each
> other. That implies that the kernel should buffer the VPD results
> internally; at least that's the only idea I've had so far. Open to
> other suggestions.

It can save the data to an user-supplied buffer until all data is
transferred or the buffer space runs out.

Thanks

Michal


Re: [PATCH] powerpc/pseries: add lparctl driver for platform-specific functions

2022-09-13 Thread Michal Suchánek
On Fri, Aug 12, 2022 at 02:14:21PM -0500, Nathan Lynch wrote:
> Laurent Dufour  writes:
> > Le 30/07/2022 à 02:04, Nathan Lynch a écrit :
> >> +static long lparctl_get_sysparm(struct lparctl_get_system_parameter 
> >> __user *argp)
> >> +{
> >> +  struct lparctl_get_system_parameter *gsp;
> >> +  long ret;
> >> +  int fwrc;
> >> +
> >> +  /*
> >> +   * Special case to allow user space to probe the command.
> >> +   */
> >> +  if (argp == NULL)
> >> +  return 0;
> >> +
> >> +  gsp = memdup_user(argp, sizeof(*gsp));
> >> +  if (IS_ERR(gsp)) {
> >> +  ret = PTR_ERR(gsp);
> >> +  goto err_return;
> >> +  }
> >> +
> >> +  ret = -EINVAL;
> >> +  if (gsp->rtas_status != 0)
> >> +  goto err_free;
> >> +
> >> +  do {
> >> +  static_assert(sizeof(gsp->data) <= sizeof(rtas_data_buf));
> >> +
> >> +  spin_lock(_data_buf_lock);
> >> +  memset(rtas_data_buf, 0, sizeof(rtas_data_buf));
> >> +  memcpy(rtas_data_buf, gsp->data, sizeof(gsp->data));
> >> +  fwrc = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
> >> +   NULL, gsp->token, __pa(rtas_data_buf),
> >> +   sizeof(gsp->data));
> >> +  if (fwrc == 0)
> >> +  memcpy(gsp->data, rtas_data_buf, sizeof(gsp->data));
> >
> > May be the amount of data copied out to the user space could be
> > gsp->length. This would prevent copying 4K bytes all the time.
> >
> > In a more general way, the size of the RTAS buffer is quite big, and I'm
> > wondering if all the data need to be copied back and forth to the kernel.
> >
> > Unless there are a high frequency of calls this doesn't make sense, and
> > keeping the code simple might be the best way. Otherwise limiting the bytes
> > copied could help a bit.
> 
> This is not intended to be a high-bandwidth interface and I don't think
> there's much of a performance concern here, so I'd rather just keep the
> copy sizes involved constant.

But that's absolutely horrible!

The user wants the VPD data, all of it. And you only give one page with
this interface.

Worse, the call is not reentrant so you need to lock against other users
calling the call while the current caller is retrieving the inidividual
pagaes.

You could do that per process, but then processes with userspace
threading would want the data as well so you would have to save the
arguments of the last call, and compare to arguments of any subsequent
call to determine if you can let it pass or block.

And when you do all that there will be a process that retrieves a couple
of pages and goes out for lunch or loses interest completely, blocking
out everyone from accessing the interface at all.

Thanks

Michal


Re: [PATCH v3a 1/2] lib: generic accessor functions for arch keystore

2022-08-08 Thread Michal Suchánek
On Mon, Aug 08, 2022 at 04:31:06PM +, Christophe Leroy wrote:
> 
> 
> Le 08/08/2022 à 17:43, gjo...@linux.vnet.ibm.com a écrit :
> > From: Greg Joyce 
> > 
> > Generic kernel subsystems may rely on platform specific persistent
> > KeyStore to store objects containing sensitive key material. In such case,
> > they need to access architecture specific functions to perform read/write
> > operations on these variables.
> > 
> > Define the generic variable read/write prototypes to be implemented by
> > architecture specific versions. The default(weak) implementations of
> > these prototypes return -EOPNOTSUPP unless overridden by architecture
> > versions.
> > 
> > Signed-off-by: Greg Joyce 
> > ---
> >   include/linux/arch_vars.h | 23 +++
> >   lib/Makefile  |  2 +-
> >   lib/arch_vars.c   | 25 +
> >   3 files changed, 49 insertions(+), 1 deletion(-)
> >   create mode 100644 include/linux/arch_vars.h
> >   create mode 100644 lib/arch_vars.c
> > 
> > diff --git a/include/linux/arch_vars.h b/include/linux/arch_vars.h
> > new file mode 100644
> > index ..9c280ff9432e
> > --- /dev/null
> > +++ b/include/linux/arch_vars.h
> > @@ -0,0 +1,23 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Platform variable opearations.
> 
> Is it platform specific or architecture specific ?
> 
> > + *
> > + * Copyright (C) 2022 IBM Corporation
> > + *
> > + * These are the accessor functions (read/write) for architecture specific
> > + * variables. Specific architectures can provide overrides.
> 
> "variables" is a very generic word which I think doesn't match what you 
> want to do.
> 
> For me "variables" are local variables and global variables in a C file. 
> Here it seems to be something completely different hence the name is 
> really meaningfull and misleading.
> 
> > + *
> > + */
> > +
> > +#include 
> > +
> > +enum arch_variable_type {
> 
> arch_variable_type ? What's that ? variable types are char, short, long, 
> long long, etc ...
> 
> > +   ARCH_VAR_OPAL_KEY  = 0, /* SED Opal Authentication Key */
> > +   ARCH_VAR_OTHER = 1, /* Other type of variable */
> > +   ARCH_VAR_MAX   = 1, /* Maximum type value */
> > +};
> 
> Why the hell do you need an enum for two values only ?
> 
> > +
> > +int arch_read_variable(enum arch_variable_type type, char *varname,
> > +  void *varbuf, u_int *varlen);
> > +int arch_write_variable(enum arch_variable_type type, char *varname,
> > +   void *varbuf, u_int varlen);
> > diff --git a/lib/Makefile b/lib/Makefile
> > index f99bf61f8bbc..b90c4cb0dbbb 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -48,7 +48,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
> >  bsearch.o find_bit.o llist.o memweight.o kfifo.o \
> >  percpu-refcount.o rhashtable.o \
> >  once.o refcount.o usercopy.o errseq.o bucket_locks.o \
> > -generic-radix-tree.o
> > +generic-radix-tree.o arch_vars.o
> >   obj-$(CONFIG_STRING_SELFTEST) += test_string.o
> >   obj-y += string_helpers.o
> >   obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
> > diff --git a/lib/arch_vars.c b/lib/arch_vars.c
> > new file mode 100644
> > index ..e6f16d7d09c1
> > --- /dev/null
> > +++ b/lib/arch_vars.c
> 
> The name is meaningless, too generic.
> 
> 
> > @@ -0,0 +1,25 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Platform variable operations.
> 
> platform versus architecture ?
> 
> > + *
> > + * Copyright (C) 2022 IBM Corporation
> > + *
> > + * These are the accessor functions (read/write) for architecture specific
> > + * variables. Specific architectures can provide overrides.
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +int __weak arch_read_variable(enum arch_variable_type type, char *varname,
> > + void *varbuf, u_int *varlen)
> 
> Sorry, to read a variable, I use READ_ONCE or I read it directly.

This is supposed to be used for things like the EFI variables and the
already existing powernv secure variables.

Nonetheless, without adding the plumbing for the existing
implementations it is not clear what it's doing, and the interface is
agruably meaningless.

Hence I would either suggest to provide the plumbing necessary for
existing (secure) variable implementations to make use of the interface,
or use private implementations like all the existing platforms do
without exposing the values in any generic way, and leave that to
somebody who is comfortable with designing a working general inteface
for this.

Thanks

Michal


Re: [PATCH v3 1/2] lib: generic accessor functions for arch keystore

2022-08-01 Thread Michal Suchánek
On Mon, Aug 01, 2022 at 03:45:45PM -0400, Nayna wrote:
> 
> On 8/1/22 09:40, Michal Suchánek wrote:
> > Hello,
> > 
> > On Mon, Aug 01, 2022 at 07:34:25AM -0500, gjo...@linux.vnet.ibm.com wrote:
> > > From: Greg Joyce 
> > > 
> > > Generic kernel subsystems may rely on platform specific persistent
> > > KeyStore to store objects containing sensitive key material. In such case,
> > > they need to access architecture specific functions to perform read/write
> > > operations on these variables.
> > > 
> > > Define the generic variable read/write prototypes to be implemented by
> > > architecture specific versions. The default(weak) implementations of
> > > these prototypes return -EOPNOTSUPP unless overridden by architecture
> > > versions.
> > > 
> > > Signed-off-by: Greg Joyce 
> > > ---
> > >   include/linux/arch_vars.h | 23 +++
> > >   lib/Makefile  |  2 +-
> > >   lib/arch_vars.c   | 25 +
> > >   3 files changed, 49 insertions(+), 1 deletion(-)
> > >   create mode 100644 include/linux/arch_vars.h
> > >   create mode 100644 lib/arch_vars.c
> > > 
> > > diff --git a/include/linux/arch_vars.h b/include/linux/arch_vars.h
> > > new file mode 100644
> > > index ..9c280ff9432e
> > > --- /dev/null
> > > +++ b/include/linux/arch_vars.h
> > > @@ -0,0 +1,23 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + * Platform variable opearations.
> > > + *
> > > + * Copyright (C) 2022 IBM Corporation
> > > + *
> > > + * These are the accessor functions (read/write) for architecture 
> > > specific
> > > + * variables. Specific architectures can provide overrides.
> > > + *
> > > + */
> > > +
> > > +#include 
> > > +
> > > +enum arch_variable_type {
> > > + ARCH_VAR_OPAL_KEY  = 0, /* SED Opal Authentication Key */
> > > + ARCH_VAR_OTHER = 1, /* Other type of variable */
> > > + ARCH_VAR_MAX   = 1, /* Maximum type value */
> > > +};
> > > +
> > > +int arch_read_variable(enum arch_variable_type type, char *varname,
> > > +void *varbuf, u_int *varlen);
> > > +int arch_write_variable(enum arch_variable_type type, char *varname,
> > > + void *varbuf, u_int varlen);
> > > diff --git a/lib/Makefile b/lib/Makefile
> > > index f99bf61f8bbc..b90c4cb0dbbb 100644
> > > --- a/lib/Makefile
> > > +++ b/lib/Makefile
> > > @@ -48,7 +48,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o 
> > > \
> > >bsearch.o find_bit.o llist.o memweight.o kfifo.o \
> > >percpu-refcount.o rhashtable.o \
> > >once.o refcount.o usercopy.o errseq.o bucket_locks.o \
> > > -  generic-radix-tree.o
> > > +  generic-radix-tree.o arch_vars.o
> > >   obj-$(CONFIG_STRING_SELFTEST) += test_string.o
> > >   obj-y += string_helpers.o
> > >   obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
> > > diff --git a/lib/arch_vars.c b/lib/arch_vars.c
> > > new file mode 100644
> > > index ..e6f16d7d09c1
> > > --- /dev/null
> > > +++ b/lib/arch_vars.c
> > > @@ -0,0 +1,25 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * Platform variable operations.
> > > + *
> > > + * Copyright (C) 2022 IBM Corporation
> > > + *
> > > + * These are the accessor functions (read/write) for architecture 
> > > specific
> > > + * variables. Specific architectures can provide overrides.
> > > + *
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +
> > > +int __weak arch_read_variable(enum arch_variable_type type, char 
> > > *varname,
> > > +   void *varbuf, u_int *varlen)
> > > +{
> > > + return -EOPNOTSUPP;
> > > +}
> > > +
> > > +int __weak arch_write_variable(enum arch_variable_type type, char 
> > > *varname,
> > > +void *varbuf, u_int varlen)
> > > +{
> > > + return -EOPNOTSUPP;
> > > +}
> > > -- 
> > Doesn't EFI already have some variables?
> > 
> > And even powernv?
> > 
> > Shouldn't this generalize the already existing variables?
> > 
> > Or mo

Re: [PATCH v3 1/2] lib: generic accessor functions for arch keystore

2022-08-01 Thread Michal Suchánek
Hello,

On Mon, Aug 01, 2022 at 07:34:25AM -0500, gjo...@linux.vnet.ibm.com wrote:
> From: Greg Joyce 
> 
> Generic kernel subsystems may rely on platform specific persistent
> KeyStore to store objects containing sensitive key material. In such case,
> they need to access architecture specific functions to perform read/write
> operations on these variables.
> 
> Define the generic variable read/write prototypes to be implemented by
> architecture specific versions. The default(weak) implementations of
> these prototypes return -EOPNOTSUPP unless overridden by architecture
> versions.
> 
> Signed-off-by: Greg Joyce 
> ---
>  include/linux/arch_vars.h | 23 +++
>  lib/Makefile  |  2 +-
>  lib/arch_vars.c   | 25 +
>  3 files changed, 49 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/arch_vars.h
>  create mode 100644 lib/arch_vars.c
> 
> diff --git a/include/linux/arch_vars.h b/include/linux/arch_vars.h
> new file mode 100644
> index ..9c280ff9432e
> --- /dev/null
> +++ b/include/linux/arch_vars.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Platform variable opearations.
> + *
> + * Copyright (C) 2022 IBM Corporation
> + *
> + * These are the accessor functions (read/write) for architecture specific
> + * variables. Specific architectures can provide overrides.
> + *
> + */
> +
> +#include 
> +
> +enum arch_variable_type {
> + ARCH_VAR_OPAL_KEY  = 0, /* SED Opal Authentication Key */
> + ARCH_VAR_OTHER = 1, /* Other type of variable */
> + ARCH_VAR_MAX   = 1, /* Maximum type value */
> +};
> +
> +int arch_read_variable(enum arch_variable_type type, char *varname,
> +void *varbuf, u_int *varlen);
> +int arch_write_variable(enum arch_variable_type type, char *varname,
> + void *varbuf, u_int varlen);
> diff --git a/lib/Makefile b/lib/Makefile
> index f99bf61f8bbc..b90c4cb0dbbb 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -48,7 +48,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
>bsearch.o find_bit.o llist.o memweight.o kfifo.o \
>percpu-refcount.o rhashtable.o \
>once.o refcount.o usercopy.o errseq.o bucket_locks.o \
> -  generic-radix-tree.o
> +  generic-radix-tree.o arch_vars.o
>  obj-$(CONFIG_STRING_SELFTEST) += test_string.o
>  obj-y += string_helpers.o
>  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
> diff --git a/lib/arch_vars.c b/lib/arch_vars.c
> new file mode 100644
> index ..e6f16d7d09c1
> --- /dev/null
> +++ b/lib/arch_vars.c
> @@ -0,0 +1,25 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Platform variable operations.
> + *
> + * Copyright (C) 2022 IBM Corporation
> + *
> + * These are the accessor functions (read/write) for architecture specific
> + * variables. Specific architectures can provide overrides.
> + *
> + */
> +
> +#include 
> +#include 
> +
> +int __weak arch_read_variable(enum arch_variable_type type, char *varname,
> +   void *varbuf, u_int *varlen)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +int __weak arch_write_variable(enum arch_variable_type type, char *varname,
> +void *varbuf, u_int varlen)
> +{
> + return -EOPNOTSUPP;
> +}
> -- 

Doesn't EFI already have some variables?

And even powernv?

Shouldn't this generalize the already existing variables?

Or move to powerpc and at least generalize the powerpc ones?

Thanks

Michal


Re: [PATCH v2 00/10] drm: Add driver for PowerPC OF displays

2022-07-28 Thread Michal Suchánek
Hello,

On Thu, Jul 28, 2022 at 09:13:59PM +1000, Michael Ellerman wrote:
> Thomas Zimmermann  writes:
> > (was: drm: Add driverof PowerPC OF displays)
> >
> > PowerPC's Open Firmware offers a simple display buffer for graphics
> > output. Add ofdrm, a DRM driver for the device. As with the existing
> > simpledrm driver, the graphics hardware is pre-initialized by the
> > firmware. The driver only provides blitting, no actual DRM modesetting
> > is possible.
> 
> Hi Thomas,
> 
> I tried to test this on a 32-bit ppc Mac Mini but didn't have much luck.
> But I'm probably doing something wrong because I'm a graphics noob.
> 
> The machine normally uses CONFIG_DRM_RADEON, so I turned that off, and
> turned DRM_OFDRM on.
> 
> When I boot I get boot messages but only one screen worth, the messages
> don't scroll at all, which is unusual. But I'm not sure if that's
> related to ofdrm or something else.

A somewhat interesting datapoint might be how this works with offb.

> The machine does come up, I can login via SSH. Is there some way to
> start X to exercise the driver from an SSH login?

The startx script provided by distribution usually works.

It's basically a very convoluted way to do something like

X :0&
DISPLAY=:0 xterm&

Thanks

Michal


Re: [PATCH v2 09/10] drm/ofdrm: Add per-model device function

2022-07-26 Thread Michal Suchánek
Hello,

On Tue, Jul 26, 2022 at 03:38:37PM +0200, Javier Martinez Canillas wrote:
> On 7/20/22 16:27, Thomas Zimmermann wrote:
> > Add a per-model device-function structure in preparation of adding
> > color-management support. Detection of the individual models has been
> > taken from fbdev's offb.
> > 
> > Signed-off-by: Thomas Zimmermann 
> > ---
> 
> Reviewed-by: Javier Martinez Canillas 
> 
> [...]
> 
> > +static bool is_avivo(__be32 vendor, __be32 device)
> > +{
> > +   /* This will match most R5xx */
> > +   return (vendor == 0x1002) &&
> > +  ((device >= 0x7100 && device < 0x7800) || (device >= 0x9400));
> > +}
> 
> Maybe add some constant macros to not have these magic numbers ?

This is based on the existing fbdev implementation's magic numbers:

drivers/video/fbdev/offb.c: ((*did >= 0x7100 && *did < 0x7800) 
||

Of course, it would be great if somebody knowledgeable could clarify
those.

Thanks

Michal


Re: [PATCH] powerpc: Remove the static variable initialisations to 0

2022-07-23 Thread Michal Suchánek
Hello,

On Sat, Jul 23, 2022 at 05:24:36PM +0800, Jason Wang wrote:
> Initialise global and static variable to 0 is always unnecessary.
> Remove the unnecessary initialisations.

Isn't this change also unnecessary?

Initializing to 0 does not affect correctness, or even any kind of
semantics in any way.

The current code is slightly easier to understand.

And changing the code introduces history noise for na gain.

Thanks

Michal

> 
> Signed-off-by: Jason Wang 
> ---
>  arch/powerpc/kexec/core_64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index c2bea9db1c1e..2407214e3f41 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -135,7 +135,7 @@ notrace void kexec_copy_flush(struct kimage *image)
>  
>  #ifdef CONFIG_SMP
>  
> -static int kexec_all_irq_disabled = 0;
> +static int kexec_all_irq_disabled;
>  
>  static void kexec_smp_down(void *arg)
>  {
> -- 
> 2.35.1
> 


Re: [PATCH 2/2] drm/tiny: Add ofdrm for Open Firmware framebuffers

2022-05-19 Thread Michal Suchánek
On Wed, May 18, 2022 at 10:11:03PM +0100, Mark Cave-Ayland wrote:
> On 18/05/2022 19:30, Thomas Zimmermann wrote:
> 
> > Open Firmware provides basic display output via the 'display' node.
> > DT platform code already provides a device that represents the node's
> > framebuffer. Add a DRM driver for the device. The display mode and
> > color format is pre-initialized by the system's firmware. Runtime
> > modesetting via DRM is not possible. The display is useful during
> > early boot stages or as error fallback.
> > 
> > Similar functionality is already provided by fbdev's offb driver,
> > which is insufficient for modern userspace. The old driver includes
> > support for BootX device tree, which can be found on old 32-bit
> > PowerPC Macintosh systems. If these are still in use, the
> > functionality can be added to ofdrm or implemented in a new
> > driver. As with simepldrm, the fbdev driver cannot be selected is
> > ofdrm is already enabled.
> > 
> > Two noteable points about the driver:
> > 
> >   * Reading the framebuffer aperture from the device tree is not
> > reliable on all systems. Ofdrm takes the heuristics and a comment
> > from offb to pick the correct range.
> > 
> >   * No resource management may be tied to the underlying PCI device.
> > Otherwise the handover to the native driver will fail with a resource
> > conflict. PCI management is therefore done as part of the platform
> > device's cleanup.
> > 
> > The driver has been tested on qemu's ppc64le emulation. The device
> > hand-over has been tested with bochs.
> 
> Thanks for working on this! Have you tried it on qemu-system-sparc and
> qemu-system-sparc64 at all? At least under QEMU I'd expect it to work for
> these platforms too, unless there is a particular dependency on PCI. A

There is an implicit dependency on PCI, and it won't work because it
depends on PPC:

depends on DRM && MMU && PPC

this is what the offb has, too.

I am wondering what is the driver for OF based framebuffer on sparc and
arm but offb clearly isn't with its dependency on PPC.

Thanks

Michal

> couple of comments inline below:
> 
> > Signed-off-by: Thomas Zimmermann 
> > ---
> >   MAINTAINERS   |   1 +
> >   drivers/gpu/drm/tiny/Kconfig  |  12 +
> >   drivers/gpu/drm/tiny/Makefile |   1 +
> >   drivers/gpu/drm/tiny/ofdrm.c  | 748 ++
> >   drivers/video/fbdev/Kconfig   |   1 +
> >   5 files changed, 763 insertions(+)
> >   create mode 100644 drivers/gpu/drm/tiny/ofdrm.c
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 43d833273ae9..090cbe1aa5e3 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -6395,6 +6395,7 @@ L:dri-de...@lists.freedesktop.org
> >   S:Maintained
> >   T:git git://anongit.freedesktop.org/drm/drm-misc
> >   F:drivers/gpu/drm/drm_aperture.c
> > +F: drivers/gpu/drm/tiny/ofdrm.c
> >   F:drivers/gpu/drm/tiny/simpledrm.c
> >   F:include/drm/drm_aperture.h
> > diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
> > index 627d637a1e7e..0bc54af42e7f 100644
> > --- a/drivers/gpu/drm/tiny/Kconfig
> > +++ b/drivers/gpu/drm/tiny/Kconfig
> > @@ -51,6 +51,18 @@ config DRM_GM12U320
> >  This is a KMS driver for projectors which use the GM12U320 chipset
> >  for video transfer over USB2/3, such as the Acer C120 mini projector.
> > +config DRM_OFDRM
> > +   tristate "Open Firmware display driver"
> > +   depends on DRM && MMU && PPC
> > +   select DRM_GEM_SHMEM_HELPER
> > +   select DRM_KMS_HELPER
> > +   help
> > + DRM driver for Open Firmware framebuffers.
> > +
> > + This driver assumes that the display hardware has been initialized
> > + by the Open Firmware before the kernel boots. Scanout buffer, size,
> > + and display format must be provided via device tree.
> > +
> >   config DRM_PANEL_MIPI_DBI
> > tristate "DRM support for MIPI DBI compatible panels"
> > depends on DRM && SPI
> > diff --git a/drivers/gpu/drm/tiny/Makefile b/drivers/gpu/drm/tiny/Makefile
> > index 1d9d6227e7ab..76dde89a044b 100644
> > --- a/drivers/gpu/drm/tiny/Makefile
> > +++ b/drivers/gpu/drm/tiny/Makefile
> > @@ -4,6 +4,7 @@ obj-$(CONFIG_DRM_ARCPGU)+= arcpgu.o
> >   obj-$(CONFIG_DRM_BOCHS)   += bochs.o
> >   obj-$(CONFIG_DRM_CIRRUS_QEMU) += cirrus.o
> >   obj-$(CONFIG_DRM_GM12U320)+= gm12u320.o
> > +obj-$(CONFIG_DRM_OFDRM)+= ofdrm.o
> >   obj-$(CONFIG_DRM_PANEL_MIPI_DBI)  += panel-mipi-dbi.o
> >   obj-$(CONFIG_DRM_SIMPLEDRM)   += simpledrm.o
> >   obj-$(CONFIG_TINYDRM_HX8357D) += hx8357d.o
> > diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
> > new file mode 100644
> > index ..aca715b36179
> > --- /dev/null
> > +++ b/drivers/gpu/drm/tiny/ofdrm.c
> > @@ -0,0 +1,748 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > 

Re: [PATCH 2/2] drm/tiny: Add ofdrm for Open Firmware framebuffers

2022-05-18 Thread Michal Suchánek
Hello,

On Wed, May 18, 2022 at 08:30:06PM +0200, Thomas Zimmermann wrote:
> Open Firmware provides basic display output via the 'display' node.
> DT platform code already provides a device that represents the node's
> framebuffer. Add a DRM driver for the device. The display mode and
> color format is pre-initialized by the system's firmware. Runtime
> modesetting via DRM is not possible. The display is useful during
> early boot stages or as error fallback.
> 
> Similar functionality is already provided by fbdev's offb driver,
> which is insufficient for modern userspace. The old driver includes
> support for BootX device tree, which can be found on old 32-bit
> PowerPC Macintosh systems. If these are still in use, the
> functionality can be added to ofdrm or implemented in a new
> driver. As with simepldrm, the fbdev driver cannot be selected is
> ofdrm is already enabled.
> 
> Two noteable points about the driver:
> 
>  * Reading the framebuffer aperture from the device tree is not
> reliable on all systems. Ofdrm takes the heuristics and a comment
> from offb to pick the correct range.
> 
>  * No resource management may be tied to the underlying PCI device.
> Otherwise the handover to the native driver will fail with a resource
> conflict. PCI management is therefore done as part of the platform
> device's cleanup.
> 
> The driver has been tested on qemu's ppc64le emulation. The device
> hand-over has been tested with bochs.
> 
> Signed-off-by: Thomas Zimmermann 
> ---
>  MAINTAINERS   |   1 +
>  drivers/gpu/drm/tiny/Kconfig  |  12 +
>  drivers/gpu/drm/tiny/Makefile |   1 +
>  drivers/gpu/drm/tiny/ofdrm.c  | 748 ++
>  drivers/video/fbdev/Kconfig   |   1 +
>  5 files changed, 763 insertions(+)
>  create mode 100644 drivers/gpu/drm/tiny/ofdrm.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 43d833273ae9..090cbe1aa5e3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6395,6 +6395,7 @@ L:  dri-de...@lists.freedesktop.org
>  S:   Maintained
>  T:   git git://anongit.freedesktop.org/drm/drm-misc
>  F:   drivers/gpu/drm/drm_aperture.c
> +F:   drivers/gpu/drm/tiny/ofdrm.c
>  F:   drivers/gpu/drm/tiny/simpledrm.c
>  F:   include/drm/drm_aperture.h
>  
> diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
> index 627d637a1e7e..0bc54af42e7f 100644
> --- a/drivers/gpu/drm/tiny/Kconfig
> +++ b/drivers/gpu/drm/tiny/Kconfig
> @@ -51,6 +51,18 @@ config DRM_GM12U320
>This is a KMS driver for projectors which use the GM12U320 chipset
>for video transfer over USB2/3, such as the Acer C120 mini projector.
>  
> +config DRM_OFDRM
> + tristate "Open Firmware display driver"
> + depends on DRM && MMU && PPC

Does this build with !PCI?

The driver uses some PCI functions, so it might possibly break with
randconfig. I don't think there are practical !PCI PPC configurations.

Thanks

Michal


Re: [PATCH] powerpc/time: Always set decrementer in timer_interrupt()

2022-04-20 Thread Michal Suchánek
Hello,

On Thu, Apr 21, 2022 at 12:16:57AM +1000, Michael Ellerman wrote:
> This is a partial revert of commit 0faf20a1ad16 ("powerpc/64s/interrupt:
> Don't enable MSR[EE] in irq handlers unless perf is in use").
> 
> Prior to that commit, we always set the decrementer in
> timer_interrupt(), to clear the timer interrupt. Otherwise we could end
> up continuously taking timer interrupts.
> 
> When high res timers are enabled there is no problem seen with leaving
> the decrementer untouched in timer_interrupt(), because it will be
> programmed via hrtimer_interrupt() -> tick_program_event() ->
> clockevents_program_event() -> decrementer_set_next_event().
> 
> However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we

How difficult is it to detect this condition?

Maybe detecting this could be just added?

Thanks

Michal

> see a stall/lockup, because tick_nohz_handler() does not cause a
> reprogram of the decrementer, leading to endless timer interrupts.
> Example trace:
> 
>   [1.898617][T7] Freeing initrd memory: 2624K^M
>   [   22.680919][C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:^M
>   [   22.682281][C1] rcu: 0-: (25 ticks this GP) idle=073/0/0x1 
> softirq=10/16 fqs=1050 ^M
>   [   22.682851][C1]  (detected by 1, t=2102 jiffies, g=-1179, q=476)^M
>   [   22.683649][C1] Sending NMI from CPU 1 to CPUs 0:^M
>   [   22.685252][C0] NMI backtrace for cpu 0^M
>   [   22.685649][C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 5.16.0-rc2-00185-g0faf20a1ad16 #145^M
>   [   22.686393][C0] NIP:  c0016d64 LR: c0f6cca4 CTR: 
> c019c6e0^M
>   [   22.686774][C0] REGS: c2833590 TRAP: 0500   Not tainted  
> (5.16.0-rc2-00185-g0faf20a1ad16)^M
>   [   22.687222][C0] MSR:  80009033   CR: 
> 24000222  XER: ^M
>   [   22.688297][C0] CFAR: c000c854 IRQMASK: 0 ^M
>   ...
>   [   22.692637][C0] NIP [c0016d64] 
> arch_local_irq_restore+0x174/0x250^M
>   [   22.694443][C0] LR [c0f6cca4] __do_softirq+0xe4/0x3dc^M
>   [   22.695762][C0] Call Trace:^M
>   [   22.696050][C0] [c2833830] [c0f6cc80] 
> __do_softirq+0xc0/0x3dc (unreliable)^M
>   [   22.697377][C0] [c2833920] [c0151508] 
> __irq_exit_rcu+0xd8/0x130^M
>   [   22.698739][C0] [c2833950] [c0151730] 
> irq_exit+0x20/0x40^M
>   [   22.699938][C0] [c2833970] [c0027f40] 
> timer_interrupt+0x270/0x460^M
>   [   22.701119][C0] [c28339d0] [c00099a8] 
> decrementer_common_virt+0x208/0x210^M
> 
> Possibly this should be fixed in the lowres timing code, but that would
> be a generic change and could take some time and may not backport
> easily, so for now make the programming of the decrementer unconditional
> again in timer_interrupt() to avoid the stall/lockup.
> 
> Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq 
> handlers unless perf is in use")
> Reported-by: Miguel Ojeda 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/kernel/time.c | 29 ++---
>  1 file changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index f5cbfe5efd25..f80cce0e3899 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -615,23 +615,22 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
>   return;
>   }
>  
> - /* Conditionally hard-enable interrupts. */
> - if (should_hard_irq_enable()) {
> - /*
> -  * Ensure a positive value is written to the decrementer, or
> -  * else some CPUs will continue to take decrementer exceptions.
> -  * When the PPC_WATCHDOG (decrementer based) is configured,
> -  * keep this at most 31 bits, which is about 4 seconds on most
> -  * systems, which gives the watchdog a chance of catching timer
> -  * interrupt hard lockups.
> -  */
> - if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
> - set_dec(0x7fff);
> - else
> - set_dec(decrementer_max);
> + /*
> +  * Ensure a positive value is written to the decrementer, or
> +  * else some CPUs will continue to take decrementer exceptions.
> +  * When the PPC_WATCHDOG (decrementer based) is configured,
> +  * keep this at most 31 bits, which is about 4 seconds on most
> +  * systems, which gives the watchdog a chance of catching timer
> +  * interrupt hard lockups.
> +  */
> + if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
> + set_dec(0x7fff);
> + else
> + set_dec(decrementer_max);
>  
> + /* Conditionally hard-enable interrupts. */
> + if (should_hard_irq_enable())
>   do_hard_irq_enable();
> - }
>  
>  #if defined(CONFIG_PPC32) && defined(CONFIG_PPC_PMAC)
>   if 

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.18-1 tag

2022-03-30 Thread Michal Suchánek
On Mon, Mar 28, 2022 at 08:07:13PM +1100, Michael Ellerman wrote:
> Linus Torvalds  writes:
> > On Fri, Mar 25, 2022 at 3:25 AM Michael Ellerman  
> > wrote:

> 
> > That said:
> >
> >> There's a series of commits cleaning up function descriptor handling,
> >
> > For some reason I also thought that powerpc had actually moved away
> > from function descriptors, so I'm clearly not keeping up with the
> > times.
> 
> No you're right, we have moved away from them, but not entirely.
> 
> Functions descriptors are still used for 64-bit big endian, but they're
> not used for 64-bit little endian, or 32-bit.

There was a patch to use ABIv2 for ppc64 big endian. I suppose that
would rid usof the gunction descriptors for good.

Somehow the discussion of that change tralied off without any results.

Maybe it's worth resurrecting?

Thanks

Michal


Re: [PATCH v5 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2022-02-14 Thread Michal Suchánek
Hello,

On Mon, Feb 14, 2022 at 10:14:16AM -0500, Mimi Zohar wrote:
> Hi Michal,
> 
> On Sun, 2022-02-13 at 21:59 -0500, Mimi Zohar wrote:
> 
> > 
> > On Tue, 2022-01-11 at 12:37 +0100, Michal Suchanek wrote:
> > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > index dea74d7717c0..1cde9b6c5987 100644
> > > --- a/arch/powerpc/Kconfig
> > > +++ b/arch/powerpc/Kconfig
> > > @@ -560,6 +560,22 @@ config KEXEC_FILE
> > >  config ARCH_HAS_KEXEC_PURGATORY
> > > def_bool KEXEC_FILE
> > >  
> > > +config KEXEC_SIG
> > > +   bool "Verify kernel signature during kexec_file_load() syscall"
> > > +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> > > +   help
> > > + This option makes kernel signature verification mandatory for

This is actually wrong. KEXEC_SIG makes it mandatory that any signature
that is appended is valid and made by a key that is part of the platform
keyiring (which is also wrong, built-in keys should be also accepted).
KEXEC_SIG_FORCE or an IMA policy makes it mandatory that the signature
is present.

> > > + the kexec_file_load() syscall.
> > 
> > When KEXEC_SIG is enabled on other architectures, IMA does not define a
> > kexec 'appraise' policy rule.  Refer to the policy rules in
> > security/ima/ima_efi.c.  Similarly the kexec 'appraise' policy rule in

I suppose you mean security/integrity/ima/ima_efi.c

I also think it's misguided because KEXEC_SIG in itself does not enforce
the signature. KEXEC_SIG_FORCE does.

> > arch/powerpc/kernel/ima_policy.c should not be defined.

I suppose you mean arch/powerpc/kernel/ima_arch.c - see above.


Thanks for taking the time to reseach and summarize the differences.

> The discussion shouldn't only be about IMA vs. KEXEC_SIG kernel image
> signature verification.  Let's try and reframe the problem a bit.
> 
> 1. Unify and simply the existing kexec signature verification so
> verifying the KEXEC kernel image signature works irrespective of
> signature type - PE, appended signature.
> 
> solution: enable KEXEC_SIG  (This patch set, with the above powerpc IMA
> policy changes.)
> 
> 2. Measure and include the kexec kernel image in a log for attestation,
> if desired.
> 
> solution: enable IMA_ARCH_POLICY 
> - Powerpc: requires trusted boot to be enabled.
> - EFI:   requires  secure boot to be enabled.  The IMA efi policy
> doesn't differentiate between secure and trusted boot.
> 
> 3. Carry the kexec kernel image measurement across kexec, if desired
> and supported on the architecture.
> 
> solution: enable IMA_KEXEC
> 
> Comparison: 
> - Are there any differences between IMA vs. KEXEC_SIG measuring the
> kexec kernel image?
> 
> One of the main differences is "what" is included in the measurement
> list differs.  In both cases, the 'd-ng' field of the IMA measurement
> list template (e.g. ima-ng, ima-sig, ima-modsig) is the full file hash
> including the appended signature.  With IMA and the 'ima-modsig'
> template, an additional hash without the appended signature is defined,
> as well as including the appended signature in the 'sig' field.
> 
> Including the file hash and appended signature in the measurement list
> allows an attestation server, for example, to verify the appended
> signature without having to know the file hash without the signature.

I don't understand this part. Isn't the hash *with* signature always
included, and the distinguishing part about IMA is the hash *without*
signature which is the same irrespective of signature type (PE, appended
xattr) and irrespective of the keyt used for signoing?

> Other differences are already included in the Kconfig KEXEC_SIG "Notes"
> section.

Which besides what is already described above would be blacklisting
specific binaries, which is much more effective if you have hashes of
binaries without signature.

Thanks

Michal


Re: No Linux logs when doing `ppc64_cpu --smt=off/8`

2022-02-14 Thread Michal Suchánek
On Mon, Feb 14, 2022 at 01:33:24PM +0100, Paul Menzel wrote:
> Dear Michal,
> 
> 
> Thank you for your reply.
> 
> Am 14.02.22 um 10:43 schrieb Michal Suchánek:
> 
> > On Mon, Feb 14, 2022 at 07:08:07AM +0100, Paul Menzel wrote:
> > > Dear PPC folks,
> > > 
> > > 
> > > On the POWER8 server IBM S822LC running `ppc64_cpu --smt=off` or 
> > > `ppc64_cpu
> > > --smt=8`, Linux 5.17-rc4 does not log anything. I would have expected a
> > > message about the change in number of processing units.
> > 
> > IIRC it was considered too noisy for systems with many CPUs and the
> > message was dropped. You can always check the resulting state with
> > ppc64_cpu or examining sysfs.
> 
> Yes, simple `nproc` suffice, but I was more thinking about, that the Linux
> log is often used for debugging and the changes of amount of processing
> units might be good to have. `ppc64_cpu --smt=off` or `=8` seems to block
> for quite some time, and each thread/processing unit seems to powered
> down/on sequentially, so it takes quite some time and it blocks. So 140
> messages would indeed be quite noise. No idea how `ppc64_cpu` works, and if
> it could log a message at the beginning and end.

Yes, it enables/disables threads one by one. AFAICT the kernel cannot know that
ppc64_cpu will enable/disable more threads later, it can either log each
or none. Rate limiting would not show the whole picture so it's not
great solution either.

Thanks

Michal


Re: No Linux logs when doing `ppc64_cpu --smt=off/8`

2022-02-14 Thread Michal Suchánek
Hello,

On Mon, Feb 14, 2022 at 07:08:07AM +0100, Paul Menzel wrote:
> Dear PPC folks,
> 
> 
> On the POWER8 server IBM S822LC running `ppc64_cpu --smt=off` or `ppc64_cpu
> --smt=8`, Linux 5.17-rc4 does not log anything. I would have expected a
> message about the change in number of processing units.

IIRC it was considered too noisy for systems with many CPUs and the
message was dropped. You can always check the resulting state with
ppc64_cpu or examining sysfs.

Thanks

Michal


Re: [PATCH v5 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2022-02-09 Thread Michal Suchánek
Hello,

On Wed, Feb 09, 2022 at 07:44:15AM +0100, Paul Menzel wrote:
> Dear Michal,
> 
> 
> Thank you for the patch.
> 
> 
> Am 11.01.22 um 12:37 schrieb Michal Suchanek:
> 
> Could you please remove the dot/period at the end of the git commit message
> summary?

Sure

> > Copy the code from s390x
> > 
> > Both powerpc and s390x use appended signature format (as opposed to EFI
> > based patforms using PE format).
> 
> patforms → platforms

Thanks for noticing

> How can this be tested?

Apparently KEXEC_SIG_FORCE is x86 only although the use of the option is
arch neutral:

arch/x86/Kconfig:config KEXEC_SIG_FORCE
kernel/kexec_file.c:if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE))
{

Maybe it should be moved?

I used a patched kernel that enables lockdown in secure boot, and then
verified that signed kernel can be loaded by kexec and unsigned not,
with KEXEC_SIG enabled and IMA_KEXEC disabled.

The lockdown support can be enabled on any platform, and although I
can't find it documented anywhere there appears to be code in kexec_file
to take it into account:
kernel/kexec.c: result = security_locked_down(LOCKDOWN_KEXEC);
kernel/kexec_file.c:security_locked_down(LOCKDOWN_KEXEC))
kernel/module.c:return security_locked_down(LOCKDOWN_MODULE_SIGNATURE);
kernel/params.c:security_locked_down(LOCKDOWN_MODULE_PARAMETERS))
and lockdown can be enabled with a buildtime option, a kernel parameter, or a
debugfs file.

Still for testing lifting KEXEC_SIG_FORCE to some arch-neutral Kconfig file is
probably the simplest option.

kexec -s option should be used to select kexec_file rather than the old
style kexec which would either fail always or succeed always regardelss
of signature.

> > Signed-off-by: Michal Suchanek 
> > ---
> > v3: - Philipp Rudo : Update the comit message with
> >explanation why the s390 code is usable on powerpc.
> >  - Include correct header for mod_check_sig
> >  - Nayna : Mention additional IMA features
> >in kconfig text
> > ---
> >   arch/powerpc/Kconfig| 16 
> >   arch/powerpc/kexec/elf_64.c | 36 
> >   2 files changed, 52 insertions(+)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index dea74d7717c0..1cde9b6c5987 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -560,6 +560,22 @@ config KEXEC_FILE
> >   config ARCH_HAS_KEXEC_PURGATORY
> > def_bool KEXEC_FILE
> > +config KEXEC_SIG
> > +   bool "Verify kernel signature during kexec_file_load() syscall"
> > +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> > +   help
> > + This option makes kernel signature verification mandatory for
> > + the kexec_file_load() syscall.
> > +
> > + In addition to that option, you need to enable signature
> > + verification for the corresponding kernel image type being
> > + loaded in order for this to work.
> > +
> > + Note: on powerpc IMA_ARCH_POLICY also implements kexec'ed kernel
> > + verification. In addition IMA adds kernel hashes to the measurement
> > + list, extends IMA PCR in the TPM, and implements kernel image
> > + blacklist by hash.
> 
> So, what is the takeaway for the user? IMA_ARCH_POLICY is preferred? What is
> the disadvantage, and two implementations(?) needed then? More overhead?

IMA_KEXEC does more than KEXEC_SIG. The overhead is probably not big
unless you are trying to really minimize the kernel code size.

Arguably the simpler implementation hass less potential for bugs, too.
Both in code and in user configuration required to enable the feature.

Interestingly IMA_ARCH_POLICY depends on KEXEC_SIG rather than
IMA_KEXEC. Just mind-boggling.

The main problem with IMA_KEXEC from my point of view is it is not portable.
To record the measurements TPM support is requireed which is not available on
all platforms. It does not support PE so it cannot be used on platforms
that use PE kernel signature format.

> 
> > +
> >   config RELOCATABLE
> > bool "Build a relocatable kernel"
> > depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
> > diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
> > index eeb258002d1e..98d1cb5135b4 100644
> > --- a/arch/powerpc/kexec/elf_64.c
> > +++ b/arch/powerpc/kexec/elf_64.c
> > @@ -23,6 +23,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   static void *elf64_load(struct kimage *image, char *kernel_buf,
> > unsigned long kernel_len, char *initrd,
> > @@ -151,7 +152,42 @@ static void *elf64_load(struct kimage *image, char 
> > *kernel_buf,
> > return ret ? ERR_PTR(ret) : NULL;
> >   }
> > +#ifdef CONFIG_KEXEC_SIG
> > +int elf64_verify_sig(const char *kernel, unsigned long kernel_len)
> > +{
> > +   const unsigned long marker_len = sizeof(MODULE_SIG_STRING) - 1;
> > +   struct module_signature *ms;
> > +   unsigned long sig_len;
> 
> Use size_t to match the signature of 

Re: [PATCH v3 4/6] modules: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC

2022-02-08 Thread Michal Suchánek
Hello,

On Thu, Feb 03, 2022 at 11:51:05AM -0800, Luis Chamberlain wrote:
> On Thu, Feb 03, 2022 at 07:05:13AM +, Christophe Leroy wrote:
> > Le 03/02/2022 à 01:01, Luis Chamberlain a écrit :
> > > On Sat, Jan 29, 2022 at 05:02:09PM +, Christophe Leroy wrote:
> > >> diff --git a/kernel/module.c b/kernel/module.c
> > >> index 11f51e17fb9f..f3758115ebaa 100644
> > >> --- a/kernel/module.c
> > >> +++ b/kernel/module.c
> > >> @@ -81,7 +81,9 @@
> > >>   /* If this is set, the section belongs in the init part of the module 
> > >> */
> > >>   #define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
> > >>   
> > >> +#ifndef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> > >>   #definedata_layout core_layout
> > >> +#endif
> > >>   
> > >>   /*
> > >>* Mutex protects:
> > >> @@ -111,6 +113,12 @@ static struct mod_tree_root {
> > >>   #define module_addr_min mod_tree.addr_min
> > >>   #define module_addr_max mod_tree.addr_max
> > >>   
> > >> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> > >> +static struct mod_tree_root mod_data_tree __cacheline_aligned = {
> > >> +.addr_min = -1UL,
> > >> +};
> > >> +#endif
> > >> +
> > >>   #ifdef CONFIG_MODULES_TREE_LOOKUP
> > >>   
> > >>   /*
> > >> @@ -186,6 +194,11 @@ static void mod_tree_insert(struct module *mod)
> > >>  __mod_tree_insert(>core_layout.mtn, _tree);
> > >>  if (mod->init_layout.size)
> > >>  __mod_tree_insert(>init_layout.mtn, _tree);
> > >> +
> > >> +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> > >> +mod->data_layout.mtn.mod = mod;
> > >> +__mod_tree_insert(>data_layout.mtn, _data_tree);
> > >> +#endif
> > > 
> > > 
> > > kernel/ directory has quite a few files, module.c is the second to
> > > largest file, and it has tons of stuff. Aaron is doing work to
> > > split things out to make code easier to read and so that its easier
> > > to review changes. See:
> > > 
> > > https://lkml.kernel.org/r/20220130213214.1042497-1-atom...@redhat.com
> > > 
> > > I think this is a good patch example which could benefit from that work.
> > > So I'd much prefer to see that work go in first than this, so to see if
> > > we can make the below changes more compartamentalized.
> > > 
> > > Curious, how much testing has been put into this series?
> > 
> > 
> > I tested the change up to (including) patch 4 to verify it doesn't 
> > introduce regression when not using 
> > CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC,
> 
> > Then I tested with patch 5. I first tried with the 'hello world' test 
> > module. After that I loaded several important modules and checked I 
> > didn't get any regression, both with and without STRICT_MODULES_RWX and 
> > I checked the consistency in /proc/vmallocinfo
> >   /proc/modules /sys/class/modules/*
> 
> I wonder if we have a test for STRICT_MODULES_RWX.
> 
> > I also tested with a hacked module_alloc() to force branch trampolines.
> 
> So to verify that reducing these trampolines actually helps on an
> architecture? I wonder if we can generalize this somehow to let archs
> verify such strategies can help.
> 
> I was hoping for a bit more wider testing, like actually users, etc.
> It does not seem like so. So we can get to that by merging this soon
> into modules-next and having this bleed out issues with linux-next.
> We are in good time to do this now.
> 
> The kmod tree has tons of tests:
> 
> https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git/
> 
> Can you use that to verify there are no regressions?

openSUSE has the testsuite packaged so it's easy to run on arbitrary
kernel but only on ppc64(le) because there is no ppc there anymore.

So yes, it does not regress Book3S/64 as far as kmod testsuite is
conderned and building s390x non-modular kernel also still worka but
that's not saying much.

Thanks

Michal


Re: [PATCH v5 3/6] kexec_file: Don't opencode appended signature verification.

2022-02-03 Thread Michal Suchánek
Hello,

thanks for the review.

On Tue, Jan 25, 2022 at 12:15:56PM -0800, Luis Chamberlain wrote:
> On Tue, Jan 11, 2022 at 12:37:45PM +0100, Michal Suchanek wrote:
> > diff --git a/include/linux/verification.h b/include/linux/verification.h
> > index a655923335ae..32db9287a7b0 100644
> > --- a/include/linux/verification.h
> > +++ b/include/linux/verification.h
> > @@ -60,5 +60,8 @@ extern int verify_pefile_signature(const void *pebuf, 
> > unsigned pelen,
> >enum key_being_used_for usage);
> >  #endif
> >  
> > +int verify_appended_signature(const void *data, unsigned long *len,
> > + struct key *trusted_keys, const char *what);
> > +
> 
> Looks very non-module specific.

Which it is now that the same signature format is used for kernels.

> 
> > diff --git a/kernel/module_signing.c b/kernel/module_signing.c
> > index 8723ae70ea1f..30149969f21f 100644
> > --- a/kernel/module_signing.c
> > +++ b/kernel/module_signing.c
> > @@ -14,32 +14,38 @@
> >  #include 
> >  #include "module-internal.h"
> >  
> > -/*
> > - * Verify the signature on a module.
> > +/**
> > + * verify_appended_signature - Verify the signature on a module with the
> > + * signature marker stripped.
> > + * @data: The data to be verified
> > + * @len: Size of @data.
> > + * @trusted_keys: Keyring to use for verification
> > + * @what: Informational string for log messages
> >   */
> > -int mod_verify_sig(const void *mod, struct load_info *info)
> > +int verify_appended_signature(const void *data, unsigned long *len,
> > + struct key *trusted_keys, const char *what)
> >  {
> > -   struct module_signature ms;
> > -   size_t sig_len, modlen = info->len;
> > +   struct module_signature *ms;
> 
> There goes the abstraction, so why not make this clear where we re-use
> the struct module_signature for various things and call it as it is,
> verify_mod_appended_signature() or some such?

It sounds like the abstraction is actually improved by callers no longer
dealing with struct module_signature when verifying signature on a
kernel. That is the structure is misnamed but it is now hidden behind
an abstraction.

Or am I missing something?

Thanks

Michal


Re: [PATCH v7 1/7] powerpc/pmem: Restrict papr_scm to P8 and above.

2022-01-21 Thread Michal Suchánek
On Fri, Jan 21, 2022 at 02:48:32PM +0530, Aneesh Kumar K.V wrote:
> Michal Suchánek  writes:
> 
> > Hello,
> >
> > On Wed, Jul 01, 2020 at 12:52:29PM +0530, Aneesh Kumar K.V wrote:
> >> The PAPR based virtualized persistent memory devices are only supported on
> >> POWER9 and above. In the followup patch, the kernel will switch the 
> >> persistent
> >> memory cache flush functions to use a new `dcbf` variant instruction. The 
> >> new
> >> instructions even though added in ISA 3.1 works even on P8 and P9 because 
> >> these
> >> are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
> >> P9 behaves as such.
> >> 
> >> Considering these devices are only supported on P8 and above,  update the 
> >> driver
> >> to prevent a P7-compat guest from using persistent memory devices.
> >> 
> >> We don't update of_pmem driver with the same condition, because, on 
> >> bare-metal,
> >> the firmware enables pmem support only on P9 and above. There the kernel 
> >> depends
> >> on OPAL firmware to restrict exposing persistent memory related device tree
> >> entries on older hardware. of_pmem.ko is written without any arch 
> >> dependency and
> >> we don't want to add ppc64 specific cpu feature check in of_pmem driver.
> >> 
> >> Signed-off-by: Aneesh Kumar K.V 
> >> ---
> >>  arch/powerpc/platforms/pseries/pmem.c | 6 ++
> >>  1 file changed, 6 insertions(+)
> >> 
> >> diff --git a/arch/powerpc/platforms/pseries/pmem.c 
> >> b/arch/powerpc/platforms/pseries/pmem.c
> >> index f860a897a9e0..2347e1038f58 100644
> >> --- a/arch/powerpc/platforms/pseries/pmem.c
> >> +++ b/arch/powerpc/platforms/pseries/pmem.c
> >> @@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
> >>  
> >>  static int pseries_pmem_init(void)
> >>  {
> >> +  /*
> >> +   * Only supported on POWER8 and above.
> >> +   */
> >> +  if (!cpu_has_feature(CPU_FTR_ARCH_207S))
> >> +  return 0;
> >> +
> >
> > This looks superfluous.
> >
> > The hypervisor is responsible for publishing the pmem in devicetree when
> > present, kernel is responsible for using it when supported by the
> > kernel.
> >
> > Or is there a problem that the flush instruction is not available in P7
> > compat mode?
> 
> We want to avoid the usage of persistent memory on p7 compat mode
> because such a guest can LPM migrate to p7 systems. Now ideally I would
> expect hypervisor to avoid such migration, that is a p7 compat mode
> guest running on p10 using persistence memory migrating to p7
> (considering p7 never really had support for persistent memory).

Yes, I would expect the hypervisor to prevent migration to host that
does not have all the hardawre that the guest uses. It could still
migrate to P8 or whatever in compat mode.

> 
> There was also the complexity w.r.t what instructions the userspace will
> use. So it was discussed at that point that we could comfortably state
> and prevent the usage of persistent memory on p7 and below. 

But is that arbitrary or does POWER7 not support the pmem sync instructions?

If that is true then how is POWER7 compat mode behaving WRT those
instructions?

Thanks

Michal


Re: [PATCH v7 1/7] powerpc/pmem: Restrict papr_scm to P8 and above.

2022-01-21 Thread Michal Suchánek
Hello,

On Wed, Jul 01, 2020 at 12:52:29PM +0530, Aneesh Kumar K.V wrote:
> The PAPR based virtualized persistent memory devices are only supported on
> POWER9 and above. In the followup patch, the kernel will switch the persistent
> memory cache flush functions to use a new `dcbf` variant instruction. The new
> instructions even though added in ISA 3.1 works even on P8 and P9 because 
> these
> are implemented as a variant of existing `dcbf` and `hwsync` and on P8 and
> P9 behaves as such.
> 
> Considering these devices are only supported on P8 and above,  update the 
> driver
> to prevent a P7-compat guest from using persistent memory devices.
> 
> We don't update of_pmem driver with the same condition, because, on 
> bare-metal,
> the firmware enables pmem support only on P9 and above. There the kernel 
> depends
> on OPAL firmware to restrict exposing persistent memory related device tree
> entries on older hardware. of_pmem.ko is written without any arch dependency 
> and
> we don't want to add ppc64 specific cpu feature check in of_pmem driver.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/platforms/pseries/pmem.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/pmem.c 
> b/arch/powerpc/platforms/pseries/pmem.c
> index f860a897a9e0..2347e1038f58 100644
> --- a/arch/powerpc/platforms/pseries/pmem.c
> +++ b/arch/powerpc/platforms/pseries/pmem.c
> @@ -147,6 +147,12 @@ const struct of_device_id drc_pmem_match[] = {
>  
>  static int pseries_pmem_init(void)
>  {
> + /*
> +  * Only supported on POWER8 and above.
> +  */
> + if (!cpu_has_feature(CPU_FTR_ARCH_207S))
> + return 0;
> +

This looks superfluous.

The hypervisor is responsible for publishing the pmem in devicetree when
present, kernel is responsible for using it when supported by the
kernel.

Or is there a problem that the flush instruction is not available in P7
compat mode?

Even then volatile regions should still work.

Thanks

Michal


Re: [PATCH v2 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2021-12-13 Thread Michal Suchánek
Hello,

On Sun, Dec 12, 2021 at 07:46:53PM -0500, Nayna wrote:
> 
> On 11/25/21 13:02, Michal Suchanek wrote:
> > Copy the code from s390x
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> >   arch/powerpc/Kconfig| 11 +++
> >   arch/powerpc/kexec/elf_64.c | 36 
> >   2 files changed, 47 insertions(+)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index ac0c515552fd..ecc1227a77f1 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -561,6 +561,17 @@ config KEXEC_FILE
> >   config ARCH_HAS_KEXEC_PURGATORY
> > def_bool KEXEC_FILE
> > 
> > +config KEXEC_SIG
> > +   bool "Verify kernel signature during kexec_file_load() syscall"
> > +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> > +   help
> > + This option makes kernel signature verification mandatory for
> > + the kexec_file_load() syscall.
> > +
> 
> Resending my last response as looks like it didn't go through mailing list
> because of some wrong formatting. My apologies to those who are receiving it
> twice.
> 
> Since powerpc also supports IMA_ARCH_POLICY for kernel image signature
> verification, please include the following:
> 
> "An alternative implementation for the powerpc arch is IMA_ARCH_POLICY. It
> verifies the appended kernel image signature and additionally includes both
> the signed and unsigned file hashes in the IMA measurement list, extends the
> IMA PCR in the TPM, and prevents blacklisted binary kernel images from being
> kexec'd."

It also does blacklist based on the file hash?

There is a downstream patch that adds the support for the module
signatures, and when the code is reused for KEXEC_SIG the blacklist
also applies to it.

Which kind of shows that people really want to use the IMA features but
with no support on some major architectures it's not going to work.

Thanks

Michal


Re: [PATCH v2 6/6] module: Move duplicate mod_check_sig users code to mod_parse_sig

2021-12-13 Thread Michal Suchánek
Hello,

On Tue, Dec 07, 2021 at 05:10:34PM +0100, Philipp Rudo wrote:
> Hi Michal,
> 
> On Thu, 25 Nov 2021 19:02:44 +0100
> Michal Suchanek  wrote:
> 
> > Multiple users of mod_check_sig check for the marker, then call
> > mod_check_sig, extract signature length, and remove the signature.
> > 
> > Put this code in one place together with mod_check_sig.
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> >  include/linux/module_signature.h|  1 +
> >  kernel/module_signature.c   | 56 -
> >  kernel/module_signing.c | 26 +++---
> >  security/integrity/ima/ima_modsig.c | 22 ++--
> >  4 files changed, 63 insertions(+), 42 deletions(-)
> > 
> > diff --git a/include/linux/module_signature.h 
> > b/include/linux/module_signature.h
> > index 7eb4b00381ac..1343879b72b3 100644
> > --- a/include/linux/module_signature.h
> > +++ b/include/linux/module_signature.h
> > @@ -42,5 +42,6 @@ struct module_signature {
> >  
> >  int mod_check_sig(const struct module_signature *ms, size_t file_len,
> >   const char *name);
> > +int mod_parse_sig(const void *data, size_t *len, size_t *sig_len, const 
> > char *name);
> >  
> >  #endif /* _LINUX_MODULE_SIGNATURE_H */
> > diff --git a/kernel/module_signature.c b/kernel/module_signature.c
> > index 00132d12487c..784b40575ee4 100644
> > --- a/kernel/module_signature.c
> > +++ b/kernel/module_signature.c
> > @@ -8,14 +8,36 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > +/**
> > + * mod_check_sig_marker - check that the given data has signature marker 
> > at the end
> > + *
> > + * @data:  Data with appended signature
> > + * @len:   Length of data. Signature marker length is subtracted on 
> > success.
> > + */
> > +static inline int mod_check_sig_marker(const void *data, size_t *len)
> 
> I personally don't like it when a function has a "check" in it's name
> as it doesn't describe what the function is checking for. For me

It is consistent with mod_check_sig

> mod_has_sig_marker is much more precise. I would use that instead.

It actually would not because it does more than that.

Thanks

Michal

> 
> Thanks
> Philipp
> 
> > +{
> > +   const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
> > +
> > +   if (markerlen > *len)
> > +   return -ENODATA;
> > +
> > +   if (memcmp(data + *len - markerlen, MODULE_SIG_STRING,
> > +  markerlen))
> > +   return -ENODATA;
> > +
> > +   *len -= markerlen;
> > +   return 0;
> > +}
> > +
> >  /**
> >   * mod_check_sig - check that the given signature is sane
> >   *
> >   * @ms:Signature to check.
> > - * @file_len:  Size of the file to which @ms is appended.
> > + * @file_len:  Size of the file to which @ms is appended (without the 
> > marker).
> >   * @name:  What is being checked. Used for error messages.
> >   */
> >  int mod_check_sig(const struct module_signature *ms, size_t file_len,
> > @@ -44,3 +66,35 @@ int mod_check_sig(const struct module_signature *ms, 
> > size_t file_len,
> >  
> > return 0;
> >  }
> > +
> > +/**
> > + * mod_parse_sig - check that the given signature is sane and determine 
> > signature length
> > + *
> > + * @data:  Data with appended signature.
> > + * @len:   Length of data. Signature and marker length is subtracted on 
> > success.
> > + * @sig_len:   Length of signature. Filled on success.
> > + * @name:  What is being checked. Used for error messages.
> > + */
> > +int mod_parse_sig(const void *data, size_t *len, size_t *sig_len, const 
> > char *name)
> > +{
> > +   const struct module_signature *sig;
> > +   int rc;
> > +
> > +   rc = mod_check_sig_marker(data, len);
> > +   if (rc)
> > +   return rc;
> > +
> > +   if (*len < sizeof(*sig))
> > +   return -ENODATA;
> > +
> > +   sig = (const struct module_signature *)(data + (*len - sizeof(*sig)));
> > +
> > +   rc = mod_check_sig(sig, *len, name);
> > +   if (rc)
> > +   return rc;
> > +
> > +   *sig_len = be32_to_cpu(sig->sig_len);
> > +   *len -= *sig_len + sizeof(*sig);
> > +
> > +   return 0;
> > +}
> > diff --git a/kernel/module_signing.c b/kernel/module_signing.c
> > index cef72a6f6b5d..02bbca90f467 100644
> > --- a/kernel/module_signing.c
> > +++ b/kernel/module_signing.c
> > @@ -25,35 +25,17 @@ int verify_appended_signature(const void *data, size_t 
> > *len,
> >   struct key *trusted_keys,
> >   enum key_being_used_for purpose)
> >  {
> > -   const unsigned long markerlen = sizeof(MODULE_SIG_STRING) - 1;
> > struct module_signature ms;
> > -   size_t sig_len, modlen = *len;
> > +   size_t sig_len;
> > int ret;
> >  
> > -   pr_devel("==>%s %s(,%zu)\n", __func__, key_being_used_for[purpose], 
> > modlen);  
> > +   pr_devel("==>%s %s(,%zu)\n", __func__, key_being_used_for[purpose], 
> > *len);
> >  
> > -   if (markerlen > modlen)
> > -   return -ENODATA;
> > -
> > -   if 

Re: [PATCH v2 0/6] KEXEC_SIG with appended signature

2021-12-09 Thread Michal Suchánek
Hello,

On Wed, Dec 08, 2021 at 08:50:54PM -0500, Nayna wrote:
> 
> On 11/25/21 13:02, Michal Suchanek wrote:
> > Hello,
> 
> Hi Michael,
> 
> > 
> > This is resend of the KEXEC_SIG patchset.
> > 
> > The first patch is new because it'a a cleanup that does not require any
> > change to the module verification code.
> > 
> > The second patch is the only one that is intended to change any
> > functionality.
> > 
> > The rest only deduplicates code but I did not receive any review on that
> > part so I don't know if it's desirable as implemented.
> > 
> > The first two patches can be applied separately without the rest.
> 
> Patch 2 fails to apply on v5.16-rc4. Can you please also include git
> tree/branch while posting the patches ?

Sorry, I did not have a clean base and the Kconfig had another change.

Here is a tree with the changes applied:
https://github.com/hramrach/kernel/tree/kexec_sig

> 
> Secondly, I see that you add the powerpc support in Patch 2 and then modify
> it again in Patch 5 after cleanup. Why not add the support for powerpc after
> the clean up ? This will reduce some rework and also probably simplify
> patches.

That's because I don't know if the later patches will be accepted. By
queueing this patch first it can be applied standalone to ppc tree
without regard for the other patches. It's a copy of the s390 code so it
needs the same rework - not really adding complexity.

Thanks

Michal


Re: [PATCH v2 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2021-12-09 Thread Michal Suchánek
Hello,

On Wed, Dec 08, 2021 at 08:51:47PM -0500, Nayna wrote:
> 
> On 11/25/21 13:02, Michal Suchanek wrote:
> > Copy the code from s390x
> > 
> > Signed-off-by: Michal Suchanek
> > ---
> >   arch/powerpc/Kconfig| 11 +++
> >   arch/powerpc/kexec/elf_64.c | 36 
> >   2 files changed, 47 insertions(+)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index ac0c515552fd..ecc1227a77f1 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -561,6 +561,17 @@ config KEXEC_FILE
> >   config ARCH_HAS_KEXEC_PURGATORY
> > def_bool KEXEC_FILE
> > 
> > +config KEXEC_SIG
> > +   bool "Verify kernel signature during kexec_file_load() syscall"
> > +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> 
> After manually applying the patch, the build is failing with the following
> error:
> 
> build failed with error "arch/powerpc/kexec/elf_64.o: In function
> `elf64_verify_sig':
> /root/kernel/linus/linux/arch/powerpc/kexec/elf_64.c:160: undefined
> reference to `verify_appended_signature'"

This patch does not add call to verify_appended_signature.

Maybe you applied the following patch as well?

Thanks

Michal


Re: [PATCH v2 0/6] KEXEC_SIG with appended signature

2021-12-07 Thread Michal Suchánek
On Tue, Dec 07, 2021 at 05:10:14PM +0100, Philipp Rudo wrote:
> Hi Michal,
> 
> i finally had the time to take a closer look at the series. Except for
> the nit in patch 4 and my personal preference in patch 6 the code looks
> good to me.
> 
> What I don't like are the commit messages on the first commits. In my
> opinion they are so short that they are almost useless. For example in
> patch 2 there is absolutely no explanation why you can simply copy the
> s390 over to ppc.

They use the same signature format. I suppose I can add a note saying
that.

> Or in patch 3 you are silently changing the error
> code in kexec from EKEYREJECT to ENODATA. So I would appreciate it if

Not sure what I should do about this. The different implementations use
different random error codes, and when they are unified the error code
clearly changes for one or the other.

Does anything depend on a particular error code returned?

Thanks

Michal

> you could improve them a little.
> 
> Thanks
> Philipp
> 
> On Thu, 25 Nov 2021 19:02:38 +0100
> Michal Suchanek  wrote:
> 
> > Hello,
> > 
> > This is resend of the KEXEC_SIG patchset.
> > 
> > The first patch is new because it'a a cleanup that does not require any
> > change to the module verification code.
> > 
> > The second patch is the only one that is intended to change any
> > functionality.
> > 
> > The rest only deduplicates code but I did not receive any review on that
> > part so I don't know if it's desirable as implemented.
> > 
> > The first two patches can be applied separately without the rest.
> > 
> > Thanks
> > 
> > Michal
> > 
> > Michal Suchanek (6):
> >   s390/kexec_file: Don't opencode appended signature check.
> >   powerpc/kexec_file: Add KEXEC_SIG support.
> >   kexec_file: Don't opencode appended signature verification.
> >   module: strip the signature marker in the verification function.
> >   module: Use key_being_used_for for log messages in
> > verify_appended_signature
> >   module: Move duplicate mod_check_sig users code to mod_parse_sig
> > 
> >  arch/powerpc/Kconfig | 11 +
> >  arch/powerpc/kexec/elf_64.c  | 14 ++
> >  arch/s390/kernel/machine_kexec_file.c| 42 ++
> >  crypto/asymmetric_keys/asymmetric_type.c |  1 +
> >  include/linux/module_signature.h |  1 +
> >  include/linux/verification.h |  4 ++
> >  kernel/module-internal.h |  2 -
> >  kernel/module.c  | 12 +++--
> >  kernel/module_signature.c| 56 +++-
> >  kernel/module_signing.c  | 33 +++---
> >  security/integrity/ima/ima_modsig.c  | 22 ++
> >  11 files changed, 113 insertions(+), 85 deletions(-)
> > 
> 


Re: [PATCH v2 0/6] KEXEC_SIG with appended signature

2021-12-01 Thread Michal Suchánek
Hello,

On Wed, Dec 01, 2021 at 10:37:47AM +0800, Baoquan He wrote:
> Hi,
> 
> On 11/25/21 at 07:02pm, Michal Suchanek wrote:
> > Hello,
> > 
> > This is resend of the KEXEC_SIG patchset.
> > 
> > The first patch is new because it'a a cleanup that does not require any
> > change to the module verification code.
> > 
> > The second patch is the only one that is intended to change any
> > functionality.
> > 
> > The rest only deduplicates code but I did not receive any review on that
> > part so I don't know if it's desirable as implemented.
> 
> Do you have the link of your 1st version?

This is the previous version:
https://lore.kernel.org/lkml/cover.1635948742.git.msucha...@suse.de/

Thanks

Michal

> And after going through the whole series, it doesn't tell what this
> patch series intends to do in cover-letter or patch log.
> 
> Thanks
> Baoquan
> 
> > 
> > The first two patches can be applied separately without the rest.
> > 
> > Thanks
> > 
> > Michal
> > 
> > Michal Suchanek (6):
> >   s390/kexec_file: Don't opencode appended signature check.
> >   powerpc/kexec_file: Add KEXEC_SIG support.
> >   kexec_file: Don't opencode appended signature verification.
> >   module: strip the signature marker in the verification function.
> >   module: Use key_being_used_for for log messages in
> > verify_appended_signature
> >   module: Move duplicate mod_check_sig users code to mod_parse_sig
> > 
> >  arch/powerpc/Kconfig | 11 +
> >  arch/powerpc/kexec/elf_64.c  | 14 ++
> >  arch/s390/kernel/machine_kexec_file.c| 42 ++
> >  crypto/asymmetric_keys/asymmetric_type.c |  1 +
> >  include/linux/module_signature.h |  1 +
> >  include/linux/verification.h |  4 ++
> >  kernel/module-internal.h |  2 -
> >  kernel/module.c  | 12 +++--
> >  kernel/module_signature.c| 56 +++-
> >  kernel/module_signing.c  | 33 +++---
> >  security/integrity/ima/ima_modsig.c  | 22 ++
> >  11 files changed, 113 insertions(+), 85 deletions(-)
> > 
> > -- 
> > 2.31.1
> > 
> > 
> > ___
> > kexec mailing list
> > ke...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> > 
> 


Re: [PATCH v3 0/4] powerpc: watchdog fixes

2021-11-25 Thread Michal Suchánek
Hello,

On Thu, Nov 25, 2021 at 04:11:03PM +0100, Laurent Dufour wrote:
> On 25/11/2021, 10:36:43, Michael Ellerman wrote:
> > On Wed, 10 Nov 2021 12:50:52 +1000, Nicholas Piggin wrote:
> >> These are some watchdog fixes and improvements, in particular a
> >> deadlock between the wd_smp_lock and console lock when the watchdog
> >> fires, found by Laurent.
> >>
> >> Thanks,
> >> Nick
> >>
> >> [...]
> > 
> > Applied to powerpc/next.
> > 
> > [1/4] powerpc/watchdog: Fix missed watchdog reset due to memory ordering 
> > race
> >   
> > https://git.kernel.org/powerpc/c/5dad4ba68a2483fc80d70b9dc90bbe16e1f27263
> > [2/4] powerpc/watchdog: tighten non-atomic read-modify-write access
> >   
> > https://git.kernel.org/powerpc/c/858c93c31504ac1507084493d7eafbe7e2302dc2
> > [3/4] powerpc/watchdog: Avoid holding wd_smp_lock over printk and 
> > smp_send_nmi_ipi
> >   
> > https://git.kernel.org/powerpc/c/76521c4b0291ad25723638ade5a0ff4d5f659771
> > [4/4] powerpc/watchdog: read TB close to where it is used
> >   
> > https://git.kernel.org/powerpc/c/1f01bf90765fa5f88fbae452c131c1edf5cda7ba
> > 
> > cheers
> > 
> 
> Hi Michael,
> 
> This series has been superseded by this series (v4)
> http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=272865
> 
> Do you plan to apply that v4?

It has been fixed up in

https://lore.kernel.org/linuxppc-dev/20211125103346.1188958-1-npig...@gmail.com/

Thanks

Michal


Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-24 Thread Michal Suchánek
On Wed, Nov 24, 2021 at 08:10:10AM -0500, Mimi Zohar wrote:
> On Wed, 2021-11-24 at 12:09 +0100, Philipp Rudo wrote:
> > Now Michal wants to adapt KEXEC_SIG for ppc too so distros can rely on all
> > architectures using the same mechanism and thus reduce maintenance cost.
> > On the way there he even makes some absolutely reasonable improvements
> > for everybody.
> > 
> > Why is that so controversial? What is the real problem that should be
> > discussed here?
> 
> Nothing is controversial with what Michal wants to do.  I've already
> said, "As for adding KEXEC_SIG appended signature support on PowerPC
> based on the s390 code, it sounds reasonable."

Ok, I will resend the series with the arch-specific changes first to be
independent of the core cleanup.

Thanks

Michal


Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-19 Thread Michal Suchánek
Hello,

On Thu, Nov 18, 2021 at 05:34:01PM -0500, Nayna wrote:
> 
> On 11/16/21 04:53, Michal Suchánek wrote:
> > On Mon, Nov 15, 2021 at 06:53:53PM -0500, Nayna wrote:
> > > On 11/12/21 03:30, Michal Suchánek wrote:
> > > > Hello,
> > > > 
> > > > On Thu, Nov 11, 2021 at 05:26:41PM -0500, Nayna wrote:
> > > > > On 11/8/21 07:05, Michal Suchánek wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > The other part is that distributions apply 'lockdown' patches that 
> > > > > > change
> > > > > > the security policy depending on secure boot status which were 
> > > > > > rejected
> > > > > > by upstream which only hook into the _SIG options, and not into the 
> > > > > > IMA_
> > > > > > options. Of course, I expect this to change when the IMA options are
> > > > > > universally available across architectures and the support picked 
> > > > > > up by
> > > > > > distributions.
> > > > > > 
> > > > > > Which brings the third point: IMA features vary across 
> > > > > > architectures,
> > > > > > and KEXEC_SIG is more common than IMA_KEXEC.
> > > > > > 
> > > > > > config/arm64/default:CONFIG_HAVE_IMA_KEXEC=y
> > > > > > config/ppc64le/default:CONFIG_HAVE_IMA_KEXEC=y
> > > > > > 
> > > > > > config/arm64/default:CONFIG_KEXEC_SIG=y
> > > > > > config/s390x/default:CONFIG_KEXEC_SIG=y
> > > > > > config/x86_64/default:CONFIG_KEXEC_SIG=y
> > > > > > 
> > > > > > KEXEC_SIG makes it much easier to get uniform features across
> > > > > > architectures.
> > > > > Architectures use KEXEC_SIG vs IMA_KEXEC based on their requirement.
> > > > > IMA_KEXEC is for the kernel images signed using sign-file (appended
> > > > > signatures, not PECOFF), provides measurement along with 
> > > > > verification, and
> > > > That's certainly not the case. S390 uses appended signatures with
> > > > KEXEC_SIG, arm64 uses PECOFF with both KEXEC_SIG and IMA_KEXEC.
> > > Yes, S390 uses appended signature, but they also do not support
> > > measurements.
> > > 
> > > On the other hand for arm64/x86, PECOFF works only with KEXEC_SIG. Look at
> > > the KEXEC_IMAGE_VERIFY_SIG config dependencies in arch/arm64/Kconfig and
> > > KEXEC_BZIMAGE_VERIFY_SIG config dependencies in arch/x86/Kconfig. Now, if
> > > KEXEC_SIG is not enabled, then IMA appraisal policies are enforced if 
> > > secure
> > > boot is enabled, refer to security/integrity/ima_efi.c . IMA would fail
> > > verification if kernel is not signed with module sig appended signatures 
> > > or
> > > signature verification fails.
> > > 
> > > In short, IMA is used to enforce the existence of a policy if secure boot 
> > > is
> > > enabled. If they don't support module sig appended signatures, by 
> > > definition
> > > it fails. Thus PECOFF doesn't work with both KEXEC_SIG and IMA_KEXEC, but
> > > only with KEXEC_SIG.
> > Then IMA_KEXEC is a no-go. It is not supported on all architectures and
> > it principially cannot be supported because it does not support PECOFF
> > which is needed to boot the kernel on EFI platforms. To get feature
> > parity across architectures KEXEC_SIG is required.
> 
> I would not say "a no-go", it is based on user requirements.
> 
> The key takeaway from this discussion is that both KEXEC_SIG and IMA_KEXEC
> support functionality with some small degree of overlap, and that
> documenting the differences is needed.  This will help kernel consumers to
> understand the difference and enable the appropriate functionality for their
> environment.

Maybe I was not clear enough. If you happen to focus on an architecture
that supports IMA fully it's great.

My point of view is maintaining multiple architectures. Both end users
and people conecerend with security are rarely familiar with
architecture specifics. Portability of documentation and debugging
instructions across architectures is a concern.

IMA has large number of options with varying availablitily across
architectures for no apparent reason. The situation is complex and hard
to grasp.

In comparison the *_SIG options are widely available. The missing
support for KEXEC_SIG on POWER is trivial to add by cut from s390.
With that all the documentation that exists already is also trivially
applicable to POWER. Any additional code cleanup is a bonus but not
really needed to enable the kexec lockdown on POWER.

Thanks

Michal


Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-16 Thread Michal Suchánek
On Mon, Nov 15, 2021 at 06:53:53PM -0500, Nayna wrote:
> 
> On 11/12/21 03:30, Michal Suchánek wrote:
> > Hello,
> > 
> > On Thu, Nov 11, 2021 at 05:26:41PM -0500, Nayna wrote:
> > > On 11/8/21 07:05, Michal Suchánek wrote:
> > > > Hello,
> > > > 

> > > > The other part is that distributions apply 'lockdown' patches that 
> > > > change
> > > > the security policy depending on secure boot status which were rejected
> > > > by upstream which only hook into the _SIG options, and not into the IMA_
> > > > options. Of course, I expect this to change when the IMA options are
> > > > universally available across architectures and the support picked up by
> > > > distributions.
> > > > 
> > > > Which brings the third point: IMA features vary across architectures,
> > > > and KEXEC_SIG is more common than IMA_KEXEC.
> > > > 
> > > > config/arm64/default:CONFIG_HAVE_IMA_KEXEC=y
> > > > config/ppc64le/default:CONFIG_HAVE_IMA_KEXEC=y
> > > > 
> > > > config/arm64/default:CONFIG_KEXEC_SIG=y
> > > > config/s390x/default:CONFIG_KEXEC_SIG=y
> > > > config/x86_64/default:CONFIG_KEXEC_SIG=y
> > > > 
> > > > KEXEC_SIG makes it much easier to get uniform features across
> > > > architectures.
> > > Architectures use KEXEC_SIG vs IMA_KEXEC based on their requirement.
> > > IMA_KEXEC is for the kernel images signed using sign-file (appended
> > > signatures, not PECOFF), provides measurement along with verification, and
> > That's certainly not the case. S390 uses appended signatures with
> > KEXEC_SIG, arm64 uses PECOFF with both KEXEC_SIG and IMA_KEXEC.
> 
> Yes, S390 uses appended signature, but they also do not support
> measurements.
> 
> On the other hand for arm64/x86, PECOFF works only with KEXEC_SIG. Look at
> the KEXEC_IMAGE_VERIFY_SIG config dependencies in arch/arm64/Kconfig and
> KEXEC_BZIMAGE_VERIFY_SIG config dependencies in arch/x86/Kconfig. Now, if
> KEXEC_SIG is not enabled, then IMA appraisal policies are enforced if secure
> boot is enabled, refer to security/integrity/ima_efi.c . IMA would fail
> verification if kernel is not signed with module sig appended signatures or
> signature verification fails.
> 
> In short, IMA is used to enforce the existence of a policy if secure boot is
> enabled. If they don't support module sig appended signatures, by definition
> it fails. Thus PECOFF doesn't work with both KEXEC_SIG and IMA_KEXEC, but
> only with KEXEC_SIG.

Then IMA_KEXEC is a no-go. It is not supported on all architectures and
it principially cannot be supported because it does not support PECOFF
which is needed to boot the kernel on EFI platforms. To get feature
parity across architectures KEXEC_SIG is required.

> > 
> > > is tied to secureboot state of the system at boot time.
> > In distrubutions it's also the case with KEXEC_SIG, it's only upstream
> > where this is different. I don't know why Linux upstream has rejected
> > this support for KEXEC_SIG.
> > 
> > Anyway, sounds like the difference is that IMA provides measurement but
> > if you don't use it it does not makes any difference except more comlex
> > code.
> I am unsure what do you mean by "complex code" here. Can you please
> elaborate ? IMA policies support for secureboot already exists and can be
> used as it is without adding any extra work as in
> arch/powerpc/kernel/ima_arch.c.

The code exists but using it to replace KEXEC_SIG also requires
understanding the code and the implications of using it. At a glance the
IMA codebase is much bigger and more convoluted compared to KEXEC_SIG
and MODULE_SIG.

Thanks

Michal


Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-12 Thread Michal Suchánek
Hello,

On Thu, Nov 11, 2021 at 05:26:41PM -0500, Nayna wrote:
> 
> On 11/8/21 07:05, Michal Suchánek wrote:
> > Hello,
> > 
> > On Mon, Nov 08, 2021 at 09:18:56AM +1100, Daniel Axtens wrote:
> > > Michal Suchánek  writes:
> > > 
> > > > On Fri, Nov 05, 2021 at 09:55:52PM +1100, Daniel Axtens wrote:
> > > > > Michal Suchanek  writes:
> > > > > 
> > > > > > S390 uses appended signature for kernel but implements the check
> > > > > > separately from module loader.
> > > > > > 
> > > > > > Support for secure boot on powerpc with appended signature is 
> > > > > > planned -
> > > > > > grub patches submitted upstream but not yet merged.
> > > > > Power Non-Virtualised / OpenPower already supports secure boot via 
> > > > > kexec
> > > > > with signature verification via IMA. I think you have now sent a
> > > > > follow-up series that merges some of the IMA implementation, I just
> > > > > wanted to make sure it was clear that we actually already have support
> > > > So is IMA_KEXEC and KEXEC_SIG redundant?
> > > > 
> > > > I see some architectures have both. I also see there is a lot of overlap
> > > > between the IMA framework and the KEXEC_SIG and MODULE_SIg.
> > > 
> > > Mimi would be much better placed than me to answer this.
> > > 
> > > The limits of my knowledge are basically that signature verification for
> > > modules and kexec kernels can be enforced by IMA policies.
> > > 
> > > For example a secure booted powerpc kernel with module support will have
> > > the following IMA policy set at the arch level:
> > > 
> > > "appraise func=KEXEC_KERNEL_CHECK appraise_flag=check_blacklist 
> > > appraise_type=imasig|modsig",
> > > (in arch/powerpc/kernel/ima_arch.c)
> > > 
> > > Module signature enforcement can be set with either IMA (policy like
> > > "appraise func=MODULE_CHECK appraise_flag=check_blacklist 
> > > appraise_type=imasig|modsig" )
> > > or with CONFIG_MODULE_SIG_FORCE/module.sig_enforce=1.
> > > 
> > > Sometimes this leads to arguably unexpected interactions - for example
> > > commit fa4f3f56ccd2 ("powerpc/ima: Fix secure boot rules in ima arch
> > > policy"), so it might be interesting to see if we can make things easier
> > > to understand.
> > I suspect that is the root of the problem here. Until distributions pick
> > up IMA and properly document step by step in detail how to implement,
> > enable, and debug it the _SIG options are required for users to be able
> > to make use of signatures.
> 
> For secureboot, IMA appraisal policies are configured in kernel at boot time
> based on secureboot state of the system, refer
> arch/powerpc/kernel/ima_arch.c and security/integrity/ima/ima_efi.c. This
> doesn't require any user configuration. Yes, I agree it would be helpful to
> update kernel documentation specifying steps to sign the kernel image using
> sign-file.
> 
> > 
> > The other part is that distributions apply 'lockdown' patches that change
> > the security policy depending on secure boot status which were rejected
> > by upstream which only hook into the _SIG options, and not into the IMA_
> > options. Of course, I expect this to change when the IMA options are
> > universally available across architectures and the support picked up by
> > distributions.
> > 
> > Which brings the third point: IMA features vary across architectures,
> > and KEXEC_SIG is more common than IMA_KEXEC.
> > 
> > config/arm64/default:CONFIG_HAVE_IMA_KEXEC=y
> > config/ppc64le/default:CONFIG_HAVE_IMA_KEXEC=y
> > 
> > config/arm64/default:CONFIG_KEXEC_SIG=y
> > config/s390x/default:CONFIG_KEXEC_SIG=y
> > config/x86_64/default:CONFIG_KEXEC_SIG=y
> > 
> > KEXEC_SIG makes it much easier to get uniform features across
> > architectures.
> 
> Architectures use KEXEC_SIG vs IMA_KEXEC based on their requirement.
> IMA_KEXEC is for the kernel images signed using sign-file (appended
> signatures, not PECOFF), provides measurement along with verification, and

That's certainly not the case. S390 uses appended signatures with
KEXEC_SIG, arm64 uses PECOFF with both KEXEC_SIG and IMA_KEXEC.

> is tied to secureboot state of the system at boot time.

In distrubutions it's also the case with KEXEC_SIG, it's only upstream
where this is different. I don't know why Linux upstream has rejected
this support for KEXEC_SIG.

Anyway, sounds like the difference is that IMA provides measurement but
if you don't use it it does not makes any difference except more comlex
code.

Thanks

Michal


Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-08 Thread Michal Suchánek
Hello,

On Mon, Nov 08, 2021 at 09:18:56AM +1100, Daniel Axtens wrote:
> Michal Suchánek  writes:
> 
> > On Fri, Nov 05, 2021 at 09:55:52PM +1100, Daniel Axtens wrote:
> >> Michal Suchanek  writes:
> >> 
> >> > S390 uses appended signature for kernel but implements the check
> >> > separately from module loader.
> >> >
> >> > Support for secure boot on powerpc with appended signature is planned -
> >> > grub patches submitted upstream but not yet merged.
> >> 
> >> Power Non-Virtualised / OpenPower already supports secure boot via kexec
> >> with signature verification via IMA. I think you have now sent a
> >> follow-up series that merges some of the IMA implementation, I just
> >> wanted to make sure it was clear that we actually already have support
> >
> > So is IMA_KEXEC and KEXEC_SIG redundant?
> >
> > I see some architectures have both. I also see there is a lot of overlap
> > between the IMA framework and the KEXEC_SIG and MODULE_SIg.
> 
> 
> Mimi would be much better placed than me to answer this.
> 
> The limits of my knowledge are basically that signature verification for
> modules and kexec kernels can be enforced by IMA policies.
> 
> For example a secure booted powerpc kernel with module support will have
> the following IMA policy set at the arch level:
> 
> "appraise func=KEXEC_KERNEL_CHECK appraise_flag=check_blacklist 
> appraise_type=imasig|modsig",
> (in arch/powerpc/kernel/ima_arch.c)
> 
> Module signature enforcement can be set with either IMA (policy like
> "appraise func=MODULE_CHECK appraise_flag=check_blacklist 
> appraise_type=imasig|modsig" )
> or with CONFIG_MODULE_SIG_FORCE/module.sig_enforce=1.
> 
> Sometimes this leads to arguably unexpected interactions - for example
> commit fa4f3f56ccd2 ("powerpc/ima: Fix secure boot rules in ima arch
> policy"), so it might be interesting to see if we can make things easier
> to understand.

I suspect that is the root of the problem here. Until distributions pick
up IMA and properly document step by step in detail how to implement,
enable, and debug it the _SIG options are required for users to be able
to make use of signatures.

The other part is that distributions apply 'lockdown' patches that change
the security policy depending on secure boot status which were rejected
by upstream which only hook into the _SIG options, and not into the IMA_
options. Of course, I expect this to change when the IMA options are
universally available across architectures and the support picked up by
distributions.

Which brings the third point: IMA features vary across architectures,
and KEXEC_SIG is more common than IMA_KEXEC.

config/arm64/default:CONFIG_HAVE_IMA_KEXEC=y
config/ppc64le/default:CONFIG_HAVE_IMA_KEXEC=y

config/arm64/default:CONFIG_KEXEC_SIG=y
config/s390x/default:CONFIG_KEXEC_SIG=y
config/x86_64/default:CONFIG_KEXEC_SIG=y

KEXEC_SIG makes it much easier to get uniform features across
architectures.

Thanks

Michal


Re: [PATCH 0/3] KEXEC_SIG with appended signature

2021-11-05 Thread Michal Suchánek
On Fri, Nov 05, 2021 at 09:55:52PM +1100, Daniel Axtens wrote:
> Michal Suchanek  writes:
> 
> > S390 uses appended signature for kernel but implements the check
> > separately from module loader.
> >
> > Support for secure boot on powerpc with appended signature is planned -
> > grub patches submitted upstream but not yet merged.
> 
> Power Non-Virtualised / OpenPower already supports secure boot via kexec
> with signature verification via IMA. I think you have now sent a
> follow-up series that merges some of the IMA implementation, I just
> wanted to make sure it was clear that we actually already have support

So is IMA_KEXEC and KEXEC_SIG redundant?

I see some architectures have both. I also see there is a lot of overlap
between the IMA framework and the KEXEC_SIG and MODULE_SIg.

Thanks

Michal


Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

2021-11-02 Thread Michal Suchánek
On Thu, Jan 14, 2021 at 11:08:03PM +1000, Nicholas Piggin wrote:
> Excerpts from Michal Suchánek's message of January 14, 2021 10:40 pm:
> > On Mon, Oct 19, 2020 at 02:50:51PM +1000, Nicholas Piggin wrote:
> >> Excerpts from Nicholas Piggin's message of October 19, 2020 11:00 am:
> >> > Excerpts from Michal Suchánek's message of October 17, 2020 6:14 am:
> >> >> On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
> >> >>> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
> >> >>> > Michal Suchánek  writes:
> >> >>> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
> >> >>> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
> >> >>> >>> > Hello,
> >> >>> >>> > 
> >> >>> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 
> >> >>> >>> > ("powerpc/64s:
> >> >>> >>> > Reimplement book3s idle code in C").
> >> >>> >>> > 
> >> >>> >>> > The symptom is host locking up completely after some hours of KVM
> >> >>> >>> > workload with messages like
> >> >>> >>> > 
> >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't 
> >> >>> >>> > grab cpu 47
> >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't 
> >> >>> >>> > grab cpu 71
> >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't 
> >> >>> >>> > grab cpu 47
> >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't 
> >> >>> >>> > grab cpu 71
> >> >>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't 
> >> >>> >>> > grab cpu 47
> >> >>> >>> > 
> >> >>> >>> > printed before the host locks up.
> >> >>> >>> > 
> >> >>> >>> > The machines run sandboxed builds which is a mixed workload 
> >> >>> >>> > resulting in
> >> >>> >>> > IO/single core/mutiple core load over time and there are periods 
> >> >>> >>> > of no
> >> >>> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
> >> >>> >>> > setup/terdown is somewhat excercised as well.
> >> >>> >>> > 
> >> >>> >>> > POWER9 with the new guest entry fast path does not seem to be 
> >> >>> >>> > affected.
> >> >>> >>> > 
> >> >>> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 
> >> >>> >>> > and
> >> >>> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore 
> >> >>> >>> > IAMR
> >> >>> >>> > after idle") which gives same idle code as 5.1.16 and the kernel 
> >> >>> >>> > seems
> >> >>> >>> > stable.
> >> >>> >>> > 
> >> >>> >>> > Config is attached.
> >> >>> >>> > 
> >> >>> >>> > I cannot easily revert this commit, especially if I want to use 
> >> >>> >>> > the same
> >> >>> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are 
> >> >>> >>> > applicable
> >> >>> >>> > only to the new idle code.
> >> >>> >>> > 
> >> >>> >>> > Any idea what can be the problem?
> >> >>> >>> 
> >> >>> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
> >> >>> >>> those threads. I wonder what they are doing. POWER8 doesn't have a 
> >> >>> >>> good
> >> >>> >>> NMI IPI and I don't know if it supports pdbg dumping registers 
> >> >>> >>> from the
> >> >>> >>> BMC unfortunately.
> >> >&g

Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8

2021-11-01 Thread Michal Suchánek
On Fri, Oct 29, 2021 at 02:33:12PM +0200, John Paul Adrian Glaubitz wrote:
> Hi Nicholas!
> 
> On 10/29/21 02:41, Nicholas Piggin wrote:
> > Soft lockup should mean it's taking timer interrupts still, just not 
> > scheduling. Do you have the hard lockup detector enabled as well? Is
> > there anything stuck spinning on another CPU?
> 

> 
> > Could you try a sysrq+w to get a trace of blocked tasks?
> 
> Not sure how to send a magic sysrequest over the IPMI serial console. Any 
> idea?

As on any serial console sending break should be equivalent to the magic
sysrq key combo.

https://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/security-sysrq.html

With ipmitool break is sent by typing ~B

https://linux.die.net/man/1/ipmitool

Thanks

Michal


Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8

2021-11-01 Thread Michal Suchánek
Hello,

On Thu, Oct 28, 2021 at 04:15:19PM +0200, John Paul Adrian Glaubitz wrote:
> Hi!
> 
> On 10/28/21 16:05, John Paul Adrian Glaubitz wrote:
> > The following packages were being built at the same time:
> > 
> > - guest 1: virtuoso-opensource and openturns
> > - guest 2: llvm-toolchain-13
> > 
> > I really did a lot of testing today with no issues and just after I sent my 
> > report
> > to oss-security that the machine seems to be stable again, the issue showed 
> > up :(.
> 
> Do you know whether IPMI features any sort of monitoring for capturing the 
> output
> of the serial console non-interactively? This way I would be able to capture 
> the
> crash besides what I have seen above.

I am pretty sure you can run something like

script ipmitool

to capture output indefinitely, and the same inside screen on a remote
machine.

Thanks

Michal


Re: [PATCH v4 2/2] powerpc/64: Option to use ELF V2 ABI for big-endian kernels

2021-06-11 Thread Michal Suchánek
On Fri, Jun 11, 2021 at 11:58:19AM +0200, Michal Suchánek wrote:
> On Fri, Jun 11, 2021 at 07:39:59PM +1000, Nicholas Piggin wrote:
> > Provide an option to build big-endian kernels using the ELFv2 ABI. This
> > works on GCC only so far, although it is rumored to work with clang
> > that's not been tested yet. A new module version check ensures the
> > module ELF ABI level matches the kernel build.
> > 
> > This can give big-endian kernels some useful advantages of the ELFv2 ABI
> > (e.g., less stack usage, -mprofile-kernel, better compatibility with eBPF
> > tools).
> > 
> > BE+ELFv2 is not officially supported by the GNU toolchain, but it works
> > fine in testing and has been used by some userspace for some time (e.g.,
> > Void Linux).
> > 
> > Tested-by: Michal Suchánek 
> > Reviewed-by: Segher Boessenkool 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >  arch/powerpc/Kconfig| 22 ++
> >  arch/powerpc/Makefile   | 18 --
> >  arch/powerpc/boot/Makefile  |  4 +++-
> >  arch/powerpc/include/asm/module.h   | 24 
> >  arch/powerpc/kernel/vdso64/Makefile | 13 +
> >  drivers/crypto/vmx/Makefile |  8 ++--
> >  drivers/crypto/vmx/ppc-xlate.pl | 10 ++
> >  7 files changed, 86 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 088dd2afcfe4..093f973a28b9 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -163,6 +163,7 @@ config PPC
> > select ARCH_WEAK_RELEASE_ACQUIRE
> > select BINFMT_ELF
> > select BUILDTIME_TABLE_SORT
> > +   select PPC64_BUILD_ELF_V2_ABI   if PPC64 && CPU_LITTLE_ENDIAN
> > select CLONE_BACKWARDS
> > select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
> > select DMA_OPS_BYPASS   if PPC64
> > @@ -561,6 +562,27 @@ config KEXEC_FILE
> >  config ARCH_HAS_KEXEC_PURGATORY
> > def_bool KEXEC_FILE
> >  
> > +config PPC64_BUILD_ELF_V2_ABI
> > +   bool
> > +
> > +config PPC64_BUILD_BIG_ENDIAN_ELF_V2_ABI
> > +   bool "Build big-endian kernel using ELF ABI V2 (EXPERIMENTAL)"
> > +   depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
> > +   depends on CC_IS_GCC && LD_VERSION >= 22400
> > +   default n
> > +   select PPC64_BUILD_ELF_V2_ABI
> > +   help
> > + This builds the kernel image using the "Power Architecture 64-Bit ELF
> > + V2 ABI Specification", which has a reduced stack overhead and faster
> > + function calls. This internal kernel ABI option does not affect
> > +  userspace compatibility.
> > +
> > + The V2 ABI is standard for 64-bit little-endian, but for big-endian
> > + it is less well tested by kernel and toolchain. However some distros
> > + build userspace this way, and it can produce a functioning kernel.
> > +
> > + This requires GCC and binutils 2.24 or newer.
> > +
> >  config RELOCATABLE
> > bool "Build a relocatable kernel"
> > depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
> > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> > index 3212d076ac6a..b90b5cb799aa 100644
> > --- a/arch/powerpc/Makefile
> > +++ b/arch/powerpc/Makefile
> > @@ -91,10 +91,14 @@ endif
> >  
> >  ifdef CONFIG_PPC64
> >  ifndef CONFIG_CC_IS_CLANG
> > -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
> > -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call 
> > cc-option,-mcall-aixdesc)
> > -aflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
> > -aflags-$(CONFIG_CPU_LITTLE_ENDIAN) += -mabi=elfv2
> > +ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
> > +cflags-y   += $(call cc-option,-mabi=elfv2)
> > +aflags-y   += $(call cc-option,-mabi=elfv2)
> > +else
> > +cflags-y   += $(call cc-option,-mabi=elfv1)
> > +cflags-y   += $(call cc-option,-mcall-aixdesc)
> > +aflags-y   += $(call cc-option,-mabi=elfv1)
> > +endif
> >  endif
> >  endif
> >  
> > @@ -142,15 +146,17 @@ endif
> >  
> >  CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no)
> >  ifndef CONFIG_CC_IS_CLANG
> > -ifdef CONFIG_CPU_LITTLE_ENDIAN
> > -CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$

Re: [PATCH v4 2/2] powerpc/64: Option to use ELF V2 ABI for big-endian kernels

2021-06-11 Thread Michal Suchánek
On Fri, Jun 11, 2021 at 07:39:59PM +1000, Nicholas Piggin wrote:
> Provide an option to build big-endian kernels using the ELFv2 ABI. This
> works on GCC only so far, although it is rumored to work with clang
> that's not been tested yet. A new module version check ensures the
> module ELF ABI level matches the kernel build.
> 
> This can give big-endian kernels some useful advantages of the ELFv2 ABI
> (e.g., less stack usage, -mprofile-kernel, better compatibility with eBPF
> tools).
> 
> BE+ELFv2 is not officially supported by the GNU toolchain, but it works
> fine in testing and has been used by some userspace for some time (e.g.,
> Void Linux).
> 
> Tested-by: Michal Suchánek 
> Reviewed-by: Segher Boessenkool 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/Kconfig| 22 ++
>  arch/powerpc/Makefile   | 18 --
>  arch/powerpc/boot/Makefile  |  4 +++-
>  arch/powerpc/include/asm/module.h   | 24 
>  arch/powerpc/kernel/vdso64/Makefile | 13 +
>  drivers/crypto/vmx/Makefile |  8 ++--
>  drivers/crypto/vmx/ppc-xlate.pl | 10 ++
>  7 files changed, 86 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 088dd2afcfe4..093f973a28b9 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -163,6 +163,7 @@ config PPC
>   select ARCH_WEAK_RELEASE_ACQUIRE
>   select BINFMT_ELF
>   select BUILDTIME_TABLE_SORT
> + select PPC64_BUILD_ELF_V2_ABI   if PPC64 && CPU_LITTLE_ENDIAN
>   select CLONE_BACKWARDS
>   select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
>   select DMA_OPS_BYPASS   if PPC64
> @@ -561,6 +562,27 @@ config KEXEC_FILE
>  config ARCH_HAS_KEXEC_PURGATORY
>   def_bool KEXEC_FILE
>  
> +config PPC64_BUILD_ELF_V2_ABI
> + bool
> +
> +config PPC64_BUILD_BIG_ENDIAN_ELF_V2_ABI
> + bool "Build big-endian kernel using ELF ABI V2 (EXPERIMENTAL)"
> + depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
> + depends on CC_IS_GCC && LD_VERSION >= 22400
> + default n
> + select PPC64_BUILD_ELF_V2_ABI
> + help
> +   This builds the kernel image using the "Power Architecture 64-Bit ELF
> +   V2 ABI Specification", which has a reduced stack overhead and faster
> +   function calls. This internal kernel ABI option does not affect
> +  userspace compatibility.
> +
> +   The V2 ABI is standard for 64-bit little-endian, but for big-endian
> +   it is less well tested by kernel and toolchain. However some distros
> +   build userspace this way, and it can produce a functioning kernel.
> +
> +   This requires GCC and binutils 2.24 or newer.
> +
>  config RELOCATABLE
>   bool "Build a relocatable kernel"
>   depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 3212d076ac6a..b90b5cb799aa 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -91,10 +91,14 @@ endif
>  
>  ifdef CONFIG_PPC64
>  ifndef CONFIG_CC_IS_CLANG
> -cflags-$(CONFIG_CPU_BIG_ENDIAN)  += $(call cc-option,-mabi=elfv1)
> -cflags-$(CONFIG_CPU_BIG_ENDIAN)  += $(call 
> cc-option,-mcall-aixdesc)
> -aflags-$(CONFIG_CPU_BIG_ENDIAN)  += $(call cc-option,-mabi=elfv1)
> -aflags-$(CONFIG_CPU_LITTLE_ENDIAN)   += -mabi=elfv2
> +ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
> +cflags-y += $(call cc-option,-mabi=elfv2)
> +aflags-y += $(call cc-option,-mabi=elfv2)
> +else
> +cflags-y += $(call cc-option,-mabi=elfv1)
> +cflags-y += $(call cc-option,-mcall-aixdesc)
> +aflags-y += $(call cc-option,-mabi=elfv1)
> +endif
>  endif
>  endif
>  
> @@ -142,15 +146,17 @@ endif
>  
>  CFLAGS-$(CONFIG_PPC64)   := $(call cc-option,-mtraceback=no)
>  ifndef CONFIG_CC_IS_CLANG
> -ifdef CONFIG_CPU_LITTLE_ENDIAN
> -CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mabi=elfv2,$(call 
> cc-option,-mcall-aixdesc))
> +ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
> +CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mabi=elfv2)
>  AFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mabi=elfv2)
>  else
> +# Keep these in synch with arch/powerpc/kernel/vdso64/Makefile
>  CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mabi=elfv1)
>  CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mcall-aixdesc)
>  AFLAGS-$(CONFIG_PPC64)   += $(call cc

Re: [PATCH v3] powerpc/64: Option to use ELFv2 ABI for big-endian kernels

2021-05-14 Thread Michal Suchánek
On Wed, May 05, 2021 at 10:07:29PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > On Mon, May 03, 2021 at 01:37:57PM +0200, Andreas Schwab wrote:
> >> Should this add a tag to the module vermagic?
> >
> > Would the modues link even if the vermagic was not changed?
> 
> Most modules will require some symbols from the kernel, and those will
> be dot symbols, which won't resolve.
> 
> But there are a few small modules that don't rely on any kernel symbols,
> which can load.
> 
> > I suppose something like this might do it.
> 
> It would, but I feel like we should be handling this at the ELF level.
> ie. we don't allow loading modules with a different ELF machine type, so
> neither should we allow loading a module with the wrong ELF ABI.
> 
> And you can build the kernel without MODVERSIONS, so relying on
> MODVERSIONS still leaves a small exposure (same kernel version
> with/without ELFv2).
> 
> I don't see an existing hook that would do what we want. There's
> elf_check_arch(), but that also applies to userspace binaries, which is
> not what we want.
> 
> Maybe something like below.

The below patch works for me.

Tested-by: Michal Suchánek 

Built a Hello World module for both v1 and v2 ABI, and kernels built
with v1 and v2 ABI rejected module with the other ABI.

[  100.602943] Module has invalid ELF structures
insmod: ERROR: could not insert module moin_v1.ko: Invalid module format

Thanks

Michal
> 
> cheers
> 
> 
> diff --git a/arch/powerpc/include/asm/module.h 
> b/arch/powerpc/include/asm/module.h
> index 857d9ff24295..d0e9368982d8 100644
> --- a/arch/powerpc/include/asm/module.h
> +++ b/arch/powerpc/include/asm/module.h
> @@ -83,5 +83,28 @@ static inline int module_finalize_ftrace(struct module 
> *mod, const Elf_Shdr *sec
>  }
>  #endif
>  
> +#ifdef CONFIG_PPC64
> +static inline bool elf_check_module_arch(Elf_Ehdr *hdr)
> +{
> + unsigned long flags;
> +
> + if (!elf_check_arch(hdr))
> + return false;
> +
> + flags = hdr->e_flags & 0x3;
> +
> +#ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
> + if (flags == 2)
> + return true;
> +#else
> + if (flags < 2)
> + return true;
> +#endif
> + return false;
> +}
> +
> +#define elf_check_module_arch elf_check_module_arch
> +#endif /* CONFIG_PPC64 */
> +
>  #endif /* __KERNEL__ */
>  #endif   /* _ASM_POWERPC_MODULE_H */
> diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> index 9e09d11ffe5b..fdc042a84562 100644
> --- a/include/linux/moduleloader.h
> +++ b/include/linux/moduleloader.h
> @@ -13,6 +13,11 @@
>   * must be implemented by each architecture.
>   */
>  
> +// Allow arch to optionally do additional checking of module ELF header
> +#ifndef elf_check_module_arch
> +#define elf_check_module_arch elf_check_arch
> +#endif
> +
>  /* Adjust arch-specific sections.  Return 0 on success.  */
>  int module_frob_arch_sections(Elf_Ehdr *hdr,
> Elf_Shdr *sechdrs,
> diff --git a/kernel/module.c b/kernel/module.c
> index b5dd92e35b02..c71889107226 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2941,7 +2941,7 @@ static int elf_validity_check(struct load_info *info)
>  
>   if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0
>   || info->hdr->e_type != ET_REL
> - || !elf_check_arch(info->hdr)
> + || !elf_check_module_arch(info->hdr)
>   || info->hdr->e_shentsize != sizeof(Elf_Shdr))
>   return -ENOEXEC;
>  


  1   2   3   4   >