Re: [PATCH] powerpc/ftrace: Handle large kernel configs

2022-01-06 Thread Naveen N. Rao

Hi Christophe,
Sorry for the delay, catching up with some of the earlier emails now..


Christophe Leroy wrote:

Hi Naveen,

Le 16/10/2018 à 22:25, Naveen N. Rao a écrit :
...


+/*
+ * If this is a compiler generated long_branch trampoline (essentially, a
+ * trampoline that has a branch to _mcount()), we re-write the branch to
+ * instead go to ftrace_[regs_]caller() and note down the location of this
+ * trampoline.
+ */
+static int setup_mcount_compiler_tramp(unsigned long tramp)
+{
+   int i, op;
+   unsigned long ptr;
+   static unsigned long ftrace_plt_tramps[NUM_FTRACE_TRAMPS];
+
+   /* Is this a known long jump tramp? */
+   for (i = 0; i < NUM_FTRACE_TRAMPS; i++)
+   if (!ftrace_tramps[i])
+   break;
+   else if (ftrace_tramps[i] == tramp)
+   return 0;
+
+   /* Is this a known plt tramp? */
+   for (i = 0; i < NUM_FTRACE_TRAMPS; i++)
+   if (!ftrace_plt_tramps[i])
+   break;
+   else if (ftrace_plt_tramps[i] == tramp)
+   return -1;


I don't understand how this is supposed to work.
ftrace_plt_tramps[] being a static table, it is set to 0s at startup.
So the above loop breaks at first round.

Then ftrace_plt_tramps[i] is never/nowhere set.

So I just see it as useless.

Am I missing something ?


No, that's correct. I had posted a cleanup of this a year back as part 
of the ftrace_direct enablement. I have updated that series and will be 
posting it out soon (though I should rebase it atop your livepatch 
series):

http://lkml.kernel.org/r/bdc3710137c4bda8393532a789558bed22507cfe.1606412433.git.naveen.n@linux.vnet.ibm.com


- Naveen


Re: [PATCH] fs: btrfs: Disable BTRFS on platforms having 256K pages

2022-01-06 Thread Michael Ellerman
Qu Wenruo  writes:
> On 2022/1/7 00:31, Neal Gompa wrote:
>> On Wed, Jan 5, 2022 at 7:05 AM Qu Wenruo  wrote:
>>>
>>> Hi Christophe,
>>>
>>> I'm recently enhancing the subpage support for btrfs, and my current
>>> branch should solve the problem for btrfs to support larger page sizes.
>>>
>>> But unfortunately my current test environment can only provide page size
>>> with 64K or 4K, no 16K or 128K/256K support.
>>>
>>> Mind to test my new branch on 128K page size systems?
>>> (256K page size support is still lacking though, which will be addressed
>>> in the future)
>>>
>>> https://github.com/adam900710/linux/tree/metadata_subpage_switch
>>>
>>
>> The Linux Asahi folks have a 16K page environment (M1 Macs)...
>
> Su Yue kindly helped me testing 16K page size, and it's pretty OK there.
>
> So I'm not that concerned.
>
> It's 128K page size that I'm a little concerned, and I have not machine
> supporting that large page size to do the test.

Did Christophe say he had a 128K system to test on?

In mainline powerpc only supports 4K/16K/64K/256K.

AFAIK there's no arch with 128K page size support, but that's only based
on some grepping, maybe it's hidden somewhere.

cheers


Re: [PATCH] fs: btrfs: Disable BTRFS on platforms having 256K pages

2022-01-06 Thread Hector Martin
On 2022/01/07 9:13, Qu Wenruo wrote:
> 
> 
> On 2022/1/7 00:31, Neal Gompa wrote:
>> On Wed, Jan 5, 2022 at 7:05 AM Qu Wenruo  wrote:
>>>
>>> Hi Christophe,
>>>
>>> I'm recently enhancing the subpage support for btrfs, and my current
>>> branch should solve the problem for btrfs to support larger page sizes.
>>>
>>> But unfortunately my current test environment can only provide page size
>>> with 64K or 4K, no 16K or 128K/256K support.
>>>
>>> Mind to test my new branch on 128K page size systems?
>>> (256K page size support is still lacking though, which will be addressed
>>> in the future)
>>>
>>> https://github.com/adam900710/linux/tree/metadata_subpage_switch
>>>
>>
>> The Linux Asahi folks have a 16K page environment (M1 Macs)...
> 
> Su Yue kindly helped me testing 16K page size, and it's pretty OK there.
> 
> So I'm not that concerned.
> 
> It's 128K page size that I'm a little concerned, and I have not machine
> supporting that large page size to do the test.
> 
> Thanks,
> Qu

I'm happy to test things on 16K in the future if you need me to :-)

-- 
Hector Martin (mar...@marcan.st)
Public Key: https://mrcn.st/pub


[powerpc:merge] BUILD SUCCESS 80fdcf45da5a677ff7f284210b0f86f443a836a0

2022-01-06 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 80fdcf45da5a677ff7f284210b0f86f443a836a0  Automatic merge of 
'master' into merge (2021-12-29 09:33)

elapsed time: 734m

configs tested: 89
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm  allyesconfig
arm  allmodconfig
arm64   defconfig
arm64allyesconfig
armlart_defconfig
arm  iop32x_defconfig
powerpc wii_defconfig
powerpc  chrp32_defconfig
arcvdk_hs38_smp_defconfig
sh   secureedge5410_defconfig
arm   viper_defconfig
arm   sunxi_defconfig
arm  randconfig-c002-20220106
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
nios2allyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allmodconfig
s390defconfig
parisc   allyesconfig
s390 allyesconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
i386   debian-10.3-kselftests
i386  debian-10.3
mips allyesconfig
mips allmodconfig
powerpc   allnoconfig
powerpc  allmodconfig
powerpc  allyesconfig
x86_64   randconfig-a012-20220106
x86_64   randconfig-a014-20220106
x86_64   randconfig-a013-20220106
x86_64   randconfig-a011-20220106
x86_64   randconfig-a016-20220106
x86_64   randconfig-a015-20220106
i386 randconfig-a012-20220106
i386 randconfig-a014-20220106
i386 randconfig-a011-20220106
i386 randconfig-a013-20220106
i386 randconfig-a016-20220106
i386 randconfig-a015-20220106
arc  randconfig-r043-20220106
riscvrandconfig-r042-20220106
s390 randconfig-r044-20220106
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
um   x86_64_defconfig
um i386_defconfig
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec
x86_64   allyesconfig
x86_64  rhel-8.3-func
x86_64rhel-8.3-kselftests

clang tested configs:
arm  colibri_pxa300_defconfig
mips   lemote2f_defconfig
powerpc xes_mpc85xx_defconfig
i386 randconfig-a003-20220106
i386 randconfig-a002-20220106
i386 randconfig-a001-20220106
i386 randconfig-a004-20220106
i386 randconfig-a006-20220106
i386 randconfig-a005-20220106
hexagon  randconfig-r041-20220106
hexagon  randconfig-r045-20220106

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Re: [PATCH 2/2] soc: fsl: guts: Add a missing memory allocation failure check

2022-01-06 Thread Li Yang
On Thu, Nov 4, 2021 at 4:10 AM Christophe JAILLET
 wrote:
>
> If 'devm_kstrdup()' fails, we should return -ENOMEM.
>
> While at it, move the 'of_node_put()' call in the error handling path and
> after the 'machine' has been copied.
> Better safe than sorry.
>
> Suggested-by: Tyrel Datwyler 
> Signed-off-by: Christophe JAILLET 

Applied with Fixes tag and Depends-on tag added.  Thanks.

> ---
> Not sure of which Fixes tag to add. Should be a6fc3b698130, but since
> another commit needs to be reverted for this patch to make sense, I'm
> unsure of what to do. :(
> So, none is given.
> ---
>  drivers/soc/fsl/guts.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> index af7741eafc57..5ed2fc1c53a0 100644
> --- a/drivers/soc/fsl/guts.c
> +++ b/drivers/soc/fsl/guts.c
> @@ -158,9 +158,14 @@ static int fsl_guts_probe(struct platform_device *pdev)
> root = of_find_node_by_path("/");
> if (of_property_read_string(root, "model", ))
> of_property_read_string_index(root, "compatible", 0, 
> );
> -   of_node_put(root);
> -   if (machine)
> +   if (machine) {
> soc_dev_attr.machine = devm_kstrdup(dev, machine, GFP_KERNEL);
> +   if (!soc_dev_attr.machine) {
> +   of_node_put(root);
> +   return -ENOMEM;
> +   }
> +   }
> +   of_node_put(root);
>
> svr = fsl_guts_get_svr();
> soc_die = fsl_soc_die_match(svr, fsl_soc_die);
> --
> 2.30.2
>


Re: [PATCH 1/2] soc: fsl: guts: Revert commit 3c0d64e867ed

2022-01-06 Thread Li Yang
On Thu, Nov 4, 2021 at 4:09 AM Christophe JAILLET
 wrote:
>
> This reverts commit 3c0d64e867ed
> ("soc: fsl: guts: reuse machine name from device tree").
>
> A following patch will fix the missing memory allocation failure check
> instead.
>
> Suggested-by: Tyrel Datwyler 
> Signed-off-by: Christophe JAILLET 

Applied for next.  Thanks.

> ---
> This is a follow-up of discussion in:
> https://lore.kernel.org/kernel-janitors/b12e8c5c5d6ab3061d9504de8fbaefcad6bbc385.1629321668.git.christophe.jail...@wanadoo.fr/
> ---
>  drivers/soc/fsl/guts.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> index 072473a16f4d..af7741eafc57 100644
> --- a/drivers/soc/fsl/guts.c
> +++ b/drivers/soc/fsl/guts.c
> @@ -28,7 +28,6 @@ struct fsl_soc_die_attr {
>  static struct guts *guts;
>  static struct soc_device_attribute soc_dev_attr;
>  static struct soc_device *soc_dev;
> -static struct device_node *root;
>
>
>  /* SoC die attribute definition for QorIQ platform */
> @@ -138,7 +137,7 @@ static u32 fsl_guts_get_svr(void)
>
>  static int fsl_guts_probe(struct platform_device *pdev)
>  {
> -   struct device_node *np = pdev->dev.of_node;
> +   struct device_node *root, *np = pdev->dev.of_node;
> struct device *dev = >dev;
> const struct fsl_soc_die_attr *soc_die;
> const char *machine;
> @@ -159,8 +158,9 @@ static int fsl_guts_probe(struct platform_device *pdev)
> root = of_find_node_by_path("/");
> if (of_property_read_string(root, "model", ))
> of_property_read_string_index(root, "compatible", 0, 
> );
> +   of_node_put(root);
> if (machine)
> -   soc_dev_attr.machine = machine;
> +   soc_dev_attr.machine = devm_kstrdup(dev, machine, GFP_KERNEL);
>
> svr = fsl_guts_get_svr();
> soc_die = fsl_soc_die_match(svr, fsl_soc_die);
> @@ -195,7 +195,6 @@ static int fsl_guts_probe(struct platform_device *pdev)
>  static int fsl_guts_remove(struct platform_device *dev)
>  {
> soc_device_unregister(soc_dev);
> -   of_node_put(root);
> return 0;
>  }
>
> --
> 2.30.2
>


[powerpc:next] BUILD SUCCESS f1aa0e47c29268776205698f2453dc07fab49855

2022-01-06 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: f1aa0e47c29268776205698f2453dc07fab49855  powerpc/xmon: Dump XIVE 
information for online-only processors.

elapsed time: 721m

configs tested: 122
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm  allyesconfig
arm  allmodconfig
arm64   defconfig
i386 randconfig-c001-20220106
s390   zfcpdump_defconfig
arm eseries_pxa_defconfig
powerpcadder875_defconfig
shsh7785lcr_defconfig
powerpcklondike_defconfig
m68k  atari_defconfig
m68k  multi_defconfig
shsh7763rdp_defconfig
shshmin_defconfig
sh   se7705_defconfig
powerpc  tqm8xx_defconfig
sh   se7206_defconfig
powerpc64alldefconfig
arc  axs101_defconfig
powerpc taishan_defconfig
shedosk7705_defconfig
sh  r7780mp_defconfig
mips  loongson3_defconfig
ia64  gensparse_defconfig
openriscor1ksim_defconfig
xtensa  cadence_csp_defconfig
sh ap325rxa_defconfig
powerpc  ep88xc_defconfig
m68km5407c3_defconfig
mips cobalt_defconfig
sh   se7721_defconfig
mipsgpr_defconfig
arm  randconfig-c002-20220106
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
i386   debian-10.3-kselftests
i386  debian-10.3
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a012-20220106
x86_64   randconfig-a015-20220106
x86_64   randconfig-a014-20220106
x86_64   randconfig-a013-20220106
x86_64   randconfig-a011-20220106
x86_64   randconfig-a016-20220106
i386 randconfig-a012-20220106
i386 randconfig-a016-20220106
i386 randconfig-a014-20220106
i386 randconfig-a015-20220106
i386 randconfig-a011-20220106
i386 randconfig-a013-20220106
s390 randconfig-r044-20220106
arc  randconfig-r043-20220106
riscvrandconfig-r042-20220106
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64rhel-8.3-kselftests
um   x86_64_defconfig
um i386_defconfig
x86_64   allyesconfig
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-func
x86_64  kexec

clang tested configs:
mips randconfig-c004-20220106
arm  randconfig

Re: [PATCH] soc: fsl: qe: fix typo in a comment

2022-01-06 Thread Li Yang
On Sat, Dec 11, 2021 at 5:12 PM Jason Wang  wrote:
>
> The double `is' in the comment in line 150 is repeated. Remove one
> of them from the comment.

Looks like you also removed a redundant tab in a new line.  We
probably can squeeze this trivial cleanup in, but we need to mention
it.

>
> Signed-off-by: Jason Wang 

Applied for next with commit message updated.

> ---
>  drivers/soc/fsl/qe/qe.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
> index 4d38c80f8be8..b3c226eb5292 100644
> --- a/drivers/soc/fsl/qe/qe.c
> +++ b/drivers/soc/fsl/qe/qe.c
> @@ -147,7 +147,7 @@ EXPORT_SYMBOL(qe_issue_cmd);
>   * memory mapped space.
>   * The BRG clock is the QE clock divided by 2.
>   * It was set up long ago during the initial boot phase and is
> - * is given to us.
> + * given to us.
>   * Baud rate clocks are zero-based in the driver code (as that maps
>   * to port numbers). Documentation uses 1-based numbering.
>   */
> @@ -421,7 +421,7 @@ static void qe_upload_microcode(const void *base,
>
> for (i = 0; i < be32_to_cpu(ucode->count); i++)
> iowrite32be(be32_to_cpu(code[i]), _immr->iram.idata);
> -
> +
> /* Set I-RAM Ready Register */
> iowrite32be(QE_IRAM_READY, _immr->iram.iready);
>  }
> --
> 2.34.1
>


Re: [PATCH] soc: fsl: qe: Check of ioremap return value

2022-01-06 Thread Li Yang
On Thu, Dec 30, 2021 at 9:47 AM Jiasheng Jiang  wrote:
>
> As the possible failure of the ioremap(), the par_io could be NULL.
> Therefore it should be better to check it and return error in order to
> guarantee the success of the initiation.
> But, I also notice that all the caller like mpc85xx_qe_par_io_init() in
> `arch/powerpc/platforms/85xx/common.c` don't check the return value of
> the par_io_init().
> Actually, par_io_init() needs to check to handle the potential error.
> I will submit another patch to fix that.
> Anyway, par_io_init() itsely should be fixed.
>
> Fixes: 7aa1aa6ecec2 ("QE: Move QE from arch/powerpc to drivers/soc")
> Signed-off-by: Jiasheng Jiang 

Applied for next.  Thanks.

> ---
>  drivers/soc/fsl/qe/qe_io.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/soc/fsl/qe/qe_io.c b/drivers/soc/fsl/qe/qe_io.c
> index e277c827bdf3..a5e2d0e5ab51 100644
> --- a/drivers/soc/fsl/qe/qe_io.c
> +++ b/drivers/soc/fsl/qe/qe_io.c
> @@ -35,6 +35,8 @@ int par_io_init(struct device_node *np)
> if (ret)
> return ret;
> par_io = ioremap(res.start, resource_size());
> +   if (!par_io)
> +   return -ENOMEM;
>
> if (!of_property_read_u32(np, "num-ports", _ports))
> num_par_io_ports = num_ports;
> --
> 2.25.1
>


Re: [PATCH v2 6/7] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-06 Thread Alexey Kardashevskiy




On 07/01/2022 07:03, Fabiano Rosas wrote:

If MMIO emulation fails we don't want to crash the whole guest by
returning to userspace.

The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
implementation") added a todo:

   /* XXX Deliver Program interrupt to guest. */

and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
emulation from priv emulation") added the Program interrupt injection
but in another file, so I'm assuming it was missed that this block
needed to be altered.

Signed-off-by: Fabiano Rosas 



Looks right.
Reviewed-by: Alexey Kardashevskiy 

but this means if I want to keep debugging those kvm selftests in 
comfort, I'll have to have some exception handlers in the vm as 
otherwise the failing $pc is lost after this change :)



---
  arch/powerpc/kvm/powerpc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a2e78229d645..50e08635e18a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
kvmppc_core_queue_program(vcpu, 0);
pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
-   r = RESUME_HOST;
+   r = RESUME_GUEST;
break;
}
default:


--
Alexey


Re: [PATCH v2 3/7] KVM: PPC: Fix mmio length message

2022-01-06 Thread Alexey Kardashevskiy




On 07/01/2022 07:03, Fabiano Rosas wrote:

We check against 'bytes' but print 'run->mmio.len' which at that point
has an old value.

e.g. 16-byte load:

before:
__kvmppc_handle_load: bad MMIO length: 8

now:
__kvmppc_handle_load: bad MMIO length: 16

Signed-off-by: Fabiano Rosas 
---
  arch/powerpc/kvm/powerpc.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 92e552ab5a77..0b0818d032e1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1246,7 +1246,7 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
  
  	if (bytes > sizeof(run->mmio.data)) {

printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);



"return EMULATE_FAIL;" here and below as there is really no point in 
trashing kvm_run::mmio (not much harm too but still) and this code does 
not handle more than 8 bytes anyway.





}
  
  	run->mmio.phys_addr = vcpu->arch.paddr_accessed;

@@ -1335,7 +1335,7 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
  
  	if (bytes > sizeof(run->mmio.data)) {

printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);
}
  
  	run->mmio.phys_addr = vcpu->arch.paddr_accessed;


--
Alexey


Re: [PATCH] fs: btrfs: Disable BTRFS on platforms having 256K pages

2022-01-06 Thread Qu Wenruo




On 2022/1/7 00:31, Neal Gompa wrote:

On Wed, Jan 5, 2022 at 7:05 AM Qu Wenruo  wrote:


Hi Christophe,

I'm recently enhancing the subpage support for btrfs, and my current
branch should solve the problem for btrfs to support larger page sizes.

But unfortunately my current test environment can only provide page size
with 64K or 4K, no 16K or 128K/256K support.

Mind to test my new branch on 128K page size systems?
(256K page size support is still lacking though, which will be addressed
in the future)

https://github.com/adam900710/linux/tree/metadata_subpage_switch



The Linux Asahi folks have a 16K page environment (M1 Macs)...


Su Yue kindly helped me testing 16K page size, and it's pretty OK there.

So I'm not that concerned.

It's 128K page size that I'm a little concerned, and I have not machine
supporting that large page size to do the test.

Thanks,
Qu



Hector, could you look at it too?





Re: [PATCH v4] powerpc/pseries: read the lpar name from the firmware

2022-01-06 Thread Michael Ellerman
Laurent Dufour  writes:
> Happy New Year, Michael!
>
> Do you consider taking that patch soon?

I did but I was hoping you and Nathan could come to an agreement.

Looks like you did while I was sleeping, perfect :)

I'll pick up v5.

cheers


> On 07/12/2021, 18:11:09, Laurent Dufour wrote:
>> The LPAR name may be changed after the LPAR has been started in the HMC.
>> In that case lparstat command is not reporting the updated value because it
>> reads it from the device tree which is read at boot time.
>> 
>> However this value could be read from RTAS.
>> 
>> Adding this value in the /proc/powerpc/lparcfg output allows to read the
>> updated value.
>> 
>> Cc: Nathan Lynch 
>> Signed-off-by: Laurent Dufour 
>> ---
>> v4:
>>  address Nathan's new comments limiting size of the buffer.
>> v3:
>>  address Michael's comments.
>> v2:
>>  address Nathan's comments.
>>  change title to partition_name aligning with existing partition_id
>> ---
>>  arch/powerpc/platforms/pseries/lparcfg.c | 54 
>>  1 file changed, 54 insertions(+)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
>> b/arch/powerpc/platforms/pseries/lparcfg.c
>> index f71eac74ea92..058d9a5fe545 100644
>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>> @@ -311,6 +311,59 @@ static void parse_mpp_x_data(struct seq_file *m)
>>  seq_printf(m, "coalesce_pool_spurr=%ld\n", 
>> mpp_x_data.pool_spurr_cycles);
>>  }
>>  
>> +/*
>> + * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 
>> to
>> + * read the LPAR name, and the largest output data to 4000 + 2 bytes length.
>> + */
>> +#define SPLPAR_LPAR_NAME_TOKEN  55
>> +#define GET_SYS_PARM_BUF_SIZE   4002
>> +#if GET_SYS_PARM_BUF_SIZE > RTAS_DATA_BUF_SIZE
>> +#error "GET_SYS_PARM_BUF_SIZE is larger than RTAS_DATA_BUF_SIZE"
>> +#endif
>> +static void read_lpar_name(struct seq_file *m)
>> +{
>> +int rc, len, token;
>> +union {
>> +char raw_buffer[GET_SYS_PARM_BUF_SIZE];
>> +struct {
>> +__be16 len;
>> +char name[GET_SYS_PARM_BUF_SIZE-2];
>> +};
>> +} *local_buffer;
>> +
>> +token = rtas_token("ibm,get-system-parameter");
>> +if (token == RTAS_UNKNOWN_SERVICE)
>> +return;
>> +
>> +local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL);
>> +if (!local_buffer)
>> +return;
>> +
>> +do {
>> +spin_lock(_data_buf_lock);
>> +memset(rtas_data_buf, 0, sizeof(*local_buffer));
>> +rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN,
>> +   __pa(rtas_data_buf), sizeof(*local_buffer));
>> +if (!rc)
>> +memcpy(local_buffer->raw_buffer, rtas_data_buf,
>> +   sizeof(local_buffer->raw_buffer));
>> +spin_unlock(_data_buf_lock);
>> +} while (rtas_busy_delay(rc));
>> +
>> +if (!rc) {
>> +/* Force end of string */
>> +len = min((int) be16_to_cpu(local_buffer->len),
>> +  (int) sizeof(local_buffer->name)-1);
>> +local_buffer->name[len] = '\0';
>> +
>> +seq_printf(m, "partition_name=%s\n", local_buffer->name);
>> +} else
>> +pr_err_once("Error calling get-system-parameter (0x%x)\n", rc);
>> +
>> +kfree(local_buffer);
>> +}
>> +
>> +
>>  #define SPLPAR_CHARACTERISTICS_TOKEN 20
>>  #define SPLPAR_MAXLENGTH 1026*(sizeof(char))
>>  
>> @@ -496,6 +549,7 @@ static int pseries_lparcfg_data(struct seq_file *m, void 
>> *v)
>>  
>>  if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
>>  /* this call handles the ibm,get-system-parameter contents */
>> +read_lpar_name(m);
>>  parse_system_parameter_string(m);
>>  parse_ppp_data(m);
>>  parse_mpp_data(m);


Re: [5.16.0-rc5][ppc][net] kernel oops when hotplug remove of vNIC interface

2022-01-06 Thread Sukadev Bhattiprolu
Michael Ellerman [m...@ellerman.id.au] wrote:
> Jakub Kicinski  writes:
> > On Wed, 5 Jan 2022 13:56:53 +0530 Abdul Haleem wrote:
> >> Greeting's
> >> 
> >> Mainline kernel 5.16.0-rc5 panics when DLPAR ADD of vNIC device on my 
> >> Powerpc LPAR
> >> 
> >> Perform below dlpar commands in a loop from linux OS
> >> 
> >> drmgr -r -c slot -s U9080.HEX.134C488-V1-C3 -w 5 -d 1
> >> drmgr -a -c slot -s U9080.HEX.134C488-V1-C3 -w 5 -d 1
> >> 
> >> after 7th iteration, the kernel panics with below messages
> >> 
> >> console messages:
> >> [102056] ibmvnic 3003 env3: Sending CRQ: 801e00086400 
> >> 0060
> >>  ibmvnic 3003 env3: Handling CRQ: 809e0008 
> >> 
> >> [102056] ibmvnic 3003 env3: Disabling tx_scrq[0] irq
> >> [102056] ibmvnic 3003 env3: Disabling tx_scrq[1] irq
> >> [102056] ibmvnic 3003 env3: Disabling rx_scrq[0] irq
> >> [102056] ibmvnic 3003 env3: Disabling rx_scrq[1] irq
> >> [102056] ibmvnic 3003 env3: Disabling rx_scrq[2] irq
> >> [102056] ibmvnic 3003 env3: Disabling rx_scrq[3] irq
> >> [102056] ibmvnic 3003 env3: Disabling rx_scrq[4] irq
> >> [102056] ibmvnic 3003 env3: Disabling rx_scrq[5] irq
> >> [102056] ibmvnic 3003 env3: Disabling rx_scrq[6] irq
> >> [102056] ibmvnic 3003 env3: Disabling rx_scrq[7] irq
> >> [102056] ibmvnic 3003 env3: Replenished 8 pools
> >> Kernel attempted to read user page (10) - exploit attempt? (uid: 0)
> >> BUG: Kernel NULL pointer dereference on read at 0x0010
> >> Faulting instruction address: 0xc0a3c840
> >> Oops: Kernel access of bad area, sig: 11 [#1]
> >> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> >> Modules linked in: bridge stp llc ib_core rpadlpar_io rpaphp nfnetlink 
> >> tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag 
> >> bonding rfkill ibmvnic sunrpc pseries_rng xts vmx_crypto gf128mul 
> >> sch_fq_codel binfmt_misc ip_tables ext4 mbcache jbd2 dm_service_time 
> >> sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror 
> >> dm_region_hash dm_log dm_mod fuse
> >> CPU: 9 PID: 102056 Comm: kworker/9:2 Kdump: loaded Not tainted 
> >> 5.16.0-rc5-autotest-g6441998e2e37 #1
> >> Workqueue: events_long __ibmvnic_reset [ibmvnic]
> >> NIP:  c0a3c840 LR: c008029b5378 CTR: c0a3c820
> >> REGS: c000548e37e0 TRAP: 0300   Not tainted 
> >> (5.16.0-rc5-autotest-g6441998e2e37)
> >> MSR:  80009033   CR: 28248484  XER: 0004
> >> CFAR: c008029bdd24 DAR: 0010 DSISR: 4000 IRQMASK: 0
> >> GPR00: c008029b55d0 c000548e3a80 c28f0200 
> >> GPR04: c00c7d1a7e00 fff6 0027 c00c7d1a7e08
> >> GPR08: 0023  0010 c008029bdd10
> >> GPR12: c0a3c820 c00c7fca6680  c00133016bf8
> >> GPR16: 03fe 1000 0002 0008
> >> GPR20: c00133016eb0   0003
> >> GPR24: c00133016000 c00133017168 2000 c00133016a00
> >> GPR28: 0006 c00133016a00 0001 c00133016000
> >> NIP [c0a3c840] napi_enable+0x20/0xc0
> >> LR [c008029b5378] __ibmvnic_open+0xf0/0x430 [ibmvnic]
> >> Call Trace:
> >> [c000548e3a80] [0006] 0x6 (unreliable)
> >> [c000548e3ab0] [c008029b55d0] __ibmvnic_open+0x348/0x430 [ibmvnic]
> >> [c000548e3b40] [c008029bcc28] __ibmvnic_reset+0x500/0xdf0 [ibmvnic]
> >> [c000548e3c60] [c0176228] process_one_work+0x288/0x570
> >> [c000548e3d00] [c0176588] worker_thread+0x78/0x660
> >> [c000548e3da0] [c01822f0] kthread+0x1c0/0x1d0
> >> [c000548e3e10] [c000cf64] ret_from_kernel_thread+0x5c/0x64
> >> Instruction dump:
> >> 7d2948f8 792307e0 4e800020 6000 3c4c01eb 384239e0 f821ffd1 39430010
> >> 38a0fff6 e92d1100 f9210028 3920  f9010020 6042 e9210020
> >> ---[ end trace 5f8033b08fd27706 ]---
> >> radix-mmu: Page sizes from device-tree:
> >> 
> >> the fault instruction points to
> >> 
> >> [root@ltcden11-lp1 boot]# gdb -batch 
> >> vmlinuz-5.16.0-rc5-autotest-g6441998e2e37 -ex 'list *(0xc0a3c840)'
> >> 0xc0a3c840 is in napi_enable (net/core/dev.c:6966).
> >> 6961    void napi_enable(struct napi_struct *n)
> >> 6962    {
> >> 6963        unsigned long val, new;
> >> 6964
> >> 6965        do {
> >> 6966            val = READ_ONCE(n->state);
> >
> > If n is NULL here that's gotta be a driver problem.
> 
> Definitely looks like it, the disassembly is:
> 
>   not r9,r9
>   clrldi  r3,r9,63
>   blr # end of previous function
>   nop
>   addis   r2,r12,491  # function entry
>   addir2,r2,14816
>   stdur1,-48(r1)  # stack frame creation
>   li  r5,-10
>   ld  r9,4352(r13)
>   std r9,40(r1)
>   li  r9,0
>   ld  r8,16(r3)   # load from r3 

Re: [PATCH] fs: btrfs: Disable BTRFS on platforms having 256K pages

2022-01-06 Thread Neal Gompa
On Wed, Jan 5, 2022 at 7:05 AM Qu Wenruo  wrote:
>
> Hi Christophe,
>
> I'm recently enhancing the subpage support for btrfs, and my current
> branch should solve the problem for btrfs to support larger page sizes.
>
> But unfortunately my current test environment can only provide page size
> with 64K or 4K, no 16K or 128K/256K support.
>
> Mind to test my new branch on 128K page size systems?
> (256K page size support is still lacking though, which will be addressed
> in the future)
>
> https://github.com/adam900710/linux/tree/metadata_subpage_switch
>

The Linux Asahi folks have a 16K page environment (M1 Macs)...

Hector, could you look at it too?



-- 
真実はいつも一つ!/ Always, there's only one truth!


Re: [PATCH v5] powerpc/pseries: read the lpar name from the firmware

2022-01-06 Thread Nathan Lynch
Laurent Dufour  writes:
> The LPAR name may be changed after the LPAR has been started in the HMC.
> In that case lparstat command is not reporting the updated value because it
> reads it from the device tree which is read at boot time.
>
> However this value could be read from RTAS.
>
> Adding this value in the /proc/powerpc/lparcfg output allows to read the
> updated value.
>
> However the hypervisor, like Qemu/KVM, may not support this RTAS
> parameter. In that case the value reported in lparcfg is read from the
> device tree and so is not updated accordingly.
>
> Cc: Nathan Lynch 
> Signed-off-by: Laurent Dufour 
> ---
> v5:
>  fallback to the device tree value if RTAS is not providing the value.
> v4:
>  address Nathan's new comments limiting size of the buffer.
> v3:
>  address Michael's comments.
> v2:
>  address Nathan's comments.
>  change title to partition_name aligning with existing partition_id

Thanks Laurent.

Reviewed-by: Nathan Lynch 


Re: [PATCH V2 1/2] tools/perf: Include global and local variants for p_stage_cyc sort key

2022-01-06 Thread Arnaldo Carvalho de Melo
Em Thu, Jan 06, 2022 at 04:21:05PM +0530, Athira Rajeev escreveu:
> 
> 
> > On 08-Dec-2021, at 9:21 AM, Nageswara Sastry  wrote:
> > 
> > 
> > 
> > On 07/12/21 8:22 pm, Arnaldo Carvalho de Melo wrote:
> >> Em Fri, Dec 03, 2021 at 07:50:37AM +0530, Athira Rajeev escreveu:
> >>> Sort key p_stage_cyc is used to present the latency
> >>> cycles spend in pipeline stages. perf tool has local
> >>> p_stage_cyc sort key to display this info. There is no
> >>> global variant available for this sort key. local variant
> >>> shows latency in a sinlge sample, whereas, global value
> >>> will be useful to present the total latency (sum of
> >>> latencies) in the hist entry. It represents latency
> >>> number multiplied by the number of samples.
> >>> 
> >>> Add global (p_stage_cyc) and local variant
> >>> (local_p_stage_cyc) for this sort key. Use the
> >>> local_p_stage_cyc as default option for "mem" sort mode.
> >>> Also add this to list of dynamic sort keys and made the
> >>> "dynamic_headers" and "arch_specific_sort_keys" as static.
> >>> 
> >>> Signed-off-by: Athira Rajeev 
> >>> Reported-by: Namhyung Kim 
> >> I got this for v1, does it stand for v2?
> >> Tested-by: Nageswara R Sastry 
> > 
> > 
> > Tested with v2 also.
> 
> Hi Arnaldo,
> 
> If this patchset looks fine to you, can you please consider pulling it.

Thanks, applied to perf/core, for 5.17.

- Arnaldo


[PATCH v2 7/7] KVM: PPC: mmio: Reject instructions that access more than mmio.data size

2022-01-06 Thread Fabiano Rosas
The MMIO interface between the kernel and userspace uses a structure
that supports a maximum of 8-bytes of data. Instructions that access
more than that need to be emulated in parts.

We currently don't have generic support for splitting the emulation in
parts and each set of instructions needs to be explicitly included.

There's already an error message being printed when a load or store
exceeds the mmio.data buffer but we don't fail the emulation until
later at kvmppc_complete_mmio_load and even then we allow userspace to
make a partial copy of the data, which ends up overwriting some fields
of the mmio structure.

This patch makes the emulation fail earlier at kvmppc_handle_load|store,
which will send a Program interrupt to the guest. This is better than
allowing the guest to proceed with partial data.

Note that this was caught in a somewhat artificial scenario using
quadword instructions (lq/stq), there's no account of an actual guest
in the wild running instructions that are not properly emulated.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 50e08635e18a..a1643ca988e0 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1247,6 +1247,7 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
   bytes);
+   return EMULATE_FAIL;
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
@@ -1336,6 +1337,7 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
   bytes);
+   return EMULATE_FAIL;
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
-- 
2.33.1



[PATCH v2 6/7] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-06 Thread Fabiano Rosas
If MMIO emulation fails we don't want to crash the whole guest by
returning to userspace.

The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
implementation") added a todo:

  /* XXX Deliver Program interrupt to guest. */

and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
emulation from priv emulation") added the Program interrupt injection
but in another file, so I'm assuming it was missed that this block
needed to be altered.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a2e78229d645..50e08635e18a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
kvmppc_core_queue_program(vcpu, 0);
pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
-   r = RESUME_HOST;
+   r = RESUME_GUEST;
break;
}
default:
-- 
2.33.1



[PATCH v2 5/7] KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio

2022-01-06 Thread Fabiano Rosas
If MMIO emulation fails, we queue a Program interrupt to the
guest. Move that line up into kvmppc_emulate_mmio, which is where we
set RESUME_GUEST/HOST.

No functional change, just separation of responsibilities.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/emulate_loadstore.c | 4 +---
 arch/powerpc/kvm/powerpc.c   | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index 48272a9b9c30..ef50e8cfd988 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -355,10 +355,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
}
}
 
-   if (emulated == EMULATE_FAIL) {
+   if (emulated == EMULATE_FAIL)
advance = 0;
-   kvmppc_core_queue_program(vcpu, 0);
-   }
 
trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated);
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3fc8057db4b4..a2e78229d645 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -307,7 +307,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
u32 last_inst;
 
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
-   /* XXX Deliver Program interrupt to guest. */
+   kvmppc_core_queue_program(vcpu, 0);
pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
r = RESUME_HOST;
break;
-- 
2.33.1



[PATCH v2 4/7] KVM: PPC: Don't use pr_emerg when mmio emulation fails

2022-01-06 Thread Fabiano Rosas
If MMIO emulation fails we deliver a Program interrupt to the
guest. This is a normal event for the host, so use pr_info.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 0b0818d032e1..3fc8057db4b4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -308,7 +308,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
 
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
/* XXX Deliver Program interrupt to guest. */
-   pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst);
+   pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
r = RESUME_HOST;
break;
}
-- 
2.33.1



[PATCH v2 3/7] KVM: PPC: Fix mmio length message

2022-01-06 Thread Fabiano Rosas
We check against 'bytes' but print 'run->mmio.len' which at that point
has an old value.

e.g. 16-byte load:

before:
__kvmppc_handle_load: bad MMIO length: 8

now:
__kvmppc_handle_load: bad MMIO length: 16

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 92e552ab5a77..0b0818d032e1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1246,7 +1246,7 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
 
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
@@ -1335,7 +1335,7 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
 
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
-- 
2.33.1



[PATCH v2 2/7] KVM: PPC: Fix vmx/vsx mixup in mmio emulation

2022-01-06 Thread Fabiano Rosas
The MMIO emulation code for vector instructions is duplicated between
VSX and VMX. When emulating VMX we should check the VMX copy size
instead of the VSX one.

Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/powerpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 1e130bb087c4..92e552ab5a77 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1507,7 +1507,7 @@ int kvmppc_handle_vmx_load(struct kvm_vcpu *vcpu,
 {
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
while (vcpu->arch.mmio_vmx_copy_nums) {
@@ -1604,7 +1604,7 @@ int kvmppc_handle_vmx_store(struct kvm_vcpu *vcpu,
unsigned int index = rs & KVM_MMIO_REG_MASK;
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
vcpu->arch.io_gpr = rs;
-- 
2.33.1



[PATCH v2 1/7] KVM: PPC: Book3S HV: Stop returning internal values to userspace

2022-01-06 Thread Fabiano Rosas
Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values
to userspace, against the API of the KVM_RUN ioctl which returns 0 on
success.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
This was noticed while enabling the kvm selftests for powerpc. There's
an assert at the _vcpu_run function when we return a value different
from the expected.
---
 arch/powerpc/kvm/powerpc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a72920f4f221..1e130bb087c4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1849,6 +1849,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_ALTIVEC
 out:
 #endif
+
+   /*
+* We're already returning to userspace, don't pass the
+* RESUME_HOST flags along.
+*/
+   if (r > 0)
+   r = 0;
+
vcpu_put(vcpu);
return r;
 }
-- 
2.33.1



[PATCH v2 0/7] KVM: PPC: MMIO fixes

2022-01-06 Thread Fabiano Rosas
The change from v1 is that I have altered the MMIO size check to fail
the emulation in case the size exceeds 8 bytes. That brought some
cleanups and another fix along with it, we were returning to userspace
in case of failure instead of the guest.

This has now become an MMIO series, but since the first commit has
been reviewed already, I'm leaving it here.

v1:
https://lore.kernel.org/r/20211223211528.3560711-1-faro...@linux.ibm.com

Fabiano Rosas (7):
  KVM: PPC: Book3S HV: Stop returning internal values to userspace
  KVM: PPC: Fix vmx/vsx mixup in mmio emulation
  KVM: PPC: Fix mmio length message
  KVM: PPC: Don't use pr_emerg when mmio emulation fails
  KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio
  KVM: PPC: mmio: Return to guest after emulation failure
  KVM: PPC: mmio: Reject instructions that access more than mmio.data
size

 arch/powerpc/kvm/emulate_loadstore.c |  4 +---
 arch/powerpc/kvm/powerpc.c   | 24 +---
 2 files changed, 18 insertions(+), 10 deletions(-)

-- 
2.33.1



Re: [PATCH]powerpc/xive: Export XIVE IPI information for online-only processors.

2022-01-06 Thread Cédric Le Goater

On 1/6/22 12:03, Sachin Sant wrote:

Cédric pointed out that XIVE IPI information exported via sysfs
(debug/powerpc/xive) display empty lines for processors which are
not online.

Switch to using for_each_online_cpu() so that information is
displayed for online-only processors.

Reported-by: Cédric Le Goater 
Signed-off-by: Sachin Sant 


Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
diff -Naurp a/arch/p werpc/sysdev/xive/common.c 
b/arch/powerpc/sysdev/xive/common.c
--- a/arch/powerpc/sysdev/xive/common.c 2022-01-05 08:52:59.460118219 -0500
+++ b/arch/powerpc/sysdev/xive/common.c 2022-01-06 02:34:20.994513145 -0500
@@ -1791,7 +1791,7 @@ static int xive_ipi_debug_show(struct se
if (xive_ops->debug_show)
xive_ops->debug_show(m, private);
  
-	for_each_possible_cpu(cpu)

+   for_each_online_cpu(cpu)
xive_debug_show_ipi(m, cpu);
return 0;
  }





[PATCH v5] powerpc/pseries: read the lpar name from the firmware

2022-01-06 Thread Laurent Dufour
The LPAR name may be changed after the LPAR has been started in the HMC.
In that case lparstat command is not reporting the updated value because it
reads it from the device tree which is read at boot time.

However this value could be read from RTAS.

Adding this value in the /proc/powerpc/lparcfg output allows to read the
updated value.

However the hypervisor, like Qemu/KVM, may not support this RTAS
parameter. In that case the value reported in lparcfg is read from the
device tree and so is not updated accordingly.

Cc: Nathan Lynch 
Signed-off-by: Laurent Dufour 
---
v5:
 fallback to the device tree value if RTAS is not providing the value.
v4:
 address Nathan's new comments limiting size of the buffer.
v3:
 address Michael's comments.
v2:
 address Nathan's comments.
 change title to partition_name aligning with existing partition_id
---
 arch/powerpc/platforms/pseries/lparcfg.c | 93 
 1 file changed, 93 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lparcfg.c 
b/arch/powerpc/platforms/pseries/lparcfg.c
index c7940fcfc911..8ca08fc306e7 100644
--- a/arch/powerpc/platforms/pseries/lparcfg.c
+++ b/arch/powerpc/platforms/pseries/lparcfg.c
@@ -311,6 +311,98 @@ static void parse_mpp_x_data(struct seq_file *m)
seq_printf(m, "coalesce_pool_spurr=%ld\n", 
mpp_x_data.pool_spurr_cycles);
 }
 
+/*
+ * PAPR defines, in section "7.3.16 System Parameters Option", the token 55 to
+ * read the LPAR name, and the largest output data to 4000 + 2 bytes length.
+ */
+#define SPLPAR_LPAR_NAME_TOKEN 55
+#define GET_SYS_PARM_BUF_SIZE  4002
+#if GET_SYS_PARM_BUF_SIZE > RTAS_DATA_BUF_SIZE
+#error "GET_SYS_PARM_BUF_SIZE is larger than RTAS_DATA_BUF_SIZE"
+#endif
+
+/**
+ * Read the lpar name using the RTAS ibm,get-system-parameter call.
+ *
+ * The name read through this call is updated if changes are made by the end
+ * user on the hypervisor side.
+ *
+ * Some hypervisor (like Qemu) may not provide this value. In that case, a non
+ * null value is returned.
+ */
+static int read_RTAS_lpar_name(struct seq_file *m)
+{
+   int rc, len, token;
+   union {
+   char raw_buffer[GET_SYS_PARM_BUF_SIZE];
+   struct {
+   __be16 len;
+   char name[GET_SYS_PARM_BUF_SIZE-2];
+   };
+   } *local_buffer;
+
+   token = rtas_token("ibm,get-system-parameter");
+   if (token == RTAS_UNKNOWN_SERVICE)
+   return -EINVAL;
+
+   local_buffer = kmalloc(sizeof(*local_buffer), GFP_KERNEL);
+   if (!local_buffer)
+   return -ENOMEM;
+
+   do {
+   spin_lock(_data_buf_lock);
+   memset(rtas_data_buf, 0, sizeof(*local_buffer));
+   rc = rtas_call(token, 3, 1, NULL, SPLPAR_LPAR_NAME_TOKEN,
+  __pa(rtas_data_buf), sizeof(*local_buffer));
+   if (!rc)
+   memcpy(local_buffer->raw_buffer, rtas_data_buf,
+  sizeof(local_buffer->raw_buffer));
+   spin_unlock(_data_buf_lock);
+   } while (rtas_busy_delay(rc));
+
+   if (!rc) {
+   /* Force end of string */
+   len = min((int) be16_to_cpu(local_buffer->len),
+ (int) sizeof(local_buffer->name)-1);
+   local_buffer->name[len] = '\0';
+
+   seq_printf(m, "partition_name=%s\n", local_buffer->name);
+   } else
+   rc = -ENODATA;
+
+   kfree(local_buffer);
+   return rc;
+}
+
+/**
+ * Read the LPAR name from the Device Tree.
+ *
+ * The value read in the DT is not updated if the end-user is touching the LPAR
+ * name on the hypervisor side.
+ */
+static int read_DT_lpar_name(struct seq_file *m)
+{
+   struct device_node *rootdn;
+   const char *name;
+
+   rootdn = of_find_node_by_path("/");
+   if (!rootdn)
+   return -ENOENT;
+
+   name = of_get_property(rootdn, "ibm,partition-name", NULL);
+   if (!name)
+   return -ENOENT;
+
+   seq_printf(m, "partition_name=%s\n", name);
+   return 0;
+}
+
+static void read_lpar_name(struct seq_file *m)
+{
+   if (read_RTAS_lpar_name(m) && read_DT_lpar_name(m))
+   pr_err_once("Error can't get the LPAR name");
+}
+
 #define SPLPAR_CHARACTERISTICS_TOKEN 20
 #define SPLPAR_MAXLENGTH 1026*(sizeof(char))
 
@@ -496,6 +588,7 @@ static int pseries_lparcfg_data(struct seq_file *m, void *v)
 
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
/* this call handles the ibm,get-system-parameter contents */
+   read_lpar_name(m);
parse_system_parameter_string(m);
parse_ppp_data(m);
parse_mpp_data(m);
-- 
2.34.1



[PATCH]powerpc/xive: Export XIVE IPI information for online-only processors.

2022-01-06 Thread Sachin Sant
Cédric pointed out that XIVE IPI information exported via sysfs
(debug/powerpc/xive) display empty lines for processors which are
not online.

Switch to using for_each_online_cpu() so that information is
displayed for online-only processors.

Reported-by: Cédric Le Goater 
Signed-off-by: Sachin Sant 
--- 
diff -Naurp a/arch/p werpc/sysdev/xive/common.c 
b/arch/powerpc/sysdev/xive/common.c
--- a/arch/powerpc/sysdev/xive/common.c 2022-01-05 08:52:59.460118219 -0500
+++ b/arch/powerpc/sysdev/xive/common.c 2022-01-06 02:34:20.994513145 -0500
@@ -1791,7 +1791,7 @@ static int xive_ipi_debug_show(struct se
if (xive_ops->debug_show)
xive_ops->debug_show(m, private);
 
-   for_each_possible_cpu(cpu)
+   for_each_online_cpu(cpu)
xive_debug_show_ipi(m, cpu);
return 0;
 }


Re: [PATCH] ethernet: ibmveth: use default_groups in kobj_type

2022-01-06 Thread patchwork-bot+netdevbpf
Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller :

On Wed,  5 Jan 2022 19:41:01 +0100 you wrote:
> There are currently 2 ways to create a set of sysfs files for a
> kobj_type, through the default_attrs field, and the default_groups
> field.  Move the ibmveth sysfs code to use default_groups
> field which has been the preferred way since aa30f47cf666 ("kobject: Add
> support for default attribute groups to kobj_type") so that we can soon
> get rid of the obsolete default_attrs field.
> 
> [...]

Here is the summary with links:
  - ethernet: ibmveth: use default_groups in kobj_type
https://git.kernel.org/netdev/net-next/c/c288bc0db2d1

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH 1/2] mm/cma: provide option to opt out from exposing pages on activation failure

2022-01-06 Thread Hari Bathini




On 22/12/21 12:18 am, David Hildenbrand wrote:

On 20.12.21 20:34, Hari Bathini wrote:

Commit 072355c1cf2d ("mm/cma: expose all pages to the buddy if
activation of an area fails") started exposing all pages to buddy
allocator on CMA activation failure. But there can be CMA users that
want to handle the reserved memory differently on CMA allocation
failure. Provide an option to opt out from exposing pages to buddy
for such cases.


Hi David,

Sorry, I could not get back on this sooner. I went out on vacation
and missed this.
.



Can you elaborate why that is important and what the target user can
actually do with it?
Previously, firmware-assisted dump [1] used to reserve memory that it 
needs for booting a capture kernel & offloading /proc/vmcore.

This memory is reserved, basically blocked from being used by
production kernel, to ensure kernel crash context is not lost on
booting into a capture kernel from this memory chunk.

But [2] started using CMA instead to let the memory be used at least
in some cases as long as this memory is not going to have kernel pages. 
So, the intention in using CMA was to keep the memory unused if CMA

activation fails and only let it be used for some purpose, if at all,
if CMA activation succeeds. But [3] breaks that assumption reporting
weird errors on vmcore captured with fadump, when CMA activation fails.

To answer the question, fadump does not want the memory to be used for
kernel pages, if CMA activation fails...


[1] 
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/firmware-assisted-dump.rst

[2] https://github.com/torvalds/linux/commit/a4e92ce8e4c8
[3] https://github.com/torvalds/linux/commit/072355c1cf2d

Thanks
Hari


[PATCH 13/13] powerpc64/bpf: Optimize instruction sequence used for function calls

2022-01-06 Thread Naveen N. Rao
When calling BPF helpers, we load the function address to call into a
register. This can result in upto 5 instructions. Optimize this by
instead using the kernel toc in r2 and adjusting offset to the BPF
helper. This works since all BPF helpers are part of kernel text, and
all BPF programs/functions utilize the kernel TOC.

Further more:
- load the actual function entry address in elf v1, rather than loading
  it through the function descriptor address.
- load the Local Entry Point (LEP) in elf v2 skipping TOC setup.
- consolidate code across elf abi v1 and v2 by using r12 on both.
- skip the initial instruction setting up r2 in case of BPF function
  calls, since we already have the kernel TOC setup in r2.

Reported-by: Anton Blanchard 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 5da8e54d4d70b6..72179295f8e8c8 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -155,22 +155,20 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
 static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx, 
u64 func)
 {
unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
+   long reladdr;
 
if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
return -EINVAL;
 
-#ifdef PPC64_ELF_ABI_v1
-   /* func points to the function descriptor */
-   PPC_LI64(b2p[TMP_REG_2], func);
-   /* Load actual entry point from function descriptor */
-   PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
-   /* ... and move it to CTR */
-   EMIT(PPC_RAW_MTCTR(b2p[TMP_REG_1]));
-#else
-   /* We can clobber r12 */
-   PPC_FUNC_ADDR(12, func);
-   EMIT(PPC_RAW_MTCTR(12));
-#endif
+   reladdr = func_addr - kernel_toc_addr();
+   if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {
+   pr_err("eBPF: address of %ps out of range of kernel_toc.\n", 
(void *)func);
+   return -ERANGE;
+   }
+
+   EMIT(PPC_RAW_ADDIS(_R12, _R2, PPC_HA(reladdr)));
+   EMIT(PPC_RAW_ADDI(_R12, _R12, PPC_LO(reladdr)));
+   EMIT(PPC_RAW_MTCTR(_R12));
EMIT(PPC_RAW_BCTRL());
 
return 0;
@@ -183,6 +181,9 @@ int bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 func
if (WARN_ON_ONCE(func && is_module_text_address(func)))
return -EINVAL;
 
+   /* skip past descriptor (elf v1) and toc load (elf v2) */
+   func += FUNCTION_DESCR_SIZE + 4;
+
/* Load function address into r12 */
PPC_LI64(12, func);
 
@@ -199,11 +200,6 @@ int bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 func
for (i = ctx->idx - ctx_idx; i < 5; i++)
EMIT(PPC_RAW_NOP());
 
-#ifdef PPC64_ELF_ABI_v1
-   /* Load actual entry point from function descriptor */
-   PPC_BPF_LL(12, 12, 0);
-#endif
-
EMIT(PPC_RAW_MTCTR(12));
EMIT(PPC_RAW_BCTRL());
 
-- 
2.34.1



[PATCH 12/13] powerpc64/bpf elfv1: Do not load TOC before calling functions

2022-01-06 Thread Naveen N. Rao
BPF helpers always reside in core kernel and all BPF programs use the
kernel TOC. As such, there is no need to load the TOC before calling
helpers or other BPF functions. Drop code to do the same.

Add a check to ensure we don't proceed if this assumption ever changes
in future.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h|  2 +-
 arch/powerpc/net/bpf_jit_comp.c   |  4 +++-
 arch/powerpc/net/bpf_jit_comp32.c |  8 +--
 arch/powerpc/net/bpf_jit_comp64.c | 39 ---
 4 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 3b5c44c0b6638d..5cb3efd76715a9 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -181,7 +181,7 @@ static inline void bpf_clear_seen_register(struct 
codegen_context *ctx, int i)
ctx->seen &= ~(1 << (31 - i));
 }
 
-void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
+int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context 
*ctx,
   u32 *addrs, int pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 141e64585b6458..635f7448ff7952 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -59,7 +59,9 @@ static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 
*image,
 */
tmp_idx = ctx->idx;
ctx->idx = addrs[i] / 4;
-   bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+   ret = bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+   if (ret)
+   return ret;
 
/*
 * Restore ctx->idx here. This is safe as the length
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 8c918db4c2c486..ce753aca5b3321 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -185,7 +185,7 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_BLR());
 }
 
-void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func)
+int bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 
func)
 {
s32 rel = (s32)func - (s32)(image + ctx->idx);
 
@@ -201,6 +201,8 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 fun
EMIT(PPC_RAW_MTCTR(_R0));
EMIT(PPC_RAW_BCTRL());
}
+
+   return 0;
 }
 
 static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 
out)
@@ -953,7 +955,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
EMIT(PPC_RAW_STW(bpf_to_ppc(ctx, BPF_REG_5), 
_R1, 12));
}
 
-   bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+   ret = bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+   if (ret)
+   return ret;
 
EMIT(PPC_RAW_MR(bpf_to_ppc(ctx, BPF_REG_0) - 1, _R3));
EMIT(PPC_RAW_MR(bpf_to_ppc(ctx, BPF_REG_0), _R4));
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index e05b577d95bf11..5da8e54d4d70b6 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -152,9 +152,13 @@ void bpf_jit_build_epilogue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_BLR());
 }
 
-static void bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx,
-  u64 func)
+static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx, 
u64 func)
 {
+   unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0;
+
+   if (WARN_ON_ONCE(!core_kernel_text(func_addr)))
+   return -EINVAL;
+
 #ifdef PPC64_ELF_ABI_v1
/* func points to the function descriptor */
PPC_LI64(b2p[TMP_REG_2], func);
@@ -162,25 +166,23 @@ static void bpf_jit_emit_func_call_hlp(u32 *image, struct 
codegen_context *ctx,
PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
/* ... and move it to CTR */
EMIT(PPC_RAW_MTCTR(b2p[TMP_REG_1]));
-   /*
-* Load TOC from function descriptor at offset 8.
-* We can clobber r2 since we get called through a
-* function pointer (so caller will save/restore r2)
-* and since we don't use a TOC ourself.
-*/
-   PPC_BPF_LL(2, b2p[TMP_REG_2], 8);
 #else
/* We can clobber r12 */
PPC_FUNC_ADDR(12, func);
EMIT(PPC_RAW_MTCTR(12));
 #endif

[PATCH 11/13] powerpc64/bpf elfv2: Setup kernel TOC in r2 on entry

2022-01-06 Thread Naveen N. Rao
In preparation for using kernel TOC, load the same in r2 on entry. With
elfv1, the kernel TOC is already setup by our caller so we just emit a
nop. We adjust the number of instructions to skip on a tail call
accordingly.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index ce4fc59bbd6a92..e05b577d95bf11 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -73,6 +73,12 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 {
int i;
 
+#ifdef PPC64_ELF_ABI_v2
+   PPC_BPF_LL(_R2, _R13, offsetof(struct paca_struct, kernel_toc));
+#else
+   EMIT(PPC_RAW_NOP());
+#endif
+
/*
 * Initialize tail_call_cnt if we do tail calls.
 * Otherwise, put in NOPs so that it can be skipped when we are
@@ -87,7 +93,7 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
EMIT(PPC_RAW_NOP());
}
 
-#define BPF_TAILCALL_PROLOGUE_SIZE 8
+#define BPF_TAILCALL_PROLOGUE_SIZE 12
 
if (bpf_has_stack_frame(ctx)) {
/*
-- 
2.34.1



[PATCH 10/13] powerpc64/bpf: Use r12 for constant blinding

2022-01-06 Thread Naveen N. Rao
In preparation for preserving kernel toc in r2, switch BPF_REG_AX from
r2 to r12. r12 is not used by bpf JIT except during external helper/bpf
calls, or with BPF_NOSPEC. These sequences aren't emitted when
BPF_REG_AX is used for constant blinding and other purposes.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit64.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index b63b35e45e558c..82cdfee412784a 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -56,7 +56,7 @@ const int b2p[MAX_BPF_JIT_REG + 2] = {
/* frame pointer aka BPF_REG_10 */
[BPF_REG_FP] = 31,
/* eBPF jit internal registers */
-   [BPF_REG_AX] = 2,
+   [BPF_REG_AX] = 12,
[TMP_REG_1] = 9,
[TMP_REG_2] = 10
 };
-- 
2.34.1



[PATCH 09/13] powerpc64/bpf: Do not save/restore LR on each call to bpf_stf_barrier()

2022-01-06 Thread Naveen N. Rao
Instead of saving and restoring LR before each invocation to
bpf_stf_barrier(), set SEEN_FUNC flag so that we save/restore LR in
prologue/epilogue.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 7d38b4be26c3a5..ce4fc59bbd6a92 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -690,11 +690,10 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
EMIT(PPC_RAW_ORI(_R31, _R31, 0));
break;
case STF_BARRIER_FALLBACK:
-   EMIT(PPC_RAW_MFLR(b2p[TMP_REG_1]));
+   ctx->seen |= SEEN_FUNC;
PPC_LI64(12, 
dereference_kernel_function_descriptor(bpf_stf_barrier));
EMIT(PPC_RAW_MTCTR(12));
EMIT(PPC_RAW_BCTRL());
-   EMIT(PPC_RAW_MTLR(b2p[TMP_REG_1]));
break;
case STF_BARRIER_NONE:
break;
-- 
2.34.1



[PATCH 08/13] powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06

2022-01-06 Thread Naveen N. Rao
Johan reported the below crash with test_bpf on ppc64 e5500:

  test_bpf: #296 ALU_END_FROM_LE 64: 0x0123456789abcdef -> 0x67452301 jited:1
  Oops: Exception in kernel mode, sig: 4 [#1]
  BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
  Modules linked in: test_bpf(+)
  CPU: 0 PID: 76 Comm: insmod Not tainted 5.14.0-03771-g98c2059e008a-dirty #1
  NIP:  80061c3c LR: 806dea64 CTR: 80061c18
  REGS: c32d3420 TRAP: 0700   Not tainted 
(5.14.0-03771-g98c2059e008a-dirty)
  MSR:  80089000   CR: 88002822  XER: 2000 IRQMASK: 0
  <...>
  NIP [80061c3c] 0x80061c3c
  LR [806dea64] .__run_one+0x104/0x17c [test_bpf]
  Call Trace:
   .__run_one+0x60/0x17c [test_bpf] (unreliable)
   .test_bpf_init+0x6a8/0xdc8 [test_bpf]
   .do_one_initcall+0x6c/0x28c
   .do_init_module+0x68/0x28c
   .load_module+0x2460/0x2abc
   .__do_sys_init_module+0x120/0x18c
   .system_call_exception+0x110/0x1b8
   system_call_common+0xf0/0x210
  --- interrupt: c00 at 0x101d0acc
  <...>
  ---[ end trace 47b2bf19090bb3d0 ]---

  Illegal instruction

The illegal instruction turned out to be 'ldbrx' emitted for
BPF_FROM_[L|B]E, which was only introduced in ISA v2.06. Guard use of
the same and implement an alternative approach for older processors.

Acked-by: Johan Almbladh 
Tested-by: Johan Almbladh 
Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended 
BPF")
Reported-by: Johan Almbladh 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/include/asm/ppc-opcode.h |  1 +
 arch/powerpc/net/bpf_jit_comp64.c | 22 +-
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index efad07081cc0e5..9675303b724e93 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -500,6 +500,7 @@
 #define PPC_RAW_LDX(r, base, b)(0x7c2a | ___PPC_RT(r) | 
___PPC_RA(base) | ___PPC_RB(b))
 #define PPC_RAW_LHZ(r, base, i)(0xa000 | ___PPC_RT(r) | 
___PPC_RA(base) | IMM_L(i))
 #define PPC_RAW_LHBRX(r, base, b)  (0x7c00062c | ___PPC_RT(r) | 
___PPC_RA(base) | ___PPC_RB(b))
+#define PPC_RAW_LWBRX(r, base, b)  (0x7c00042c | ___PPC_RT(r) | 
___PPC_RA(base) | ___PPC_RB(b))
 #define PPC_RAW_LDBRX(r, base, b)  (0x7c000428 | ___PPC_RT(r) | 
___PPC_RA(base) | ___PPC_RB(b))
 #define PPC_RAW_STWCX(s, a, b) (0x7c00012d | ___PPC_RS(s) | 
___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_CMPWI(a, i)(0x2c00 | ___PPC_RA(a) | IMM_L(i))
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 48d2ca3fe126dd..7d38b4be26c3a5 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -634,17 +634,21 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
EMIT(PPC_RAW_MR(dst_reg, b2p[TMP_REG_1]));
break;
case 64:
-   /*
-* Way easier and faster(?) to store the value
-* into stack and then use ldbrx
-*
-* ctx->seen will be reliable in pass2, but
-* the instructions generated will remain the
-* same across all passes
-*/
+   /* Store the value to stack and then use 
byte-reverse loads */
PPC_BPF_STL(dst_reg, 1, 
bpf_jit_stack_local(ctx));
EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], 1, 
bpf_jit_stack_local(ctx)));
-   EMIT(PPC_RAW_LDBRX(dst_reg, 0, b2p[TMP_REG_1]));
+   if (cpu_has_feature(CPU_FTR_ARCH_206)) {
+   EMIT(PPC_RAW_LDBRX(dst_reg, 0, 
b2p[TMP_REG_1]));
+   } else {
+   EMIT(PPC_RAW_LWBRX(dst_reg, 0, 
b2p[TMP_REG_1]));
+   if 
(IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN))
+   EMIT(PPC_RAW_SLDI(dst_reg, 
dst_reg, 32));
+   EMIT(PPC_RAW_LI(b2p[TMP_REG_2], 4));
+   EMIT(PPC_RAW_LWBRX(b2p[TMP_REG_2], 
b2p[TMP_REG_2], b2p[TMP_REG_1]));
+   if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
+   
EMIT(PPC_RAW_SLDI(b2p[TMP_REG_2], b2p[TMP_REG_2], 32));
+   EMIT(PPC_RAW_OR(dst_reg, dst_reg, 
b2p[TMP_REG_2]));
+   }
break;
}
break;
-- 
2.34.1



[PATCH 07/13] powerpc/bpf: Handle large branch ranges with BPF_EXIT

2022-01-06 Thread Naveen N. Rao
In some scenarios, it is possible that the program epilogue is outside
the branch range for a BPF_EXIT instruction. Instead of rejecting such
programs, emit epilogue as an alternate exit point from the program.
Track the location of the same so that subsequent exits can take either
of the two paths.

Reported-by: Jordan Niethe 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h|  2 ++
 arch/powerpc/net/bpf_jit_comp.c   | 22 +-
 arch/powerpc/net/bpf_jit_comp32.c |  7 +--
 arch/powerpc/net/bpf_jit_comp64.c |  7 +--
 4 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 9cdd33d6be4cc0..3b5c44c0b6638d 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -151,6 +151,7 @@ struct codegen_context {
unsigned int stack_size;
int b2p[ARRAY_SIZE(b2p)];
unsigned int exentry_idx;
+   unsigned int alt_exit_addr;
 };
 
 #ifdef CONFIG_PPC32
@@ -186,6 +187,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
+int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr);
 
 int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct 
codegen_context *ctx,
  int insn_idx, int jmp_off, int dst_reg);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 56dd1f4e3e4447..141e64585b6458 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -89,6 +89,22 @@ static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 
*image,
return 0;
 }
 
+int bpf_jit_emit_exit_insn(u32 *image, struct codegen_context *ctx, int 
tmp_reg, long exit_addr)
+{
+   if (!exit_addr || is_offset_in_branch_range(exit_addr - (ctx->idx * 
4))) {
+   PPC_JMP(exit_addr);
+   } else if (ctx->alt_exit_addr) {
+   if (WARN_ON(!is_offset_in_branch_range((long)ctx->alt_exit_addr 
- (ctx->idx * 4
+   return -1;
+   PPC_JMP(ctx->alt_exit_addr);
+   } else {
+   ctx->alt_exit_addr = ctx->idx * 4;
+   bpf_jit_build_epilogue(image, ctx);
+   }
+
+   return 0;
+}
+
 struct powerpc64_jit_data {
struct bpf_binary_header *header;
u32 *addrs;
@@ -177,8 +193,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 * If we have seen a tail call, we need a second pass.
 * This is because bpf_jit_emit_common_epilogue() is called
 * from bpf_jit_emit_tail_call() with a not yet stable ctx->seen.
+* We also need a second pass if we ended up with too large
+* a program so as to ensure BPF_EXIT branches are in range.
 */
-   if (cgctx.seen & SEEN_TAILCALL) {
+   if (cgctx.seen & SEEN_TAILCALL || 
!is_offset_in_branch_range((long)cgctx.idx * 4)) {
cgctx.idx = 0;
if (bpf_jit_build_body(fp, 0, , addrs, 0)) {
fp = org_fp;
@@ -193,6 +211,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 * calculate total size from idx.
 */
bpf_jit_build_prologue(0, );
+   addrs[fp->len] = cgctx.idx * 4;
bpf_jit_build_epilogue(0, );
 
fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4;
@@ -233,6 +252,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
for (pass = 1; pass < 3; pass++) {
/* Now build the prologue, body code & epilogue for real. */
cgctx.idx = 0;
+   cgctx.alt_exit_addr = 0;
bpf_jit_build_prologue(code_base, );
if (bpf_jit_build_body(fp, code_base, , addrs, pass)) {
bpf_jit_binary_free(bpf_hdr);
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 72c2c47612964d..8c918db4c2c486 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -929,8 +929,11 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
 * the epilogue. If we _are_ the last instruction,
 * we'll just fall through to the epilogue.
 */
-   if (i != flen - 1)
-   PPC_JMP(exit_addr);
+   if (i != flen - 1) {
+   ret = bpf_jit_emit_exit_insn(image, ctx, _R0, 
exit_addr);
+   if (ret)
+   return ret;
+   }
/* else fall through to the epilogue */
break;
 
diff --git 

[PATCH 06/13] powerpc/bpf: Emit a single branch instruction for known short branch ranges

2022-01-06 Thread Naveen N. Rao
PPC_BCC() emits two instructions to accommodate scenarios where we need
to branch outside the range of a conditional branch. PPC_BCC_SHORT()
emits a single branch instruction and can be used when the branch is
known to be within a conditional branch range.

Convert some of the uses of PPC_BCC() in the powerpc BPF JIT over to
PPC_BCC_SHORT() where we know the branch range.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp32.c | 8 
 arch/powerpc/net/bpf_jit_comp64.c | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 2258d3886d02ec..72c2c47612964d 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -221,7 +221,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
EMIT(PPC_RAW_LWZ(_R0, b2p_bpf_array, offsetof(struct bpf_array, 
map.max_entries)));
EMIT(PPC_RAW_CMPLW(b2p_index, _R0));
EMIT(PPC_RAW_LWZ(_R0, _R1, bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
-   PPC_BCC(COND_GE, out);
+   PPC_BCC_SHORT(COND_GE, out);
 
/*
 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -230,7 +230,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
EMIT(PPC_RAW_CMPLWI(_R0, MAX_TAIL_CALL_CNT));
/* tail_call_cnt++; */
EMIT(PPC_RAW_ADDIC(_R0, _R0, 1));
-   PPC_BCC(COND_GT, out);
+   PPC_BCC_SHORT(COND_GT, out);
 
/* prog = array->ptrs[index]; */
EMIT(PPC_RAW_RLWINM(_R3, b2p_index, 2, 0, 29));
@@ -243,7 +243,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
 *   goto out;
 */
EMIT(PPC_RAW_CMPLWI(_R3, 0));
-   PPC_BCC(COND_EQ, out);
+   PPC_BCC_SHORT(COND_EQ, out);
 
/* goto *(prog->bpf_func + prologue_size); */
EMIT(PPC_RAW_LWZ(_R3, _R3, offsetof(struct bpf_prog, bpf_func)));
@@ -834,7 +834,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
if (BPF_MODE(code) == BPF_PROBE_MEM) {
PPC_LI32(_R0, TASK_SIZE - off);
EMIT(PPC_RAW_CMPLW(src_reg, _R0));
-   PPC_BCC(COND_GT, (ctx->idx + 5) * 4);
+   PPC_BCC_SHORT(COND_GT, (ctx->idx + 4) * 4);
EMIT(PPC_RAW_LI(dst_reg, 0));
/*
 * For BPF_DW case, "li reg_h,0" would be 
needed when
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 3d018ecc475b2b..2b291d435d2e26 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -225,7 +225,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
EMIT(PPC_RAW_LWZ(b2p[TMP_REG_1], b2p_bpf_array, offsetof(struct 
bpf_array, map.max_entries)));
EMIT(PPC_RAW_RLWINM(b2p_index, b2p_index, 0, 0, 31));
EMIT(PPC_RAW_CMPLW(b2p_index, b2p[TMP_REG_1]));
-   PPC_BCC(COND_GE, out);
+   PPC_BCC_SHORT(COND_GE, out);
 
/*
 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
@@ -233,7 +233,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
 */
PPC_BPF_LL(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx));
EMIT(PPC_RAW_CMPLWI(b2p[TMP_REG_1], MAX_TAIL_CALL_CNT));
-   PPC_BCC(COND_GT, out);
+   PPC_BCC_SHORT(COND_GT, out);
 
/*
 * tail_call_cnt++;
@@ -251,7 +251,7 @@ static int bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32 o
 *   goto out;
 */
EMIT(PPC_RAW_CMPLDI(b2p[TMP_REG_1], 0));
-   PPC_BCC(COND_EQ, out);
+   PPC_BCC_SHORT(COND_EQ, out);
 
/* goto *(prog->bpf_func + prologue_size); */
PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_1], offsetof(struct bpf_prog, 
bpf_func));
@@ -803,7 +803,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
else /* BOOK3S_64 */
PPC_LI64(b2p[TMP_REG_2], PAGE_OFFSET);
EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], 
b2p[TMP_REG_2]));
-   PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
+   PPC_BCC_SHORT(COND_GT, (ctx->idx + 3) * 4);
EMIT(PPC_RAW_LI(dst_reg, 0));
/*
 * Check if 'off' is word aligned because 
PPC_BPF_LL()
-- 
2.34.1



[PATCH 05/13] powerpc/bpf: Skip branch range validation during first pass

2022-01-06 Thread Naveen N. Rao
During the first pass, addrs[] is still being populated. So, all
branches to following instructions will appear to be going to the start
of the JIT program. Ignore branch range validation for such instructions
and assume those to be in range. Branch range validation will happen
during the second pass after addrs[] is setup properly.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index b20a2a83a6e75b..9cdd33d6be4cc0 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -27,7 +27,7 @@
 #define PPC_JMP(dest)\
do {  \
long offset = (long)(dest) - (ctx->idx * 4);  \
-   if (!is_offset_in_branch_range(offset)) { \
+   if ((dest) != 0 && !is_offset_in_branch_range(offset)) {
  \
pr_err_ratelimited("Branch offset 0x%lx (@%u) out of 
range\n", offset, ctx->idx);   \
return -ERANGE;   \
} \
@@ -41,7 +41,7 @@
 #define PPC_BCC_SHORT(cond, dest)\
do {  \
long offset = (long)(dest) - (ctx->idx * 4);  \
-   if (!is_offset_in_cond_branch_range(offset)) {\
+   if ((dest) != 0 && !is_offset_in_cond_branch_range(offset)) {   
  \
pr_err_ratelimited("Conditional branch offset 0x%lx 
(@%u) out of range\n", offset, ctx->idx);   \
return -ERANGE;   \
} \
-- 
2.34.1



[PATCH 04/13] tools/bpf: Rename 'struct event' to avoid naming conflict

2022-01-06 Thread Naveen N. Rao
On ppc64le, trying to build bpf seltests throws the below warning:
  In file included from runqslower.bpf.c:5:
  ./runqslower.h:7:8: error: redefinition of 'event'
  struct event {
 ^
  
/home/naveen/linux/tools/testing/selftests/bpf/tools/build/runqslower/vmlinux.h:156602:8:
  note: previous definition is here
  struct event {
 ^

This happens since 'struct event' is defined in
drivers/net/ethernet/alteon/acenic.h . Rename the one in runqslower to a
more appropriate 'runq_event' to avoid the naming conflict.

Signed-off-by: Naveen N. Rao 
---
 tools/bpf/runqslower/runqslower.bpf.c | 2 +-
 tools/bpf/runqslower/runqslower.c | 2 +-
 tools/bpf/runqslower/runqslower.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/runqslower/runqslower.bpf.c 
b/tools/bpf/runqslower/runqslower.bpf.c
index ab9353f2fd46ab..9a5c1f008fe603 100644
--- a/tools/bpf/runqslower/runqslower.bpf.c
+++ b/tools/bpf/runqslower/runqslower.bpf.c
@@ -68,7 +68,7 @@ int handle__sched_switch(u64 *ctx)
 */
struct task_struct *prev = (struct task_struct *)ctx[1];
struct task_struct *next = (struct task_struct *)ctx[2];
-   struct event event = {};
+   struct runq_event event = {};
u64 *tsp, delta_us;
long state;
u32 pid;
diff --git a/tools/bpf/runqslower/runqslower.c 
b/tools/bpf/runqslower/runqslower.c
index d8971584495213..ff7f5e8485e937 100644
--- a/tools/bpf/runqslower/runqslower.c
+++ b/tools/bpf/runqslower/runqslower.c
@@ -100,7 +100,7 @@ static int bump_memlock_rlimit(void)
 
 void handle_event(void *ctx, int cpu, void *data, __u32 data_sz)
 {
-   const struct event *e = data;
+   const struct runq_event *e = data;
struct tm *tm;
char ts[32];
time_t t;
diff --git a/tools/bpf/runqslower/runqslower.h 
b/tools/bpf/runqslower/runqslower.h
index 9db225425e5ff9..4f70f07200c23d 100644
--- a/tools/bpf/runqslower/runqslower.h
+++ b/tools/bpf/runqslower/runqslower.h
@@ -4,7 +4,7 @@
 
 #define TASK_COMM_LEN 16
 
-struct event {
+struct runq_event {
char task[TASK_COMM_LEN];
__u64 delta_us;
pid_t pid;
-- 
2.34.1



[PATCH 03/13] powerpc/bpf: Update ldimm64 instructions during extra pass

2022-01-06 Thread Naveen N. Rao
These instructions are updated after the initial JIT, so redo codegen
during the extra pass. Rename bpf_jit_fixup_subprog_calls() to clarify
that this is more than just subprog calls.

Fixes: 69c087ba6225b5 ("bpf: Add bpf_for_each_map_elem() helper")
Cc: sta...@vger.kernel.org # v5.15
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp.c   | 29 +++--
 arch/powerpc/net/bpf_jit_comp32.c |  6 ++
 arch/powerpc/net/bpf_jit_comp64.c |  7 ++-
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index d6ffdd0f2309d0..56dd1f4e3e4447 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -23,15 +23,15 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int 
size)
memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
-/* Fix the branch target addresses for subprog calls */
-static int bpf_jit_fixup_subprog_calls(struct bpf_prog *fp, u32 *image,
-  struct codegen_context *ctx, u32 *addrs)
+/* Fix updated addresses (for subprog calls, ldimm64, et al) during extra pass 
*/
+static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
+  struct codegen_context *ctx, u32 *addrs)
 {
const struct bpf_insn *insn = fp->insnsi;
bool func_addr_fixed;
u64 func_addr;
u32 tmp_idx;
-   int i, ret;
+   int i, j, ret;
 
for (i = 0; i < fp->len; i++) {
/*
@@ -66,6 +66,23 @@ static int bpf_jit_fixup_subprog_calls(struct bpf_prog *fp, 
u32 *image,
 * of the JITed sequence remains unchanged.
 */
ctx->idx = tmp_idx;
+   } else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW)) {
+   tmp_idx = ctx->idx;
+   ctx->idx = addrs[i] / 4;
+#ifdef CONFIG_PPC32
+   PPC_LI32(ctx->b2p[insn[i].dst_reg] - 1, (u32)insn[i + 
1].imm);
+   PPC_LI32(ctx->b2p[insn[i].dst_reg], (u32)insn[i].imm);
+   for (j = ctx->idx - addrs[i] / 4; j < 4; j++)
+   EMIT(PPC_RAW_NOP());
+#else
+   func_addr = ((u64)(u32)insn[i].imm) | 
(((u64)(u32)insn[i + 1].imm) << 32);
+   PPC_LI64(b2p[insn[i].dst_reg], func_addr);
+   /* overwrite rest with nops */
+   for (j = ctx->idx - addrs[i] / 4; j < 5; j++)
+   EMIT(PPC_RAW_NOP());
+#endif
+   ctx->idx = tmp_idx;
+   i++;
}
}
 
@@ -200,13 +217,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
/*
 * Do not touch the prologue and epilogue as they will remain
 * unchanged. Only fix the branch target address for subprog
-* calls in the body.
+* calls in the body, and ldimm64 instructions.
 *
 * This does not change the offsets and lengths of the subprog
 * call instruction sequences and hence, the size of the JITed
 * image as well.
 */
-   bpf_jit_fixup_subprog_calls(fp, code_base, , addrs);
+   bpf_jit_fixup_addresses(fp, code_base, , addrs);
 
/* There is no need to perform the usual passes. */
goto skip_codegen_passes;
diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index 997a47fa615b30..2258d3886d02ec 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -293,6 +293,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
bool func_addr_fixed;
u64 func_addr;
u32 true_cond;
+   u32 tmp_idx;
+   int j;
 
/*
 * addrs[] maps a BPF bytecode address into a real offset from
@@ -908,8 +910,12 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, 
struct codegen_context *
 * 16 byte instruction that uses two 'struct bpf_insn'
 */
case BPF_LD | BPF_IMM | BPF_DW: /* dst = (u64) imm */
+   tmp_idx = ctx->idx;
PPC_LI32(dst_reg_h, (u32)insn[i + 1].imm);
PPC_LI32(dst_reg, (u32)insn[i].imm);
+   /* padding to allow full 4 instructions for later 
patching */
+   for (j = ctx->idx - tmp_idx; j < 4; j++)
+   EMIT(PPC_RAW_NOP());
/* Adjust for two bpf instructions */
addrs[++i] = ctx->idx * 4;
break;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 

[PATCH 02/13] powerpc32/bpf: Fix codegen for bpf-to-bpf calls

2022-01-06 Thread Naveen N. Rao
Pad instructions emitted for BPF_CALL so that the number of instructions
generated does not change for different function addresses. This is
especially important for calls to other bpf functions, whose address
will only be known during extra pass.

Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: sta...@vger.kernel.org # v5.13+
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/net/bpf_jit_comp32.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp32.c 
b/arch/powerpc/net/bpf_jit_comp32.c
index d3a52cd42f5346..997a47fa615b30 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -191,6 +191,9 @@ void bpf_jit_emit_func_call_rel(u32 *image, struct 
codegen_context *ctx, u64 fun
 
if (image && rel < 0x200 && rel >= -0x200) {
PPC_BL_ABS(func);
+   EMIT(PPC_RAW_NOP());
+   EMIT(PPC_RAW_NOP());
+   EMIT(PPC_RAW_NOP());
} else {
/* Load function address into r0 */
EMIT(PPC_RAW_LIS(_R0, IMM_H(func)));
-- 
2.34.1



[PATCH 01/13] bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()

2022-01-06 Thread Naveen N. Rao
task_pt_regs() can return NULL on powerpc for kernel threads. This is
then used in __bpf_get_stack() to check for user mode, resulting in a
kernel oops. Guard against this by checking return value of
task_pt_regs() before trying to obtain the call chain.

Fixes: fa28dcb82a38f8 ("bpf: Introduce helper bpf_get_task_stack()")
Cc: sta...@vger.kernel.org # v5.9+
Signed-off-by: Naveen N. Rao 
---
 kernel/bpf/stackmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 6e75bbee39f0b5..0dcaed4d3f4cec 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -525,13 +525,14 @@ BPF_CALL_4(bpf_get_task_stack, struct task_struct *, 
task, void *, buf,
   u32, size, u64, flags)
 {
struct pt_regs *regs;
-   long res;
+   long res = -EINVAL;
 
if (!try_get_task_stack(task))
return -EFAULT;
 
regs = task_pt_regs(task);
-   res = __bpf_get_stack(regs, task, NULL, buf, size, flags);
+   if (regs)
+   res = __bpf_get_stack(regs, task, NULL, buf, size, flags);
put_task_stack(task);
 
return res;
-- 
2.34.1



[PATCH 00/13] powerpc/bpf: Some fixes and updates

2022-01-06 Thread Naveen N. Rao
A set of fixes and updates to powerpc BPF JIT:
- Patches 1-3 fix issues with the existing powerpc JIT and are tagged 
  for -stable.
- Patch 4 fixes a build issue with bpf selftests on powerpc.
- Patches 5-9 handle some corner cases and make some small improvements.
- Patches 10-13 optimize how function calls are handled in ppc64. 

Patches 7 and 8 were previously posted, and while patch 7 has no 
changes, patch 8 has been reworked to handle BPF_EXIT differently.


- Naveen


Naveen N. Rao (13):
  bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
  powerpc32/bpf: Fix codegen for bpf-to-bpf calls
  powerpc/bpf: Update ldimm64 instructions during extra pass
  tools/bpf: Rename 'struct event' to avoid naming conflict
  powerpc/bpf: Skip branch range validation during first pass
  powerpc/bpf: Emit a single branch instruction for known short branch
ranges
  powerpc/bpf: Handle large branch ranges with BPF_EXIT
  powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
  powerpc64/bpf: Do not save/restore LR on each call to
bpf_stf_barrier()
  powerpc64/bpf: Use r12 for constant blinding
  powerpc64/bpf elfv2: Setup kernel TOC in r2 on entry
  powerpc64/bpf elfv1: Do not load TOC before calling functions
  powerpc64/bpf: Optimize instruction sequence used for function calls

 arch/powerpc/include/asm/ppc-opcode.h |   1 +
 arch/powerpc/net/bpf_jit.h|   8 +-
 arch/powerpc/net/bpf_jit64.h  |   2 +-
 arch/powerpc/net/bpf_jit_comp.c   |  55 ++--
 arch/powerpc/net/bpf_jit_comp32.c |  32 +--
 arch/powerpc/net/bpf_jit_comp64.c | 124 ++
 kernel/bpf/stackmap.c |   5 +-
 tools/bpf/runqslower/runqslower.bpf.c |   2 +-
 tools/bpf/runqslower/runqslower.c |   2 +-
 tools/bpf/runqslower/runqslower.h |   2 +-
 10 files changed, 153 insertions(+), 80 deletions(-)


base-commit: bdcf18e133f656b2c97390a594fc95e37849e682
-- 
2.34.1



Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8

2022-01-06 Thread Michael Ellerman
John Paul Adrian Glaubitz  writes:
> Hi Michael!
>
> Sorry for the long time without any responses. Shall we continue debugging 
> this?

Yes!

Sorry also that I haven't been able to fix it yet, I had to stop chasing
this bug and work on other things before the end of the year.

> We're currently running 5.15.x on the host system and the guests and the 
> testsuite
> for gcc-9 still reproducibly kills the KVM host.

Have you been able to try the different -smp options I suggested?

Can you separately test with (on the host):

 # echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes


cheers

> On 11/1/21 07:53, Michael Ellerman wrote:
>> Sure, will give that a try.
>> 
>> I was able to crash my machine over the weekend, building openjdk, but I
>> haven't been able to reproduce it for ~24 hours now (I didn't change
>> anything).
>> 
>> 
>> Can you try running your guests with no SMT threads?
>> 
>> I think one of your guests was using:
>> 
>>   -smp 32,sockets=1,dies=1,cores=8,threads=4
>> 
>> Can you change that to:
>> 
>>   -smp 8,sockets=1,dies=1,cores=8,threads=1
>> 
>> 
>> And something similar for the other guest(s).
>> 
>> If the system is stable with those settings that would be useful
>> information, and would also mean you could use the system without it
>> crashing semi regularly.
>> 
>> cheers
> -- 
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer - glaub...@debian.org
> `. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
>   `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH V2 1/2] tools/perf: Include global and local variants for p_stage_cyc sort key

2022-01-06 Thread Athira Rajeev



> On 08-Dec-2021, at 9:21 AM, Nageswara Sastry  wrote:
> 
> 
> 
> On 07/12/21 8:22 pm, Arnaldo Carvalho de Melo wrote:
>> Em Fri, Dec 03, 2021 at 07:50:37AM +0530, Athira Rajeev escreveu:
>>> Sort key p_stage_cyc is used to present the latency
>>> cycles spend in pipeline stages. perf tool has local
>>> p_stage_cyc sort key to display this info. There is no
>>> global variant available for this sort key. local variant
>>> shows latency in a sinlge sample, whereas, global value
>>> will be useful to present the total latency (sum of
>>> latencies) in the hist entry. It represents latency
>>> number multiplied by the number of samples.
>>> 
>>> Add global (p_stage_cyc) and local variant
>>> (local_p_stage_cyc) for this sort key. Use the
>>> local_p_stage_cyc as default option for "mem" sort mode.
>>> Also add this to list of dynamic sort keys and made the
>>> "dynamic_headers" and "arch_specific_sort_keys" as static.
>>> 
>>> Signed-off-by: Athira Rajeev 
>>> Reported-by: Namhyung Kim 
>> I got this for v1, does it stand for v2?
>> Tested-by: Nageswara R Sastry 
> 
> 
> Tested with v2 also.

Hi Arnaldo,

If this patchset looks fine to you, can you please consider pulling it.

Thanks
Athira
> 
> 
> Tested-by: Nageswara R Sastry 
> 
>>> ---
>>> Changelog:
>>> v1 -> v2:
>>>  Addressed review comments from Jiri by making the
>>>  "dynamic_headers" and "arch_specific_sort_keys"
>>>  as static.
>>> 
>>>  tools/perf/util/hist.c |  4 +++-
>>>  tools/perf/util/hist.h |  3 ++-
>>>  tools/perf/util/sort.c | 34 +-
>>>  tools/perf/util/sort.h |  3 ++-
>>>  4 files changed, 32 insertions(+), 12 deletions(-)
>>> 
>>> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
>>> index b776465e04ef..0a8033b09e28 100644
>>> --- a/tools/perf/util/hist.c
>>> +++ b/tools/perf/util/hist.c
>>> @@ -211,7 +211,9 @@ void hists__calc_col_len(struct hists *hists, struct 
>>> hist_entry *h)
>>> hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10);
>>> hists__new_col_len(hists, HISTC_LOCAL_INS_LAT, 13);
>>> hists__new_col_len(hists, HISTC_GLOBAL_INS_LAT, 13);
>>> -   hists__new_col_len(hists, HISTC_P_STAGE_CYC, 13);
>>> +   hists__new_col_len(hists, HISTC_LOCAL_P_STAGE_CYC, 13);
>>> +   hists__new_col_len(hists, HISTC_GLOBAL_P_STAGE_CYC, 13);
>>> +
>>> if (symbol_conf.nanosecs)
>>> hists__new_col_len(hists, HISTC_TIME, 16);
>>> else
>>> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
>>> index 621f35ae1efa..2a15e22fb89c 100644
>>> --- a/tools/perf/util/hist.h
>>> +++ b/tools/perf/util/hist.h
>>> @@ -75,7 +75,8 @@ enum hist_column {
>>> HISTC_MEM_BLOCKED,
>>> HISTC_LOCAL_INS_LAT,
>>> HISTC_GLOBAL_INS_LAT,
>>> -   HISTC_P_STAGE_CYC,
>>> +   HISTC_LOCAL_P_STAGE_CYC,
>>> +   HISTC_GLOBAL_P_STAGE_CYC,
>>> HISTC_NR_COLS, /* Last entry */
>>>  };
>>>  diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
>>> index a111065b484e..e417e47f51b9 100644
>>> --- a/tools/perf/util/sort.c
>>> +++ b/tools/perf/util/sort.c
>>> @@ -37,7 +37,7 @@ const chardefault_parent_pattern[] = 
>>> "^sys_|^do_page_fault";
>>>  const char *parent_pattern = default_parent_pattern;
>>>  const char *default_sort_order = "comm,dso,symbol";
>>>  const char default_branch_sort_order[] = 
>>> "comm,dso_from,symbol_from,symbol_to,cycles";
>>> -const char default_mem_sort_order[] = 
>>> "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,p_stage_cyc";
>>> +const char default_mem_sort_order[] = 
>>> "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc";
>>>  const char default_top_sort_order[] = "dso,symbol";
>>>  const char default_diff_sort_order[] = "dso,symbol";
>>>  const char default_tracepoint_sort_order[] = "trace";
>>> @@ -46,8 +46,8 @@ const char*field_order;
>>>  regex_tignore_callees_regex;
>>>  inthave_ignore_callees = 0;
>>>  enum sort_mode sort__mode = SORT_MODE__NORMAL;
>>> -const char *dynamic_headers[] = {"local_ins_lat", "p_stage_cyc"};
>>> -const char *arch_specific_sort_keys[] = {"p_stage_cyc"};
>>> +static const char *const dynamic_headers[] = {"local_ins_lat", "ins_lat", 
>>> "local_p_stage_cyc", "p_stage_cyc"};
>>> +static const char *const arch_specific_sort_keys[] = {"local_p_stage_cyc", 
>>> "p_stage_cyc"};
>>>/*
>>>   * Replaces all occurrences of a char used with the:
>>> @@ -1392,22 +1392,37 @@ struct sort_entry sort_global_ins_lat = {
>>>  };
>>>static int64_t
>>> -sort__global_p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry 
>>> *right)
>>> +sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right)
>>>  {
>>> return left->p_stage_cyc - right->p_stage_cyc;
>>>  }
>>>  +static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, 
>>> char *bf,
>>> +   size_t size, unsigned int width)
>>> +{
>>> +   return repsep_snprintf(bf, size, 

Re: [PATCH v4] powerpc/pseries: read the lpar name from the firmware

2022-01-06 Thread Laurent Dufour
On 06/01/2022, 02:17:21, Tyrel Datwyler wrote:
> On 1/5/22 3:19 PM, Nathan Lynch wrote:
>> Laurent Dufour  writes:
>>> On 07/12/2021, 18:11:09, Laurent Dufour wrote:
 The LPAR name may be changed after the LPAR has been started in the HMC.
 In that case lparstat command is not reporting the updated value because it
 reads it from the device tree which is read at boot time.

 However this value could be read from RTAS.

 Adding this value in the /proc/powerpc/lparcfg output allows to read the
 updated value.
>>>
>>> Do you consider taking that patch soon?
>>
>> This version prints an error on non-PowerVM guests the first time
>> lparcfg is read.
> 
> I assume because QEMU doesn't implement the LPAR_NAME token for get_sysparm.
> 
>>
>> And I still contend that having this function fall back to reporting the
>> partition name in the DT would provide a beneficial consistency in the
>> user-facing API, allowing programs to avoid hypervisor-specific branches
>> in their code. 
> 
> Agreed, if the get_sysparm fails just report the lpar-name from the device 
> tree.

My aim is to not do in the kernel what can be easily done in user space but
avoiding user space program hypervisor-specific branches is a good point.

Note that if the RTAS call has been available to unprivileged user, all
that stuff would have been made in user space, so hypervisor-specific...

Anyway, I'll work on a new version fetching the DT value in the case the
RTAS call is failing.

Thanks,
Laurent.