[PATCH 2/2] ASoC: fsl_spdif: Add support for imx6sx platform
The one difference on imx6sx platform is that the root clock is shared with ASRC module, so we add a new flags "ind_root_clk" which means the root clock is independent, then we will not do the clk_set_rate and clk_round_rate to avoid impact ASRC module usage. As add a new flags, we include the soc specific data struct. Signed-off-by: Shengjiu Wang --- sound/soc/fsl/fsl_spdif.c | 43 +++ 1 file changed, 39 insertions(+), 4 deletions(-) diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c index 1b2e516f9162..00e06803d32f 100644 --- a/sound/soc/fsl/fsl_spdif.c +++ b/sound/soc/fsl/fsl_spdif.c @@ -42,6 +42,17 @@ static u8 srpc_dpll_locked[] = { 0x0, 0x1, 0x2, 0x3, 0x4, 0xa, 0xb }; #define DEFAULT_RXCLK_SRC 1 +/** + * struct fsl_spdif_soc_data: soc specific data + * + * @imx: for imx platform + * @ind_root_clk: flag for round clk rate + */ +struct fsl_spdif_soc_data { + bool imx; + bool ind_root_clk; +}; + /* * SPDIF control structure * Defines channel status, subcode and Q sub @@ -93,6 +104,7 @@ struct fsl_spdif_priv { struct snd_soc_dai_driver cpu_dai_drv; struct platform_device *pdev; struct regmap *regmap; + const struct fsl_spdif_soc_data *soc; bool dpll_locked; u32 txrate[SPDIF_TXRATE_MAX]; u8 txclk_df[SPDIF_TXRATE_MAX]; @@ -110,6 +122,21 @@ struct fsl_spdif_priv { u32 regcache_srpc; }; +static struct fsl_spdif_soc_data fsl_spdif_vf610 = { + .imx = false, + .ind_root_clk = true, +}; + +static struct fsl_spdif_soc_data fsl_spdif_imx35 = { + .imx = true, + .ind_root_clk = true, +}; + +static struct fsl_spdif_soc_data fsl_spdif_imx6sx = { + .imx = true, + .ind_root_clk = false, +}; + /* DPLL locked and lock loss interrupt handler */ static void spdif_irq_dpll_lock(struct fsl_spdif_priv *spdif_priv) { @@ -421,7 +448,7 @@ static int spdif_set_sample_rate(struct snd_pcm_substream *substream, sysclk_df = spdif_priv->sysclk_df[rate]; /* Don't mess up the clocks from other modules */ - if (clk != STC_TXCLK_SPDIF_ROOT) + if (clk != STC_TXCLK_SPDIF_ROOT || !spdif_priv->soc->ind_root_clk) goto clk_set_bypass; /* The S/PDIF block needs a clock of 64 * fs * txclk_df */ @@ -1186,7 +1213,8 @@ static int fsl_spdif_probe_txclk(struct fsl_spdif_priv *spdif_priv, continue; ret = fsl_spdif_txclk_caldiv(spdif_priv, clk, savesub, index, -i == STC_TXCLK_SPDIF_ROOT); +i == STC_TXCLK_SPDIF_ROOT && +spdif_priv->soc->ind_root_clk); if (savesub == ret) continue; @@ -1230,6 +1258,12 @@ static int fsl_spdif_probe(struct platform_device *pdev) spdif_priv->pdev = pdev; + spdif_priv->soc = of_device_get_match_data(&pdev->dev); + if (!spdif_priv->soc) { + dev_err(&pdev->dev, "failed to get soc data\n"); + return -ENODEV; + } + /* Initialize this copy of the CPU DAI driver structure */ memcpy(&spdif_priv->cpu_dai_drv, &fsl_spdif_dai, sizeof(fsl_spdif_dai)); spdif_priv->cpu_dai_drv.name = dev_name(&pdev->dev); @@ -1359,8 +1393,9 @@ static const struct dev_pm_ops fsl_spdif_pm = { }; static const struct of_device_id fsl_spdif_dt_ids[] = { - { .compatible = "fsl,imx35-spdif", }, - { .compatible = "fsl,vf610-spdif", }, + { .compatible = "fsl,imx35-spdif", .data = &fsl_spdif_imx35, }, + { .compatible = "fsl,vf610-spdif", .data = &fsl_spdif_vf610, }, + { .compatible = "fsl,imx6sx-spdif", .data = &fsl_spdif_imx6sx, }, {} }; MODULE_DEVICE_TABLE(of, fsl_spdif_dt_ids); -- 2.21.0
[PATCH 1/2] ASoC: bindings: fsl_spdif: Add new compatible string for imx6sx
Add new compatible string "fsl,imx6sx-spdif" in the binding document. And add compatible string "fsl,vf610-spdif" which was missed before. Signed-off-by: Shengjiu Wang --- Documentation/devicetree/bindings/sound/fsl,spdif.txt | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/sound/fsl,spdif.txt b/Documentation/devicetree/bindings/sound/fsl,spdif.txt index 8b324f82a782..e1365b0ee1e9 100644 --- a/Documentation/devicetree/bindings/sound/fsl,spdif.txt +++ b/Documentation/devicetree/bindings/sound/fsl,spdif.txt @@ -6,7 +6,11 @@ a fibre cable. Required properties: - - compatible : Compatible list, must contain "fsl,imx35-spdif". + - compatible : Compatible list, should contain one of the following + compatibles: + "fsl,imx35-spdif", + "fsl,vf610-spdif", + "fsl,imx6sx-spdif", - reg: Offset and length of the register set for the device. -- 2.21.0
Re: [PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()
On Mon 15-06-20 21:57:16, Waiman Long wrote: > The kzfree() function is normally used to clear some sensitive > information, like encryption keys, in the buffer before freeing it back > to the pool. Memset() is currently used for the buffer clearing. However, > it is entirely possible that the compiler may choose to optimize away the > memory clearing especially if LTO is being used. To make sure that this > optimization will not happen, memzero_explicit(), which is introduced > in v3.18, is now used in kzfree() to do the clearing. > > Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()") > Cc: sta...@vger.kernel.org > Signed-off-by: Waiman Long Acked-by: Michal Hocko Although I am not really sure this is a stable material. Is there any known instance where the memset was optimized out from kzfree? > --- > mm/slab_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 9e72ba224175..37d48a56431d 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -1726,7 +1726,7 @@ void kzfree(const void *p) > if (unlikely(ZERO_OR_NULL_PTR(mem))) > return; > ks = ksize(mem); > - memset(mem, 0, ks); > + memzero_explicit(mem, ks); > kfree(mem); > } > EXPORT_SYMBOL(kzfree); > -- > 2.18.1 > -- Michal Hocko SUSE Labs
Re: [PATCH] powerpc/ptdump: Fix build failure in hashpagetable.c
Christophe Leroy writes: > H_SUCCESS is only defined when CONFIG_PPC_PSERIES is defined. It's always defined in hvcall.h, but it doesn't always get included via plpar_wrappers.h. It looks to be CONFIG_SMP=n that causes that, for SMP=y we get a copy via asm/spinlock.h > != H_SUCCESS means != 0. Modify the test accordingly. I guess that's an OK solution. > Reported-by: kernel test robot > Fixes: 65e701b2d2a8 ("powerpc/ptdump: drop non vital #ifdefs") > Cc: sta...@vger.kernel.org I don't think it needs to go to stable, none of the defconfigs hit it, we don't even really support SMP=n for 64-bit book3s. cheers > Signed-off-by: Christophe Leroy > --- > arch/powerpc/mm/ptdump/hashpagetable.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/ptdump/hashpagetable.c > b/arch/powerpc/mm/ptdump/hashpagetable.c > index a2c33efc7ce8..5b8bd34cd3a1 100644 > --- a/arch/powerpc/mm/ptdump/hashpagetable.c > +++ b/arch/powerpc/mm/ptdump/hashpagetable.c > @@ -258,7 +258,7 @@ static int pseries_find(unsigned long ea, int psize, bool > primary, u64 *v, u64 * > for (i = 0; i < HPTES_PER_GROUP; i += 4, hpte_group += 4) { > lpar_rc = plpar_pte_read_4(0, hpte_group, (void *)ptes); > > - if (lpar_rc != H_SUCCESS) > + if (lpar_rc) > continue; > for (j = 0; j < 4; j++) { > if (HPTE_V_COMPARE(ptes[j].v, want_v) && > -- > 2.25.0
Re: [PATCH v3] ASoC: fsl_ssi: Fix bclk calculation for mono channel
On Tue, Jun 16, 2020 at 10:53:48AM +0800, Shengjiu Wang wrote: > For mono channel, SSI will switch to Normal mode. > > In Normal mode and Network mode, the Word Length Control bits > control the word length divider in clock generator, which is > different with I2S Master mode (the word length is fixed to > 32bit), it should be the value of params_width(hw_params). > > The condition "slots == 2" is not good for I2S Master mode, > because for Network mode and Normal mode, the slots can also > be 2. Then we need to use (ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) > to check if it is I2S Master mode. > > So we refine the formula for mono channel, otherwise there > will be sound issue for S24_LE. > > Fixes: b0a7043d5c2c ("ASoC: fsl_ssi: Caculate bit clock rate using slot > number and width") > Signed-off-by: Shengjiu Wang Reviewed-by: Nicolin Chen
[PATCH v3] All arch: remove system call sys_sysctl
Since the commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"), sys_sysctl is actually unavailable: any input can only return an error. We have been warning about people using the sysctl system call for years and believe there are no more users. Even if there are users of this interface if they have not complained or fixed their code by now they probably are not going to, so there is no point in warning them any longer. So completely remove sys_sysctl on all architectures. Signed-off-by: Xiaoming Ni Acked-by: Will Deacon Acked-by: "Eric W. Biederman" changes in v3: restore include/uapi/linux/sysctl.h rebase on commit bc7d17d55762 ("Add linux-next specific files for 20200615") Conflicts: arch/sh/include/uapi/asm/unistd_64.h arch/sh/kernel/syscalls_64.S v2: According to Kees Cook's suggestion, completely remove sys_sysctl on all arch According to Eric W. Biederman's suggestion, update the commit log V1: https://lore.kernel.org/lkml/1591683605-8585-1-git-send-email-nixiaom...@huawei.com/ Delete the code of sys_sysctl and return -ENOSYS directly at the function entry --- arch/alpha/kernel/syscalls/syscall.tbl| 2 +- arch/arm/configs/am200epdkit_defconfig| 1 - arch/arm/tools/syscall.tbl| 2 +- arch/arm64/include/asm/unistd32.h | 4 +- arch/ia64/kernel/syscalls/syscall.tbl | 2 +- arch/m68k/kernel/syscalls/syscall.tbl | 2 +- arch/microblaze/kernel/syscalls/syscall.tbl | 2 +- arch/mips/configs/cu1000-neo_defconfig| 1 - arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +- arch/mips/kernel/syscalls/syscall_n64.tbl | 2 +- arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +- arch/parisc/kernel/syscalls/syscall.tbl | 2 +- arch/powerpc/kernel/syscalls/syscall.tbl | 2 +- arch/s390/kernel/syscalls/syscall.tbl | 2 +- arch/sh/configs/dreamcast_defconfig | 1 - arch/sh/configs/espt_defconfig| 1 - arch/sh/configs/hp6xx_defconfig | 1 - arch/sh/configs/landisk_defconfig | 1 - arch/sh/configs/lboxre2_defconfig | 1 - arch/sh/configs/microdev_defconfig| 1 - arch/sh/configs/migor_defconfig | 1 - arch/sh/configs/r7780mp_defconfig | 1 - arch/sh/configs/r7785rp_defconfig | 1 - arch/sh/configs/rts7751r2d1_defconfig | 1 - arch/sh/configs/rts7751r2dplus_defconfig | 1 - arch/sh/configs/se7206_defconfig | 1 - arch/sh/configs/se7343_defconfig | 1 - arch/sh/configs/se7619_defconfig | 1 - arch/sh/configs/se7705_defconfig | 1 - arch/sh/configs/se7750_defconfig | 1 - arch/sh/configs/se7751_defconfig | 1 - arch/sh/configs/secureedge5410_defconfig | 1 - arch/sh/configs/sh03_defconfig| 1 - arch/sh/configs/sh7710voipgw_defconfig| 1 - arch/sh/configs/sh7757lcr_defconfig | 1 - arch/sh/configs/sh7763rdp_defconfig | 1 - arch/sh/configs/shmin_defconfig | 1 - arch/sh/configs/titan_defconfig | 1 - arch/sh/kernel/syscalls/syscall.tbl | 2 +- arch/sparc/kernel/syscalls/syscall.tbl| 2 +- arch/x86/entry/syscalls/syscall_32.tbl| 2 +- arch/x86/entry/syscalls/syscall_64.tbl| 2 +- arch/xtensa/kernel/syscalls/syscall.tbl | 2 +- include/linux/compat.h| 1 - include/linux/syscalls.h | 2 - include/linux/sysctl.h| 6 +- kernel/Makefile | 2 +- kernel/sys_ni.c | 1 - kernel/sysctl_binary.c| 171 -- .../arch/powerpc/entry/syscalls/syscall.tbl | 2 +- .../perf/arch/s390/entry/syscalls/syscall.tbl | 2 +- .../arch/x86/entry/syscalls/syscall_64.tbl| 2 +- 52 files changed, 24 insertions(+), 227 deletions(-) delete mode 100644 kernel/sysctl_binary.c diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index b249824c8267..0da7f1c61529 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -249,7 +249,7 @@ 316common mlockallsys_mlockall 317common munlockall sys_munlockall 318common sysinfo sys_sysinfo -319common _sysctl sys_sysctl +319common _sysctl sys_ni_syscall # 320 was sys_idle 321common oldumount sys_oldumount 322common swapon sys_swapon diff --git a/arch/arm/configs/am200epdkit_defconfig b/arch/arm/configs/am200epdkit_defconfig index f56ac394caf1..4e49d6cb2f62 100644 ---
Re: [PATCH 2/2] cpufreq: Specify default governor on command line
On 15-06-20, 17:55, Quentin Perret wrote: > +static void cpufreq_get_default_governor(void) > +{ > + default_governor = cpufreq_parse_governor(cpufreq_param_governor); > + if (!default_governor) { > + if (*cpufreq_param_governor) > + pr_warn("Failed to find %s\n", cpufreq_param_governor); > + default_governor = cpufreq_default_governor(); A module_get() never happened for this case and so maybe a module_put() should never get called. > + } > +} > + > +static void cpufreq_put_default_governor(void) > +{ > + if (!default_governor) > + return; > + module_put(default_governor->owner); > + default_governor = NULL; > +} > + > static int cpufreq_init_governor(struct cpufreq_policy *policy) > { > int ret; > @@ -2701,6 +2721,8 @@ int cpufreq_register_driver(struct cpufreq_driver > *driver_data) > > if (driver_data->setpolicy) > driver_data->flags |= CPUFREQ_CONST_LOOPS; > + else > + cpufreq_get_default_governor(); > > if (cpufreq_boost_supported()) { > ret = create_boost_sysfs_file(); > @@ -2769,6 +2791,7 @@ int cpufreq_unregister_driver(struct cpufreq_driver > *driver) > subsys_interface_unregister(&cpufreq_interface); > remove_boost_sysfs_file(); > cpuhp_remove_state_nocalls_cpuslocked(hp_online); > + cpufreq_put_default_governor(); > > write_lock_irqsave(&cpufreq_driver_lock, flags); > > @@ -2792,4 +2815,5 @@ static int __init cpufreq_core_init(void) > return 0; > } And since this is a per boot thing, there is perhaps no need of doing these at driver register/unregister, I would rather do it at: cpufreq_core_init() time itself and so we will never need to run cpufreq_put_default_governor() and so can be removed. And another thing I am not able to understand (despite you commenting about that in the commit log) is what happens if the default governor chosen is built as a module ? -- viresh
Re: [PATCH 1/2] cpufreq: Register governors at core_initcall
On 15-06-20, 17:55, Quentin Perret wrote: > Currently, most CPUFreq governors are registered at core_initcall time > when used as default, and module_init otherwise. In preparation for > letting users specify the default governor on the kernel command line, > change all of them to use core_initcall unconditionally, as is already > the case for schedutil and performance. This will enable us to assume > builtin governors have been registered before the builtin CPUFreq > drivers probe. > > And since all governors now have similar init/exit patterns, introduce > two new macros cpufreq_governor_{init,exit}() to factorize the code. > > Signed-off-by: Quentin Perret > --- > Note: I couldn't boot-test the change to spudemand, by lack of hardware. > But I can confirm cell_defconfig compiles just fine. > --- > .../platforms/cell/cpufreq_spudemand.c| 26 ++- > drivers/cpufreq/cpufreq_conservative.c| 22 > drivers/cpufreq/cpufreq_ondemand.c| 24 + > drivers/cpufreq/cpufreq_performance.c | 14 ++ > drivers/cpufreq/cpufreq_powersave.c | 18 +++-- > drivers/cpufreq/cpufreq_userspace.c | 18 +++-- > include/linux/cpufreq.h | 14 ++ > kernel/sched/cpufreq_schedutil.c | 6 + > 8 files changed, 36 insertions(+), 106 deletions(-) Acked-by: Viresh Kumar -- viresh
Re: [PATCH 00/17] spelling.txt: /decriptors/descriptors/
On Tue, 9 Jun 2020 13:45:53 +0100, Kieran Bingham wrote: > I wouldn't normally go through spelling fixes, but I caught sight of > this typo twice, and then foolishly grepped the tree for it, and saw how > pervasive it was. > > so here I am ... fixing a typo globally... but with an addition in > scripts/spelling.txt so it shouldn't re-appear ;-) > > [...] Applied to 5.9/scsi-queue, thanks! [06/17] scsi: Fix trivial spelling https://git.kernel.org/mkp/scsi/c/0a19a725c0ed -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()
On Mon, Jun 15, 2020 at 09:57:16PM -0400, Waiman Long wrote: > The kzfree() function is normally used to clear some sensitive > information, like encryption keys, in the buffer before freeing it back > to the pool. Memset() is currently used for the buffer clearing. However, > it is entirely possible that the compiler may choose to optimize away the > memory clearing especially if LTO is being used. To make sure that this > optimization will not happen, memzero_explicit(), which is introduced > in v3.18, is now used in kzfree() to do the clearing. > > Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()") > Cc: sta...@vger.kernel.org > Signed-off-by: Waiman Long > --- > mm/slab_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 9e72ba224175..37d48a56431d 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -1726,7 +1726,7 @@ void kzfree(const void *p) > if (unlikely(ZERO_OR_NULL_PTR(mem))) > return; > ks = ksize(mem); > - memset(mem, 0, ks); > + memzero_explicit(mem, ks); > kfree(mem); > } > EXPORT_SYMBOL(kzfree); This is a good change, but the commit message isn't really accurate. AFAIK, no one has found any case where this memset() gets optimized out. And even with LTO, it would be virtually impossible due to all the synchronization and global data structures that kfree() uses. (Remember that this isn't the C standard function "free()", so the compiler can't assign it any special meaning.) Not to mention that LTO support isn't actually upstream yet. I still agree with the change, but it might be helpful if the commit message were honest that this is really a hardening measure and about properly conveying the intent. As-is this sounds like a critical fix, which might confuse people. - Eric
Re: [PATCH] kbuild: reuse vmlinux.o in vmlinux_link
On Tue, Jun 16, 2020 at 6:47 AM Sami Tolvanen wrote: > > On Sat, May 23, 2020 at 8:13 AM Masahiro Yamada wrote: > > > > Hi Nicholas, > > (+CC: Sam Ravnborg) > > > > > > On Sat, May 23, 2020 at 7:06 PM Nicholas Piggin wrote: > > > > > > Excerpts from Masahiro Yamada's message of May 23, 2020 3:44 am: > > > > + Michael, and PPC ML. > > > > > > > > They may know something about the reason of failure. > > > > > > Because the linker can't put branch stubs within object code sections, > > > so when you incrementally link them too large, the linker can't resolve > > > branches into other object files. > > > > > > Ah, you are right. > > > > So, this is a problem not only for PPC > > but also for ARM (both 32 and 64 bit), etc. > > > > ARM needs to insert a veneer to jump far. > > > > Prior to thin archive, we could not compile > > ARCH=arm allyesconfig because > > drivers/built-in.o was too large. > > > > This patch gets us back to the too large > > incremental object situation. > > > > With my quick compile-testing, > > ARCH=arm allyesconfig > > and ARCH=arm64 allyesconfig are broken. > > Thanks for looking into this! Clang doesn't appear to have this issue > with LTO because it always enables both -ffunction-sections and > -fdata-sections. I confirmed that -ffunction-sections also fixes arm64 > allyesconfig with this patch. While I'm fine with reusing vmlinux.o > only with LTO, how would you feel about enabling -ffunction-sections > in the kernel by default? I am OK if it works. Please do compile tests for some architectures. (especially, ARCH=powerpc defconfig, and ARCH=arm(64) allyesconfig) Thank you. -- Best Regards Masahiro Yamada
[PATCH v3] ASoC: fsl_ssi: Fix bclk calculation for mono channel
For mono channel, SSI will switch to Normal mode. In Normal mode and Network mode, the Word Length Control bits control the word length divider in clock generator, which is different with I2S Master mode (the word length is fixed to 32bit), it should be the value of params_width(hw_params). The condition "slots == 2" is not good for I2S Master mode, because for Network mode and Normal mode, the slots can also be 2. Then we need to use (ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) to check if it is I2S Master mode. So we refine the formula for mono channel, otherwise there will be sound issue for S24_LE. Fixes: b0a7043d5c2c ("ASoC: fsl_ssi: Caculate bit clock rate using slot number and width") Signed-off-by: Shengjiu Wang --- changes in v3 - update according to Nicolin's comments changes in v2 - refine patch for Network mode and Normal mode. sound/soc/fsl/fsl_ssi.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c index bad89b0d129e..1a2fa7f18142 100644 --- a/sound/soc/fsl/fsl_ssi.c +++ b/sound/soc/fsl/fsl_ssi.c @@ -678,8 +678,9 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream *substream, struct regmap *regs = ssi->regs; u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i; unsigned long clkrate, baudrate, tmprate; - unsigned int slots = params_channels(hw_params); - unsigned int slot_width = 32; + unsigned int channels = params_channels(hw_params); + unsigned int slot_width = params_width(hw_params); + unsigned int slots = 2; u64 sub, savesub = 10; unsigned int freq; bool baudclk_is_used; @@ -688,10 +689,14 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream *substream, /* Override slots and slot_width if being specifically set... */ if (ssi->slots) slots = ssi->slots; - /* ...but keep 32 bits if slots is 2 -- I2S Master mode */ - if (ssi->slot_width && slots != 2) + if (ssi->slot_width) slot_width = ssi->slot_width; + /* ...but force 32 bits for stereo audio using I2S Master Mode */ + if (channels == 2 && + (ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) == SSI_SCR_I2S_MODE_MASTER) + slot_width = 32; + /* Generate bit clock based on the slot number and slot width */ freq = slots * slot_width * params_rate(hw_params); -- 2.21.0
[PATCH 2/2] cpufreq: Specify default governor on command line
Currently, the only way to specify the default CPUfreq governor is via Kconfig options, which suits users who can build the kernel themselves perfectly. However, for those who use a distro-like kernel (such as Android, with the Generic Kernel Image project), the only way to use a different default is to boot to userspace, and to then switch using the sysfs interface. Being able to specify the default governor on the command line, like is the case for cpuidle, would enable those users to specify their governor of choice earlier on, and to simplify slighlty the userspace boot procedure. To support this use-case, add a kernel command line parameter enabling to specify a default governor for CPUfreq, which takes precedence over the builtin default. This implementation has one notable limitation: the default governor must be registered before the driver. This is solved for builtin governors and drivers using appropriate *_initcall() functions. And in the modular case, this must be reflected as a constraint on the module loading order. Signed-off-by: Quentin Perret --- .../admin-guide/kernel-parameters.txt | 5 +++ Documentation/admin-guide/pm/cpufreq.rst | 6 ++-- drivers/cpufreq/cpufreq.c | 34 --- 3 files changed, 37 insertions(+), 8 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fb95fad81c79..5fd3c9f187eb 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -703,6 +703,11 @@ cpufreq.off=1 [CPU_FREQ] disable the cpufreq sub-system + cpufreq.default_governor= + [CPU_FREQ] Name of the default cpufreq governor to use. + This governor must be registered in the kernel before + the cpufreq driver probes. + cpu_init_udelay=N [X86] Delay for N microsec between assert and de-assert of APIC INIT to start processors. This delay occurs diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst index 0c74a7784964..368e612145d2 100644 --- a/Documentation/admin-guide/pm/cpufreq.rst +++ b/Documentation/admin-guide/pm/cpufreq.rst @@ -147,9 +147,9 @@ CPUs in it. The next major initialization step for a new policy object is to attach a scaling governor to it (to begin with, that is the default scaling governor -determined by the kernel configuration, but it may be changed later -via ``sysfs``). First, a pointer to the new policy object is passed to the -governor's ``->init()`` callback which is expected to initialize all of the +determined by the kernel command line or configuration, but it may be changed +later via ``sysfs``). First, a pointer to the new policy object is passed to +the governor's ``->init()`` callback which is expected to initialize all of the data structures necessary to handle the given policy and, possibly, to add a governor ``sysfs`` interface to it. Next, the governor is started by invoking its ``->start()`` callback. diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0128de3603df..0f05caedc320 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -50,6 +50,9 @@ static LIST_HEAD(cpufreq_governor_list); #define for_each_governor(__governor) \ list_for_each_entry(__governor, &cpufreq_governor_list, governor_list) +static char cpufreq_param_governor[CPUFREQ_NAME_LEN]; +static struct cpufreq_governor *default_governor; + /** * The "cpufreq driver" - the arch- or hardware-dependent low * level driver of CPUFreq support, and its spinlock. This lock @@ -1055,7 +1058,6 @@ __weak struct cpufreq_governor *cpufreq_default_governor(void) static int cpufreq_init_policy(struct cpufreq_policy *policy) { - struct cpufreq_governor *def_gov = cpufreq_default_governor(); struct cpufreq_governor *gov = NULL; unsigned int pol = CPUFREQ_POLICY_UNKNOWN; @@ -1065,8 +1067,8 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy) if (gov) { pr_debug("Restoring governor %s for cpu %d\n", policy->governor->name, policy->cpu); - } else if (def_gov) { - gov = def_gov; + } else if (default_governor) { + gov = default_governor; } else { return -ENODATA; } @@ -1074,8 +1076,8 @@ static int cpufreq_init_policy(struct cpufreq_policy *policy) /* Use the default policy if there is no last_policy. */ if (policy->last_policy) { pol = policy->last_policy; - } else if (def_gov) { - pol = cpufreq_parse_policy(def_gov->name); +
Re: [PATCH v13 3/6] powerpc/papr_scm: Fetch nvdimm health information from PHYP
On Mon, Jun 15, 2020 at 06:14:04PM +0530, Vaibhav Jain wrote: > Implement support for fetching nvdimm health information via > H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair > of 64-bit bitmap, bitwise-and of which is then stored in > 'struct papr_scm_priv' and subsequently partially exposed to > user-space via newly introduced dimm specific attribute > 'papr/flags'. Since the hcall is costly, the health information is > cached and only re-queried, 60s after the previous successful hcall. > > The patch also adds a documentation text describing flags reported by > the the new sysfs attribute 'papr/flags' is also introduced at > Documentation/ABI/testing/sysfs-bus-papr-pmem. > > [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for > PAPR hcalls") > > Cc: "Aneesh Kumar K . V" > Cc: Dan Williams > Cc: Michael Ellerman > Cc: Ira Weiny Reviewed-by: Ira Weiny > Signed-off-by: Vaibhav Jain > --- > Changelog: > > v12..v13: > * None > > v11..v12: > * None > > v10..v11: > * None > > v9..v10: > * Removed an avoidable 'goto' in __drc_pmem_query_health. [ Ira ]. > > Resend: > * Added ack from Aneesh. > > v8..v9: > * Rename some variables and defines to reduce usage of term SCM > replacing it with PMEM [Dan Williams, Aneesh] > * s/PAPR_SCM_DIMM/PAPR_PMEM/g > * s/papr_scm_nd_attributes/papr_nd_attributes/g > * s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g > * s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g > * Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem > > v7..v8: > * Update type of variable 'rc' in __drc_pmem_query_health() and > drc_pmem_query_health() to long and int respectively. [ Ira ] > * Updated the patch description to s/64 bit Big Endian Number/64-bit > bitmap/ [ Ira, Aneesh ]. > > Resend: > * None > > v6..v7 : > * Used the exported buf_seq_printf() function to generate content for > 'papr/flags' > * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c > and removed the papr_scm.h file [Mpe] > * Some minor consistency issued in sysfs-bus-papr-scm > documentation. [Mpe] > * s/dimm_mutex/health_mutex/g [Mpe] > * Split drc_pmem_query_health() into two function one of which takes > care of caching and locking. [Mpe] > * Fixed a local copy creation of dimm health information using > READ_ONCE(). [Mpe] > > v5..v6 : > * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags' > [Dan Williams] > * Include documentation for 'papr/flags' attr [Dan Williams] > * Change flag 'save_fail' to 'flush_fail' [Dan Williams] > * Caching of health bitmap to reduce expensive hcalls [Dan Williams] > * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe] > * Replaced two __be64 integers from papr_scm_priv to a single u64 > integer [Mpe] > * Updated patch description to reflect the changes made in this > version. > * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from > flags_show() [Dan Williams] > > v4..v5 : > * None > > v3..v4 : > * None > > v2..v3 : > * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for >NVDIMM unarmed [Aneesh] > > v1..v2 : > * New patch in the series. > --- > Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++ > arch/powerpc/platforms/pseries/papr_scm.c | 168 +- > 2 files changed, 193 insertions(+), 2 deletions(-) > create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem > > diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem > b/Documentation/ABI/testing/sysfs-bus-papr-pmem > new file mode 100644 > index ..5b10d036a8d4 > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem > @@ -0,0 +1,27 @@ > +What:/sys/bus/nd/devices/nmemX/papr/flags > +Date:Apr, 2020 > +KernelVersion: v5.8 > +Contact: linuxppc-dev , > linux-nvd...@lists.01.org, > +Description: > + (RO) Report flags indicating various states of a > + papr-pmem NVDIMM device. Each flag maps to a one or > + more bits set in the dimm-health-bitmap retrieved in > + response to H_SCM_HEALTH hcall. The details of the bit > + flags returned in response to this hcall is available > + at 'Documentation/powerpc/papr_hcalls.rst' . Below are > + the flags reported in this sysfs file: > + > + * "not_armed" : Indicates that NVDIMM contents will not > + survive a power cycle. > + * "flush_fail" : Indicates that NVDIMM contents > + couldn't be flushed during last > + shut-down event. > + * "restore_fail": Indicates that NVDIMM contents > + couldn't be restored during NVDIMM > + initialization. > + * "encrypted" : NVDIMM contents are encrypted. > + * "smart_notify": There is health event for the N
Re: [PATCH v2] ASoC: fsl_ssi: Fix bclk calculation for mono channel
On Tue, Jun 16, 2020 at 9:59 AM Nicolin Chen wrote: > > On Tue, Jun 16, 2020 at 09:48:39AM +0800, Shengjiu Wang wrote: > > On Tue, Jun 16, 2020 at 7:11 AM Nicolin Chen wrote: > > > > > > On Mon, Jun 15, 2020 at 01:56:18PM +0800, Shengjiu Wang wrote: > > > > For mono channel, SSI will switch to Normal mode. > > > > > > > > In Normal mode and Network mode, the Word Length Control bits > > > > control the word length divider in clock generator, which is > > > > different with I2S Master mode (the word length is fixed to > > > > 32bit), it should be the value of params_width(hw_params). > > > > > > > > The condition "slots == 2" is not good for I2S Master mode, > > > > because for Network mode and Normal mode, the slots can also > > > > be 2. Then we need to use (ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) > > > > to check if it is I2S Master mode. > > > > > > The fsl_ssi_set_bclk is only called when fsl_ssi_is_i2s_master, > > > though I agree that that line of comments sounds confusing now. > > > > Actually I think fsl_ssi_is_i2s_master is not accurate, it just checks > > the Master mode, but didn't check the I2S mode. > > > > > > > > > So we refine the famula for mono channel, otherwise there > > > > > > famula => formula? > > > > > > > will be sound issue for S24_LE. > > > > > > > > Fixes: b0a7043d5c2c ("ASoC: fsl_ssi: Caculate bit clock rate using slot > > > > number and width") > > > > Signed-off-by: Shengjiu Wang > > > > --- > > > > changes in v2 > > > > - refine patch for Network mode and Normal mode. > > > > > > > > sound/soc/fsl/fsl_ssi.c | 15 +++ > > > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c > > > > index bad89b0d129e..cbf67d132fda 100644 > > > > --- a/sound/soc/fsl/fsl_ssi.c > > > > +++ b/sound/soc/fsl/fsl_ssi.c > > > > @@ -678,7 +678,8 @@ static int fsl_ssi_set_bclk(struct > > > > snd_pcm_substream *substream, > > > > struct regmap *regs = ssi->regs; > > > > u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i; > > > > unsigned long clkrate, baudrate, tmprate; > > > > - unsigned int slots = params_channels(hw_params); > > > > + unsigned int channels = params_channels(hw_params); > > > > + unsigned int slots; > > > > unsigned int slot_width = 32; > > > > u64 sub, savesub = 10; > > > > unsigned int freq; > > > > @@ -688,9 +689,15 @@ static int fsl_ssi_set_bclk(struct > > > > snd_pcm_substream *substream, > > > > /* Override slots and slot_width if being specifically set... */ > > > > if (ssi->slots) > > > > slots = ssi->slots; > > > > - /* ...but keep 32 bits if slots is 2 -- I2S Master mode */ > > > > - if (ssi->slot_width && slots != 2) > > > > - slot_width = ssi->slot_width; > > > > + else > > > > + /* Apply two slots for mono channel, because DC = 2 */ > > > > + slots = (channels == 1) ? 2 : channels; > > > > + > > > > + /* ...but keep 32 bits if I2S Master mode */ > > > > + if ((ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) != > > > > SSI_SCR_I2S_MODE_MASTER || > > > > + channels == 1) > > > > + slot_width = ssi->slot_width ? ssi->slot_width : > > > > > > This looks very complicated...can you review and try mine? > > > (Basically, take 32-bit out of default but force it later) > > > > > > @@ -678,8 +678,9 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > > > *substream, > > > struct regmap *regs = ssi->regs; > > > u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i; > > > unsigned long clkrate, baudrate, tmprate; > > > - unsigned int slots = params_channels(hw_params); > > > - unsigned int slot_width = 32; > > > + unsigned int channels = params_channels(hw_params); > > > + unsigned int slot_width = params_width(hw_params); > > > + unsigned int slots = 2; > > > u64 sub, savesub = 10; > > > unsigned int freq; > > > bool baudclk_is_used; > > > @@ -688,10 +689,16 @@ static int fsl_ssi_set_bclk(struct > > > snd_pcm_substream *substream, > > > /* Override slots and slot_width if being specifically set... */ > > > if (ssi->slots) > > > slots = ssi->slots; > > > - /* ...but keep 32 bits if slots is 2 -- I2S Master mode */ > > > - if (ssi->slot_width && slots != 2) > > > + if (ssi->slot_width) > > > slot_width = ssi->slot_width; > > > > > > + /* > > > +* ...but force 32 bits for stereo audio. Note that mono audio is > > > also > > > +* sent in 2 slots via NORMAL mode, so check both slots and > > > channels. > > > +*/ > > > + if (slots == 2 && channels == 2) > > > + slot_width = 32; > > > > slots ==2 && channels ==2 does not mean the I2S Master mode. > > For LEFT_J, it is also slots =2 & channels = 2, then the slot_width > > should be params_width
Re: [PATCH v2] ASoC: fsl_ssi: Fix bclk calculation for mono channel
On Tue, Jun 16, 2020 at 09:48:39AM +0800, Shengjiu Wang wrote: > On Tue, Jun 16, 2020 at 7:11 AM Nicolin Chen wrote: > > > > On Mon, Jun 15, 2020 at 01:56:18PM +0800, Shengjiu Wang wrote: > > > For mono channel, SSI will switch to Normal mode. > > > > > > In Normal mode and Network mode, the Word Length Control bits > > > control the word length divider in clock generator, which is > > > different with I2S Master mode (the word length is fixed to > > > 32bit), it should be the value of params_width(hw_params). > > > > > > The condition "slots == 2" is not good for I2S Master mode, > > > because for Network mode and Normal mode, the slots can also > > > be 2. Then we need to use (ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) > > > to check if it is I2S Master mode. > > > > The fsl_ssi_set_bclk is only called when fsl_ssi_is_i2s_master, > > though I agree that that line of comments sounds confusing now. > > Actually I think fsl_ssi_is_i2s_master is not accurate, it just checks > the Master mode, but didn't check the I2S mode. > > > > > > So we refine the famula for mono channel, otherwise there > > > > famula => formula? > > > > > will be sound issue for S24_LE. > > > > > > Fixes: b0a7043d5c2c ("ASoC: fsl_ssi: Caculate bit clock rate using slot > > > number and width") > > > Signed-off-by: Shengjiu Wang > > > --- > > > changes in v2 > > > - refine patch for Network mode and Normal mode. > > > > > > sound/soc/fsl/fsl_ssi.c | 15 +++ > > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > > > diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c > > > index bad89b0d129e..cbf67d132fda 100644 > > > --- a/sound/soc/fsl/fsl_ssi.c > > > +++ b/sound/soc/fsl/fsl_ssi.c > > > @@ -678,7 +678,8 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > > > *substream, > > > struct regmap *regs = ssi->regs; > > > u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i; > > > unsigned long clkrate, baudrate, tmprate; > > > - unsigned int slots = params_channels(hw_params); > > > + unsigned int channels = params_channels(hw_params); > > > + unsigned int slots; > > > unsigned int slot_width = 32; > > > u64 sub, savesub = 10; > > > unsigned int freq; > > > @@ -688,9 +689,15 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > > > *substream, > > > /* Override slots and slot_width if being specifically set... */ > > > if (ssi->slots) > > > slots = ssi->slots; > > > - /* ...but keep 32 bits if slots is 2 -- I2S Master mode */ > > > - if (ssi->slot_width && slots != 2) > > > - slot_width = ssi->slot_width; > > > + else > > > + /* Apply two slots for mono channel, because DC = 2 */ > > > + slots = (channels == 1) ? 2 : channels; > > > + > > > + /* ...but keep 32 bits if I2S Master mode */ > > > + if ((ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) != > > > SSI_SCR_I2S_MODE_MASTER || > > > + channels == 1) > > > + slot_width = ssi->slot_width ? ssi->slot_width : > > > > This looks very complicated...can you review and try mine? > > (Basically, take 32-bit out of default but force it later) > > > > @@ -678,8 +678,9 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > > *substream, > > struct regmap *regs = ssi->regs; > > u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i; > > unsigned long clkrate, baudrate, tmprate; > > - unsigned int slots = params_channels(hw_params); > > - unsigned int slot_width = 32; > > + unsigned int channels = params_channels(hw_params); > > + unsigned int slot_width = params_width(hw_params); > > + unsigned int slots = 2; > > u64 sub, savesub = 10; > > unsigned int freq; > > bool baudclk_is_used; > > @@ -688,10 +689,16 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > > *substream, > > /* Override slots and slot_width if being specifically set... */ > > if (ssi->slots) > > slots = ssi->slots; > > - /* ...but keep 32 bits if slots is 2 -- I2S Master mode */ > > - if (ssi->slot_width && slots != 2) > > + if (ssi->slot_width) > > slot_width = ssi->slot_width; > > > > + /* > > +* ...but force 32 bits for stereo audio. Note that mono audio is > > also > > +* sent in 2 slots via NORMAL mode, so check both slots and > > channels. > > +*/ > > + if (slots == 2 && channels == 2) > > + slot_width = 32; > > slots ==2 && channels ==2 does not mean the I2S Master mode. > For LEFT_J, it is also slots =2 & channels = 2, then the slot_width > should be params_width(hw_params). > and DSP_A/B also supports stereo. I think you have a point. Then would this condition work? + /* ...but force 32 bits for stereo audio using I2S Master Mode */ + if (channels == 2 && + ssi->i2s_net & SSI_SCR_I2S_MODE_MASK == SSI_SCR_
[PATCH v4 3/3] btrfs: Use kfree() in btrfs_ioctl_get_subvol_info()
In btrfs_ioctl_get_subvol_info(), there is a classic case where kzalloc() was incorrectly paired with kzfree(). According to David Sterba, there isn't any sensitive information in the subvol_info that needs to be cleared before freeing. So kfree_sensitive() isn't really needed, use kfree() instead. Reported-by: David Sterba Signed-off-by: Waiman Long --- fs/btrfs/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f1dd9e4271e9..e8f7c5f00894 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2692,7 +2692,7 @@ static int btrfs_ioctl_get_subvol_info(struct file *file, void __user *argp) btrfs_put_root(root); out_free: btrfs_free_path(path); - kfree_sensitive(subvol_info); + kfree(subvol_info); return ret; } -- 2.18.1
[PATCH v4 2/3] mm, treewide: Rename kzfree() to kfree_sensitive()
As said by Linus: A symmetric naming is only helpful if it implies symmetries in use. Otherwise it's actively misleading. In "kzalloc()", the z is meaningful and an important part of what the caller wants. In "kzfree()", the z is actively detrimental, because maybe in the future we really _might_ want to use that "memfill(0xdeadbeef)" or something. The "zero" part of the interface isn't even _relevant_. The main reason that kzfree() exists is to clear sensitive information that should not be leaked to other future users of the same memory objects. Rename kzfree() to kfree_sensitive() to follow the example of the recently added kvfree_sensitive() and make the intention of the API more explicit. In addition, memzero_explicit() is used to clear the memory to make sure that it won't get optimized away by the compiler. The renaming is done by using the command sequence: git grep -w --name-only kzfree |\ xargs sed -i 's/\bkzfree\b/kfree_sensitive/' followed by some editing of the kfree_sensitive() kerneldoc and the use of memzero_explicit() instead of memset(). Suggested-by: Joe Perches Acked-by: David Howells Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Waiman Long --- arch/s390/crypto/prng.c | 4 +-- arch/x86/power/hibernate.c| 2 +- crypto/adiantum.c | 2 +- crypto/ahash.c| 4 +-- crypto/api.c | 2 +- crypto/asymmetric_keys/verify_pefile.c| 4 +-- crypto/deflate.c | 2 +- crypto/drbg.c | 10 +++--- crypto/ecc.c | 8 ++--- crypto/ecdh.c | 2 +- crypto/gcm.c | 2 +- crypto/gf128mul.c | 4 +-- crypto/jitterentropy-kcapi.c | 2 +- crypto/rng.c | 2 +- crypto/rsa-pkcs1pad.c | 6 ++-- crypto/seqiv.c| 2 +- crypto/shash.c| 2 +- crypto/skcipher.c | 2 +- crypto/testmgr.c | 6 ++-- crypto/zstd.c | 2 +- .../allwinner/sun8i-ce/sun8i-ce-cipher.c | 2 +- .../allwinner/sun8i-ss/sun8i-ss-cipher.c | 2 +- drivers/crypto/amlogic/amlogic-gxl-cipher.c | 4 +-- drivers/crypto/atmel-ecc.c| 2 +- drivers/crypto/caam/caampkc.c | 28 +++ drivers/crypto/cavium/cpt/cptvf_main.c| 6 ++-- drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 12 +++ drivers/crypto/cavium/nitrox/nitrox_lib.c | 4 +-- drivers/crypto/cavium/zip/zip_crypto.c| 6 ++-- drivers/crypto/ccp/ccp-crypto-rsa.c | 6 ++-- drivers/crypto/ccree/cc_aead.c| 4 +-- drivers/crypto/ccree/cc_buffer_mgr.c | 4 +-- drivers/crypto/ccree/cc_cipher.c | 6 ++-- drivers/crypto/ccree/cc_hash.c| 8 ++--- drivers/crypto/ccree/cc_request_mgr.c | 2 +- drivers/crypto/marvell/cesa/hash.c| 2 +- .../crypto/marvell/octeontx/otx_cptvf_main.c | 6 ++-- .../marvell/octeontx/otx_cptvf_reqmgr.h | 2 +- drivers/crypto/mediatek/mtk-aes.c | 2 +- drivers/crypto/nx/nx.c| 4 +-- drivers/crypto/virtio/virtio_crypto_algs.c| 12 +++ drivers/crypto/virtio/virtio_crypto_core.c| 2 +- drivers/md/dm-crypt.c | 32 - drivers/md/dm-integrity.c | 6 ++-- drivers/misc/ibmvmc.c | 6 ++-- .../hisilicon/hns3/hns3pf/hclge_mbx.c | 2 +- .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 6 ++-- drivers/net/ppp/ppp_mppe.c| 6 ++-- drivers/net/wireguard/noise.c | 4 +-- drivers/net/wireguard/peer.c | 2 +- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +- .../net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 6 ++-- drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 6 ++-- drivers/net/wireless/intersil/orinoco/wext.c | 4 +-- drivers/s390/crypto/ap_bus.h | 4 +-- drivers/staging/ks7010/ks_hostif.c| 2 +- drivers/staging/rtl8723bs/core/rtw_security.c | 2 +- drivers/staging/wlan-ng/p80211netdev.c| 2 +- drivers/target/iscsi/iscsi_target_auth.c | 2 +- fs/btrfs/ioctl.c | 2 +- fs/cifs/cifsencrypt.c | 2 +- fs/cifs/connect.c | 10 +++--- fs/cifs/dfs_cache.c | 2 +- fs/cifs/misc.c| 8 ++--- fs/crypto/keyring.c | 6 ++-- fs/crypto/keysetup_v1.c | 4 +-- fs/ecryptfs/keystore.c
[PATCH v4 1/3] mm/slab: Use memzero_explicit() in kzfree()
The kzfree() function is normally used to clear some sensitive information, like encryption keys, in the buffer before freeing it back to the pool. Memset() is currently used for the buffer clearing. However, it is entirely possible that the compiler may choose to optimize away the memory clearing especially if LTO is being used. To make sure that this optimization will not happen, memzero_explicit(), which is introduced in v3.18, is now used in kzfree() to do the clearing. Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()") Cc: sta...@vger.kernel.org Signed-off-by: Waiman Long --- mm/slab_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 9e72ba224175..37d48a56431d 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1726,7 +1726,7 @@ void kzfree(const void *p) if (unlikely(ZERO_OR_NULL_PTR(mem))) return; ks = ksize(mem); - memset(mem, 0, ks); + memzero_explicit(mem, ks); kfree(mem); } EXPORT_SYMBOL(kzfree); -- 2.18.1
[PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
v4: - Break out the memzero_explicit() change as suggested by Dan Carpenter so that it can be backported to stable. - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for now as there can be a bit more discussion on what is best. It will be introduced as a separate patch later on after this one is merged. This patchset makes a global rename of the kzfree() to kfree_sensitive() to highlight the fact buffer clearing is only needed if the data objects contain sensitive information like encrpytion key. The fact that kzfree() uses memset() to do the clearing isn't totally safe either as compiler may compile out the clearing in their optimizer especially if LTO is used. Instead, the new kfree_sensitive() uses memzero_explicit() which won't get compiled out. Waiman Long (3): mm/slab: Use memzero_explicit() in kzfree() mm, treewide: Rename kzfree() to kfree_sensitive() btrfs: Use kfree() in btrfs_ioctl_get_subvol_info() arch/s390/crypto/prng.c | 4 +-- arch/x86/power/hibernate.c| 2 +- crypto/adiantum.c | 2 +- crypto/ahash.c| 4 +-- crypto/api.c | 2 +- crypto/asymmetric_keys/verify_pefile.c| 4 +-- crypto/deflate.c | 2 +- crypto/drbg.c | 10 +++--- crypto/ecc.c | 8 ++--- crypto/ecdh.c | 2 +- crypto/gcm.c | 2 +- crypto/gf128mul.c | 4 +-- crypto/jitterentropy-kcapi.c | 2 +- crypto/rng.c | 2 +- crypto/rsa-pkcs1pad.c | 6 ++-- crypto/seqiv.c| 2 +- crypto/shash.c| 2 +- crypto/skcipher.c | 2 +- crypto/testmgr.c | 6 ++-- crypto/zstd.c | 2 +- .../allwinner/sun8i-ce/sun8i-ce-cipher.c | 2 +- .../allwinner/sun8i-ss/sun8i-ss-cipher.c | 2 +- drivers/crypto/amlogic/amlogic-gxl-cipher.c | 4 +-- drivers/crypto/atmel-ecc.c| 2 +- drivers/crypto/caam/caampkc.c | 28 +++ drivers/crypto/cavium/cpt/cptvf_main.c| 6 ++-- drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 12 +++ drivers/crypto/cavium/nitrox/nitrox_lib.c | 4 +-- drivers/crypto/cavium/zip/zip_crypto.c| 6 ++-- drivers/crypto/ccp/ccp-crypto-rsa.c | 6 ++-- drivers/crypto/ccree/cc_aead.c| 4 +-- drivers/crypto/ccree/cc_buffer_mgr.c | 4 +-- drivers/crypto/ccree/cc_cipher.c | 6 ++-- drivers/crypto/ccree/cc_hash.c| 8 ++--- drivers/crypto/ccree/cc_request_mgr.c | 2 +- drivers/crypto/marvell/cesa/hash.c| 2 +- .../crypto/marvell/octeontx/otx_cptvf_main.c | 6 ++-- .../marvell/octeontx/otx_cptvf_reqmgr.h | 2 +- drivers/crypto/mediatek/mtk-aes.c | 2 +- drivers/crypto/nx/nx.c| 4 +-- drivers/crypto/virtio/virtio_crypto_algs.c| 12 +++ drivers/crypto/virtio/virtio_crypto_core.c| 2 +- drivers/md/dm-crypt.c | 32 - drivers/md/dm-integrity.c | 6 ++-- drivers/misc/ibmvmc.c | 6 ++-- .../hisilicon/hns3/hns3pf/hclge_mbx.c | 2 +- .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 6 ++-- drivers/net/ppp/ppp_mppe.c| 6 ++-- drivers/net/wireguard/noise.c | 4 +-- drivers/net/wireguard/peer.c | 2 +- drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 2 +- .../net/wireless/intel/iwlwifi/pcie/tx-gen2.c | 6 ++-- drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 6 ++-- drivers/net/wireless/intersil/orinoco/wext.c | 4 +-- drivers/s390/crypto/ap_bus.h | 4 +-- drivers/staging/ks7010/ks_hostif.c| 2 +- drivers/staging/rtl8723bs/core/rtw_security.c | 2 +- drivers/staging/wlan-ng/p80211netdev.c| 2 +- drivers/target/iscsi/iscsi_target_auth.c | 2 +- fs/btrfs/ioctl.c | 2 +- fs/cifs/cifsencrypt.c | 2 +- fs/cifs/connect.c | 10 +++--- fs/cifs/dfs_cache.c | 2 +- fs/cifs/misc.c| 8 ++--- fs/crypto/keyring.c | 6 ++-- fs/crypto/keysetup_v1.c | 4 +-- fs/ecryptfs/keystore.c| 4 +-- fs/ecryptfs/messaging.c | 2 +- include/crypto/aead.h | 2 +- include/crypto/akcipher.h | 2 +- include/crypto/gf128mul.h | 2 +- include/crypto/hash.h
Re: [PATCH v2] ASoC: fsl_ssi: Fix bclk calculation for mono channel
On Tue, Jun 16, 2020 at 7:11 AM Nicolin Chen wrote: > > On Mon, Jun 15, 2020 at 01:56:18PM +0800, Shengjiu Wang wrote: > > For mono channel, SSI will switch to Normal mode. > > > > In Normal mode and Network mode, the Word Length Control bits > > control the word length divider in clock generator, which is > > different with I2S Master mode (the word length is fixed to > > 32bit), it should be the value of params_width(hw_params). > > > > The condition "slots == 2" is not good for I2S Master mode, > > because for Network mode and Normal mode, the slots can also > > be 2. Then we need to use (ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) > > to check if it is I2S Master mode. > > The fsl_ssi_set_bclk is only called when fsl_ssi_is_i2s_master, > though I agree that that line of comments sounds confusing now. Actually I think fsl_ssi_is_i2s_master is not accurate, it just checks the Master mode, but didn't check the I2S mode. > > > So we refine the famula for mono channel, otherwise there > > famula => formula? > > > will be sound issue for S24_LE. > > > > Fixes: b0a7043d5c2c ("ASoC: fsl_ssi: Caculate bit clock rate using slot > > number and width") > > Signed-off-by: Shengjiu Wang > > --- > > changes in v2 > > - refine patch for Network mode and Normal mode. > > > > sound/soc/fsl/fsl_ssi.c | 15 +++ > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c > > index bad89b0d129e..cbf67d132fda 100644 > > --- a/sound/soc/fsl/fsl_ssi.c > > +++ b/sound/soc/fsl/fsl_ssi.c > > @@ -678,7 +678,8 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > > *substream, > > struct regmap *regs = ssi->regs; > > u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i; > > unsigned long clkrate, baudrate, tmprate; > > - unsigned int slots = params_channels(hw_params); > > + unsigned int channels = params_channels(hw_params); > > + unsigned int slots; > > unsigned int slot_width = 32; > > u64 sub, savesub = 10; > > unsigned int freq; > > @@ -688,9 +689,15 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > > *substream, > > /* Override slots and slot_width if being specifically set... */ > > if (ssi->slots) > > slots = ssi->slots; > > - /* ...but keep 32 bits if slots is 2 -- I2S Master mode */ > > - if (ssi->slot_width && slots != 2) > > - slot_width = ssi->slot_width; > > + else > > + /* Apply two slots for mono channel, because DC = 2 */ > > + slots = (channels == 1) ? 2 : channels; > > + > > + /* ...but keep 32 bits if I2S Master mode */ > > + if ((ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) != SSI_SCR_I2S_MODE_MASTER > > || > > + channels == 1) > > + slot_width = ssi->slot_width ? ssi->slot_width : > > This looks very complicated...can you review and try mine? > (Basically, take 32-bit out of default but force it later) > > @@ -678,8 +678,9 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > *substream, > struct regmap *regs = ssi->regs; > u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i; > unsigned long clkrate, baudrate, tmprate; > - unsigned int slots = params_channels(hw_params); > - unsigned int slot_width = 32; > + unsigned int channels = params_channels(hw_params); > + unsigned int slot_width = params_width(hw_params); > + unsigned int slots = 2; > u64 sub, savesub = 10; > unsigned int freq; > bool baudclk_is_used; > @@ -688,10 +689,16 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > *substream, > /* Override slots and slot_width if being specifically set... */ > if (ssi->slots) > slots = ssi->slots; > - /* ...but keep 32 bits if slots is 2 -- I2S Master mode */ > - if (ssi->slot_width && slots != 2) > + if (ssi->slot_width) > slot_width = ssi->slot_width; > > + /* > +* ...but force 32 bits for stereo audio. Note that mono audio is also > +* sent in 2 slots via NORMAL mode, so check both slots and channels. > +*/ > + if (slots == 2 && channels == 2) > + slot_width = 32; slots ==2 && channels ==2 does not mean the I2S Master mode. For LEFT_J, it is also slots =2 & channels = 2, then the slot_width should be params_width(hw_params). and DSP_A/B also supports stereo.
Re: [PATCH v5 01/13] powerpc: Remove Xilinx PPC405/PPC440 support
On Thu, May 21, 2020 at 04:55:52PM +, Christophe Leroy wrote: > From: Michal Simek > > The latest Xilinx design tools called ISE and EDK has been released in > October 2013. New tool doesn't support any PPC405/PPC440 new designs. > These platforms are no longer supported and tested. > > PowerPC 405/440 port is orphan from 2013 by > commit cdeb89943bfc ("MAINTAINERS: Fix incorrect status tag") and > commit 19624236cce1 ("MAINTAINERS: Update Grant's email address and > maintainership") > that's why it is time to remove the support fot these platforms. > > Signed-off-by: Michal Simek > Acked-by: Arnd Bergmann > Signed-off-by: Christophe Leroy This patch causes qemu-system-ppc to fail to load ppc44x_defconfig: $ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc-linux- O=out/ppc distclean ppc44x_defconfig zImage $ timeout --foreground 30s unbuffer \ qemu-system-ppc \ -machine bamboo \ -no-reboot \ -append "console=ttyS0" \ -display none \ -initrd ../../cbl/github/boot-utils/images/ppc32/rootfs.cpio \ -kernel out/ppc/arch/powerpc/boot/zImage \ -m 128m \ -nodefaults \ -serial mon:stdio qemu-system-ppc: could not load kernel 'out/ppc/arch/powerpc/boot/zImage' $ ls out/ppc/arch/powerpc/boot/zImage out/ppc/arch/powerpc/boot/zImage Is this expected? Is there some other config or machine that we should be testing instead? Cheers, Nathan
Re: [PATCH 00/17] spelling.txt: /decriptors/descriptors/
On Tue, 9 Jun 2020 13:45:53 +0100, Kieran Bingham wrote: > I wouldn't normally go through spelling fixes, but I caught sight of > this typo twice, and then foolishly grepped the tree for it, and saw how > pervasive it was. > > so here I am ... fixing a typo globally... but with an addition in > scripts/spelling.txt so it shouldn't re-appear ;-) > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git for-next Thanks! [1/2] regulator: Fix trivial spelling commit: d3f3723387f97118c337689fc73e4199fb4331ce [2/2] regulator: gpio: Fix trivial spelling commit: 1f0b740004f09d2f1b716fd6c2fdca81004ded05 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH 0/3] ASoC: fsl_easrc: Fix several warnings
On Wed, 3 Jun 2020 11:39:38 +0800, Shengjiu Wang wrote: > Fix several warnings with "make W=1" > > Shengjiu Wang (3): > ASoC: fsl_easrc: Fix -Wmissing-prototypes warning > ASoC: fsl_easrc: Fix -Wunused-but-set-variable > ASoC: fsl_easrc: Fix "Function parameter not described" warnings > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/3] ASoC: fsl_easrc: Fix -Wmissing-prototypes warning commit: e4cc0aaac390a87f80ae542c75d4c84de08816f9 [2/3] ASoC: fsl_easrc: Fix -Wunused-but-set-variable commit: 633a2c7d6e621e748d69423fa85be88c7dcd4f94 [3/3] ASoC: fsl_easrc: Fix "Function parameter not described" warnings commit: d73d682a9e87fa494868e8094fcc5b6a6b505464 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH v2 04/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
Hi, Baolu, On Sat, Jun 13, 2020 at 08:17:40PM +0800, Lu Baolu wrote: > Hi Fenghua, > > On 2020/6/13 8:41, Fenghua Yu wrote: > >+implement implement fairness or ensure forward progress can be made. > > Repeated "implement". Will fix this. > >+For example, the Intel Data Streaming Accelerator (DSA) uses > >+intel_svm_bind_mm(), which will do the following. > > The Intel SVM APIs have been deprecated. Drivers should use > iommu_sva_bind_device() instead. Please also update other places in > this document. Will fix this. Thanks. -Fenghua
Re: [PATCH v2] ASoC: fsl_ssi: Fix bclk calculation for mono channel
On Mon, Jun 15, 2020 at 01:56:18PM +0800, Shengjiu Wang wrote: > For mono channel, SSI will switch to Normal mode. > > In Normal mode and Network mode, the Word Length Control bits > control the word length divider in clock generator, which is > different with I2S Master mode (the word length is fixed to > 32bit), it should be the value of params_width(hw_params). > > The condition "slots == 2" is not good for I2S Master mode, > because for Network mode and Normal mode, the slots can also > be 2. Then we need to use (ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) > to check if it is I2S Master mode. The fsl_ssi_set_bclk is only called when fsl_ssi_is_i2s_master, though I agree that that line of comments sounds confusing now. > So we refine the famula for mono channel, otherwise there famula => formula? > will be sound issue for S24_LE. > > Fixes: b0a7043d5c2c ("ASoC: fsl_ssi: Caculate bit clock rate using slot > number and width") > Signed-off-by: Shengjiu Wang > --- > changes in v2 > - refine patch for Network mode and Normal mode. > > sound/soc/fsl/fsl_ssi.c | 15 +++ > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c > index bad89b0d129e..cbf67d132fda 100644 > --- a/sound/soc/fsl/fsl_ssi.c > +++ b/sound/soc/fsl/fsl_ssi.c > @@ -678,7 +678,8 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > *substream, > struct regmap *regs = ssi->regs; > u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i; > unsigned long clkrate, baudrate, tmprate; > - unsigned int slots = params_channels(hw_params); > + unsigned int channels = params_channels(hw_params); > + unsigned int slots; > unsigned int slot_width = 32; > u64 sub, savesub = 10; > unsigned int freq; > @@ -688,9 +689,15 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream > *substream, > /* Override slots and slot_width if being specifically set... */ > if (ssi->slots) > slots = ssi->slots; > - /* ...but keep 32 bits if slots is 2 -- I2S Master mode */ > - if (ssi->slot_width && slots != 2) > - slot_width = ssi->slot_width; > + else > + /* Apply two slots for mono channel, because DC = 2 */ > + slots = (channels == 1) ? 2 : channels; > + > + /* ...but keep 32 bits if I2S Master mode */ > + if ((ssi->i2s_net & SSI_SCR_I2S_MODE_MASK) != SSI_SCR_I2S_MODE_MASTER || > + channels == 1) > + slot_width = ssi->slot_width ? ssi->slot_width : This looks very complicated...can you review and try mine? (Basically, take 32-bit out of default but force it later) @@ -678,8 +678,9 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream *substream, struct regmap *regs = ssi->regs; u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i; unsigned long clkrate, baudrate, tmprate; - unsigned int slots = params_channels(hw_params); - unsigned int slot_width = 32; + unsigned int channels = params_channels(hw_params); + unsigned int slot_width = params_width(hw_params); + unsigned int slots = 2; u64 sub, savesub = 10; unsigned int freq; bool baudclk_is_used; @@ -688,10 +689,16 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream *substream, /* Override slots and slot_width if being specifically set... */ if (ssi->slots) slots = ssi->slots; - /* ...but keep 32 bits if slots is 2 -- I2S Master mode */ - if (ssi->slot_width && slots != 2) + if (ssi->slot_width) slot_width = ssi->slot_width; + /* +* ...but force 32 bits for stereo audio. Note that mono audio is also +* sent in 2 slots via NORMAL mode, so check both slots and channels. +*/ + if (slots == 2 && channels == 2) + slot_width = 32; + /* Generate bit clock based on the slot number and slot width */ freq = slots * slot_width * params_rate(hw_params);
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote: > > I don't get why you need a rdmsr here, or why not having one would > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > > > > +*/ > > > > + rdmsrl(MSR_IA32_PASID, pasid_msr); > > > > + if (pasid_msr & MSR_IA32_PASID_VALID) > > > > + return false; > > > > + > > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */ > > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID); > > How much more expensive is the wrmsr over the rdmsr? Can't we just > unconditionally write the current PASID and call it a day? The reason to check the rdmsr() is because we are using a hueristic taking GP faults. If we already setup the MSR, but we get it a second time it means the reason is something other than PASID_MSR not being set. Ideally we should use the TIF_ to track this would be cheaper, but we were told those bits aren't easy to give out. Cheers, Ashok
[PATCH 17/25] mm/powerpc: Use mm_fault_accounting()
Use the new mm_fault_accounting() helper for page fault accounting. cmo_account_page_fault() is special. Keep that. CC: Michael Ellerman CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: linuxppc-dev@lists.ozlabs.org Signed-off-by: Peter Xu --- arch/powerpc/mm/fault.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 84af6c8eecf7..6043b639ae42 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -481,8 +481,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, if (!arch_irq_disabled_regs(regs)) local_irq_enable(); - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); - if (error_code & DSISR_KEYFAULT) return bad_key_fault_exception(regs, address, get_mm_addr_key(mm, address)); @@ -604,14 +602,11 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, /* * Major/minor page fault accounting. */ - if (major) { - current->maj_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address); + if (major) cmo_account_page_fault(); - } else { - current->min_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address); - } + + mm_fault_accounting(current, regs, address, major); + return 0; } NOKPROBE_SYMBOL(__do_page_fault); -- 2.26.2
[Bug 208197] OF: /pci@f2000000/mac-io@17/gpio@50/...: could not find phandle
https://bugzilla.kernel.org/show_bug.cgi?id=208197 --- Comment #1 from Erhard F. (erhar...@mailbox.org) --- Created attachment 289687 --> https://bugzilla.kernel.org/attachment.cgi?id=289687&action=edit kernel .config (5.8-rc1, PowerMac G4 DP) -- You are receiving this mail because: You are watching the assignee of the bug.
[Bug 208197] New: OF: /pci@f2000000/mac-io@17/gpio@50/...: could not find phandle
https://bugzilla.kernel.org/show_bug.cgi?id=208197 Bug ID: 208197 Summary: OF: /pci@f200/mac-io@17/gpio@50/...: could not find phandle Product: Platform Specific/Hardware Version: 2.5 Kernel Version: 5.8-rc1 Hardware: PPC-32 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: PPC-32 Assignee: platform_ppc...@kernel-bugs.osdl.org Reporter: erhar...@mailbox.org Regression: Yes Created attachment 289685 --> https://bugzilla.kernel.org/attachment.cgi?id=289685&action=edit dmesg (5.8-rc1, PowerMac G4 DP) Since 5.8-rc1 I get these OF errors on my PowerMac G4 DP: [...] T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/gpio5@6f: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/extint-gpio15@67: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/gpio6@70: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/extint-gpio16@68: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/extint-gpio14@66: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/gpio12@76: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/gpio11@75: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/gpio5@6f: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/gpio6@70: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/extint-gpio4@5c: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/gpio11@75: could not find phandle T600 kernel: OF: /pci@f200/mac-io@17/gpio@50/extint-gpio15@67: could not find phandle[...] This has not been the case in kernel 5.7 or earlier. -- You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 01:17:35PM -0700, Fenghua Yu wrote: > Hi, Peter, > > On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote: > > On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote: > > > > > Or do you suggest to add a random new flag in struct thread_info instead > > > of a TIF flag? > > > > Why thread_info? What's wrong with something simple like the below. It > > takes a bit from the 'strictly current' flags word. > > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index b62e6aaf28f0..fca830b97055 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -801,6 +801,9 @@ struct task_struct { > > /* Stalled due to lack of memory */ > > unsignedin_memstall:1; > > #endif > > +#ifdef CONFIG_PCI_PASID > > + unsignedhas_valid_pasid:1; > > +#endif > > > > unsigned long atomic_flags; /* Flags requiring atomic > > access. */ > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 142b23645d82..10b3891be99e 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct > > task_struct *orig, int node) > > tsk->use_memdelay = 0; > > #endif > > > > +#ifdef CONFIG_PCI_PASID > > + tsk->has_valid_pasid = 0; > > +#endif > > + > > #ifdef CONFIG_MEMCG > > tsk->active_memcg = NULL; > > #endif > > The PASID MSR is x86 specific although PASID is PCIe concept and per-mm. > Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work. > The flag should be cleared in cloned()/forked() and is only set and > read in fixup() in x86 #GP for heuristic. It's not used anywhere outside > of x86. > > That's why we think the flag should be in x86 struct thread_info instead > of in generice struct task_struct. I don't think anybody really cares, it's just one bit, we have plenty left. On x86_64 there's a u32 sized alignment hole in thread_info, also we don't use the upper 32bit of thread_info::flags, however using those would still mean you have to use atomic ops, which you really don't need.
RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
> So what’s the RDMSR for? Surely you > have some state somewhere that says “this task has a PASID.” > Can’t you just make sure that stays in sync with the MSR? Then, on #GP, if > the task already has a PASID, you know the MSR is set. We have state that says the process ("mm") has been allocated a PASID. But not for each task. E.g. a process may clone a bunch of tasks, then one of them opens a device that needs a PASID. We did internally have patches to go hunt down all those other tasks and force a PASID onto each. But the code was big and ugly. Also maybe the wrong thing to do if those threads didn't ever need to access the device. PeterZ suggested that we can add a bit to the task structure for this purpose. Fenghua is hesitant about adding an x86 only bit there. -Tony
Re: [PATCH v13 2/6] seq_buf: Export seq_buf_printf
On Mon, 15 Jun 2020 14:55:52 +0200 Borislav Petkov wrote: > Can you please resend your patchset once a week like everyone else and > not flood inboxes with it? Boris, Haven't you automated your inbox yet? I have patchwork reading my INBOX and it's smart enough to understand new series, and the old one simply disappears as "superseded". Email is just the wires to our infrastructure. Get with the times man! ;-) -- Steve
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
> On Jun 15, 2020, at 1:56 PM, Luck, Tony wrote: > > >> >> Are we planning to keep PASID live once a task has used it once or are we >> going to swap it lazily? If the latter, a percpu variable might be better. > > Current plan is "touch it once and the task owns it until exit(2)" > > Maybe someday in the future when we have data on how applications > actually use accelerators we could look at something more complex > if usage patterns look like it would be beneficial. > > So what’s the RDMSR for? Surely you have some state somewhere that says “this task has a PASID.” Can’t you just make sure that stays in sync with the MSR? Then, on #GP, if the task already has a PASID, you know the MSR is set.
RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
> Are we planning to keep PASID live once a task has used it once or are we > going to swap it lazily? If the latter, a percpu variable might be better. Current plan is "touch it once and the task owns it until exit(2)" Maybe someday in the future when we have data on how applications actually use accelerators we could look at something more complex if usage patterns look like it would be beneficial. -Tony
[Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
https://bugzilla.kernel.org/show_bug.cgi?id=206203 Erhard F. (erhar...@mailbox.org) changed: What|Removed |Added Attachment #287675|0 |1 is obsolete|| --- Comment #15 from Erhard F. (erhar...@mailbox.org) --- Created attachment 289679 --> https://bugzilla.kernel.org/attachment.cgi?id=289679&action=edit kernel .config (kernel 5.8-rc1, PowerMac G5 11,2) -- You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
> On Jun 15, 2020, at 1:17 PM, Fenghua Yu wrote: > > Hi, Peter, > >> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote: >>> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote: >>> >>> Or do you suggest to add a random new flag in struct thread_info instead >>> of a TIF flag? >> >> Why thread_info? What's wrong with something simple like the below. It >> takes a bit from the 'strictly current' flags word. >> >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index b62e6aaf28f0..fca830b97055 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -801,6 +801,9 @@ struct task_struct { >>/* Stalled due to lack of memory */ >>unsignedin_memstall:1; >> #endif >> +#ifdef CONFIG_PCI_PASID >> +unsignedhas_valid_pasid:1; >> +#endif >> >>unsigned longatomic_flags; /* Flags requiring atomic access. >> */ >> >> diff --git a/kernel/fork.c b/kernel/fork.c >> index 142b23645d82..10b3891be99e 100644 >> --- a/kernel/fork.c >> +++ b/kernel/fork.c >> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct >> task_struct *orig, int node) >>tsk->use_memdelay = 0; >> #endif >> >> +#ifdef CONFIG_PCI_PASID >> +tsk->has_valid_pasid = 0; >> +#endif >> + >> #ifdef CONFIG_MEMCG >>tsk->active_memcg = NULL; >> #endif > > The PASID MSR is x86 specific although PASID is PCIe concept and per-mm. > Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work. > The flag should be cleared in cloned()/forked() and is only set and > read in fixup() in x86 #GP for heuristic. It's not used anywhere outside > of x86. > > That's why we think the flag should be in x86 struct thread_info instead > of in generice struct task_struct. > Are we planning to keep PASID live once a task has used it once or are we going to swap it lazily? If the latter, a percpu variable might be better. > Please advice. > > Thanks. > > -Fenghua
[Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
https://bugzilla.kernel.org/show_bug.cgi?id=206203 Erhard F. (erhar...@mailbox.org) changed: What|Removed |Added Attachment #287673|0 |1 is obsolete|| Attachment #288567|0 |1 is obsolete|| --- Comment #14 from Erhard F. (erhar...@mailbox.org) --- Created attachment 289677 --> https://bugzilla.kernel.org/attachment.cgi?id=289677&action=edit dmesg (kernel 5.8-rc1, PowerMac G5 11,2) -- You are receiving this mail because: You are watching the assignee of the bug.
[Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
https://bugzilla.kernel.org/show_bug.cgi?id=206203 Erhard F. (erhar...@mailbox.org) changed: What|Removed |Added Attachment #287671|0 |1 is obsolete|| Attachment #288565|0 |1 is obsolete|| --- Comment #13 from Erhard F. (erhar...@mailbox.org) --- Created attachment 289675 --> https://bugzilla.kernel.org/attachment.cgi?id=289675&action=edit kmemleak output (kernel 5.8-rc1, PowerMac G5 11,2) -- You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH net v2] ibmvnic: Harden device login requests
From: Thomas Falcon Date: Mon, 15 Jun 2020 10:29:23 -0500 > The VNIC driver's "login" command sequence is the final step > in the driver's initialization process with device firmware, > confirming the available device queue resources to be utilized > by the driver. Under high system load, firmware may not respond > to the request in a timely manner or may abort the request. In > such cases, the driver should reattempt the login command > sequence. In case of a device error, the number of retries > is bounded. > > Signed-off-by: Thomas Falcon > --- > v2: declare variables in Reverse Christmas tree format Applied, thanks.
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
Hi, Peter, On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote: > On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote: > > > Or do you suggest to add a random new flag in struct thread_info instead > > of a TIF flag? > > Why thread_info? What's wrong with something simple like the below. It > takes a bit from the 'strictly current' flags word. > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index b62e6aaf28f0..fca830b97055 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -801,6 +801,9 @@ struct task_struct { > /* Stalled due to lack of memory */ > unsignedin_memstall:1; > #endif > +#ifdef CONFIG_PCI_PASID > + unsignedhas_valid_pasid:1; > +#endif > > unsigned long atomic_flags; /* Flags requiring atomic > access. */ > > diff --git a/kernel/fork.c b/kernel/fork.c > index 142b23645d82..10b3891be99e 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct > task_struct *orig, int node) > tsk->use_memdelay = 0; > #endif > > +#ifdef CONFIG_PCI_PASID > + tsk->has_valid_pasid = 0; > +#endif > + > #ifdef CONFIG_MEMCG > tsk->active_memcg = NULL; > #endif The PASID MSR is x86 specific although PASID is PCIe concept and per-mm. Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work. The flag should be cleared in cloned()/forked() and is only set and read in fixup() in x86 #GP for heuristic. It's not used anywhere outside of x86. That's why we think the flag should be in x86 struct thread_info instead of in generice struct task_struct. Please advice. Thanks. -Fenghua
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 2:47 PM Arnd Bergmann wrote: > > On Mon, Jun 15, 2020 at 4:48 PM Brian Gerst wrote: > > On Mon, Jun 15, 2020 at 10:13 AM Christoph Hellwig wrote: > > > On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote: > > > > > > > I'd rather keep it in common code as that allows all the low-level > > > exec stuff to be marked static, and avoid us growing new pointless > > > compat variants through copy and paste. > > > smart compiler to d > > > > > > > I don't really understand > > > > the comment, why can't this just use this? > > > > > > That errors out with: > > > > > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to > > > `__x32_sys_execve' > > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to > > > `__x32_sys_execveat' > > > make: *** [Makefile:1139: vmlinux] Error 1 > > > > I think I have a fix for this, by modifying the syscall wrappers to > > add an alias for the __x32 variant to the native __x64_sys_foo(). > > I'll get back to you with a patch. > > Do we actually need the __x32 prefix any more, or could we just > change all x32 specific calls to use __x64_compat_sys_foo()? I suppose that would work too. The prefix really describes the register mapping. -- Brian Gerst
Re: [PATCH v13 2/6] seq_buf: Export seq_buf_printf
On Mon, Jun 15, 2020 at 5:56 AM Borislav Petkov wrote: > > On Mon, Jun 15, 2020 at 06:14:03PM +0530, Vaibhav Jain wrote: > > 'seq_buf' provides a very useful abstraction for writing to a string > > buffer without needing to worry about it over-flowing. However even > > though the API has been stable for couple of years now its still not > > exported to kernel loadable modules limiting its usage. > > > > Hence this patch proposes update to 'seq_buf.c' to mark > > seq_buf_printf() which is part of the seq_buf API to be exported to > > kernel loadable GPL modules. This symbol will be used in later parts > > of this patch-set to simplify content creation for a sysfs attribute. > > > > Cc: Piotr Maziarz > > Cc: Cezary Rojewski > > Cc: Christoph Hellwig > > Cc: Steven Rostedt > > Cc: Borislav Petkov > > Acked-by: Steven Rostedt (VMware) > > Signed-off-by: Vaibhav Jain > > --- > > Changelog: > > > > v12..v13: > > * None > > > > v11..v12: > > * None > > Can you please resend your patchset once a week like everyone else and > not flood inboxes with it? Hi Boris, I gave Vaibhav some long shot hope that his series could be included in my libnvdimm pull request for -rc1. Save for a last minute clang report that I misread as a gcc warning, I likely would have included. This spin is looking to address the last of the comments I had and something I would consider for -rc2. So, in this case the resends were requested by me and I'll take the grumbles on Vaibhav's behalf.
Re: [PATCH v6 0/2] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)
Hello Catalin, Will, On Tue, Jun 2, 2020 at 10:54 AM Bhupesh Sharma wrote: > > Hello, > > On Thu, May 14, 2020 at 12:22 AM Bhupesh Sharma wrote: > > > > Apologies for the delayed update. Its been quite some time since I > > posted the last version (v5), but I have been really caught up in some > > other critical issues. > > > > Changes since v5: > > > > - v5 can be viewed here: > > http://lists.infradead.org/pipermail/kexec/2019-November/024055.html > > - Addressed review comments from James Morse and Boris. > > - Added Tested-by received from John on v5 patchset. > > - Rebased against arm64 (for-next/ptr-auth) branch which has Amit's > > patchset for ARMv8.3-A Pointer Authentication feature vmcoreinfo > > applied. > > > > Changes since v4: > > > > - v4 can be seen here: > > http://lists.infradead.org/pipermail/kexec/2019-November/023961.html > > - Addressed comments from Dave and added patches for documenting > > new variables appended to vmcoreinfo documentation. > > - Added testing report shared by Akashi for PATCH 2/5. > > > > Changes since v3: > > > > - v3 can be seen here: > > http://lists.infradead.org/pipermail/kexec/2019-March/022590.html > > - Addressed comments from James and exported TCR_EL1.T1SZ in vmcoreinfo > > instead of PTRS_PER_PGD. > > - Added a new patch (via [PATCH 3/3]), which fixes a simple typo in > > 'Documentation/arm64/memory.rst' > > > > Changes since v2: > > > > - v2 can be seen here: > > http://lists.infradead.org/pipermail/kexec/2019-March/022531.html > > - Protected 'MAX_PHYSMEM_BITS' vmcoreinfo variable under CONFIG_SPARSEMEM > > ifdef sections, as suggested by Kazu. > > - Updated vmcoreinfo documentation to add description about > > 'MAX_PHYSMEM_BITS' variable (via [PATCH 3/3]). > > > > Changes since v1: > > > > - v1 was sent out as a single patch which can be seen here: > > http://lists.infradead.org/pipermail/kexec/2019-February/022411.html > > > > - v2 breaks the single patch into two independent patches: > > [PATCH 1/2] appends 'PTRS_PER_PGD' to vmcoreinfo for arm64 arch, whereas > > [PATCH 2/2] appends 'MAX_PHYSMEM_BITS' to vmcoreinfo in core kernel code > > (all archs) > > > > This patchset primarily fixes the regression reported in user-space > > utilities like 'makedumpfile' and 'crash-utility' on arm64 architecture > > with the availability of 52-bit address space feature in underlying > > kernel. These regressions have been reported both on CPUs which don't > > support ARMv8.2 extensions (i.e. LVA, LPA) and are running newer kernels > > and also on prototype platforms (like ARMv8 FVP simulator model) which > > support ARMv8.2 extensions and are running newer kernels. > > > > The reason for these regressions is that right now user-space tools > > have no direct access to these values (since these are not exported > > from the kernel) and hence need to rely on a best-guess method of > > determining value of 'vabits_actual' and 'MAX_PHYSMEM_BITS' supported > > by underlying kernel. > > > > Exporting these values via vmcoreinfo will help user-land in such cases. > > In addition, as per suggestion from makedumpfile maintainer (Kazu), > > it makes more sense to append 'MAX_PHYSMEM_BITS' to > > vmcoreinfo in the core code itself rather than in arm64 arch-specific > > code, so that the user-space code for other archs can also benefit from > > this addition to the vmcoreinfo and use it as a standard way of > > determining 'SECTIONS_SHIFT' value in user-land. > > > > Cc: Boris Petkov > > Cc: Ingo Molnar > > Cc: Thomas Gleixner > > Cc: Jonathan Corbet > > Cc: James Morse > > Cc: Mark Rutland > > Cc: Will Deacon > > Cc: Steve Capper > > Cc: Catalin Marinas > > Cc: Ard Biesheuvel > > Cc: Michael Ellerman > > Cc: Paul Mackerras > > Cc: Benjamin Herrenschmidt > > Cc: Dave Anderson > > Cc: Kazuhito Hagio > > Cc: John Donnelly > > Cc: scott.bran...@broadcom.com > > Cc: Amit Kachhap > > Cc: x...@kernel.org > > Cc: linuxppc-dev@lists.ozlabs.org > > Cc: linux-arm-ker...@lists.infradead.org > > Cc: linux-ker...@vger.kernel.org > > Cc: linux-...@vger.kernel.org > > Cc: ke...@lists.infradead.org > > > > Bhupesh Sharma (2): > > crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo > > arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo > > > > Documentation/admin-guide/kdump/vmcoreinfo.rst | 16 > > arch/arm64/include/asm/pgtable-hwdef.h | 1 + > > arch/arm64/kernel/crash_core.c | 10 ++ > > kernel/crash_core.c| 1 + > > 4 files changed, 28 insertions(+) > > Ping. @James Morse , Others > > Please share if you have some comments regarding this patchset. Ping. I think we have two Tested-by available from Oracle and Marvell folks on this patchset and no further review-comments. Can you please help review/pick this patchset via the arm64 tree? User-space utilities
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote: > Or do you suggest to add a random new flag in struct thread_info instead > of a TIF flag? Why thread_info? What's wrong with something simple like the below. It takes a bit from the 'strictly current' flags word. diff --git a/include/linux/sched.h b/include/linux/sched.h index b62e6aaf28f0..fca830b97055 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -801,6 +801,9 @@ struct task_struct { /* Stalled due to lack of memory */ unsignedin_memstall:1; #endif +#ifdef CONFIG_PCI_PASID + unsignedhas_valid_pasid:1; +#endif unsigned long atomic_flags; /* Flags requiring atomic access. */ diff --git a/kernel/fork.c b/kernel/fork.c index 142b23645d82..10b3891be99e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) tsk->use_memdelay = 0; #endif +#ifdef CONFIG_PCI_PASID + tsk->has_valid_pasid = 0; +#endif + #ifdef CONFIG_MEMCG tsk->active_memcg = NULL; #endif
Re: [PATCH 00/17] spelling.txt: /decriptors/descriptors/
On Tue, Jun 09, 2020 at 01:45:53PM +0100, Kieran Bingham wrote: > I wouldn't normally go through spelling fixes, but I caught sight of > this typo twice, and then foolishly grepped the tree for it, and saw how > pervasive it was. > > so here I am ... fixing a typo globally... but with an addition in > scripts/spelling.txt so it shouldn't re-appear ;-) > > Cc: linux-arm-ker...@lists.infradead.org (moderated list:TI DAVINCI MACHINE > SUPPORT) > Cc: linux-ker...@vger.kernel.org (open list) > Cc: linux...@vger.kernel.org (open list:DEVICE FREQUENCY EVENT > (DEVFREQ-EVENT)) > Cc: linux-g...@vger.kernel.org (open list:GPIO SUBSYSTEM) > Cc: dri-de...@lists.freedesktop.org (open list:DRM DRIVERS) > Cc: linux-r...@vger.kernel.org (open list:HFI1 DRIVER) > Cc: linux-in...@vger.kernel.org (open list:INPUT (KEYBOARD, MOUSE, JOYSTICK, > TOUCHSCREEN)...) > Cc: linux-...@lists.infradead.org (open list:NAND FLASH SUBSYSTEM) > Cc: net...@vger.kernel.org (open list:NETWORKING DRIVERS) > Cc: ath...@lists.infradead.org (open list:QUALCOMM ATHEROS ATH10K WIRELESS > DRIVER) > Cc: linux-wirel...@vger.kernel.org (open list:NETWORKING DRIVERS (WIRELESS)) > Cc: linux-s...@vger.kernel.org (open list:IBM Power Virtual FC Device Drivers) > Cc: linuxppc-dev@lists.ozlabs.org (open list:LINUX FOR POWERPC (32-BIT AND > 64-BIT)) > Cc: linux-...@vger.kernel.org (open list:USB SUBSYSTEM) > Cc: virtualizat...@lists.linux-foundation.org (open list:VIRTIO CORE AND NET > DRIVERS) > Cc: linux...@kvack.org (open list:MEMORY MANAGEMENT) > > > Kieran Bingham (17): > arch: arm: mach-davinci: Fix trivial spelling > drivers: infiniband: Fix trivial spelling > drivers: infiniband: Fix trivial spelling I took these two RDMA patches and merged them, thanks Jason
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
Hi, Peter, On Mon, Jun 15, 2020 at 08:31:16PM +0200, Peter Zijlstra wrote: > On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote: > > > I don't get why you need a rdmsr here, or why not having one would > > > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > > > My concern is TIF flags are precious (only 3 slots available). Defining > > a new TIF flag may be not worth it while rdmsr() can check if PASID > > is valid in the MSR. And performance here might not be a big issue > > in #GP. > > > > But if you think using TIF flag is better, I can define a new TIF flag > > and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()). > > Then we can avoid using rdmsr() to check valid PASID in the MSR. > > WHY ?!?! What do you need a TIF flag for? We need "a way" to check if the per thread MSR has a valid PASID. If yes, no need to fix up the MSR (wrmsr()), and let other handler to handle the #GP. Otherwise, apply the heuristics and fix up the MSR and exit the #GP. The way to check the valid PASID in the MSR is rdmsr() in this series. A TIF flag will be much faster than rdmsr() and seems a sutiable way to check valid PASID status per thread. That's why it could replace rdmsr() to check PASID in the MSR. Or do you suggest to add a random new flag in struct thread_info instead of a TIF flag? Thanks. -Fenghua
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 4:48 PM Brian Gerst wrote: > On Mon, Jun 15, 2020 at 10:13 AM Christoph Hellwig wrote: > > On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote: > > > > I'd rather keep it in common code as that allows all the low-level > > exec stuff to be marked static, and avoid us growing new pointless > > compat variants through copy and paste. > > smart compiler to d > > > > > I don't really understand > > > the comment, why can't this just use this? > > > > That errors out with: > > > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to > > `__x32_sys_execve' > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to > > `__x32_sys_execveat' > > make: *** [Makefile:1139: vmlinux] Error 1 > > I think I have a fix for this, by modifying the syscall wrappers to > add an alias for the __x32 variant to the native __x64_sys_foo(). > I'll get back to you with a patch. Do we actually need the __x32 prefix any more, or could we just change all x32 specific calls to use __x64_compat_sys_foo()? Arnd
Re: [PATCH 1/2] mm, treewide: Rename kzfree() to kfree_sensitive()
On 6/15/20 2:07 PM, Dan Carpenter wrote: On Mon, Apr 13, 2020 at 05:15:49PM -0400, Waiman Long wrote: diff --git a/mm/slab_common.c b/mm/slab_common.c index 23c7500eea7d..c08bc7eb20bd 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1707,17 +1707,17 @@ void *krealloc(const void *p, size_t new_size, gfp_t flags) EXPORT_SYMBOL(krealloc); /** - * kzfree - like kfree but zero memory + * kfree_sensitive - Clear sensitive information in memory before freeing * @p: object to free memory of * * The memory of the object @p points to is zeroed before freed. - * If @p is %NULL, kzfree() does nothing. + * If @p is %NULL, kfree_sensitive() does nothing. * * Note: this function zeroes the whole allocated buffer which can be a good * deal bigger than the requested buffer size passed to kmalloc(). So be * careful when using this function in performance sensitive code. */ -void kzfree(const void *p) +void kfree_sensitive(const void *p) { size_t ks; void *mem = (void *)p; @@ -1725,10 +1725,10 @@ void kzfree(const void *p) if (unlikely(ZERO_OR_NULL_PTR(mem))) return; ks = ksize(mem); - memset(mem, 0, ks); + memzero_explicit(mem, ks); ^ This is an unrelated bug fix. It really needs to be pulled into a separate patch by itself and back ported to stable kernels. kfree(mem); } -EXPORT_SYMBOL(kzfree); +EXPORT_SYMBOL(kfree_sensitive); /** * ksize - get the actual amount of memory allocated for a given object regards, dan carpenter Thanks for the suggestion. I will break it out and post a version soon. Cheers, Longman
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 11:19:21AM -0700, Raj, Ashok wrote: > On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote: > > > > I don't get why you need a rdmsr here, or why not having one would > > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > > > > > > + */ > > > > > + rdmsrl(MSR_IA32_PASID, pasid_msr); > > > > > + if (pasid_msr & MSR_IA32_PASID_VALID) > > > > > + return false; > > > > > + > > > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */ > > > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID); > > > > How much more expensive is the wrmsr over the rdmsr? Can't we just > > unconditionally write the current PASID and call it a day? > > The reason to check the rdmsr() is because we are using a hueristic taking > GP faults. If we already setup the MSR, but we get it a second time it > means the reason is something other than PASID_MSR not being set. > > Ideally we should use the TIF_ to track this would be cheaper, but we were > told those bits aren't easy to give out. Why do you need a TIF flag? Why not any other random flag in current?
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote: > > I don't get why you need a rdmsr here, or why not having one would > > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > My concern is TIF flags are precious (only 3 slots available). Defining > a new TIF flag may be not worth it while rdmsr() can check if PASID > is valid in the MSR. And performance here might not be a big issue > in #GP. > > But if you think using TIF flag is better, I can define a new TIF flag > and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()). > Then we can avoid using rdmsr() to check valid PASID in the MSR. WHY ?!?! What do you need a TIF flag for?
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote: > On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote: > > Hi, Peter, > > On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote: > > > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote: > > > > +/* > > > > + * Apply some heuristics to see if the #GP fault was caused by a thread > > > > + * that hasn't had the IA32_PASID MSR initialized. If it looks like > > > > that > > > > + * is the problem, try initializing the IA32_PASID MSR. If the > > > > heuristic > > > > + * guesses incorrectly, take one more #GP fault. > > > > > > How is that going to help? Aren't we then going to run this same > > > heuristic again and again and again? > > > > The heuristic always initializes the MSR with the per mm PASID IIF the > > mm has a valid PASID but the MSR doesn't have one. This heuristic usually > > happens only once on the first #GP in a thread. > > But it doesn't guarantee the PASID is the right one. Suppose both the mm > has a PASID and the MSR has a VALID one, but the MSR isn't the mm one. > Then we NO-OP. So if the exception was due to us having the wrong PASID, > we stuck. The MSR for each thread was cleared during fork() and clone(). The PASID was cleared during mm_init(). The per-mm PASID was assigned when fist bind_mm() is called and remains the same value until process exit(). The MSR is only fixed up when the first ENQCMD is executed in a thread: bit 31 in the MSR is 0 and the PASID in the mm is non-zero. The MSR remains the same PASID value once it's fixed up until the thread exits. So the work flow ensures the PASID goes from mm's PASID to the MSR. The PASID could be unbund from the mm. In this case, iommu will generate #PF and kernel oops instead of #GP. > > > If the next #GP still comes in, the heuristic finds out the MSR already > > has a valid PASID and thus will not fixup the MSR any more. The fixup() > > returns "false" and lets others to handle the #GP. > > > > So the heuristic will be executed once (at most) and won't be executed > > again and again. > > So I get that you want to set the PASID on-demand, but I find this all > really weird code to make that happen. We could keep PASID same in all threads sychronously by propogating the MSRs when the PASID is bound to the mm via IPIs or taskworks to all threads in the process. But the code is complex and error-prone and overhead could be high: 1. The user can call driver to do binding and unbinding multiple times. The IPIs or taskworks will be sent multiple times to make sure only the last IPIs or taskworks take action. 2. Even if a thread never executes ENQCMD and thus never uses the MSR, the MSR still needs to be updated whenever bind_mm() and needs to be context switched each time. This could cause high overhead. Setting the PASID on-demand is simpler and cleaner and was recommended by Thomas. > > > > > +bool __fixup_pasid_exception(void) > > > > +{ > > > > + u64 pasid_msr; > > > > + unsigned int pasid; > > > > + > > > > + /* > > > > +* This function is called only when this #GP was triggered > > > > from user > > > > +* space. So the mm cannot be NULL. > > > > +*/ > > > > + pasid = current->mm->pasid; > > > > + /* If the mm doesn't have a valid PASID, then can't help. */ > > > > + if (invalid_pasid(pasid)) > > > > + return false; > > > > + > > > > + /* > > > > +* Since IRQ is disabled now, the current task still owns the > > > > FPU on > > > > > > That's just weird and confusing. What you want to say is that you rely > > > on the exception disabling the interrupt. > > > > I checked SDM again. You are right. #GP can be interrupted by machine check > > or other interrupts. So I cannot assume the current task still owns the FPU. > > Instead of directly rdmsr() and wrmsr(), I will add helpers that can access > > either the MSR on the processor or the PASID state in the memory. > > That's not in fact what I meant, but yes, you can take exceptions while > !IF just fine. > > > > > +* this CPU and the PASID MSR can be directly accessed. > > > > +* > > > > +* If the MSR has a valid PASID, the #GP must be for some other > > > > reason. > > > > +* > > > > +* If rdmsr() is really a performance issue, a TIF_ flag may be > > > > +* added to check if the thread has a valid PASID instead of > > > > rdmsr(). > > > > > > I don't understand any of this. Nobody except us writes to this MSR, we > > > should bloody well know what's in it. What gives? > > > > Patch 4 describes how to manage the MSR and patch 7 describes the format > > of the MSR (20-bit PASID value and bit 31 is valid bit). > > > > Are they sufficient to help? Or do you mean something else? > > I don't get why you need a rdmsr here, or why not having one would > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? My conc
Re: [PATCH 1/2] mm, treewide: Rename kzfree() to kfree_sensitive()
On Mon, Apr 13, 2020 at 05:15:49PM -0400, Waiman Long wrote: > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 23c7500eea7d..c08bc7eb20bd 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -1707,17 +1707,17 @@ void *krealloc(const void *p, size_t new_size, gfp_t > flags) > EXPORT_SYMBOL(krealloc); > > /** > - * kzfree - like kfree but zero memory > + * kfree_sensitive - Clear sensitive information in memory before freeing > * @p: object to free memory of > * > * The memory of the object @p points to is zeroed before freed. > - * If @p is %NULL, kzfree() does nothing. > + * If @p is %NULL, kfree_sensitive() does nothing. > * > * Note: this function zeroes the whole allocated buffer which can be a good > * deal bigger than the requested buffer size passed to kmalloc(). So be > * careful when using this function in performance sensitive code. > */ > -void kzfree(const void *p) > +void kfree_sensitive(const void *p) > { > size_t ks; > void *mem = (void *)p; > @@ -1725,10 +1725,10 @@ void kzfree(const void *p) > if (unlikely(ZERO_OR_NULL_PTR(mem))) > return; > ks = ksize(mem); > - memset(mem, 0, ks); > + memzero_explicit(mem, ks); ^ This is an unrelated bug fix. It really needs to be pulled into a separate patch by itself and back ported to stable kernels. > kfree(mem); > } > -EXPORT_SYMBOL(kzfree); > +EXPORT_SYMBOL(kfree_sensitive); > > /** > * ksize - get the actual amount of memory allocated for a given object regards, dan carpenter
RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
>> The heuristic always initializes the MSR with the per mm PASID IIF the >> mm has a valid PASID but the MSR doesn't have one. This heuristic usually >> happens only once on the first #GP in a thread. > > But it doesn't guarantee the PASID is the right one. Suppose both the mm > has a PASID and the MSR has a VALID one, but the MSR isn't the mm one. > Then we NO-OP. So if the exception was due to us having the wrong PASID, > we stuck. ENQCMD only checks the 'valid' bit of the IA32_PASID MSR to decide whether to #GP or not. H/W has no concept of the "right" pasid value. If IA32_PASID is valid with the wrong value ... then the system is about to see some major corruption because the operations in the accelerator are not going to translate to the physical addresses for pages owned by the process that issued the ENQCMD. -Tony
Re: [PATCH v1 4/4] KVM: PPC: Book3S HV: migrate hot plugged memory
Le 31/05/2020 à 04:27, Ram Pai a écrit : From: Laurent Dufour When a memory slot is hot plugged to a SVM, GFNs associated with that memory slot automatically default to secure GFN. Hence migrate the PFNs associated with these GFNs to device-PFNs. uv_migrate_mem_slot() is called to achieve that. It will not call UV_PAGE_IN since this request is ignored by the Ultravisor. NOTE: Ultravisor does not trust any page content provided by the Hypervisor, ones the VM turns secure. Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Bharata B Rao Cc: Aneesh Kumar K.V Cc: Sukadev Bhattiprolu Cc: Laurent Dufour Cc: Thiago Jung Bauermann Cc: David Gibson Cc: Claudio Carvalho Cc: kvm-...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Ram Pai (fixed merge conflicts. Modified the commit message) Signed-off-by: Laurent Dufour --- arch/powerpc/include/asm/kvm_book3s_uvmem.h | 4 arch/powerpc/kvm/book3s_hv.c| 11 +++ arch/powerpc/kvm/book3s_hv_uvmem.c | 3 +-- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h index f0c5708..2ec2e5afb 100644 --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h @@ -23,6 +23,7 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm, void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free, struct kvm *kvm, bool skip_page_out, bool purge_gfn); +int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot); #else static inline int kvmppc_uvmem_init(void) { @@ -78,5 +79,8 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn) kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free, struct kvm *kvm, bool skip_page_out, bool purge_gfn) { } + +static int uv_migrate_mem_slot(struct kvm *kvm, + const struct kvm_memory_slot *memslot); #endif /* CONFIG_PPC_UV */ #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 4c62bfe..604d062 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4516,13 +4516,16 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm, case KVM_MR_CREATE: if (kvmppc_uvmem_slot_init(kvm, new)) return; - uv_register_mem_slot(kvm->arch.lpid, -new->base_gfn << PAGE_SHIFT, -new->npages * PAGE_SIZE, -0, new->id); + if (uv_register_mem_slot(kvm->arch.lpid, +new->base_gfn << PAGE_SHIFT, +new->npages * PAGE_SIZE, +0, new->id)) + return; + uv_migrate_mem_slot(kvm, new); break; case KVM_MR_DELETE: uv_unregister_mem_slot(kvm->arch.lpid, old->id); + kvmppc_uvmem_drop_pages(old, kvm, true, true); My mistake, kvmppc_radix_flush_memslot() called just before is already triggering the call to kvmppc_uvmem_drop_pages(), so that call is useless. You should remove it in your v2. kvmppc_uvmem_slot_free(kvm, old); break; default: diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index 36dda1d..1fa5f2a 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -377,8 +377,7 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct *vma, return ret; } -static int uv_migrate_mem_slot(struct kvm *kvm, - const struct kvm_memory_slot *memslot) +int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot) { unsigned long gfn = memslot->base_gfn; unsigned long end;
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 11:33:49AM -0400, Brian Gerst wrote: > If you move those aliases above all the __SYSCALL_* defines it will > work, since that will get the forward declaration too. This would be > the simplest workaround. That compiles and also passes my exaustive x32 tests (chroot + ls -l). This is the updated version: http://git.infradead.org/users/hch/misc.git/commitdiff/c8d319711ad2f53be003ae8e9be08519068bdcee
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote: > Hi, Peter, > On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote: > > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote: > > > +/* > > > + * Apply some heuristics to see if the #GP fault was caused by a thread > > > + * that hasn't had the IA32_PASID MSR initialized. If it looks like that > > > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic > > > + * guesses incorrectly, take one more #GP fault. > > > > How is that going to help? Aren't we then going to run this same > > heuristic again and again and again? > > The heuristic always initializes the MSR with the per mm PASID IIF the > mm has a valid PASID but the MSR doesn't have one. This heuristic usually > happens only once on the first #GP in a thread. But it doesn't guarantee the PASID is the right one. Suppose both the mm has a PASID and the MSR has a VALID one, but the MSR isn't the mm one. Then we NO-OP. So if the exception was due to us having the wrong PASID, we stuck. > If the next #GP still comes in, the heuristic finds out the MSR already > has a valid PASID and thus will not fixup the MSR any more. The fixup() > returns "false" and lets others to handle the #GP. > > So the heuristic will be executed once (at most) and won't be executed > again and again. So I get that you want to set the PASID on-demand, but I find this all really weird code to make that happen. > > > +bool __fixup_pasid_exception(void) > > > +{ > > > + u64 pasid_msr; > > > + unsigned int pasid; > > > + > > > + /* > > > + * This function is called only when this #GP was triggered from user > > > + * space. So the mm cannot be NULL. > > > + */ > > > + pasid = current->mm->pasid; > > > + /* If the mm doesn't have a valid PASID, then can't help. */ > > > + if (invalid_pasid(pasid)) > > > + return false; > > > + > > > + /* > > > + * Since IRQ is disabled now, the current task still owns the FPU on > > > > That's just weird and confusing. What you want to say is that you rely > > on the exception disabling the interrupt. > > I checked SDM again. You are right. #GP can be interrupted by machine check > or other interrupts. So I cannot assume the current task still owns the FPU. > Instead of directly rdmsr() and wrmsr(), I will add helpers that can access > either the MSR on the processor or the PASID state in the memory. That's not in fact what I meant, but yes, you can take exceptions while !IF just fine. > > > + * this CPU and the PASID MSR can be directly accessed. > > > + * > > > + * If the MSR has a valid PASID, the #GP must be for some other reason. > > > + * > > > + * If rdmsr() is really a performance issue, a TIF_ flag may be > > > + * added to check if the thread has a valid PASID instead of rdmsr(). > > > > I don't understand any of this. Nobody except us writes to this MSR, we > > should bloody well know what's in it. What gives? > > Patch 4 describes how to manage the MSR and patch 7 describes the format > of the MSR (20-bit PASID value and bit 31 is valid bit). > > Are they sufficient to help? Or do you mean something else? I don't get why you need a rdmsr here, or why not having one would require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed? > > > + */ > > > + rdmsrl(MSR_IA32_PASID, pasid_msr); > > > + if (pasid_msr & MSR_IA32_PASID_VALID) > > > + return false; > > > + > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */ > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID); How much more expensive is the wrmsr over the rdmsr? Can't we just unconditionally write the current PASID and call it a day?
Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
Hi, Peter, On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote: > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote: > > +/* > > + * Apply some heuristics to see if the #GP fault was caused by a thread > > + * that hasn't had the IA32_PASID MSR initialized. If it looks like that > > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic > > + * guesses incorrectly, take one more #GP fault. > > How is that going to help? Aren't we then going to run this same > heuristic again and again and again? The heuristic always initializes the MSR with the per mm PASID IIF the mm has a valid PASID but the MSR doesn't have one. This heuristic usually happens only once on the first #GP in a thread. If the next #GP still comes in, the heuristic finds out the MSR already has a valid PASID and thus will not fixup the MSR any more. The fixup() returns "false" and lets others to handle the #GP. So the heuristic will be executed once (at most) and won't be executed again and again. > > > + */ > > +bool __fixup_pasid_exception(void) > > +{ > > + u64 pasid_msr; > > + unsigned int pasid; > > + > > + /* > > +* This function is called only when this #GP was triggered from user > > +* space. So the mm cannot be NULL. > > +*/ > > + pasid = current->mm->pasid; > > + /* If the mm doesn't have a valid PASID, then can't help. */ > > + if (invalid_pasid(pasid)) > > + return false; > > + > > + /* > > +* Since IRQ is disabled now, the current task still owns the FPU on > > That's just weird and confusing. What you want to say is that you rely > on the exception disabling the interrupt. I checked SDM again. You are right. #GP can be interrupted by machine check or other interrupts. So I cannot assume the current task still owns the FPU. Instead of directly rdmsr() and wrmsr(), I will add helpers that can access either the MSR on the processor or the PASID state in the memory. > > > +* this CPU and the PASID MSR can be directly accessed. > > +* > > +* If the MSR has a valid PASID, the #GP must be for some other reason. > > +* > > +* If rdmsr() is really a performance issue, a TIF_ flag may be > > +* added to check if the thread has a valid PASID instead of rdmsr(). > > I don't understand any of this. Nobody except us writes to this MSR, we > should bloody well know what's in it. What gives? Patch 4 describes how to manage the MSR and patch 7 describes the format of the MSR (20-bit PASID value and bit 31 is valid bit). Are they sufficient to help? Or do you mean something else? > > +*/ > > + rdmsrl(MSR_IA32_PASID, pasid_msr); > > + if (pasid_msr & MSR_IA32_PASID_VALID) > > + return false; > > + > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */ > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID); > > + > > + return true; > > +} > > -- > > 2.19.1 > > Thanks. -Fenghua
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 11:10 AM Christoph Hellwig wrote: > > On Mon, Jun 15, 2020 at 04:46:15PM +0200, Arnd Bergmann wrote: > > How about this one: > > > > diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c > > index 3d8d70d3896c..0ce15807cf54 100644 > > --- a/arch/x86/entry/syscall_x32.c > > +++ b/arch/x86/entry/syscall_x32.c > > @@ -16,6 +16,9 @@ > > #undef __SYSCALL_X32 > > #undef __SYSCALL_COMMON > > > > +#define __x32_sys_execve __x64_sys_execve > > +#define __x32_sys_execveat __x64_sys_execveat > > + > > > arch/x86/entry/syscall_x32.c:19:26: error: ‘__x64_sys_execve’ undeclared here > (not in a function); did you mean ‘__x32_sys_execve’? >19 | #define __x32_sys_execve __x64_sys_execve > | ^~~~ > arch/x86/entry/syscall_x32.c:22:39: note: in expansion of macro > ‘__x32_sys_execve’ >22 | #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym, > | ^~ > ./arch/x86/include/generated/asm/syscalls_64.h:344:1: note: in expansion of > macro ‘__SYSCALL_X32’ > 344 | __SYSCALL_X32(520, sys_execve) > | ^ > arch/x86/entry/syscall_x32.c:20:28: error: ‘__x64_sys_execveat’ undeclared > here (not in a function); did you mean ‘__x32_sys_execveat’? >20 | #define __x32_sys_execveat __x64_sys_execveat > |^~ > arch/x86/entry/syscall_x32.c:22:39: note: in expansion of macro > ‘__x32_sys_execveat’ >22 | #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym, > | ^~ > ./arch/x86/include/generated/asm/syscalls_64.h:369:1: note: in expansion of > macro ‘__SYSCALL_X32’ > 369 | __SYSCALL_X32(545, sys_execveat) > | ^ > make[2]: *** [scripts/Makefile.build:281: arch/x86/entry/syscall_x32.o] Error > 1 > make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2 > make[1]: *** Waiting for unfinished jobs > kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x14: unreachable > instruction > make: *** [Makefile:1764: arch/x86] Error 2 > make: *** Waiting for unfinished jobs If you move those aliases above all the __SYSCALL_* defines it will work, since that will get the forward declaration too. This would be the simplest workaround. -- Brian Gerst
[PATCH net v2] ibmvnic: Harden device login requests
The VNIC driver's "login" command sequence is the final step in the driver's initialization process with device firmware, confirming the available device queue resources to be utilized by the driver. Under high system load, firmware may not respond to the request in a timely manner or may abort the request. In such cases, the driver should reattempt the login command sequence. In case of a device error, the number of retries is bounded. Signed-off-by: Thomas Falcon --- v2: declare variables in Reverse Christmas tree format --- drivers/net/ethernet/ibm/ibmvnic.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 1b4d04e..2baf7b3 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -842,12 +842,13 @@ static int ibmvnic_login(struct net_device *netdev) struct ibmvnic_adapter *adapter = netdev_priv(netdev); unsigned long timeout = msecs_to_jiffies(3); int retry_count = 0; + int retries = 10; bool retry; int rc; do { retry = false; - if (retry_count > IBMVNIC_MAX_QUEUES) { + if (retry_count > retries) { netdev_warn(netdev, "Login attempts exceeded\n"); return -1; } @@ -862,11 +863,23 @@ static int ibmvnic_login(struct net_device *netdev) if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { - netdev_warn(netdev, "Login timed out\n"); - return -1; + netdev_warn(netdev, "Login timed out, retrying...\n"); + retry = true; + adapter->init_done_rc = 0; + retry_count++; + continue; } - if (adapter->init_done_rc == PARTIALSUCCESS) { + if (adapter->init_done_rc == ABORTED) { + netdev_warn(netdev, "Login aborted, retrying...\n"); + retry = true; + adapter->init_done_rc = 0; + retry_count++; + /* FW or device may be busy, so +* wait a bit before retrying login +*/ + msleep(500); + } else if (adapter->init_done_rc == PARTIALSUCCESS) { retry_count++; release_sub_crqs(adapter, 1); -- 1.8.3.1
Re: [PATCH] scsi: target/sbp: remove firewire SBP target driver
On 15/06/2020 00:28, Finn Thain wrote: > On Sun, 14 Jun 2020, Chris Boot wrote: > >> I expect that if someone finds this useful it can stick around (but >> that's not my call). > > Who's call is that? If the patch had said "From: Martin K. Petersen" and > "This driver is being removed because it has the following defects..." > that would be some indication of a good-faith willingness to accept users > as developers in the spirit of the GPL, which is what you seem to be > alluding to (?). If you're asking me, I'd say it was martin's call: > SCSI TARGET SUBSYSTEM > > M: "Martin K. Petersen" > [...] > F: drivers/target/ > > F: include/target/ > >> I just don't have the time or inclination or hardware to be able to >> maintain it anymore, so someone else would have to pick it up. >> > > Which is why most drivers get orphaned, right? Sure, but that's not what Martin asked me to do, hence this patch. -- Chris Boot bo...@boo.tc
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 04:46:15PM +0200, Arnd Bergmann wrote: > How about this one: > > diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c > index 3d8d70d3896c..0ce15807cf54 100644 > --- a/arch/x86/entry/syscall_x32.c > +++ b/arch/x86/entry/syscall_x32.c > @@ -16,6 +16,9 @@ > #undef __SYSCALL_X32 > #undef __SYSCALL_COMMON > > +#define __x32_sys_execve __x64_sys_execve > +#define __x32_sys_execveat __x64_sys_execveat > + arch/x86/entry/syscall_x32.c:19:26: error: ‘__x64_sys_execve’ undeclared here (not in a function); did you mean ‘__x32_sys_execve’? 19 | #define __x32_sys_execve __x64_sys_execve | ^~~~ arch/x86/entry/syscall_x32.c:22:39: note: in expansion of macro ‘__x32_sys_execve’ 22 | #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym, | ^~ ./arch/x86/include/generated/asm/syscalls_64.h:344:1: note: in expansion of macro ‘__SYSCALL_X32’ 344 | __SYSCALL_X32(520, sys_execve) | ^ arch/x86/entry/syscall_x32.c:20:28: error: ‘__x64_sys_execveat’ undeclared here (not in a function); did you mean ‘__x32_sys_execveat’? 20 | #define __x32_sys_execveat __x64_sys_execveat |^~ arch/x86/entry/syscall_x32.c:22:39: note: in expansion of macro ‘__x32_sys_execveat’ 22 | #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym, | ^~ ./arch/x86/include/generated/asm/syscalls_64.h:369:1: note: in expansion of macro ‘__SYSCALL_X32’ 369 | __SYSCALL_X32(545, sys_execveat) | ^ make[2]: *** [scripts/Makefile.build:281: arch/x86/entry/syscall_x32.o] Error 1 make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2 make[1]: *** Waiting for unfinished jobs kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x14: unreachable instruction make: *** [Makefile:1764: arch/x86] Error 2 make: *** Waiting for unfinished jobs
Re: [PATCH] powerpc/fsl_booke/32: fix build with CONFIG_RANDOMIZE_BASE
On Sat, 2020-06-13 at 23:28 +0700, Arseny Solokha wrote: > Building the current 5.8 kernel for a e500 machine with > CONFIG_RANDOMIZE_BASE set yields the following failure: > > arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init': > arch/powerpc/mm/nohash/kaslr_booke.c:387:2: error: implicit declaration > of function 'flush_icache_range'; did you mean 'flush_tlb_range'? > [-Werror=implicit-function-declaration] > > Indeed, including asm/cacheflush.h into kaslr_booke.c fixes the build. > > The issue dates back to the introduction of that file and probably went > unnoticed because there's no in-tree defconfig with CONFIG_RANDOMIZE_BASE > set. > > Fixes: 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR infrastructure") > Cc: sta...@vger.kernel.org > Signed-off-by: Arseny Solokha > --- > arch/powerpc/mm/nohash/kaslr_booke.c | 1 + > 1 file changed, 1 insertion(+) Acked-by: Scott Wood -Scott
Re: [PATCH v2 00/12] x86: tag application address space for devices
Hi, Peter, On Mon, Jun 15, 2020 at 09:52:02AM +0200, Peter Zijlstra wrote: > On Fri, Jun 12, 2020 at 05:41:21PM -0700, Fenghua Yu wrote: > > > This series only provides simple and basic support for ENQCMD and the MSR: > > 1. Clean up type definitions (patch 1-3). These patches can be in a > >separate series. > >- Define "pasid" as "unsigned int" consistently (patch 1 and 2). > >- Define "flags" as "unsigned int" > > 2. Explain different various technical terms used in the series (patch 4). > > 3. Enumerate support for ENQCMD in the processor (patch 5). > > 4. Handle FPU PASID state and the MSR during context switch (patches 6-7). > > 5. Define "pasid" in mm_struct (patch 8). > > 5. Clear PASID state for new mm and forked and cloned thread (patch 9-10). > > 6. Allocate and free PASID for a process (patch 11). > > 7. Fix up the PASID MSR in #GP handler when one thread in a process > >executes ENQCMD for the first time (patches 12). > > If this is per mm, should not switch_mm() update the MSR ? I'm not > seeing that, nor do I see it explained why not. PASID value is per mm and all threads in a process have the same PASID value in the MSR. However, the MSR is per thread and is context switched by XSAVES/XRSTROS in patches 6-7. Thanks. -Fenghua
Re: [PATCH net] ibmvnic: Harden device login requests
On 6/12/20 4:10 PM, David Miller wrote: From: Thomas Falcon Date: Fri, 12 Jun 2020 13:31:39 -0500 @@ -841,13 +841,14 @@ static int ibmvnic_login(struct net_device *netdev) { struct ibmvnic_adapter *adapter = netdev_priv(netdev); unsigned long timeout = msecs_to_jiffies(3); + int retries = 10; int retry_count = 0; bool retry; int rc; Reverse christmas tree, please. Oops, sending a v2 soon. Thanks, Tom
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 10:13 AM Christoph Hellwig wrote: > > On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote: > > > #ifdef CONFIG_COMPAT > > > - if (unlikely(argv.is_compat)) { > > > + if (in_compat_syscall()) { > > > + const compat_uptr_t __user *compat_argv = > > > + compat_ptr((unsigned long)argv); > > > compat_uptr_t compat; > > > > > > - if (get_user(compat, argv.ptr.compat + nr)) > > > + if (get_user(compat, compat_argv + nr)) > > > return ERR_PTR(-EFAULT); > > > > > > return compat_ptr(compat); > > > } > > > #endif > > > > I would expect that the "#ifdef CONFIG_COMPAT" can be removed > > now, since compat_ptr() and in_compat_syscall() are now defined > > unconditionally. I have not tried that though. > > True, I'll give it a spin. > > > > +/* > > > + * x32 syscalls are listed in the same table as x86_64 ones, so we need > > > to > > > + * define compat syscalls that are exactly the same as the native > > > version for > > > + * the syscall table machinery to work. Sigh.. > > > + */ > > > +#ifdef CONFIG_X86_X32 > > > COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, > > > - const compat_uptr_t __user *, argv, > > > - const compat_uptr_t __user *, envp) > > > + const char __user *const __user *, argv, > > > + const char __user *const __user *, envp) > > > { > > > - return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, > > > 0); > > > + return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, > > > NULL); > > > } > > > > Maybe move it to arch/x86/kernel/process_64.c or > > arch/x86/entry/syscall_x32.c > > to keep it out of the common code if this is needed. > > I'd rather keep it in common code as that allows all the low-level > exec stuff to be marked static, and avoid us growing new pointless > compat variants through copy and paste. > smart compiler to d > > > I don't really understand > > the comment, why can't this just use this? > > That errors out with: > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to > `__x32_sys_execve' > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to > `__x32_sys_execveat' > make: *** [Makefile:1139: vmlinux] Error 1 I think I have a fix for this, by modifying the syscall wrappers to add an alias for the __x32 variant to the native __x64_sys_foo(). I'll get back to you with a patch. -- Brian Gerst
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 4:43 PM Christoph Hellwig wrote: > > On Mon, Jun 15, 2020 at 04:40:28PM +0200, Arnd Bergmann wrote: > > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to > > > `__x32_sys_execve' > > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to > > > `__x32_sys_execveat' > > > make: *** [Makefile:1139: vmlinux] Error 1 > > > > Ah, I see: it's marked x32-only, so arch/x86/entry/syscall_x32.c > > uses the __x32 prefix instead of the __x64 one. Marking it 'common' > > instead would make it work, but also create an extra entry point > > for native processes, something that commit > > 6365b842aae4 ("x86/syscalls: Split the x32 syscalls into their own table") > > was trying to avoid. > > Marking it common also doesn't compile at all because __NR_execve > and __NR_execveat get redefined in unistd_64.h. I then tried to rename > the x32 versions, which failed in yet another way. At that point I gave > up instead of digging myself into a deeper hole.. How about this one: diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c index 3d8d70d3896c..0ce15807cf54 100644 --- a/arch/x86/entry/syscall_x32.c +++ b/arch/x86/entry/syscall_x32.c @@ -16,6 +16,9 @@ #undef __SYSCALL_X32 #undef __SYSCALL_COMMON +#define __x32_sys_execve __x64_sys_execve +#define __x32_sys_execveat __x64_sys_execveat + #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym, #define __SYSCALL_COMMON(nr, sym) [nr] = __x64_##sym, Still ugly, but much simpler and more localized (if it works). Arnd
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 4:12 PM Christoph Hellwig wrote: > On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote: > > > I don't really understand > > the comment, why can't this just use this? > > That errors out with: > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to > `__x32_sys_execve' > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to > `__x32_sys_execveat' > make: *** [Makefile:1139: vmlinux] Error 1 Ah, I see: it's marked x32-only, so arch/x86/entry/syscall_x32.c uses the __x32 prefix instead of the __x64 one. Marking it 'common' instead would make it work, but also create an extra entry point for native processes, something that commit 6365b842aae4 ("x86/syscalls: Split the x32 syscalls into their own table") was trying to avoid. Arnd
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 04:40:28PM +0200, Arnd Bergmann wrote: > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to > > `__x32_sys_execve' > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to > > `__x32_sys_execveat' > > make: *** [Makefile:1139: vmlinux] Error 1 > > Ah, I see: it's marked x32-only, so arch/x86/entry/syscall_x32.c > uses the __x32 prefix instead of the __x64 one. Marking it 'common' > instead would make it work, but also create an extra entry point > for native processes, something that commit > 6365b842aae4 ("x86/syscalls: Split the x32 syscalls into their own table") > was trying to avoid. Marking it common also doesn't compile at all because __NR_execve and __NR_execveat get redefined in unistd_64.h. I then tried to rename the x32 versions, which failed in yet another way. At that point I gave up instead of digging myself into a deeper hole..
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote: > > #ifdef CONFIG_COMPAT > > - if (unlikely(argv.is_compat)) { > > + if (in_compat_syscall()) { > > + const compat_uptr_t __user *compat_argv = > > + compat_ptr((unsigned long)argv); > > compat_uptr_t compat; > > > > - if (get_user(compat, argv.ptr.compat + nr)) > > + if (get_user(compat, compat_argv + nr)) > > return ERR_PTR(-EFAULT); > > > > return compat_ptr(compat); > > } > > #endif > > I would expect that the "#ifdef CONFIG_COMPAT" can be removed > now, since compat_ptr() and in_compat_syscall() are now defined > unconditionally. I have not tried that though. True, I'll give it a spin. > > +/* > > + * x32 syscalls are listed in the same table as x86_64 ones, so we need to > > + * define compat syscalls that are exactly the same as the native version > > for > > + * the syscall table machinery to work. Sigh.. > > + */ > > +#ifdef CONFIG_X86_X32 > > COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, > > - const compat_uptr_t __user *, argv, > > - const compat_uptr_t __user *, envp) > > + const char __user *const __user *, argv, > > + const char __user *const __user *, envp) > > { > > - return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, 0); > > + return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, > > NULL); > > } > > Maybe move it to arch/x86/kernel/process_64.c or arch/x86/entry/syscall_x32.c > to keep it out of the common code if this is needed. I'd rather keep it in common code as that allows all the low-level exec stuff to be marked static, and avoid us growing new pointless compat variants through copy and paste. smart compiler to d > I don't really understand > the comment, why can't this just use this? That errors out with: ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to `__x32_sys_execve' ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to `__x32_sys_execveat' make: *** [Makefile:1139: vmlinux] Error 1
Re: [PATCH 0/8 v2] PCI: Align return values of PCIe capability and PCI accessors
On Mon, Jun 15, 2020 at 09:32:17AM +0200, refactormys...@gmail.com wrote: > From: Bolarinwa Olayemi Saheed > > > PATCH 1/8 to 7/8: > PCIBIOS_ error codes have positive values and they are passed down the > call heirarchy from accessors. For functions which are meant to return > only a negative value on failure, passing on this value is a bug. > To mitigate this, call pcibios_err_to_errno() before passing on return > value from PCIe capability accessors call heirarchy. This function > converts any positive PCIBIOS_ error codes to negative generic error > values. > > PATCH 8/8: > The PCIe capability accessors can return 0, -EINVAL, or any PCIBIOS_ error > code. The pci accessor on the other hand can only return 0 or any PCIBIOS_ > error code.This inconsistency among these accessor makes it harder for > callers to check for errors. > Return PCIBIOS_BAD_REGISTER_NUMBER instead of -EINVAL in all PCIe > capability accessors. > > MERGING: > These may all be merged via the PCI tree, since it is a collection of > similar fixes. This way they all get merged at once. I prefer this not happen for active trees, it just risks needless merge conflicts. I will take the hfi1 patches at least, let me know when they are reviewed Thanks, Jason
Re: [PATCH 2/6] exec: simplify the compat syscall handling
On Mon, Jun 15, 2020 at 3:00 PM Christoph Hellwig wrote: > > The only differenence betweeen the compat exec* syscalls and their > native versions is that compat_ptr sign extension, and the fact that > the pointer arithmetics for the two dimensional arrays needs to use > the compat pointer size. Instead of the compat wrappers and the > struct user_arg_ptr machinery just use in_compat_syscall() to do the > right thing for the compat case deep inside get_user_arg_ptr(). > > Signed-off-by: Christoph Hellwig Nice! I have an older patch doing the same for sys_mount() somewhere, but never got around to send that. One day we should see if we can just do this for all of them. > - > -static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr) > +static const char __user * > +get_user_arg_ptr(const char __user *const __user *argv, int nr) > { > const char __user *native; > > #ifdef CONFIG_COMPAT > - if (unlikely(argv.is_compat)) { > + if (in_compat_syscall()) { > + const compat_uptr_t __user *compat_argv = > + compat_ptr((unsigned long)argv); > compat_uptr_t compat; > > - if (get_user(compat, argv.ptr.compat + nr)) > + if (get_user(compat, compat_argv + nr)) > return ERR_PTR(-EFAULT); > > return compat_ptr(compat); > } > #endif I would expect that the "#ifdef CONFIG_COMPAT" can be removed now, since compat_ptr() and in_compat_syscall() are now defined unconditionally. I have not tried that though. > +/* > + * x32 syscalls are listed in the same table as x86_64 ones, so we need to > + * define compat syscalls that are exactly the same as the native version for > + * the syscall table machinery to work. Sigh.. > + */ > +#ifdef CONFIG_X86_X32 > COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, > - const compat_uptr_t __user *, argv, > - const compat_uptr_t __user *, envp) > + const char __user *const __user *, argv, > + const char __user *const __user *, envp) > { > - return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, 0); > + return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, NULL); > } Maybe move it to arch/x86/kernel/process_64.c or arch/x86/entry/syscall_x32.c to keep it out of the common code if this is needed. I don't really understand the comment, why can't this just use this? --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -379,7 +379,7 @@ 517x32 recvfromcompat_sys_recvfrom 518x32 sendmsg compat_sys_sendmsg 519x32 recvmsg compat_sys_recvmsg -520x32 execve compat_sys_execve +520x32 execve sys_execve 521x32 ptrace compat_sys_ptrace 522x32 rt_sigpending compat_sys_rt_sigpending 523x32 rt_sigtimedwait compat_sys_rt_sigtimedwait_time64 @@ -404,7 +404,7 @@ 542x32 getsockopt compat_sys_getsockopt 543x32 io_setupcompat_sys_io_setup 544x32 io_submit compat_sys_io_submit -545x32 execveatcompat_sys_execveat +545x32 execveatsys_execveat 546x32 preadv2 compat_sys_preadv64v2 547x32 pwritev2compat_sys_pwritev64v2 548x32 process_madvise compat_sys_process_madvise Arnd
Re: properly support exec and wait with kernel pointers
On Mon, Jun 15, 2020 at 3:00 PM Christoph Hellwig wrote: > > Hi all, > > this series first cleans up the exec code and then adds proper > kernel_execveat and kernel_wait callers instead of relying on the fact > that the early init code and kernel threads implicitly run with > the address limit set to KERNEL_DS. > > Note that the cleanup removes the compat execve(at) handlers (almost) > entirely, as we can handle the compat difference very nicely in a > unified codebase. The only exception is x86 where this would list the > handlers twice in the same syscall table due to the messed up x32 > design. I had to add an extra compat handler just for that case, but > maybe someone has a better idea. I looked at all the patches and I like it a lot. I replied with some suggestions for x32, but maybe I misunderstood what its problem is, as I don't see anything preventing us from having two entries in the x32 table pointing to the same function. Arnd
Re: [PATCH 3/3] powerpc/8xx: Provide ptep_get() with 16k pages
On Mon, Jun 15, 2020 at 12:57:59PM +, Christophe Leroy wrote: > READ_ONCE() now enforces atomic read, which leads to: > Fixes: 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() > memory accesses") > Cc: Will Deacon > Signed-off-by: Christophe Leroy > --- > arch/powerpc/include/asm/nohash/32/pgtable.h | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h > b/arch/powerpc/include/asm/nohash/32/pgtable.h > index b56f14160ae5..77addb599ce7 100644 > --- a/arch/powerpc/include/asm/nohash/32/pgtable.h > +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h > @@ -286,6 +286,16 @@ static inline pte_t ptep_get_and_clear(struct mm_struct > *mm, unsigned long addr, > return __pte(pte_update(mm, addr, ptep, ~0, 0, 0)); > } > > +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES) > +#define __HAVE_ARCH_PTEP_GET > +static inline pte_t ptep_get(pte_t *ptep) > +{ > + pte_t pte = {READ_ONCE(ptep->pte), 0, 0, 0}; > + > + return pte; > +} > +#endif Would it make sense to have a comment with this magic? The casual reader might wonder WTH just happened when he stumbles on this :-) Other than that, the series looks good to me, Thanks! Acked-by: Peter Zijlstra (Intel)
[PATCH] powerpc/ptdump: Fix build failure in hashpagetable.c
H_SUCCESS is only defined when CONFIG_PPC_PSERIES is defined. != H_SUCCESS means != 0. Modify the test accordingly. Reported-by: kernel test robot Fixes: 65e701b2d2a8 ("powerpc/ptdump: drop non vital #ifdefs") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- arch/powerpc/mm/ptdump/hashpagetable.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/ptdump/hashpagetable.c b/arch/powerpc/mm/ptdump/hashpagetable.c index a2c33efc7ce8..5b8bd34cd3a1 100644 --- a/arch/powerpc/mm/ptdump/hashpagetable.c +++ b/arch/powerpc/mm/ptdump/hashpagetable.c @@ -258,7 +258,7 @@ static int pseries_find(unsigned long ea, int psize, bool primary, u64 *v, u64 * for (i = 0; i < HPTES_PER_GROUP; i += 4, hpte_group += 4) { lpar_rc = plpar_pte_read_4(0, hpte_group, (void *)ptes); - if (lpar_rc != H_SUCCESS) + if (lpar_rc) continue; for (j = 0; j < 4; j++) { if (HPTE_V_COMPARE(ptes[j].v, want_v) && -- 2.25.0
[PATCH 6/6] kernel: add a kernel_wait helper
Add a helper that waits for a pid and stores the status in the passed in kernel pointer. Use it to fix the usage of kernel_wait4 in call_usermodehelper_exec_sync that only happens to work due to the implicit set_fs(KERNEL_DS) for kernel threads. Signed-off-by: Christoph Hellwig --- include/linux/sched/task.h | 1 + kernel/exit.c | 16 kernel/umh.c | 29 - 3 files changed, 21 insertions(+), 25 deletions(-) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 38359071236ad7..a80007df396e95 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -102,6 +102,7 @@ struct task_struct *fork_idle(int); struct mm_struct *copy_init_mm(void); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); extern long kernel_wait4(pid_t, int __user *, int, struct rusage *); +int kernel_wait(pid_t pid, int *stat); extern void free_task(struct task_struct *tsk); diff --git a/kernel/exit.c b/kernel/exit.c index 727150f2810338..fd598846df0b17 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1626,6 +1626,22 @@ long kernel_wait4(pid_t upid, int __user *stat_addr, int options, return ret; } +int kernel_wait(pid_t pid, int *stat) +{ + struct wait_opts wo = { + .wo_type= PIDTYPE_PID, + .wo_pid = find_get_pid(pid), + .wo_flags = WEXITED, + }; + int ret; + + ret = do_wait(&wo); + if (ret > 0 && wo.wo_stat) + *stat = wo.wo_stat; + put_pid(wo.wo_pid); + return ret; +} + SYSCALL_DEFINE4(wait4, pid_t, upid, int __user *, stat_addr, int, options, struct rusage __user *, ru) { diff --git a/kernel/umh.c b/kernel/umh.c index 1284823dbad338..6fd948e478bec4 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -126,37 +126,16 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) { pid_t pid; - /* If SIGCLD is ignored kernel_wait4 won't populate the status. */ + /* If SIGCLD is ignored do_wait won't populate the status. */ kernel_sigaction(SIGCHLD, SIG_DFL); pid = kernel_thread(call_usermodehelper_exec_async, sub_info, SIGCHLD); - if (pid < 0) { + if (pid < 0) sub_info->retval = pid; - } else { - int ret = -ECHILD; - /* -* Normally it is bogus to call wait4() from in-kernel because -* wait4() wants to write the exit code to a userspace address. -* But call_usermodehelper_exec_sync() always runs as kernel -* thread (workqueue) and put_user() to a kernel address works -* OK for kernel threads, due to their having an mm_segment_t -* which spans the entire address space. -* -* Thus the __user pointer cast is valid here. -*/ - kernel_wait4(pid, (int __user *)&ret, 0, NULL); - - /* -* If ret is 0, either call_usermodehelper_exec_async failed and -* the real error code is already in sub_info->retval or -* sub_info->retval is 0 anyway, so don't mess with it then. -*/ - if (ret) - sub_info->retval = ret; - } + else + kernel_wait(pid, &sub_info->retval); /* Restore default kernel sig handler */ kernel_sigaction(SIGCHLD, SIG_IGN); - umh_complete(sub_info); } -- 2.26.2
[PATCH 5/6] exec: add a kernel_execveat helper
Add a kernel_execveat helper to execute a binary with kernel space argv and envp pointers. Switch executing init and user mode helpers to this new helper instead of relying on the implicit set_fs(KERNEL_DS) for early init code and kernel threads, and move the getname call into the do_execve helper. Signed-off-by: Christoph Hellwig --- fs/exec.c | 116 +++- include/linux/binfmts.h | 6 +-- init/main.c | 6 +-- kernel/umh.c| 8 ++- 4 files changed, 97 insertions(+), 39 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 696b1e8180d7d8..9e16d56aa53e86 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -439,6 +439,21 @@ static int count_strings(const char __user *const __user *argv) return i; } +static int count_kernel_strings(const char *const *argv) +{ + int i; + + if (!argv) + return 0; + + for (i = 0; argv[i]; i++) { + if (i >= MAX_ARG_STRINGS) + return -E2BIG; + } + + return i; +} + static int check_arg_limit(struct linux_binprm *bprm) { unsigned long limit, ptr_size; @@ -615,6 +630,19 @@ int copy_string_kernel(const char *arg, struct linux_binprm *bprm) } EXPORT_SYMBOL(copy_string_kernel); +static int copy_strings_kernel(int argc, const char *const *argv, + struct linux_binprm *bprm) +{ + int ret; + + while (argc-- > 0) { + ret = copy_string_kernel(argv[argc], bprm); + if (ret) + break; + } + return ret; +} + #ifdef CONFIG_MMU /* @@ -1797,9 +1825,11 @@ static int exec_binprm(struct linux_binprm *bprm) return 0; } -int do_execveat(int fd, struct filename *filename, +static int __do_execveat(int fd, struct filename *filename, const char __user *const __user *argv, const char __user *const __user *envp, + const char *const *kernel_argv, + const char *const *kernel_envp, int flags, struct file *file) { char *pathbuf = NULL; @@ -1880,16 +1910,30 @@ int do_execveat(int fd, struct filename *filename, if (retval) goto out_unmark; - bprm->argc = count_strings(argv); - if (bprm->argc < 0) { - retval = bprm->argc; - goto out; - } + if (unlikely(kernel_argv)) { + bprm->argc = count_kernel_strings(kernel_argv); + if (bprm->argc < 0) { + retval = bprm->argc; + goto out; + } - bprm->envc = count_strings(envp); - if (bprm->envc < 0) { - retval = bprm->envc; - goto out; + bprm->envc = count_kernel_strings(kernel_envp); + if (bprm->envc < 0) { + retval = bprm->envc; + goto out; + } + } else { + bprm->argc = count_strings(argv); + if (bprm->argc < 0) { + retval = bprm->argc; + goto out; + } + + bprm->envc = count_strings(envp); + if (bprm->envc < 0) { + retval = bprm->envc; + goto out; + } } retval = check_arg_limit(bprm); @@ -1906,13 +1950,22 @@ int do_execveat(int fd, struct filename *filename, goto out; bprm->exec = bprm->p; - retval = copy_strings(bprm->envc, envp, bprm); - if (retval < 0) - goto out; - retval = copy_strings(bprm->argc, argv, bprm); - if (retval < 0) - goto out; + if (unlikely(kernel_argv)) { + retval = copy_strings_kernel(bprm->envc, kernel_envp, bprm); + if (retval < 0) + goto out; + retval = copy_strings_kernel(bprm->argc, kernel_argv, bprm); + if (retval < 0) + goto out; + } else { + retval = copy_strings(bprm->envc, envp, bprm); + if (retval < 0) + goto out; + retval = copy_strings(bprm->argc, argv, bprm); + if (retval < 0) + goto out; + } retval = exec_binprm(bprm); if (retval < 0) @@ -1963,6 +2016,23 @@ int do_execveat(int fd, struct filename *filename, return retval; } +static int do_execveat(int fd, const char *filename, + const char __user *const __user *argv, + const char __user *const __user *envp, int flags) +{ + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; + struct filename *name = getname_flags(filename, lookup_flags, NULL); + + return __do_execveat(fd, name, argv, envp, NULL, NULL, flags, NULL); +} + +int kernel_execveat(int fd, const char *file
[PATCH 4/6] exec: split prepare_arg_pages
Move counting the arguments and enviroment variables out of prepare_arg_pages and rename the rest of the function to check_arg_limit. This prepares for a version of do_execvat that takes kernel pointers. Signed-off-by: Christoph Hellwig --- fs/exec.c | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 6e4d9d1ffa35fa..696b1e8180d7d8 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -439,20 +439,10 @@ static int count_strings(const char __user *const __user *argv) return i; } -static int prepare_arg_pages(struct linux_binprm *bprm, - const char __user *const __user *argv, - const char __user *const __user *envp) +static int check_arg_limit(struct linux_binprm *bprm) { unsigned long limit, ptr_size; - bprm->argc = count_strings(argv); - if (bprm->argc < 0) - return bprm->argc; - - bprm->envc = count_strings(envp); - if (bprm->envc < 0) - return bprm->envc; - /* * Limit to 1/4 of the max stack size or 3/4 of _STK_LIM * (whichever is smaller) for the argv+env strings. @@ -1890,7 +1880,19 @@ int do_execveat(int fd, struct filename *filename, if (retval) goto out_unmark; - retval = prepare_arg_pages(bprm, argv, envp); + bprm->argc = count_strings(argv); + if (bprm->argc < 0) { + retval = bprm->argc; + goto out; + } + + bprm->envc = count_strings(envp); + if (bprm->envc < 0) { + retval = bprm->envc; + goto out; + } + + retval = check_arg_limit(bprm); if (retval < 0) goto out; -- 2.26.2
[PATCH 3/6] exec: cleanup the count() function
Remove the max argument as it is hard wired to MAX_ARG_STRINGS, and give the function a slightly less generic name. Signed-off-by: Christoph Hellwig --- fs/exec.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 94107eceda8a67..6e4d9d1ffa35fa 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -411,9 +411,9 @@ get_user_arg_ptr(const char __user *const __user *argv, int nr) } /* - * count() counts the number of strings in array ARGV. + * count_strings() counts the number of strings in array ARGV. */ -static int count(const char __user *const __user *argv, int max) +static int count_strings(const char __user *const __user *argv) { int i = 0; @@ -427,7 +427,7 @@ static int count(const char __user *const __user *argv, int max) if (IS_ERR(p)) return -EFAULT; - if (i >= max) + if (i >= MAX_ARG_STRINGS) return -E2BIG; ++i; @@ -445,11 +445,11 @@ static int prepare_arg_pages(struct linux_binprm *bprm, { unsigned long limit, ptr_size; - bprm->argc = count(argv, MAX_ARG_STRINGS); + bprm->argc = count_strings(argv); if (bprm->argc < 0) return bprm->argc; - bprm->envc = count(envp, MAX_ARG_STRINGS); + bprm->envc = count_strings(envp); if (bprm->envc < 0) return bprm->envc; -- 2.26.2
[PATCH 2/6] exec: simplify the compat syscall handling
The only differenence betweeen the compat exec* syscalls and their native versions is that compat_ptr sign extension, and the fact that the pointer arithmetics for the two dimensional arrays needs to use the compat pointer size. Instead of the compat wrappers and the struct user_arg_ptr machinery just use in_compat_syscall() to do the right thing for the compat case deep inside get_user_arg_ptr(). Signed-off-by: Christoph Hellwig --- arch/arm64/include/asm/unistd32.h | 4 +- arch/mips/kernel/syscalls/syscall_n32.tbl | 4 +- arch/mips/kernel/syscalls/syscall_o32.tbl | 4 +- arch/parisc/kernel/syscalls/syscall.tbl | 4 +- arch/powerpc/kernel/syscalls/syscall.tbl | 4 +- arch/s390/kernel/syscalls/syscall.tbl | 4 +- arch/sparc/kernel/syscalls.S | 4 +- arch/x86/entry/syscalls/syscall_32.tbl| 4 +- fs/exec.c | 89 ++- include/linux/compat.h| 7 -- include/uapi/asm-generic/unistd.h | 4 +- tools/include/uapi/asm-generic/unistd.h | 4 +- .../arch/powerpc/entry/syscalls/syscall.tbl | 4 +- .../perf/arch/s390/entry/syscalls/syscall.tbl | 4 +- 14 files changed, 53 insertions(+), 91 deletions(-) diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 6d95d0c8bf2f47..141f5d2ff1c34f 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -33,7 +33,7 @@ __SYSCALL(__NR_link, sys_link) #define __NR_unlink 10 __SYSCALL(__NR_unlink, sys_unlink) #define __NR_execve 11 -__SYSCALL(__NR_execve, compat_sys_execve) +__SYSCALL(__NR_execve, sys_execve) #define __NR_chdir 12 __SYSCALL(__NR_chdir, sys_chdir) /* 13 was sys_time */ @@ -785,7 +785,7 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create) #define __NR_bpf 386 __SYSCALL(__NR_bpf, sys_bpf) #define __NR_execveat 387 -__SYSCALL(__NR_execveat, compat_sys_execveat) +__SYSCALL(__NR_execveat, sys_execveat) #define __NR_userfaultfd 388 __SYSCALL(__NR_userfaultfd, sys_userfaultfd) #define __NR_membarrier 389 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index f777141f52568f..e861b5ab7179c9 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -64,7 +64,7 @@ 54 n32 getsockopt compat_sys_getsockopt 55 n32 clone __sys_clone 56 n32 fork__sys_fork -57 n32 execve compat_sys_execve +57 n32 execve sys_execve 58 n32 exitsys_exit 59 n32 wait4 compat_sys_wait4 60 n32 killsys_kill @@ -328,7 +328,7 @@ 317n32 getrandom sys_getrandom 318n32 memfd_createsys_memfd_create 319n32 bpf sys_bpf -320n32 execveatcompat_sys_execveat +320n32 execveatsys_execveat 321n32 userfaultfd sys_userfaultfd 322n32 membarrier sys_membarrier 323n32 mlock2 sys_mlock2 diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 13280625d312e9..bba80f74e9968e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -18,7 +18,7 @@ 8 o32 creat sys_creat 9 o32 linksys_link 10 o32 unlink sys_unlink -11 o32 execve sys_execve compat_sys_execve +11 o32 execve sys_execve 12 o32 chdir sys_chdir 13 o32 timesys_time32 14 o32 mknod sys_mknod @@ -367,7 +367,7 @@ 353o32 getrandom sys_getrandom 354o32 memfd_createsys_memfd_create 355o32 bpf sys_bpf -356o32 execveatsys_execveat compat_sys_execveat +356o32 execveatsys_execveat 357o32 userfaultfd sys_userfaultfd 358o32 membarrier sys_membarrier 359o32 mlock2 sys_mlock2 diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 5a758fa6ec5242..23fa0d0edf3384 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -18,7 +18,7 @@ 8 common creat
properly support exec and wait with kernel pointers
Hi all, this series first cleans up the exec code and then adds proper kernel_execveat and kernel_wait callers instead of relying on the fact that the early init code and kernel threads implicitly run with the address limit set to KERNEL_DS. Note that the cleanup removes the compat execve(at) handlers (almost) entirely, as we can handle the compat difference very nicely in a unified codebase. The only exception is x86 where this would list the handlers twice in the same syscall table due to the messed up x32 design. I had to add an extra compat handler just for that case, but maybe someone has a better idea.
[PATCH 1/6] exec: cleanup the execve wrappers
Remove a whole bunch of wrappers that eventually all call __do_execve_file, and consolidate the execvce helpers to: (1) __do_execveat, which is the lowest level helper implementing the actual functionality (2) do_execvat, which is used by all callers that want native pointers (3) do_compat_execve, which is used by all compat syscalls Signed-off-by: Christoph Hellwig --- fs/exec.c | 98 +++-- include/linux/binfmts.h | 12 ++--- init/main.c | 7 +-- kernel/umh.c| 16 +++ 4 files changed, 41 insertions(+), 92 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index e6e8a9a7032784..354fdaa536ae7d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1815,10 +1815,7 @@ static int exec_binprm(struct linux_binprm *bprm) return 0; } -/* - * sys_execve() executes a new program. - */ -static int __do_execve_file(int fd, struct filename *filename, +static int __do_execveat(int fd, struct filename *filename, struct user_arg_ptr argv, struct user_arg_ptr envp, int flags, struct file *file) @@ -1972,74 +1969,16 @@ static int __do_execve_file(int fd, struct filename *filename, return retval; } -static int do_execveat_common(int fd, struct filename *filename, - struct user_arg_ptr argv, - struct user_arg_ptr envp, - int flags) -{ - return __do_execve_file(fd, filename, argv, envp, flags, NULL); -} - -int do_execve_file(struct file *file, void *__argv, void *__envp) -{ - struct user_arg_ptr argv = { .ptr.native = __argv }; - struct user_arg_ptr envp = { .ptr.native = __envp }; - - return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file); -} - -int do_execve(struct filename *filename, - const char __user *const __user *__argv, - const char __user *const __user *__envp) -{ - struct user_arg_ptr argv = { .ptr.native = __argv }; - struct user_arg_ptr envp = { .ptr.native = __envp }; - return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); -} - int do_execveat(int fd, struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp, - int flags) + int flags, struct file *file) { struct user_arg_ptr argv = { .ptr.native = __argv }; struct user_arg_ptr envp = { .ptr.native = __envp }; - return do_execveat_common(fd, filename, argv, envp, flags); -} - -#ifdef CONFIG_COMPAT -static int compat_do_execve(struct filename *filename, - const compat_uptr_t __user *__argv, - const compat_uptr_t __user *__envp) -{ - struct user_arg_ptr argv = { - .is_compat = true, - .ptr.compat = __argv, - }; - struct user_arg_ptr envp = { - .is_compat = true, - .ptr.compat = __envp, - }; - return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); -} - -static int compat_do_execveat(int fd, struct filename *filename, - const compat_uptr_t __user *__argv, - const compat_uptr_t __user *__envp, - int flags) -{ - struct user_arg_ptr argv = { - .is_compat = true, - .ptr.compat = __argv, - }; - struct user_arg_ptr envp = { - .is_compat = true, - .ptr.compat = __envp, - }; - return do_execveat_common(fd, filename, argv, envp, flags); + return __do_execveat(fd, filename, argv, envp, flags, file); } -#endif void set_binfmt(struct linux_binfmt *new) { @@ -2070,7 +2009,7 @@ SYSCALL_DEFINE3(execve, const char __user *const __user *, argv, const char __user *const __user *, envp) { - return do_execve(getname(filename), argv, envp); + return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, NULL); } SYSCALL_DEFINE5(execveat, @@ -2080,18 +2019,34 @@ SYSCALL_DEFINE5(execveat, int, flags) { int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; + struct filename *name = getname_flags(filename, lookup_flags, NULL); - return do_execveat(fd, - getname_flags(filename, lookup_flags, NULL), - argv, envp, flags); + return do_execveat(fd, name, argv, envp, flags, NULL); } #ifdef CONFIG_COMPAT +static int do_compat_execve(int fd, struct filename *filename, + const compat_uptr_t __user *__argv, + const compat_uptr_t __user *__envp, + int flags) +{ + struct user_arg_ptr argv = { + .is_compat = true, + .ptr.compat = __argv, + }; + struct user_arg_ptr envp = { + .is_compat = true, +
[PATCH 3/3] powerpc/8xx: Provide ptep_get() with 16k pages
READ_ONCE() now enforces atomic read, which leads to: CC mm/gup.o In file included from ./include/linux/kernel.h:11:0, from mm/gup.c:2: In function 'gup_hugepte.constprop', inlined from 'gup_huge_pd.isra.79' at mm/gup.c:2465:8: ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_222' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^ ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ ^ ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type' compiletime_assert_rwonce_type(x);\ ^ mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE' pte = READ_ONCE(*ptep); ^ In function 'gup_get_pte', inlined from 'gup_pte_range' at mm/gup.c:2228:9, inlined from 'gup_pmd_range' at mm/gup.c:2613:15, inlined from 'gup_pud_range' at mm/gup.c:2641:15, inlined from 'gup_p4d_range' at mm/gup.c:2666:15, inlined from 'gup_pgd_range' at mm/gup.c:2694:15, inlined from 'internal_get_user_pages_fast' at mm/gup.c:2795:3: ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_219' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^ ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ ^ ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type' compiletime_assert_rwonce_type(x);\ ^ mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE' return READ_ONCE(*ptep); ^ make[2]: *** [mm/gup.o] Error 1 Define ptep_get() on 8xx when using 16k pages. Fixes: 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses") Cc: Will Deacon Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/nohash/32/pgtable.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h index b56f14160ae5..77addb599ce7 100644 --- a/arch/powerpc/include/asm/nohash/32/pgtable.h +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h @@ -286,6 +286,16 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, return __pte(pte_update(mm, addr, ptep, ~0, 0, 0)); } +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES) +#define __HAVE_ARCH_PTEP_GET +static inline pte_t ptep_get(pte_t *ptep) +{ + pte_t pte = {READ_ONCE(ptep->pte), 0, 0, 0}; + + return pte; +} +#endif + #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -- 2.25.0
[PATCH 0/3] Fix build failure with v5.8-rc1
Commit 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses") leads to following build failure on powerpc 8xx. To fix it, this small series introduces a new helper named ptep_get() to replace the direct access with READ_ONCE(). This new helper can be overriden by architectures. CC mm/gup.o In file included from ./include/linux/kernel.h:11:0, from mm/gup.c:2: In function 'gup_hugepte.constprop', inlined from 'gup_huge_pd.isra.79' at mm/gup.c:2465:8: ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_222' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^ ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ ^ ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type' compiletime_assert_rwonce_type(x);\ ^ mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE' pte = READ_ONCE(*ptep); ^ In function 'gup_get_pte', inlined from 'gup_pte_range' at mm/gup.c:2228:9, inlined from 'gup_pmd_range' at mm/gup.c:2613:15, inlined from 'gup_pud_range' at mm/gup.c:2641:15, inlined from 'gup_p4d_range' at mm/gup.c:2666:15, inlined from 'gup_pgd_range' at mm/gup.c:2694:15, inlined from 'internal_get_user_pages_fast' at mm/gup.c:2795:3: ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_219' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^ ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ^ ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ ^ ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type' compiletime_assert_rwonce_type(x);\ ^ mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE' return READ_ONCE(*ptep); ^ make[2]: *** [mm/gup.o] Error 1 Christophe Leroy (3): mm/gup: Use huge_ptep_get() in gup_hugepte() mm: Allow arches to provide ptep_get() powerpc/8xx: Provide ptep_get() with 16k pages arch/powerpc/include/asm/nohash/32/pgtable.h | 10 ++ include/asm-generic/hugetlb.h| 2 +- include/linux/pgtable.h | 7 +++ mm/gup.c | 4 ++-- 4 files changed, 20 insertions(+), 3 deletions(-) -- 2.25.0
[PATCH 1/3] mm/gup: Use huge_ptep_get() in gup_hugepte()
gup_hugepte() reads hugepage table entries, it can't read them directly, huge_ptep_get() must be used. Fixes: 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses") Cc: Will Deacon Signed-off-by: Christophe Leroy --- mm/gup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/gup.c b/mm/gup.c index de9e36262ccb..761df4944ef5 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2425,7 +2425,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, if (pte_end < end) end = pte_end; - pte = READ_ONCE(*ptep); + pte = huge_ptep_get(ptep); if (!pte_access_permitted(pte, flags & FOLL_WRITE)) return 0; -- 2.25.0
[PATCH 2/3] mm: Allow arches to provide ptep_get()
Since commit 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses"), it is not possible anymore to use READ_ONCE() to access complex page table entries like the one defined for powerpc 8xx with 16k size pages. Define a ptep_get() helper that architectures can override instead of performing a READ_ONCE() on the page table entry pointer. Fixes: 2ab3a0a02905 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses") Cc: Will Deacon Signed-off-by: Christophe Leroy --- include/asm-generic/hugetlb.h | 2 +- include/linux/pgtable.h | 7 +++ mm/gup.c | 2 +- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index 40f85decc2ee..8e1e6244a89d 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -122,7 +122,7 @@ static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma, #ifndef __HAVE_ARCH_HUGE_PTEP_GET static inline pte_t huge_ptep_get(pte_t *ptep) { - return READ_ONCE(*ptep); + return ptep_get(ptep); } #endif diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 32b6c52d41b9..56c1e8eb7bb0 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -249,6 +249,13 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, } #endif +#ifndef __HAVE_ARCH_PTEP_GET +static inline pte_t ptep_get(pte_t *ptep) +{ + return READ_ONCE(*ptep); +} +#endif + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #ifndef __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, diff --git a/mm/gup.c b/mm/gup.c index 761df4944ef5..6f47697f8fb0 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2196,7 +2196,7 @@ static inline pte_t gup_get_pte(pte_t *ptep) */ static inline pte_t gup_get_pte(pte_t *ptep) { - return READ_ONCE(*ptep); + return ptep_get(ptep); } #endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */ -- 2.25.0
Re: [PATCH v13 2/6] seq_buf: Export seq_buf_printf
On Mon, Jun 15, 2020 at 06:14:03PM +0530, Vaibhav Jain wrote: > 'seq_buf' provides a very useful abstraction for writing to a string > buffer without needing to worry about it over-flowing. However even > though the API has been stable for couple of years now its still not > exported to kernel loadable modules limiting its usage. > > Hence this patch proposes update to 'seq_buf.c' to mark > seq_buf_printf() which is part of the seq_buf API to be exported to > kernel loadable GPL modules. This symbol will be used in later parts > of this patch-set to simplify content creation for a sysfs attribute. > > Cc: Piotr Maziarz > Cc: Cezary Rojewski > Cc: Christoph Hellwig > Cc: Steven Rostedt > Cc: Borislav Petkov > Acked-by: Steven Rostedt (VMware) > Signed-off-by: Vaibhav Jain > --- > Changelog: > > v12..v13: > * None > > v11..v12: > * None Can you please resend your patchset once a week like everyone else and not flood inboxes with it? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v12 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
This accidently got reposted. Please ignore. v13 version of the patch series located at https://lore.kernel.org/linux-nvdimm/20200615124407.32596-1-vaib...@linux.ibm.com Vaibhav Jain writes: > Changes since v11 [1]: > * Minor update to 'papr_pdsm.h' fixing a misleading comment about > 'possible' padding being added by GCC which doesn't apply in case > structs are marked as __packed. > * Fix the order of initialization of 'struct nd_papr_pdsm_health' in > papr_pdsm_health(). > * Added acks from Ira for various patches. > > [1] > https://lore.kernel.org/linux-nvdimm/20200607131339.476036-1-vaib...@linux.ibm.com > --- > > The PAPR standard[2][4] provides mechanisms to query the health and > performance stats of an NVDIMM via various hcalls as described in > Ref[3]. Until now these stats were never available nor exposed to the > user-space tools like 'ndctl'. This is partly due to PAPR platform not > having support for ACPI and NFIT. Hence 'ndctl' is unable to query and > report the dimm health status and a user had no way to determine the > current health status of a NDVIMM. > > To overcome this limitation, this patch-set updates papr_scm kernel > module to query and fetch NVDIMM health stats using hcalls described > in Ref[3]. This health and performance stats are then exposed to > userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by > libndctl. > > These changes coupled with proposed ndtcl changes located at Ref[5] > should provide a way for the user to retrieve NVDIMM health status > using ndtcl. > > Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM > in a emulation environment: > > # ndctl list -DH > [ > { > "dev":"nmem0", > "health":{ > "health_state":"fatal", > "shutdown_state":"dirty" > } > } > ] > > Dimm health report output on a pseries guest lpar with vPMEM or HMS > based NVDIMMs that are in perfectly healthy conditions: > > # ndctl list -d nmem0 -H > [ > { > "dev":"nmem0", > "health":{ > "health_state":"ok", > "shutdown_state":"clean" > } > } > ] > > PAPR NVDIMM-Specific-Methods(PDSM) > == > > PDSM requests are issued by vendor specific code in libndctl to > execute certain operations or fetch information from NVDIMMS. PDSMs > requests can be sent to papr_scm module via libndctl(userspace) and > libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be > handled in the dimm control function papr_scm_ndctl(). Current > patchset proposes a single PDSM to retrieve NVDIMM health, defined in > the newly introduced uapi header named 'papr_pdsm.h'. Support for > more PDSMs will be added in future. > > Structure of the patch-set > == > > The patch-set starts with a doc patch documenting details of hcall > H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf() > thats used in subsequent patches to generate sysfs attribute content. > > Third patch implements support for fetching NVDIMM health information > from PHYP and partially exposing it to user-space via a NVDIMM sysfs > flag. > > Fourth patch updates papr_scm_ndctl() to handle a possible error case > and also improve debug logging. > > Fifth patch deals with implementing support for servicing PDSM > commands in papr_scm module. > > Finally the last patch implements support for servicing PDSM > 'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to > libndctl. > > References: > [2] "Power Architecture Platform Reference" > https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference > [3] commit 58b278f568f0 > ("powerpc: Provide initial documentation for PAPR hcalls") > [4] "Linux on Power Architecture Platform Reference" > https://members.openpowerfoundation.org/document/dl/469 > [5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v12 > > --- > > Vaibhav Jain (6): > powerpc: Document details on H_SCM_HEALTH hcall > seq_buf: Export seq_buf_printf > powerpc/papr_scm: Fetch nvdimm health information from PHYP > powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl() > ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods > powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH > > Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++ > Documentation/powerpc/papr_hcalls.rst | 46 ++- > arch/powerpc/include/uapi/asm/papr_pdsm.h | 125 ++ > arch/powerpc/platforms/pseries/papr_scm.c | 373 +- > include/uapi/linux/ndctl.h| 1 + > lib/seq_buf.c | 1 + > 6 files changed, 562 insertions(+), 11 deletions(-) > create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem > create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h > > -- > 2.26.2 > -- Cheers ~ Vaibhav
Re: [PATCH v12 1/6] powerpc: Document details on H_SCM_HEALTH hcall
This accidently got reposted. Please ignore. Vaibhav Jain writes: > Add documentation to 'papr_hcalls.rst' describing the bitmap flags > that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM > specification. > > Cc: "Aneesh Kumar K . V" > Cc: Dan Williams > Cc: Michael Ellerman > Cc: Ira Weiny > Acked-by: Ira Weiny > Signed-off-by: Vaibhav Jain > --- > Changelog: > > v11..v12: > * None > > v10..v11: > * None > > v9..v10: > * Added ack from Ira. > > Resend: > * None > > v8..v9: > * s/SCM/PMEM device. [ Dan Williams, Aneesh ] > > v7..v8: > * Added a clarification on bit-ordering of Health Bitmap > > Resend: > * None > > v6..v7: > * None > > v5..v6: > * New patch in the series > --- > Documentation/powerpc/papr_hcalls.rst | 46 --- > 1 file changed, 42 insertions(+), 4 deletions(-) > > diff --git a/Documentation/powerpc/papr_hcalls.rst > b/Documentation/powerpc/papr_hcalls.rst > index 3493631a60f8..48fcf1255a33 100644 > --- a/Documentation/powerpc/papr_hcalls.rst > +++ b/Documentation/powerpc/papr_hcalls.rst > @@ -220,13 +220,51 @@ from the LPAR memory. > **H_SCM_HEALTH** > > | Input: drcIndex > -| Out: *health-bitmap, health-bit-valid-bitmap* > +| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* > | Return Value: *H_Success, H_Parameter, H_Hardware* > > Given a DRC Index return the info on predictive failure and overall health of > -the NVDIMM. The asserted bits in the health-bitmap indicate a single > predictive > -failure and health-bit-valid-bitmap indicate which bits in health-bitmap are > -valid. > +the PMEM device. The asserted bits in the health-bitmap indicate one or more > states > +(described in table below) of the PMEM device and health-bit-valid-bitmap > indicate > +which bits in health-bitmap are valid. The bits are reported in > +reverse bit ordering for example a value of 0xC400 > +indicates bits 0, 1, and 5 are valid. > + > +Health Bitmap Flags: > + > ++--+---+ > +| Bit | Definition > | > ++==+===+ > +| 00 | PMEM device is unable to persist memory contents. > | > +| | If the system is powered down, nothing will be saved. > | > ++--+---+ > +| 01 | PMEM device failed to persist memory contents. Either contents were > | > +| | not saved successfully on power down or were not restored properly > on | > +| | power up. > | > ++--+---+ > +| 02 | PMEM device contents are persisted from previous IPL. The data from > | > +| | the last boot were successfully restored. > | > ++--+---+ > +| 03 | PMEM device contents are not persisted from previous IPL. There was > no| > +| | data to restore from the last boot. > | > ++--+---+ > +| 04 | PMEM device memory life remaining is critically low > | > ++--+---+ > +| 05 | PMEM device will be garded off next IPL due to failure > | > ++--+---+ > +| 06 | PMEM device contents cannot persist due to current platform health > | > +| | status. A hardware failure may prevent data from being saved or > | > +| | restored. > | > ++--+---+ > +| 07 | PMEM device is unable to persist memory contents in certain > conditions| > ++--+---+ > +| 08 | PMEM device is encrypted > | > ++--+---+ > +| 09 | PMEM device has successfully completed a requested erase or secure > | > +| | erase procedure. > | > ++--+---+ > +|10:63 | Reserved / Unused > | > ++--+---+ > > **H_SCM_PERFORMANCE_STATS** > > -- > 2.26.2 > -- Cheers ~ Vaibhav
[PATCH v13 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm module and add the command family NVDIMM_FAMILY_PAPR to the white list of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the nvdimm command mask and implement necessary scaffolding in the module to handle ND_CMD_CALL ioctl and PDSM requests that we receive. The layout of the PDSM request as we expect from libnvdimm/libndctl is described in newly introduced uapi header 'papr_pdsm.h' which defines a 'struct nd_pkg_pdsm' and a maximal union named 'nd_pdsm_payload'. These new structs together with 'struct nd_cmd_pkg' for a pdsm envelop thats sent by libndctl to libnvdimm and serviced by papr_scm in 'papr_scm_service_pdsm()'. The PDSM request is communicated by member 'struct nd_cmd_pkg.nd_command' together with other information on the pdsm payload (size-in, size-out). The patch also introduces 'struct pdsm_cmd_desc' instances of which are stored in an array __pdsm_cmd_descriptors[] indexed with PDSM cmd and corresponding access function pdsm_cmd_desc() is introduced. 'struct pdsm_cdm_desc' holds the service function for a given PDSM and corresponding payload in/out sizes. A new function papr_scm_service_pdsm() is introduced and is called from papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL command from libnvdimm. The function performs validation on the PDSM payload based on info present in corresponding PDSM descriptor and if valid calls the 'struct pdcm_cmd_desc.service' function to service the PDSM. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v12..v13: * s/struct nd_pdsm_cmd_pkg/struct nd_pkg_pdsm/ * Removed instance of 'struct nd_cmd_pkg hdr' from 'struct nd_pdsm_cmd_pkg'. * Introduced 'union nd_pdsm_payload' thats a maximal union of all possible payload structs. * Instead of having flexible 'payload' member 'struct nd_pdsm_cmd_pkg' replace it with a 'union nd_pdsm_payload'. * Introduce pdsm descriptor 'struct pdsm_cmd_desc' and its array __pdsm_cmd_descriptors[] that holds a payload 'size_[in|out]' and service function for each pdsm. This is analogus to '__nd_cmd_dimm_descs[]' * Introduce function 'pdsm_cmd_desc()' to fetch the corrosponding pdsm descriptor for each valid pdsm. * Updated papr_scm_service_pdsm() to use 'pdsm_cmd_desc()' and apply checks on 'nd_cmd_pkg' payload based on psdm descriptor members and finally service the pdsm using the 'service' member of the descriptor. * Updated 'papr_scm_ndctl()' to send the 'struct nd_cmd_pkg*' instance to papr_scm_service_pdsm() instead of a 'struct nd_pdsm_cmd_pkg*'. * Updated documentation in 'papr_psdm.h' to reflect new pdsm envelope layout. * Update patch description. v11..v12: * Updated a misleading comment in 'papr_pdsm.h' regarding payload size. [ Ira ] v10..v11: * Moved in-lines 'nd_pdsm_cmd_pkg()' and 'pdsm_cmd_to_payload()' from 'papr_pdsm.h' header to 'papr_scm.c'. The avoids a potential license incompatibility issue with non-GPL-2.0 user-space code trying to include the header in its code. [ Ira ] * Verified papr_pdsm.h with UAPI_HEADER_TEST config. * Moved the is_cmd_valid() check in papr_scm_ndctl() before check for cmd_rc == NULL. This prevents cmd_rc to be updated in case the nd-cmd is invalid or unknown. v9..v10: * Simplified 'struct nd_pdsm_cmd_pkg' by removing the 'payload_version' field. * Removed the corrosponding documentation on versioning and backward compatibility from 'papr_pdsm.h' * Reduced the size of reserved fields to 4-bytes making 'struct nd_pdsm_cmd_pkg' 64 + 8 bytes long. * Updated is_cmd_valid() to enforce validation checks on pdsm commands. [ Dan Williams ] * Added check for reserved fields being set to '0' in is_cmd_valid() [ Ira ] * Moved changes for checking cmd_rc == NULL and logging improvements to a separate prelim patch [ Ira ]. * Moved pdsm package validation checks from papr_scm_service_pdsm() to is_cmd_valid(). * Marked papr_scm_service_pdsm() return type as 'void' since errors are reported in nd_pdsm_cmd_pkg.cmd_status field. Resend: * Added ack from Aneesh. v8..v9: * Reduced the usage of term SCM replacing it with appropriate replacement [ Dan Williams, Aneesh ] * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h' * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'. * Minor update to patch description. v7..v8: * Removed the 'payload_offset' field from 'struct nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ] * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg', 'reserved' field of 10-bytes is introduced. [ Aneesh ] * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h [ Ira ] Resend: * None v6..v7 : * Removed the re-definitions of __packed macro from papr_scm_pdsm.h [Mpe]. * R
[PATCH v13 2/6] seq_buf: Export seq_buf_printf
'seq_buf' provides a very useful abstraction for writing to a string buffer without needing to worry about it over-flowing. However even though the API has been stable for couple of years now its still not exported to kernel loadable modules limiting its usage. Hence this patch proposes update to 'seq_buf.c' to mark seq_buf_printf() which is part of the seq_buf API to be exported to kernel loadable GPL modules. This symbol will be used in later parts of this patch-set to simplify content creation for a sysfs attribute. Cc: Piotr Maziarz Cc: Cezary Rojewski Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Borislav Petkov Acked-by: Steven Rostedt (VMware) Signed-off-by: Vaibhav Jain --- Changelog: v12..v13: * None v11..v12: * None v10..v11: * None v9..v10: * None Resend: * Added ack from Steven Rostedt v8..v9: * None v7..v8: * Updated the patch title [ Christoph Hellwig ] * Updated patch description to replace confusing term 'external kernel modules' to 'kernel lodable modules'. Resend: * Added ack from Steven Rostedt v6..v7: * New patch in the series --- lib/seq_buf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/seq_buf.c b/lib/seq_buf.c index 4e865d42ab03..707453f5d58e 100644 --- a/lib/seq_buf.c +++ b/lib/seq_buf.c @@ -91,6 +91,7 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...) return ret; } +EXPORT_SYMBOL_GPL(seq_buf_printf); #ifdef CONFIG_BINARY_PRINTF /** -- 2.26.2
[PATCH v13 4/6] powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
Since papr_scm_ndctl() can be called from outside papr_scm, its exposed to the possibility of receiving NULL as value of 'cmd_rc' argument. This patch updates papr_scm_ndctl() to protect against such possibility by assigning it pointer to a local variable in case cmd_rc == NULL. Finally the patch also updates the 'default' add a debug log unknown 'cmd' values. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Reviewed-by: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v12..v13: * None v11..v12: * Added ack from Ira v10..v11: * Instead of returning *cmd_rd just return '0' in case nd_cmd is handled. In case of unknown nd-cmd return -EINVAL [ Ira and Dan Williams ] * Updated patch description. v9..v10 * New patch in the series --- arch/powerpc/platforms/pseries/papr_scm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 0c091622b15e..692ad3d79826 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -355,11 +355,16 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, { struct nd_cmd_get_config_size *get_size_hdr; struct papr_scm_priv *p; + int rc; /* Only dimm-specific calls are supported atm */ if (!nvdimm) return -EINVAL; + /* Use a local variable in case cmd_rc pointer is NULL */ + if (!cmd_rc) + cmd_rc = &rc; + p = nvdimm_provider_data(nvdimm); switch (cmd) { @@ -381,6 +386,7 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, break; default: + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd); return -EINVAL; } -- 2.26.2
[PATCH v13 6/6] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
This patch implements support for PDSM request 'PAPR_PDSM_HEALTH' that returns a newly introduced 'struct nd_papr_pdsm_health' instance containing dimm health information back to user space in response to ND_CMD_CALL. This functionality is implemented in newly introduced papr_pdsm_health() that queries the nvdimm health information and then copies this information to the package payload whose layout is defined by 'struct nd_papr_pdsm_health'. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v12..v13: * Added the 'struct nd_papr_pdsm_health' to 'union nd_pdsm_payload' * Added pdsm descriptor of 'PAPR_PDSM_HEALTH' to __pdsm_cmd_descriptors[]. * Updated papr_pdsm_health() to copy the dimm health information directly into PDSM payload rather then copying it to an intermediate struct and then memcpy it to the PDSM payload area. v11..v12: * Minor: Reodered the initialization of 'struct nd_papr_pdsm_health' fields to match order present in its definition. [ Ira ] * Added ack from Ira v10..v11: * Changed the definition of 'struct nd_papr_pdsm_health' to a maximal struct 184 bytes in size [ Dan Williams ] * Added new field 'extension_flags' to 'struct nd_papr_pdsm_health' [ Dan Williams ] * Updated papr_pdsm_health() to set field 'extension_flags' to 0. * Introduced a define ND_PDSM_PAYLOAD_MAX_SIZE that indicates the maximum size of a payload. * Fixed a suspicious conversion from u64 to u8 in papr_pdsm_health that was preventing correct initialization of 'struct nd_papr_pdsm_health'. [ Ira ] v9..v10: * Removed code in papr_pdsm_health that performed validation on pdsm payload version and corrosponding struct and defines used for validation of payload version. * Dropped usage of struct papr_pdsm_health in 'struct papr_scm_priv'. Instead papr_psdm_health() now uses 'papr_scm_priv.health_bitmap' to populate the pdsm payload. * Above change also fixes the problem where this patch was removing the code that was previously introduced in this patch-series. [ Ira ] * Introduced a new def ND_PDSM_ENVELOPE_HDR_SIZE that indicates the space allocated to 'struct nd_pdsm_cmd_pkg' fields except 'struct nd_cmd_pkg'. This def is useful in validating payload sizes. * Reworked papr_pdsm_health() to enforce a specific payload size for 'PAPR_PDSM_HEALTH' pdsm request. Resend: * Added ack from Aneesh. v8..v9: * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ] * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g * Renamed papr_scm_get_health() to papr_psdm_health() * Updated patch description to replace papr-scm dimm with nvdimm. v7..v8: * None Resend: * None v6..v7: * Updated flags_show() to use seq_buf_printf(). [Mpe] * Updated papr_scm_get_health() to use newly introduced __drc_pmem_query_health() bypassing the cache [Mpe]. v5..v6: * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to gaurd against possibility of different compilers adding different paddings to the struct [ Dan Williams ] * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of 'bool' and also updated drc_pmem_query_health() to take this into account. [ Dan Williams ] v4..v5: * None v3..v4: * Call the DSM_PAPR_SCM_HEALTH service function from papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh] v2..v3: * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types as its exported to the userspace [Aneesh] * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health from enum to #defines [Aneesh] v1..v2: * New patch in the series --- arch/powerpc/include/uapi/asm/papr_pdsm.h | 37 arch/powerpc/platforms/pseries/papr_scm.c | 51 +++ 2 files changed, 88 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h index 28115152aa4e..9ccecc1d6840 100644 --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h @@ -66,17 +66,54 @@ #define ND_PDSM_HDR_SIZE \ (sizeof(struct nd_pkg_pdsm) - ND_PDSM_PAYLOAD_MAX_SIZE) +/* Various nvdimm health indicators */ +#define PAPR_PDSM_DIMM_HEALTHY 0 +#define PAPR_PDSM_DIMM_UNHEALTHY 1 +#define PAPR_PDSM_DIMM_CRITICAL 2 +#define PAPR_PDSM_DIMM_FATAL 3 + +/* + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH + * Various flags indicate the health status of the dimm. + * + * extension_flags : Any extension fields present in the struct. + * dimm_unarmed: Dimm not armed. So contents wont persist. + * dimm_bad_shutdown : Previous shutdown did not persist contents. + * dimm_bad_restore: Contents from previous shutdown werent restored. + * dimm_scrubbed : Contents of the dimm have been scrubbed. + * dimm_locked : Contents of the dimm cant be modified until CEC reboot + * dimm_encrypted : Contents of dimm are encrypted. + * dimm_health
[PATCH v13 3/6] powerpc/papr_scm: Fetch nvdimm health information from PHYP
Implement support for fetching nvdimm health information via H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair of 64-bit bitmap, bitwise-and of which is then stored in 'struct papr_scm_priv' and subsequently partially exposed to user-space via newly introduced dimm specific attribute 'papr/flags'. Since the hcall is costly, the health information is cached and only re-queried, 60s after the previous successful hcall. The patch also adds a documentation text describing flags reported by the the new sysfs attribute 'papr/flags' is also introduced at Documentation/ABI/testing/sysfs-bus-papr-pmem. [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for PAPR hcalls") Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v12..v13: * None v11..v12: * None v10..v11: * None v9..v10: * Removed an avoidable 'goto' in __drc_pmem_query_health. [ Ira ]. Resend: * Added ack from Aneesh. v8..v9: * Rename some variables and defines to reduce usage of term SCM replacing it with PMEM [Dan Williams, Aneesh] * s/PAPR_SCM_DIMM/PAPR_PMEM/g * s/papr_scm_nd_attributes/papr_nd_attributes/g * s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g * s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g * Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem v7..v8: * Update type of variable 'rc' in __drc_pmem_query_health() and drc_pmem_query_health() to long and int respectively. [ Ira ] * Updated the patch description to s/64 bit Big Endian Number/64-bit bitmap/ [ Ira, Aneesh ]. Resend: * None v6..v7 : * Used the exported buf_seq_printf() function to generate content for 'papr/flags' * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c and removed the papr_scm.h file [Mpe] * Some minor consistency issued in sysfs-bus-papr-scm documentation. [Mpe] * s/dimm_mutex/health_mutex/g [Mpe] * Split drc_pmem_query_health() into two function one of which takes care of caching and locking. [Mpe] * Fixed a local copy creation of dimm health information using READ_ONCE(). [Mpe] v5..v6 : * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags' [Dan Williams] * Include documentation for 'papr/flags' attr [Dan Williams] * Change flag 'save_fail' to 'flush_fail' [Dan Williams] * Caching of health bitmap to reduce expensive hcalls [Dan Williams] * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe] * Replaced two __be64 integers from papr_scm_priv to a single u64 integer [Mpe] * Updated patch description to reflect the changes made in this version. * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from flags_show() [Dan Williams] v4..v5 : * None v3..v4 : * None v2..v3 : * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for NVDIMM unarmed [Aneesh] v1..v2 : * New patch in the series. --- Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++ arch/powerpc/platforms/pseries/papr_scm.c | 168 +- 2 files changed, 193 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem new file mode 100644 index ..5b10d036a8d4 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem @@ -0,0 +1,27 @@ +What: /sys/bus/nd/devices/nmemX/papr/flags +Date: Apr, 2020 +KernelVersion: v5.8 +Contact: linuxppc-dev , linux-nvd...@lists.01.org, +Description: + (RO) Report flags indicating various states of a + papr-pmem NVDIMM device. Each flag maps to a one or + more bits set in the dimm-health-bitmap retrieved in + response to H_SCM_HEALTH hcall. The details of the bit + flags returned in response to this hcall is available + at 'Documentation/powerpc/papr_hcalls.rst' . Below are + the flags reported in this sysfs file: + + * "not_armed" : Indicates that NVDIMM contents will not + survive a power cycle. + * "flush_fail" : Indicates that NVDIMM contents + couldn't be flushed during last + shut-down event. + * "restore_fail": Indicates that NVDIMM contents + couldn't be restored during NVDIMM + initialization. + * "encrypted" : NVDIMM contents are encrypted. + * "smart_notify": There is health event for the NVDIMM. + * "scrubbed": Indicating that contents of the + NVDIMM have been scrubbed. + * "locked" : Indicating that NVDIMM contents cant + be modified until next power cycle. diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/po
[PATCH v13 1/6] powerpc: Document details on H_SCM_HEALTH hcall
Add documentation to 'papr_hcalls.rst' describing the bitmap flags that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM specification. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Acked-by: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v12..v13: * None v11..v12: * None v10..v11: * None v9..v10: * Added ack from Ira. Resend: * None v8..v9: * s/SCM/PMEM device. [ Dan Williams, Aneesh ] v7..v8: * Added a clarification on bit-ordering of Health Bitmap Resend: * None v6..v7: * None v5..v6: * New patch in the series --- Documentation/powerpc/papr_hcalls.rst | 46 --- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst index 3493631a60f8..48fcf1255a33 100644 --- a/Documentation/powerpc/papr_hcalls.rst +++ b/Documentation/powerpc/papr_hcalls.rst @@ -220,13 +220,51 @@ from the LPAR memory. **H_SCM_HEALTH** | Input: drcIndex -| Out: *health-bitmap, health-bit-valid-bitmap* +| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* | Return Value: *H_Success, H_Parameter, H_Hardware* Given a DRC Index return the info on predictive failure and overall health of -the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive -failure and health-bit-valid-bitmap indicate which bits in health-bitmap are -valid. +the PMEM device. The asserted bits in the health-bitmap indicate one or more states +(described in table below) of the PMEM device and health-bit-valid-bitmap indicate +which bits in health-bitmap are valid. The bits are reported in +reverse bit ordering for example a value of 0xC400 +indicates bits 0, 1, and 5 are valid. + +Health Bitmap Flags: + ++--+---+ +| Bit | Definition | ++==+===+ +| 00 | PMEM device is unable to persist memory contents. | +| | If the system is powered down, nothing will be saved. | ++--+---+ +| 01 | PMEM device failed to persist memory contents. Either contents were | +| | not saved successfully on power down or were not restored properly on | +| | power up. | ++--+---+ +| 02 | PMEM device contents are persisted from previous IPL. The data from | +| | the last boot were successfully restored. | ++--+---+ +| 03 | PMEM device contents are not persisted from previous IPL. There was no| +| | data to restore from the last boot. | ++--+---+ +| 04 | PMEM device memory life remaining is critically low | ++--+---+ +| 05 | PMEM device will be garded off next IPL due to failure | ++--+---+ +| 06 | PMEM device contents cannot persist due to current platform health | +| | status. A hardware failure may prevent data from being saved or | +| | restored. | ++--+---+ +| 07 | PMEM device is unable to persist memory contents in certain conditions| ++--+---+ +| 08 | PMEM device is encrypted | ++--+---+ +| 09 | PMEM device has successfully completed a requested erase or secure | +| | erase procedure. | ++--+---+ +|10:63 | Reserved / Unused | ++--+---+ **H_SCM_PERFORMANCE_STATS** -- 2.26.2
[PATCH v13 0/6] powerpc/papr_scm: Add support for reporting nvdimm health
Changes since v12 [1]: * Fixed the clang warning regarding variable length object' being not at the end of the 'struct nd_pdsm_cmd_pkg' by introducing a new layout. * Removed instance of 'struct nd_cmd_pkg hdr' from 'struct nd_pdsm_cmd_pkg' and renamed the struct to 'struct nd_pkg_pdsm' to match the libndctl naming convention. * Introduced 'union nd_pdsm_payload' thats a maximal union of all possible payload structs and use it instead of having flexible 'payload' member 'struct nd_pdsm_cmd_pkg'. * Introduce pdsm descriptor 'struct pdsm_cmd_desc' and its array __pdsm_cmd_descriptors[] that holds a payload 'size_[in|out]' and service function for each pdsm. This is analogues to '__nd_cmd_dimm_descs[]' * Introduce function 'pdsm_cmd_desc()' to fetch the corresponding pdsm descriptor for each valid pdsm. * Updated papr_scm_service_pdsm() to use 'pdsm_cmd_desc()' and apply checks on 'nd_cmd_pkg' payload based on psdm descriptor members and finally service the pdsm using the 'service' member of the descriptor. * Updated Patch-5 that to use the updated 'struct nd_pkg_pdsm' definition. [1] https://lore.kernel.org/linux-nvdimm/20200608211026.67573-1-vaib...@linux.ibm.com --- The PAPR standard[2][4] provides mechanisms to query the health and performance stats of an NVDIMM via various hcalls as described in Ref[3]. Until now these stats were never available nor exposed to the user-space tools like 'ndctl'. This is partly due to PAPR platform not having support for ACPI and NFIT. Hence 'ndctl' is unable to query and report the dimm health status and a user had no way to determine the current health status of a NDVIMM. To overcome this limitation, this patch-set updates papr_scm kernel module to query and fetch NVDIMM health stats using hcalls described in Ref[3]. This health and performance stats are then exposed to userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by libndctl. These changes coupled with proposed ndtcl changes located at Ref[5] should provide a way for the user to retrieve NVDIMM health status using ndtcl. Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM in a emulation environment: # ndctl list -DH [ { "dev":"nmem0", "health":{ "health_state":"fatal", "shutdown_state":"dirty" } } ] Dimm health report output on a pseries guest lpar with vPMEM or HMS based NVDIMMs that are in perfectly healthy conditions: # ndctl list -d nmem0 -H [ { "dev":"nmem0", "health":{ "health_state":"ok", "shutdown_state":"clean" } } ] PAPR NVDIMM-Specific-Methods(PDSM) == PDSM requests are issued by vendor specific code in libndctl to execute certain operations or fetch information from NVDIMMS. PDSMs requests can be sent to papr_scm module via libndctl(userspace) and libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be handled in the dimm control function papr_scm_ndctl(). Current patchset proposes a single PDSM to retrieve NVDIMM health, defined in the newly introduced uapi header named 'papr_pdsm.h'. Support for more PDSMs will be added in future. Structure of the patch-set == The patch-set starts with a doc patch documenting details of hcall H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf() thats used in subsequent patches to generate sysfs attribute content. Third patch implements support for fetching NVDIMM health information from PHYP and partially exposing it to user-space via a NVDIMM sysfs flag. Fourth patch updates papr_scm_ndctl() to handle a possible error case and also improve debug logging. Fifth patch deals with implementing support for servicing PDSM commands in papr_scm module. Finally the last patch implements support for servicing PDSM 'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to libndctl. References: [2] "Power Architecture Platform Reference" https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference [3] commit 58b278f568f0 ("powerpc: Provide initial documentation for PAPR hcalls") [4] "Linux on Power Architecture Platform Reference" https://members.openpowerfoundation.org/document/dl/469 [5] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v13 --- Vaibhav Jain (6): powerpc: Document details on H_SCM_HEALTH hcall seq_buf: Export seq_buf_printf powerpc/papr_scm: Fetch nvdimm health information from PHYP powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl() ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++ Documentation/powerpc/papr_hcalls.rst | 46 +- arch/powerpc/include/uapi/asm/papr_pdsm.h | 132 ++ arch/powerpc/platforms/pseries/papr_scm.c | 420 +- include/uapi/linux/ndctl.h| 1 + lib/seq_buf.c
Re: [PATCH v2] SUNRPC: Add missing definition of ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
> On Jun 15, 2020, at 2:25 AM, Christophe Leroy > wrote: > > Even if that's only a warning, not including asm/cacheflush.h > leads to svc_flush_bvec() being empty allthough powerpc defines > ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE. > > CC net/sunrpc/svcsock.o > net/sunrpc/svcsock.c:227:5: warning: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is > not defined [-Wundef] > #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE > ^ > > Include linux/highmem.h so that asm/cacheflush.h will be included. > > Reported-by: Christophe Leroy > Reported-by: kernel test robot > Cc: Chuck Lever > Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()") > Signed-off-by: Christophe Leroy LGTM. Acked-by: Chuck Lever > --- > v2: Use linux/highmem.h instead of asm/cacheflush.sh > Signed-off-by: Christophe Leroy > --- > net/sunrpc/svcsock.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index 5c4ec9386f81..c537272f9c7e 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -44,6 +44,7 @@ > #include > #include > #include > +#include > #include > > #include > -- > 2.25.0 > -- Chuck Lever
[PATCH v12 6/6] powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
This patch implements support for PDSM request 'PAPR_PDSM_HEALTH' that returns a newly introduced 'struct nd_papr_pdsm_health' instance containing dimm health information back to user space in response to ND_CMD_CALL. This functionality is implemented in newly introduced papr_pdsm_health() that queries the nvdimm health information and then copies this information to the package payload whose layout is defined by 'struct nd_papr_pdsm_health'. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Reviewed-by: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v11..v12: * Minor: Reodered the initialization of 'struct nd_papr_pdsm_health' fields to match order present in its definition. [ Ira ] * Added ack from Ira v10..v11: * Changed the definition of 'struct nd_papr_pdsm_health' to a maximal struct 184 bytes in size [ Dan Williams ] * Added new field 'extension_flags' to 'struct nd_papr_pdsm_health' [ Dan Williams ] * Updated papr_pdsm_health() to set field 'extension_flags' to 0. * Introduced a define ND_PDSM_PAYLOAD_MAX_SIZE that indicates the maximum size of a payload. * Fixed a suspicious conversion from u64 to u8 in papr_pdsm_health that was preventing correct initialization of 'struct nd_papr_pdsm_health'. [ Ira ] v9..v10: * Removed code in papr_pdsm_health that performed validation on pdsm payload version and corrosponding struct and defines used for validation of payload version. * Dropped usage of struct papr_pdsm_health in 'struct papr_scm_priv'. Instead papr_psdm_health() now uses 'papr_scm_priv.health_bitmap' to populate the pdsm payload. * Above change also fixes the problem where this patch was removing the code that was previously introduced in this patch-series. [ Ira ] * Introduced a new def ND_PDSM_ENVELOPE_HDR_SIZE that indicates the space allocated to 'struct nd_pdsm_cmd_pkg' fields except 'struct nd_cmd_pkg'. This def is useful in validating payload sizes. * Reworked papr_pdsm_health() to enforce a specific payload size for 'PAPR_PDSM_HEALTH' pdsm request. Resend: * Added ack from Aneesh. v8..v9: * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ] * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g * Renamed papr_scm_get_health() to papr_psdm_health() * Updated patch description to replace papr-scm dimm with nvdimm. v7..v8: * None Resend: * None v6..v7: * Updated flags_show() to use seq_buf_printf(). [Mpe] * Updated papr_scm_get_health() to use newly introduced __drc_pmem_query_health() bypassing the cache [Mpe]. v5..v6: * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to gaurd against possibility of different compilers adding different paddings to the struct [ Dan Williams ] * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of 'bool' and also updated drc_pmem_query_health() to take this into account. [ Dan Williams ] v4..v5: * None v3..v4: * Call the DSM_PAPR_SCM_HEALTH service function from papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh] v2..v3: * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types as its exported to the userspace [Aneesh] * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health from enum to #defines [Aneesh] v1..v2: * New patch in the series --- arch/powerpc/include/uapi/asm/papr_pdsm.h | 43 ++ arch/powerpc/platforms/pseries/papr_scm.c | 71 +++ 2 files changed, 114 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h index 34d1a41d2406..d453baea13c4 100644 --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h @@ -70,13 +70,56 @@ struct nd_pdsm_cmd_pkg { __u8 payload[]; /* In/Out: Sub-cmd data buffer */ } __packed; +/* Calculate size used by the pdsm header fields minus 'struct nd_cmd_pkg' */ +#define ND_PDSM_HDR_SIZE \ + (sizeof(struct nd_pdsm_cmd_pkg) - sizeof(struct nd_cmd_pkg)) + +/* Max payload size that we can handle */ +#define ND_PDSM_PAYLOAD_MAX_SIZE 184 + /* * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct */ enum papr_pdsm { PAPR_PDSM_MIN = 0x0, + PAPR_PDSM_HEALTH, PAPR_PDSM_MAX, }; +/* Various nvdimm health indicators */ +#define PAPR_PDSM_DIMM_HEALTHY 0 +#define PAPR_PDSM_DIMM_UNHEALTHY 1 +#define PAPR_PDSM_DIMM_CRITICAL 2 +#define PAPR_PDSM_DIMM_FATAL 3 + +/* + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH + * Various flags indicate the health status of the dimm. + * + * extension_flags : Any extension fields present in the struct. + * dimm_unarmed: Dimm not armed. So contents wont persist. + * dimm_bad_shutdown : Previous shutdown did not persist contents. + * dimm_bad_restore: Contents from previous shutdown werent restored. + * dimm_scrubbe
[PATCH v12 5/6] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm module and add the command family NVDIMM_FAMILY_PAPR to the white list of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the nvdimm command mask and implement necessary scaffolding in the module to handle ND_CMD_CALL ioctl and PDSM requests that we receive. The layout of the PDSM request as we expect from libnvdimm/libndctl is described in newly introduced uapi header 'papr_pdsm.h' which defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used to communicate the PDSM request via member 'nd_cmd_pkg.nd_command' and size of payload that need to be sent/received for servicing the PDSM. A new function is_cmd_valid() is implemented that reads the args to papr_scm_ndctl() and performs sanity tests on them. A new function papr_scm_service_pdsm() is introduced and is called from papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL command from libnvdimm. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v11..v12: * Updated a misleading comment in 'papr_pdsm.h' regarding payload size. [ Ira ] v10..v11: * Moved in-lines 'nd_pdsm_cmd_pkg()' and 'pdsm_cmd_to_payload()' from 'papr_pdsm.h' header to 'papr_scm.c'. The avoids a potential license incompatibility issue with non-GPL-2.0 user-space code trying to include the header in its code. [ Ira ] * Verified papr_pdsm.h with UAPI_HEADER_TEST config. * Moved the is_cmd_valid() check in papr_scm_ndctl() before check for cmd_rc == NULL. This prevents cmd_rc to be updated in case the nd-cmd is invalid or unknown. v9..v10: * Simplified 'struct nd_pdsm_cmd_pkg' by removing the 'payload_version' field. * Removed the corrosponding documentation on versioning and backward compatibility from 'papr_pdsm.h' * Reduced the size of reserved fields to 4-bytes making 'struct nd_pdsm_cmd_pkg' 64 + 8 bytes long. * Updated is_cmd_valid() to enforce validation checks on pdsm commands. [ Dan Williams ] * Added check for reserved fields being set to '0' in is_cmd_valid() [ Ira ] * Moved changes for checking cmd_rc == NULL and logging improvements to a separate prelim patch [ Ira ]. * Moved pdsm package validation checks from papr_scm_service_pdsm() to is_cmd_valid(). * Marked papr_scm_service_pdsm() return type as 'void' since errors are reported in nd_pdsm_cmd_pkg.cmd_status field. Resend: * Added ack from Aneesh. v8..v9: * Reduced the usage of term SCM replacing it with appropriate replacement [ Dan Williams, Aneesh ] * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h' * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'. * Minor update to patch description. v7..v8: * Removed the 'payload_offset' field from 'struct nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ] * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg', 'reserved' field of 10-bytes is introduced. [ Aneesh ] * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h [ Ira ] Resend: * None v6..v7 : * Removed the re-definitions of __packed macro from papr_scm_pdsm.h [Mpe]. * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe]. * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h [Mpe]. * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe] v5..v6 : * Changed the usage of the term DSM to PDSM to distinguish it from the ACPI term [ Dan Williams ] * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct to reflect the new terminology. * Updated the patch description and title to reflect the new terminology. * Squashed patch to introduce new command family in 'ndctl.h' with this patch [ Dan Williams ] * Updated the papr_scm_pdsm method starting index from 0x1 to 0x0 [ Dan Williams ] * Removed redundant license text from the papr_scm_psdm.h file. [ Dan Williams ] * s/envelop/envelope/ at various places [ Dan Williams ] * Added '__packed' attribute to command package header to gaurd against different compiler adding paddings between the fields. [ Dan Williams] * Converted various pr_debug to dev_debug [ Dan Williams ] v4..v5 : * None v3..v4 : * None v2..v3 : * Updated the patch prefix to 'ndctl/uapi' [Aneesh] v1..v2 : * None --- arch/powerpc/include/uapi/asm/papr_pdsm.h | 82 ++ arch/powerpc/platforms/pseries/papr_scm.c | 126 +- include/uapi/linux/ndctl.h| 1 + 3 files changed, 205 insertions(+), 4 deletions(-) create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h new file mode 100644 index ..34d1a41d2406 --- /dev/null +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h @@ -0,0 +1,82 @@ +/*
[PATCH v12 4/6] powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
Since papr_scm_ndctl() can be called from outside papr_scm, its exposed to the possibility of receiving NULL as value of 'cmd_rc' argument. This patch updates papr_scm_ndctl() to protect against such possibility by assigning it pointer to a local variable in case cmd_rc == NULL. Finally the patch also updates the 'default' add a debug log unknown 'cmd' values. Cc: "Aneesh Kumar K . V" Cc: Dan Williams Cc: Michael Ellerman Cc: Ira Weiny Reviewed-by: Ira Weiny Signed-off-by: Vaibhav Jain --- Changelog: v11..v12: * Added ack from Ira v10..v11: * Instead of returning *cmd_rd just return '0' in case nd_cmd is handled. In case of unknown nd-cmd return -EINVAL [ Ira and Dan Williams ] * Updated patch description. v9..v10 * New patch in the series --- arch/powerpc/platforms/pseries/papr_scm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 0c091622b15e..692ad3d79826 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -355,11 +355,16 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, { struct nd_cmd_get_config_size *get_size_hdr; struct papr_scm_priv *p; + int rc; /* Only dimm-specific calls are supported atm */ if (!nvdimm) return -EINVAL; + /* Use a local variable in case cmd_rc pointer is NULL */ + if (!cmd_rc) + cmd_rc = &rc; + p = nvdimm_provider_data(nvdimm); switch (cmd) { @@ -381,6 +386,7 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc, break; default: + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd); return -EINVAL; } -- 2.26.2