date:20200924

[rcu:dev.2020.09.22a] BUILD SUCCESS 1d39c91bee0e9fcd601235404d60e704fd020ff6

2020-09-24 Thread kernel test robot

tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git  
dev.2020.09.22a
branch HEAD: 1d39c91bee0e9fcd601235404d60e704fd020ff6  torture: Allow 
alternative forms of kvm.sh command-line arguments

elapsed time: 768m

configs tested: 135
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm  allyesconfig
arm  allmodconfig
arm64allyesconfig
arm64   defconfig
sh  rsk7203_defconfig
nios2 10m50_defconfig
powerpc   mpc834x_itxgp_defconfig
m68kmvme16x_defconfig
powerpc   holly_defconfig
mips  maltasmvp_eva_defconfig
m68k apollo_defconfig
mips  bmips_stb_defconfig
arc nsimosci_hs_defconfig
powerpc  tqm8xx_defconfig
arm bcm2835_defconfig
arm   stm32_defconfig
arm  collie_defconfig
mips cobalt_defconfig
armclps711x_defconfig
powerpc  ppc64e_defconfig
xtensa  audio_kc705_defconfig
h8300alldefconfig
armtrizeps4_defconfig
powerpc mpc85xx_cds_defconfig
mips rt305x_defconfig
arm   mainstone_defconfig
powerpc64   defconfig
xtensa  defconfig
m68kstmark2_defconfig
powerpc   motionpro_defconfig
sh  kfr2r09_defconfig
powerpc linkstation_defconfig
powerpc sbc8548_defconfig
sh shx3_defconfig
arm   efm32_defconfig
powerpc akebono_defconfig
sh apsh4a3a_defconfig
sh   se7712_defconfig
shsh7757lcr_defconfig
powerpcadder875_defconfig
arc  axs103_smp_defconfig
m68k  amiga_defconfig
powerpcmpc7448_hpc2_defconfig
powerpc tqm8540_defconfig
sh  rts7751r2d1_defconfig
riscvnommu_k210_defconfig
riscvallyesconfig
mipsbcm47xx_defconfig
mips tb0226_defconfig
mipsworkpad_defconfig
powerpc ep8248e_defconfig
mips   capcella_defconfig
riscv  rv32_defconfig
powerpc ppa8548_defconfig
x86_64  defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20200924
i386 randconfig-a006-20200924
i386 randconfig-a003-20200924
i386 randconfig-a004-20200924
i386 randconfig-a005-20200924
i386 randconfig-a001-20200924
i386 randconfig-a002-20200923
i386 randconfig-a006-20200923
i386 randconfig-a003-20200923
i386 randconfig-a004-20200923
i386 randconfig-a005-20200923
i386 randconfig-a001

Re: [PATCH] tpm: of: avoid __va() translation for event log address

2020-09-24 Thread Jarkko Sakkinen

On Tue, Sep 22, 2020 at 11:41:28AM +0200, Ard Biesheuvel wrote:
> The TPM event log is provided to the OS by the firmware, by loading
> it into an area in memory and passing the physical address via a node
> in the device tree.
> 
> Currently, we use __va() to access the memory via the kernel's linear
> map: however, it is not guaranteed that the linear map covers this
> particular address, as we may be running under HIGHMEM on a 32-bit
> architecture, or running firmware that uses a memory type for the
> event log that is omitted from the linear map (such as EfiReserved).

Makes perfect sense to the level that I wonder if this should have a
fixes tag and/or needs to be backported to the stable kernels?

> So instead, use memremap(), which will reuse the linear mapping if
> it is valid, or create another mapping otherwise.
> 
> Cc: Peter Huewe 
> Cc: Jarkko Sakkinen 
> Cc: Jason Gunthorpe 
> Signed-off-by: Ard Biesheuvel 
> ---
>  drivers/char/tpm/eventlog/of.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/char/tpm/eventlog/of.c b/drivers/char/tpm/eventlog/of.c
> index a9ce66d09a75..9178547589a3 100644
> --- a/drivers/char/tpm/eventlog/of.c
> +++ b/drivers/char/tpm/eventlog/of.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -25,6 +26,7 @@ int tpm_read_log_of(struct tpm_chip *chip)
>   struct tpm_bios_log *log;
>   u32 size;
>   u64 base;
> + void *p;

I'd just use 'ptr' for readability sake.

>   log = >log;
>   if (chip->dev.parent && chip->dev.parent->of_node)
> @@ -65,7 +67,11 @@ int tpm_read_log_of(struct tpm_chip *chip)
>   return -EIO;
>   }
>  
> - log->bios_event_log = kmemdup(__va(base), size, GFP_KERNEL);
> + p = memremap(base, size, MEMREMAP_WB);
> + if (!p)
> + return -ENOMEM;
> + log->bios_event_log = kmemdup(p, size, GFP_KERNEL);
> + memunmap(p);
>   if (!log->bios_event_log)
>   return -ENOMEM;
>  
> -- 
> 2.17.1
> 

This is a really great catch!

I'm a bit late of my PR a bit because of SGX upstreaming madness
(sending v39 soon). If you can answer to my question above, I can do
that nitpick change to patch and get it to my v5.10 PR.

PS. Just so that you know, once I've applied it, it will be available
here:

git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git

I'll include MAINTAINERS update to that PR.

/Jarkko

Re: [PATCH v1 0/6] seccomp: Implement constant action bitmaps

2020-09-24 Thread Rasmus Villemoes

On 24/09/2020 15.58, YiFei Zhu wrote:
> On Thu, Sep 24, 2020 at 8:46 AM Rasmus Villemoes
>  wrote:
>> But one thing I'm wondering about and I haven't seen addressed anywhere:
>> Why build the bitmap on the kernel side (with all the complexity of
>> having to emulate the filter for all syscalls)? Why can't userspace just
>> hand the kernel "here's a new filter: the syscalls in this bitmap are
>> always allowed noquestionsasked, for the rest, run this bpf". Sure, that
>> might require a new syscall or extending seccomp(2) somewhat, but isn't
>> that a _lot_ simpler? It would probably also mean that the bpf we do get
>> handed is a lot smaller. Userspace might need to pass a couple of
>> bitmaps, one for each relevant arch, but you get the overall idea.
> 
> Perhaps. The thing is, the current API expects any filter attaches to
> be "additive". If a new filter gets attached that says "disallow read"
> then no matter whatever has been attached already, "read" shall not be
> allowed at the next syscall, bypassing all previous allowlist bitmaps
> (so you need to emulate the bpf anyways here?). We should also not
> have a API that could let anyone escape the secomp jail. Say "prctl"
> is permitted but "read" is not permitted, one must not be allowed to
> attach a bitmap so that "read" now appears in the allowlist. The only
> way this could potentially work is to attach a BPF filter and a bitmap
> at the same time in the same syscall, which might mean API redesign?

Yes, the man page would read something like

   SECCOMP_SET_MODE_FILTER_BITMAP
  The system calls allowed are defined by a pointer to a
Berkeley Packet Filter (BPF) passed  via  args.
  This argument is a pointer to a struct sock_fprog_bitmap;

with that struct containing whatever information/extra pointers needed
for passing the bitmap(s) in addition to the bpf prog.

And SECCOMP_SET_MODE_FILTER would internally just be updated to work
as-if all-zero allow-bitmaps were passed along. The internal kernel
bitmap would just be the and of the bitmaps in the filter stack.

Sure, it's UAPI, so would certainly need more careful thought on details
of just how the arg struct looks like etc. etc., but I was wondering why
it hadn't been discussed at all.

>> I'm also a bit worried about the performance of doing that emulation;
>> that's constant extra overhead for, say, launching a docker container.
> 
> IMO, launching a docker container is so expensive this should be negligible.

Regardless, I'd like to see some numbers, certainly for the "how much
faster does a getpid() or read() or any of the other syscalls that
nobody disallows" get, but also "what's the cost of doing that emulation
at seccomp(2) time".

Rasmus

Re: [PATCH] tpm: of: avoid __va() translation for event log address

2020-09-24 Thread Jarkko Sakkinen

On Fri, Sep 25, 2020 at 08:56:30AM +0300, Jarkko Sakkinen wrote:
> On Tue, Sep 22, 2020 at 11:41:28AM +0200, Ard Biesheuvel wrote:
> > The TPM event log is provided to the OS by the firmware, by loading
> > it into an area in memory and passing the physical address via a node
> > in the device tree.
> > 
> > Currently, we use __va() to access the memory via the kernel's linear
> > map: however, it is not guaranteed that the linear map covers this
> > particular address, as we may be running under HIGHMEM on a 32-bit
> > architecture, or running firmware that uses a memory type for the
> > event log that is omitted from the linear map (such as EfiReserved).
> 
> Makes perfect sense to the level that I wonder if this should have a
> fixes tag and/or needs to be backported to the stable kernels?
> 
> > So instead, use memremap(), which will reuse the linear mapping if
> > it is valid, or create another mapping otherwise.
> > 
> > Cc: Peter Huewe 
> > Cc: Jarkko Sakkinen 
> > Cc: Jason Gunthorpe 
> > Signed-off-by: Ard Biesheuvel 
> > ---
> >  drivers/char/tpm/eventlog/of.c | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/char/tpm/eventlog/of.c b/drivers/char/tpm/eventlog/of.c
> > index a9ce66d09a75..9178547589a3 100644
> > --- a/drivers/char/tpm/eventlog/of.c
> > +++ b/drivers/char/tpm/eventlog/of.c
> > @@ -11,6 +11,7 @@
> >   */
> >  
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -25,6 +26,7 @@ int tpm_read_log_of(struct tpm_chip *chip)
> > struct tpm_bios_log *log;
> > u32 size;
> > u64 base;
> > +   void *p;
> 
> I'd just use 'ptr' for readability sake.
> 
> > log = >log;
> > if (chip->dev.parent && chip->dev.parent->of_node)
> > @@ -65,7 +67,11 @@ int tpm_read_log_of(struct tpm_chip *chip)
> > return -EIO;
> > }
> >  
> > -   log->bios_event_log = kmemdup(__va(base), size, GFP_KERNEL);
> > +   p = memremap(base, size, MEMREMAP_WB);
> > +   if (!p)
> > +   return -ENOMEM;
> > +   log->bios_event_log = kmemdup(p, size, GFP_KERNEL);
> > +   memunmap(p);
> > if (!log->bios_event_log)
> > return -ENOMEM;
> >  
> > -- 
> > 2.17.1
> > 
> 
> This is a really great catch!
> 
> I'm a bit late of my PR a bit because of SGX upstreaming madness
> (sending v39 soon). If you can answer to my question above, I can do
> that nitpick change to patch and get it to my v5.10 PR.
> 
> PS. Just so that you know, once I've applied it, it will be available
> here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git
> 
> I'll include MAINTAINERS update to that PR.

Forgot this:

Reviewed-by: Jarkko Sakkinen 

/Jarkko

Re: [PATCH] rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config

2020-09-24 Thread Neeraj Upadhyay


Hi Paul,

On 9/25/2020 4:29 AM, Paul E. McKenney wrote:

On Thu, Sep 24, 2020 at 12:04:10PM +0530, Neeraj Upadhyay wrote:

Clarify the "x" in rcuox/N naming in RCU_NOCB_CPU config
description.

Signed-off-by: Neeraj Upadhyay 


Applied with a few additional updates as shown below.  As always, please
let me know if I messed anything up.



Looks good! thanks!


Thanks
Neeraj


Thanx, Paul



commit 8d1d776b4998896a6f8f4608edb0b258bd37ec9f
Author: Neeraj Upadhyay 
Date:   Thu Sep 24 12:04:10 2020 +0530

 rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config
 
 This commit clarifies that the "p" and the "s" in the in the RCU_NOCB_CPU

 config-option description refer to the "x" in the "rcuox/N" kthread name.
 
 Signed-off-by: Neeraj Upadhyay 

 [ paulmck: While in the area, update description and advice. ]
 Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index b71e21f..cdc57b4 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -221,19 +221,23 @@ config RCU_NOCB_CPU
  Use this option to reduce OS jitter for aggressive HPC or
  real-time workloads.  It can also be used to offload RCU
  callback invocation to energy-efficient CPUs in battery-powered
- asymmetric multiprocessors.
+ asymmetric multiprocessors.  The price of this reduced jitter
+ is that the overhead of call_rcu() increases and that some
+ workloads will incur significant increases in context-switch
+ rates.
  
  	  This option offloads callback invocation from the set of CPUs

  specified at boot time by the rcu_nocbs parameter.  For each
  such CPU, a kthread ("rcuox/N") will be created to invoke
  callbacks, where the "N" is the CPU being offloaded, and where
- the "p" for RCU-preempt (PREEMPTION kernels) and "s" for RCU-sched
- (!PREEMPTION kernels).  Nothing prevents this kthread from running
- on the specified CPUs, but (1) the kthreads may be preempted
- between each callback, and (2) affinity or cgroups can be used
- to force the kthreads to run on whatever set of CPUs is desired.
-
- Say Y here if you want to help to debug reduced OS jitter.
+ the "x" is "p" for RCU-preempt (PREEMPTION kernels) and "s" for
+ RCU-sched (!PREEMPTION kernels).  Nothing prevents this kthread
+ from running on the specified CPUs, but (1) the kthreads may be
+ preempted between each callback, and (2) affinity or cgroups can
+ be used to force the kthreads to run on whatever set of CPUs is
+ desired.
+
+ Say Y here if you need reduced OS jitter, despite added overhead.
  Say N here if you are unsure.
  
  config TASKS_TRACE_RCU_READ_MB




--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

Re: [PATCH v3 5/7] rtc: New driver for RTC in Netronix embedded controller

2020-09-24 Thread Uwe Kleine-König

Hello Jonathan,

On Thu, Sep 24, 2020 at 09:24:53PM +0200, Jonathan Neuschäfer wrote:
> +#define NTXEC_REG_WRITE_YEAR 0x10
> +#define NTXEC_REG_WRITE_MONTH0x11
> +#define NTXEC_REG_WRITE_DAY  0x12
> +#define NTXEC_REG_WRITE_HOUR 0x13
> +#define NTXEC_REG_WRITE_MINUTE   0x14
> +#define NTXEC_REG_WRITE_SECOND   0x15
> +
> +#define NTXEC_REG_READ_YM0x20
> +#define NTXEC_REG_READ_DH0x21
> +#define NTXEC_REG_READ_MS0x23

Is this an official naming? I think at least ..._MS is a poor name.
Maybe consider ..._MINSEC instead and make the other two names a bit longer
for consistency?

> +static int ntxec_read_time(struct device *dev, struct rtc_time *tm)
> +{
> + struct ntxec_rtc *rtc = dev_get_drvdata(dev);
> + unsigned int value;
> + int res;
> +
> + res = regmap_read(rtc->ec->regmap, NTXEC_REG_READ_YM, );
> + if (res < 0)
> + return res;
> +
> + tm->tm_year = (value >> 8) + 100;
> + tm->tm_mon = (value & 0xff) - 1;
> +
> + res = regmap_read(rtc->ec->regmap, NTXEC_REG_READ_DH, );
> + if (res < 0)
> + return res;
> +
> + tm->tm_mday = value >> 8;
> + tm->tm_hour = value & 0xff;
> +
> + res = regmap_read(rtc->ec->regmap, NTXEC_REG_READ_MS, );
> + if (res < 0)
> + return res;
> +
> + tm->tm_min = value >> 8;
> + tm->tm_sec = value & 0xff;
> +
> + return 0;
> +}
> +
> +static int ntxec_set_time(struct device *dev, struct rtc_time *tm)
> +{
> + struct ntxec_rtc *rtc = dev_get_drvdata(dev);
> + int res = 0;
> +
> + res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_YEAR, 
> ntxec_reg8(tm->tm_year - 100));
> + if (res)
> + return res;
> +
> + res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_MONTH, 
> ntxec_reg8(tm->tm_mon + 1));
> + if (res)
> + return res;
> +
> + res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_DAY, 
> ntxec_reg8(tm->tm_mday));
> + if (res)
> + return res;
> +
> + res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_HOUR, 
> ntxec_reg8(tm->tm_hour));
> + if (res)
> + return res;
> +
> + res = regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_MINUTE, 
> ntxec_reg8(tm->tm_min));
> + if (res)
> + return res;
> +
> + return regmap_write(rtc->ec->regmap, NTXEC_REG_WRITE_SECOND, 
> ntxec_reg8(tm->tm_sec));

I wonder: Is this racy? If you write minute, does the seconds reset to
zero or something like that? Or can it happen, that after writing the
minute register and before writing the second register the seconds
overflow and you end up with the time set to a minute later than
intended? If so it might be worth to set the seconds to 0 at the start
of the function (with an explaining comment).

.read_time has a similar race. What happens if minutes overflow between
reading NTXEC_REG_READ_DH and NTXEC_REG_READ_MS?

> +static struct platform_driver ntxec_rtc_driver = {
> + .driver = {
> + .name = "ntxec-rtc",
> + },
> + .probe = ntxec_rtc_probe,

No .remove function?

> +};
> +module_platform_driver(ntxec_rtc_driver);
> +
> +MODULE_AUTHOR("Jonathan Neuschäfer ");
> +MODULE_DESCRIPTION("RTC driver for Netronix EC");
> +MODULE_LICENSE("GPL");

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature

Re: [PATCH 05/13] x86: Add early TPM1.2/TPM2.0 interface support for Secure Launch

2020-09-24 Thread Jarkko Sakkinen

On Thu, Sep 24, 2020 at 10:58:33AM -0400, Ross Philipson wrote:
> From: "Daniel P. Smith" 
> 
> This commit introduces an abstraction for TPM1.2 and TPM2.0 devices
> above the TPM hardware interface.
> 
> Signed-off-by: Daniel P. Smith 
> Signed-off-by: Ross Philipson 

This is way, way too PoC. I wonder why there is no RFC tag.

Please also read section 2 of

https://www.kernel.org/doc/html/v5.8/process/submitting-patches.html

You should leverage existing TPM code in a way or another. Refine it so
that it scales for your purpose and then compile it into your thing
(just include the necesary C-files with relative paths).

How it is now is never going to fly.

/Jarkko

Re: [PATCH v3] i2c: imx: Fix external abort on interrupt in exit paths

2020-09-24 Thread Oleksij Rempel

On Sun, Sep 20, 2020 at 11:12:38PM +0200, Krzysztof Kozlowski wrote:
> If interrupt comes late, during probe error path or device remove (could
> be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler
> i2c_imx_isr() will access registers with the clock being disabled.  This
> leads to external abort on non-linefetch on Toradex Colibri VF50 module
> (with Vybrid VF5xx):
> 
> Unhandled fault: external abort on non-linefetch (0x1008) at 0x8882d003
> Internal error: : 1008 [#1] ARM
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper Not tainted 5.7.0 #607
> Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
>   (i2c_imx_isr) from [<8017009c>] (free_irq+0x25c/0x3b0)
>   (free_irq) from [<805844ec>] (release_nodes+0x178/0x284)
>   (release_nodes) from [<80580030>] (really_probe+0x10c/0x348)
>   (really_probe) from [<80580380>] (driver_probe_device+0x60/0x170)
>   (driver_probe_device) from [<80580630>] (device_driver_attach+0x58/0x60)
>   (device_driver_attach) from [<805806bc>] (__driver_attach+0x84/0xc0)
>   (__driver_attach) from [<8057e228>] (bus_for_each_dev+0x68/0xb4)
>   (bus_for_each_dev) from [<8057f3ec>] (bus_add_driver+0x144/0x1ec)
>   (bus_add_driver) from [<80581320>] (driver_register+0x78/0x110)
>   (driver_register) from [<8010213c>] (do_one_initcall+0xa8/0x2f4)
>   (do_one_initcall) from [<80c0100c>] (kernel_init_freeable+0x178/0x1dc)
>   (kernel_init_freeable) from [<80807048>] (kernel_init+0x8/0x110)
>   (kernel_init) from [<80100114>] (ret_from_fork+0x14/0x20)
> 
> Additionally, the i2c_imx_isr() could wake up the wait queue
> (imx_i2c_struct->queue) before its initialization happens.
> 
> The resource-managed framework should not be used for interrupt handling,
> because the resource will be released too late - after disabling clocks.
> The interrupt handler is not prepared for such case.
> 
> Fixes: 1c4b6c3bcf30 ("i2c: imx: implement bus recovery")
> Cc: 
> Signed-off-by: Krzysztof Kozlowski 

Acked-by: Oleksij Rempel 

Thank you!

> ---
> 
> Changes since v2:
> 1. Rebase
> 
> Changes since v1:
> 1. Remove the devm- and use regular methods.
> ---
>  drivers/i2c/busses/i2c-imx.c | 24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
> index 63f4367c312b..c98529c76348 100644
> --- a/drivers/i2c/busses/i2c-imx.c
> +++ b/drivers/i2c/busses/i2c-imx.c
> @@ -1169,14 +1169,6 @@ static int i2c_imx_probe(struct platform_device *pdev)
>   return ret;
>   }
>  
> - /* Request IRQ */
> - ret = devm_request_irq(>dev, irq, i2c_imx_isr, IRQF_SHARED,
> - pdev->name, i2c_imx);
> - if (ret) {
> - dev_err(>dev, "can't claim irq %d\n", irq);
> - goto clk_disable;
> - }
> -
>   /* Init queue */
>   init_waitqueue_head(_imx->queue);
>  
> @@ -1195,6 +1187,14 @@ static int i2c_imx_probe(struct platform_device *pdev)
>   if (ret < 0)
>   goto rpm_disable;
>  
> + /* Request IRQ */
> + ret = request_threaded_irq(irq, i2c_imx_isr, NULL, IRQF_SHARED,
> +pdev->name, i2c_imx);
> + if (ret) {
> + dev_err(>dev, "can't claim irq %d\n", irq);
> + goto rpm_disable;
> + }
> +
>   /* Set up clock divider */
>   i2c_imx->bitrate = I2C_MAX_STANDARD_MODE_FREQ;
>   ret = of_property_read_u32(pdev->dev.of_node,
> @@ -1237,13 +1237,12 @@ static int i2c_imx_probe(struct platform_device *pdev)
>  
>  clk_notifier_unregister:
>   clk_notifier_unregister(i2c_imx->clk, _imx->clk_change_nb);
> + free_irq(irq, i2c_imx);
>  rpm_disable:
>   pm_runtime_put_noidle(>dev);
>   pm_runtime_disable(>dev);
>   pm_runtime_set_suspended(>dev);
>   pm_runtime_dont_use_autosuspend(>dev);
> -
> -clk_disable:
>   clk_disable_unprepare(i2c_imx->clk);
>   return ret;
>  }
> @@ -1251,7 +1250,7 @@ static int i2c_imx_probe(struct platform_device *pdev)
>  static int i2c_imx_remove(struct platform_device *pdev)
>  {
>   struct imx_i2c_struct *i2c_imx = platform_get_drvdata(pdev);
> - int ret;
> + int irq, ret;
>  
>   ret = pm_runtime_get_sync(>dev);
>   if (ret < 0)
> @@ -1271,6 +1270,9 @@ static int i2c_imx_remove(struct platform_device *pdev)
>   imx_i2c_write_reg(0, i2c_imx, IMX_I2C_I2SR);
>  
>   clk_notifier_unregister(i2c_imx->clk, _imx->clk_change_nb);
> + irq = platform_get_irq(pdev, 0);
> + if (irq >= 0)
> + free_irq(irq, i2c_imx);
>   clk_disable_unprepare(i2c_imx->clk);
>  
>   pm_runtime_put_noidle(>dev);
> -- 
> 2.17.1
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Pengutronix e.K.   |

linux-next: manual merge of the tip tree with the pci tree

2020-09-24 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  arch/x86/kernel/apic/msi.c

between commit:

  2705b8e4d46f ("x86/apic/msi: Use Real PCI DMA device when configuring IRTE")

from the pci tree and commit:

  7ca435cf857d ("x86/irq: Cleanup the arch_*_msi_irqs() leftovers")

from the tip tree.

I fixed it up (the latter removed the code updated by the former, so I
did that) and can carry the fix as necessary. This is now fixed as far as
linux-next is concerned, but any non trivial conflicts should be mentioned
to your upstream maintainer when your tree is submitted for merging.
You may also want to consider cooperating with the maintainer of the
conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpkXEBoOYuvM.pgp
Description: OpenPGP digital signature

Re: [PATCH v9 08/20] gpiolib: cdev: support GPIO_V2_GET_LINEINFO_IOCTL and GPIO_V2_GET_LINEINFO_WATCH_IOCTL

2020-09-24 Thread Kent Gibson

On Wed, Sep 23, 2020 at 06:41:45PM +0300, Andy Shevchenko wrote:
> On Tue, Sep 22, 2020 at 5:35 AM Kent Gibson  wrote:
> >
> > Add support for GPIO_V2_GET_LINEINFO_IOCTL and
> > GPIO_V2_GET_LINEINFO_WATCH_IOCTL.
> >
> > +#ifdef CONFIG_GPIO_CDEV_V1
> > +static int lineinfo_ensure_abi_version(struct gpio_chardev_data *cdata,
> > +  unsigned int version)
> > +{
> 
> > +   int abiv = atomic_read(>watch_abi_version);
> > +
> > +   if (abiv == 0) {
> 
> > +   atomic_cmpxchg(>watch_abi_version, 0, version);
> > +   abiv = atomic_read(>watch_abi_version);
> 
> atomic_cmpxchng() returns a value.
> Also there are no barriers here...
> 
> > +   }
> > +   if (abiv != version)
> > +   return -EPERM;
> 
> I'm not sure I understand why this is atomic.
> 
> Also this seems to be racy if cdata changed in background.
> 
> Shouldn't be rather
> 
> if (atomic_cmpxchg() == 0) {
>   if (atomic_read() != version)
> return ...;
> }
> 

Perhaps this is what you had in mind:

static bool lineinfo_is_abi_version(struct gpio_chardev_data *cdata,
   unsigned int version)
{
int abiv = atomic_cmpxchg(>watch_abi_version, 0, version);

if (abiv == version)
return true;
if (abiv == 0)
/* first (and only) setter sees initial 0 value */
return true;

return false;
}

That is functionally equivalent, but slightly shorter.

I was thinking the atomic_cmpxchg() is more expensive than an
atomic_read(), and so initially checked the value with
that, but for the cmp failure case it is probably the same, or at least
not worth the bother - this isn't even vaguely near a hot path.

I've also switched the return value to bool in response to your other
comments.

Cheers,
Kent.

Re: [PATCH 00/13] x86: Trenchboot secure dynamic launch Linux kernel support

2020-09-24 Thread Jarkko Sakkinen

On Thu, Sep 24, 2020 at 10:58:28AM -0400, Ross Philipson wrote:
> The Trenchboot project focus on boot security has led to the enabling of
> the Linux kernel to be directly invocable by the x86 Dynamic Launch
> instruction(s) for establishing a Dynamic Root of Trust for Measurement
> (DRTM). The dynamic launch will be initiated by a boot loader with

What is "the dynamic launch"?

> associated support added to it, for example the first targeted boot
> loader will be GRUB2. An integral part of establishing the DRTM involves
> measuring everything that is intended to be run (kernel image, initrd,
> etc) and everything that will configure that kernel to run (command
> line, boot params, etc) into specific PCRs, the DRTM PCRs (17-22), in
> the TPM. Another key aspect is the dynamic launch is rooted in hardware,
> that is to say the hardware (CPU) is what takes the first measurement
> for the chain of integrity measurements. On Intel this is done using
> the GETSEC instruction provided by Intel's TXT and the SKINIT
> instruction provided by AMD's AMD-V. Information on these technologies
> can be readily found online. This patchset introduces Intel TXT support.

Why not both Intel and AMD? You should explain this in the cover letter.

I'd be more motivated to review and test a full all encompassing x86
solution. It would increase the patch set size but would also give it
a better test coverage, which I think would be a huge plus in such a
complex patch set.

> To enable the kernel to be launched by GETSEC, a stub must be built
> into the setup section of the compressed kernel to handle the specific
> state that the dynamic launch process leaves the BSP in. This is
> analogous to the EFI stub that is found in the same area. Also this stub

How is it analogous?

> must measure everything that is going to be used as early as possible.
> This stub code and subsequent code must also deal with the specific
> state that the dynamic launch leaves the APs in.

What is "the specific state"?

> A quick note on terminology. The larger open source project itself is
> called Trenchboot, which is hosted on Github (links below). The kernel
> feature enabling the use of the x86 technology is referred to as "Secure
> Launch" within the kernel code. As such the prefixes sl_/SL_ or
> slaunch/SLAUNCH will be seen in the code. The stub code discussed above
> is referred to as the SL stub.

Is this only for Trenchboot? I'm a bit lost. What is it anyway?

> The basic flow is:
> 
>  - Entry from the dynamic launch jumps to the SL stub
>  - SL stub fixes up the world on the BSP

What is "SL"?

>  - For TXT, SL stub wakes the APs, fixes up their worlds
>  - For TXT, APs are left halted waiting for an NMI to wake them
>  - SL stub jumps to startup_32
>  - SL main runs to measure configuration and module information into the
>DRTM PCRs. It also locates the TPM event log.
>  - Kernel boot proceeds normally from this point.
>  - During early setup, slaunch_setup() runs to finish some validation
>and setup tasks.

What are "some" validation and setup tasks?

>  - The SMP bringup code is modified to wake the waiting APs. APs vector
>to rmpiggy and start up normally from that point.
>  - Kernel boot finishes booting normally
>  - SL securityfs module is present to allow reading and writing of the
>TPM event log.

What is SL securityfs module? Why is it needed? We already have
securityfs file for the event log. Why it needs to be writable?

>  - SEXIT support to leave SMX mode is present on the kexec path and
>the various reboot paths (poweroff, reset, halt).

What SEXIT do and why it is required on the kexec path?

/Jarkko

linux-next: manual merge of the tip tree with the vfs tree

2020-09-24 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got conflicts in:

  arch/ia64/Kconfig
  arch/s390/Kconfig

between commit:

  5e6e9852d6f7 ("uaccess: add infrastructure for kernel builds with set_fs()")

from the vfs tree and commit:

  077ee78e3928 ("PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/ia64/Kconfig
index 3414e67229b3,7ff5b3bbf160..
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@@ -55,7 -56,7 +55,8 @@@ config IA6
select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
select NUMA if !FLATMEM
+   select PCI_MSI_ARCH_FALLBACKS
 +  select SET_FS
default y
help
  The Itanium Processor Family is Intel's 64-bit successor to
diff --cc arch/s390/Kconfig
index dde501bc6304,63dd5a0aa252..
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@@ -190,7 -185,7 +190,8 @@@ config S39
select OLD_SIGSUSPEND3
select PCI_DOMAINS  if PCI
select PCI_MSI  if PCI
+   select PCI_MSI_ARCH_FALLBACKS
 +  select SET_FS
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK


pgpXBdqE61P9f.pgp
Description: OpenPGP digital signature

linux-next: manual merge of the tip tree with the kbuild tree

2020-09-24 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  arch/arm/Makefile

between commit:

  596b0474d3d9 ("kbuild: preprocess module linker script")

from the kbuild tree and commit:

  5a17850e251a ("arm/build: Warn on orphan section placement")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm/Makefile
index e15f76ca2887,e589da3c8949..
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@@ -16,6 -16,14 +16,10 @@@ LDFLAGS_vmlinux += --be
  KBUILD_LDFLAGS_MODULE += --be8
  endif
  
+ # We never want expected sections to be placed heuristically by the
+ # linker. All sections should be explicitly named in the linker script.
+ LDFLAGS_vmlinux += $(call ld-option, --orphan-handling=warn)
+ 
 -ifeq ($(CONFIG_ARM_MODULE_PLTS),y)
 -KBUILD_LDS_MODULE += $(srctree)/arch/arm/kernel/module.lds
 -endif
 -
  GZFLAGS   :=-9
  #KBUILD_CFLAGS+=-pipe
  


pgpCuPdXMNPOx.pgp
Description: OpenPGP digital signature

Re: [PATCH v2 1/1] kdump: append uts_namespace.name offset to VMCOREINFO

2020-09-24 Thread Bhupesh Sharma

Hi Alexander,

On Thu, Sep 24, 2020 at 6:18 PM Alexander Egorenkov
 wrote:
>
> The offset of the field 'init_uts_ns.name' has changed
> since
>
> commit 9a56493f6942c0e2df1579986128721da96e00d8
> Author: Kirill Tkhai 
> Date:   Mon Aug 3 13:16:21 2020 +0300
>
> uts: Use generic ns_common::count

A minor nitpick:
You can add the following line to your .gitconfig:
one = show -s --pretty='format:%h (\"%s\")'

running a command '$ git one ' will then give you an
abbreviated form to be used while referring to existing git commits in
the log message. For e.g. in this case, the output would be something
like:

$ git one 9a56493f6942c0e2df1579986128721da96e00d8
9a56493f6942 ("uts: Use generic ns_common::count")

Then you can use '9a56493f6942 ("uts: Use generic ns_common::count")'
to refer to an existing upstream patch in the log message.

But I think this can be fixed while applying the patch (if there are
no further revisions required).

> Link: 
> https://lore.kernel.org/r/159644978167.604812.1773586504374412107.stgit@localhost.localdomain
>
> Make the offset of the field 'uts_namespace.name' available
> in VMCOREINFO because tools like 'crash-utility' and
> 'makedumpfile' must be able to read it from crash dumps.
>
> Signed-off-by: Alexander Egorenkov 
> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 106e4500fd53..173fdc261882 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -447,6 +447,7 @@ static int __init crash_save_vmcoreinfo_init(void)
> VMCOREINFO_PAGESIZE(PAGE_SIZE);
>
> VMCOREINFO_SYMBOL(init_uts_ns);
> +   VMCOREINFO_OFFSET(uts_namespace, name);
> VMCOREINFO_SYMBOL(node_online_map);
>  #ifdef CONFIG_MMU
> VMCOREINFO_SYMBOL_ARRAY(swapper_pg_dir);
> --
> 2.26.2

Thanks for making the changes we discussed in the v1 review. Otherwise
the patch looks fine to me, so:

Reviewed-by: Bhupesh Sharma

linux-next: manual merge of the tip tree with the pci tree

2020-09-24 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  drivers/pci/controller/vmd.c

between commit:

  42443f036042 ("PCI: vmd: Create IRQ Domain configuration helper")

from the pci tree and commit:

  585dfe8abc44 ("PCI: vmd: Dont abuse vector irqomain as parent")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/pci/controller/vmd.c
index 3c4418cbde1c,aa1b12bac9a1..
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@@ -304,50 -298,6 +304,50 @@@ static struct msi_domain_info vmd_msi_d
.chip   = _msi_controller,
  };
  
 +static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable)
 +{
 +  u16 reg;
 +
 +  pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, );
 +  reg = enable ? (reg & ~0x2) : (reg | 0x2);
 +  pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg);
 +}
 +
 +static int vmd_create_irq_domain(struct vmd_dev *vmd)
 +{
 +  struct fwnode_handle *fn;
 +
 +  fn = irq_domain_alloc_named_id_fwnode("VMD-MSI", vmd->sysdata.domain);
 +  if (!fn)
 +  return -ENODEV;
 +
 +  vmd->irq_domain = pci_msi_create_irq_domain(fn, _msi_domain_info,
-   x86_vector_domain);
++  NULL);
 +  if (!vmd->irq_domain) {
 +  irq_domain_free_fwnode(fn);
 +  return -ENODEV;
 +  }
 +
 +  return 0;
 +}
 +
 +static void vmd_remove_irq_domain(struct vmd_dev *vmd)
 +{
 +  /*
 +   * Some production BIOS won't enable remapping between soft reboots.
 +   * Ensure remapping is restored before unloading the driver
 +   */
 +  if (!vmd->msix_count)
 +  vmd_enable_msi_remapping(vmd, true);
 +
 +  if (vmd->irq_domain) {
 +  struct fwnode_handle *fn = vmd->irq_domain->fwnode;
 +
 +  irq_domain_remove(vmd->irq_domain);
 +  irq_domain_free_fwnode(fn);
 +  }
 +}
 +
  static char __iomem *vmd_cfg_addr(struct vmd_dev *vmd, struct pci_bus *bus,
  unsigned int devfn, int reg, int len)
  {
@@@ -717,12 -568,24 +717,18 @@@ static int vmd_enable_domain(struct vmd
  
sd->node = pcibus_to_node(vmd->dev->bus);
  
 -  fn = irq_domain_alloc_named_id_fwnode("VMD-MSI", vmd->sysdata.domain);
 -  if (!fn)
 -  return -ENODEV;
 -
 -  vmd->irq_domain = pci_msi_create_irq_domain(fn, _msi_domain_info,
 -  NULL);
 -
 -  if (!vmd->irq_domain) {
 -  irq_domain_free_fwnode(fn);
 -  return -ENODEV;
 +  if (vmd->msix_count) {
 +  ret = vmd_create_irq_domain(vmd);
 +  if (ret)
 +  return ret;
}
  
+   /*
+* Override the irq domain bus token so the domain can be distinguished
+* from a regular PCI/MSI domain.
+*/
+   irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
+ 
pci_add_resource(, >resources[0]);
pci_add_resource_offset(, >resources[1], offset[0]);
pci_add_resource_offset(, >resources[2], offset[1]);


pgp8gkfMcb4Zp.pgp
Description: OpenPGP digital signature

[rcu:rcu/test 55/64] kernel/rcu/rcutorture.c:1507:13: warning: no previous prototype for function 'rcu_nocb_cpu_offload'

2020-09-24 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
rcu/test
head:   516c2f87d0d1204a9ea1b36298283f74056a7eab
commit: 4df5d8a622235d0b26aba92c881d190667e4d6c3 [55/64] rcutorture: Test 
runtime toggling of CPUs' callback offloading
config: riscv-randconfig-r036-20200923 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
c32e69b2ce7abfb151a87ba363ac9e25abf7d417)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=4df5d8a622235d0b26aba92c881d190667e4d6c3
git remote add rcu 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
git fetch --no-tags rcu rcu/test
git checkout 4df5d8a622235d0b26aba92c881d190667e4d6c3
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> kernel/rcu/rcutorture.c:1507:13: warning: no previous prototype for function 
>> 'rcu_nocb_cpu_offload' [-Wmissing-prototypes]
   void __weak rcu_nocb_cpu_offload(int cpu) {}
   ^
   kernel/rcu/rcutorture.c:1507:1: note: declare 'static' if the function is 
not intended to be used outside of this translation unit
   void __weak rcu_nocb_cpu_offload(int cpu) {}
   ^
   static 
>> kernel/rcu/rcutorture.c:1508:13: warning: no previous prototype for function 
>> 'rcu_nocb_cpu_deoffload' [-Wmissing-prototypes]
   void __weak rcu_nocb_cpu_deoffload(int cpu) {}
   ^
   kernel/rcu/rcutorture.c:1508:1: note: declare 'static' if the function is 
not intended to be used outside of this translation unit
   void __weak rcu_nocb_cpu_deoffload(int cpu) {}
   ^
   static 
   2 warnings generated.

vim +/rcu_nocb_cpu_offload +1507 kernel/rcu/rcutorture.c

  1506  
> 1507  void __weak rcu_nocb_cpu_offload(int cpu) {}
> 1508  void __weak rcu_nocb_cpu_deoffload(int cpu) {}
  1509  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

KMSAN: uninit-value in udf_evict_inode

2020-09-24 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:c5a13b33 kmsan: clang-format core
git tree:   https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=142cf80990
kernel config:  https://syzkaller.appspot.com/x/.config?x=20f149ad694ba4be
dashboard link: https://syzkaller.appspot.com/bug?extid=91f02b28f9bb5f5f1341
compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+91f02b28f9bb5f5f1...@syzkaller.appspotmail.com

UDF-fs: warning (device loop3): udf_load_vrs: No anchor found
UDF-fs: Scanning with blocksize 512 failed
UDF-fs: INFO Mounting volume 'LinuxUDF', timestamp 2020/09/19 18:44 (1000)
UDF-fs: error (device loop3): udf_read_inode: (ino 1328) failed !bh
=
BUG: KMSAN: uninit-value in udf_evict_inode+0x382/0x7d0 fs/udf/inode.c:150
CPU: 0 PID: 13956 Comm: syz-executor.3 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219
 udf_evict_inode+0x382/0x7d0 fs/udf/inode.c:150
 evict+0x4d3/0xec0 fs/inode.c:576
 iput_final fs/inode.c:1652 [inline]
 iput+0xc7b/0xf50 fs/inode.c:1678
 iget_failed+0x2cb/0x390 fs/bad_inode.c:242
 __udf_iget+0x111d/0x4650 fs/udf/inode.c:1914
 udf_iget fs/udf/udfdecl.h:147 [inline]
 udf_fill_super+0x2fa3/0x3330 fs/udf/super.c:2284
 mount_bdev+0x622/0x910 fs/super.c:1417
 udf_mount+0xc9/0xe0 fs/udf/super.c:127
 legacy_get_tree+0x163/0x2e0 fs/fs_context.c:592
 vfs_get_tree+0xd8/0x5d0 fs/super.c:1547
 do_new_mount fs/namespace.c:2875 [inline]
 path_mount+0x3d1a/0x5d40 fs/namespace.c:3192
 do_mount+0x1c6/0x220 fs/namespace.c:3205
 __do_compat_sys_mount fs/compat.c:122 [inline]
 __se_compat_sys_mount+0x7b5/0xaa0 fs/compat.c:89
 __ia32_compat_sys_mount+0x62/0x80 fs/compat.c:89
 do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline]
 __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139
 do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162
 do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205
 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
RIP: 0023:0xf7f94549
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 
00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 
eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:f558df20 EFLAGS: 0292 ORIG_RAX: 0015
RAX: ffda RBX: f558df7c RCX: 2100
RDX: 2000 RSI:  RDI: f558dfbc
RBP: f558df7c R08:  R09: 
R10:  R11:  R12: 
R13:  R14:  R15: 

Uninit was stored to memory at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:143 [inline]
 kmsan_internal_chain_origin+0xad/0x130 mm/kmsan/kmsan.c:311
 __msan_chain_origin+0x50/0x90 mm/kmsan/kmsan_instr.c:169
 udf_alloc_inode+0x2ab/0x2d0 fs/udf/super.c:154
 alloc_inode fs/inode.c:232 [inline]
 iget_locked+0x37a/0x13f0 fs/inode.c:1193
 __udf_iget+0x152/0x4650 fs/udf/inode.c:1902
 udf_iget fs/udf/udfdecl.h:147 [inline]
 udf_fill_super+0x2fa3/0x3330 fs/udf/super.c:2284
 mount_bdev+0x622/0x910 fs/super.c:1417
 udf_mount+0xc9/0xe0 fs/udf/super.c:127
 legacy_get_tree+0x163/0x2e0 fs/fs_context.c:592
 vfs_get_tree+0xd8/0x5d0 fs/super.c:1547
 do_new_mount fs/namespace.c:2875 [inline]
 path_mount+0x3d1a/0x5d40 fs/namespace.c:3192
 do_mount+0x1c6/0x220 fs/namespace.c:3205
 __do_compat_sys_mount fs/compat.c:122 [inline]
 __se_compat_sys_mount+0x7b5/0xaa0 fs/compat.c:89
 __ia32_compat_sys_mount+0x62/0x80 fs/compat.c:89
 do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline]
 __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139
 do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162
 do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205
 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

Uninit was created at:
 kmsan_save_stack_with_flags+0x3c/0x90 mm/kmsan/kmsan.c:143
 kmsan_internal_alloc_meta_for_pages mm/kmsan/kmsan_shadow.c:268 [inline]
 kmsan_alloc_page+0xc5/0x1a0 mm/kmsan/kmsan_shadow.c:292
 __alloc_pages_nodemask+0xf34/0x1120 mm/page_alloc.c:4927
 alloc_pages_current+0x685/0xb50 mm/mempolicy.c:2275
 alloc_pages include/linux/gfp.h:545 [inline]
 alloc_slab_page mm/slub.c:1634 [inline]
 allocate_slab+0x2fe/0x1180 mm/slub.c:1777
 new_slab mm/slub.c:1838 [inline]
 new_slab_objects mm/slub.c:2595 [inline]
 ___slab_alloc+0xd65/0x1940 mm/slub.c:2756
 __slab_alloc mm/slub.c:2796 [inline]
 slab_alloc_node

KMSAN: uninit-value in f2fs_lookup

2020-09-24 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:c5a13b33 kmsan: clang-format core
git tree:   https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=14f5b19b90
kernel config:  https://syzkaller.appspot.com/x/.config?x=20f149ad694ba4be
dashboard link: https://syzkaller.appspot.com/bug?extid=0eac6f0bbd558fd866d7
compiler:   clang version 10.0.0 (https://github.com/llvm/llvm-project/ 
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+0eac6f0bbd558fd86...@syzkaller.appspotmail.com

=
BUG: KMSAN: uninit-value in f2fs_lookup+0xe05/0x1a80 fs/f2fs/namei.c:503
CPU: 0 PID: 20216 Comm: syz-executor.5 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x21c/0x280 lib/dump_stack.c:118
 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122
 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219
 f2fs_lookup+0xe05/0x1a80 fs/f2fs/namei.c:503
 lookup_open fs/namei.c:3082 [inline]
 open_last_lookups fs/namei.c:3177 [inline]
 path_openat+0x2729/0x6a90 fs/namei.c:3365
 do_filp_open+0x2b8/0x710 fs/namei.c:3395
 do_sys_openat2+0xa88/0x1140 fs/open.c:1168
 do_sys_open fs/open.c:1184 [inline]
 __do_compat_sys_openat fs/open.c:1242 [inline]
 __se_compat_sys_openat+0x2a4/0x310 fs/open.c:1240
 __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240
 do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline]
 __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139
 do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162
 do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205
 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
RIP: 0023:0xf7f73549
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 
00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 
eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 002b:f554c0cc EFLAGS: 0296 ORIG_RAX: 0127
RAX: ffda RBX: ff9c RCX: 2980
RDX: 0002f042 RSI:  RDI: 
RBP:  R08:  R09: 
R10:  R11:  R12: 
R13:  R14:  R15: 

Local variable page@f2fs_lookup created at:
 f2fs_lookup+0x8f/0x1a80 fs/f2fs/namei.c:477
 f2fs_lookup+0x8f/0x1a80 fs/f2fs/namei.c:477
=


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Re: [Intel-wired-lan] [PATCH v3] e1000e: Increase iteration on polling MDIC ready bit

2020-09-24 Thread Paul Menzel


Dear Kai-Heng,


Thank you for patch version 3.

Am 24.09.20 um 18:45 schrieb Kai-Heng Feng:

We are seeing the following error after S3 resume:
[  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
[  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
[  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
shifted) reg 0x17
[  704.943155] e1000e :00:1f.6 eno1: MDI Error
...
[  705.108161] e1000e :00:1f.6 eno1: Hardware Error

As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
increase polling iteration can resolve the issue.

The root cause is quite likely Intel ME, since it's a blackbox to the
kernel so the only approach we can take is to be patient and wait
longer.

Signed-off-by: Kai-Heng Feng 
---
v3:
  - Moving delay to end of loop doesn't save anytime, move it back.
  - Point out this is quitely likely caused by Intel ME.


quietly

You seem to have missed my comments regarding patch version 3. It’d be 
great if you improved the commit message with my suggestions.


Without knowing what hardware this happened on, nobody, even later 
getting the hardware, can reproduce the your results. If you say the ME 
is involved, please also document the ME firmware version, which is used 
here.



v2:
  - Increase polling iteration instead of powering down the phy.

  drivers/net/ethernet/intel/e1000e/phy.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c 
b/drivers/net/ethernet/intel/e1000e/phy.c
index e11c877595fb..e6d4acd90937 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -203,7 +203,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 
offset, u16 data)
 * Increasing the time out as testing showed failures with
 * the lower time out
 */
-   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
+   for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 10); i++) {
udelay(50);
mdic = er32(MDIC);
if (mdic & E1000_MDIC_READY)


In the PCI subsystem, a warning is shown, when something takes more then 
100 ms. As you increase it to over 320 ms, a warning should be printed 
to talk to the firmware folks, when it passes 100 ms.



Kind regards,

Paul

Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-24 Thread Greg Kroah-Hartman

On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote:
> >From stable rc 4.18.1 onwards to today's stable rc 4.19.147
> 
> There are two problems  while running LTP tracing tests
> 1) kernel panic  on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
> 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 
> [2]
> Please refer to the full test logs from below links.
> 
> The first bad commit found by git bisect.
>commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
>tracing: Centralize preemptirq tracepoints and unify their usage
> 
> Reported-by: Naresh Kamboju 

So this also is reproducable in 5.4 and Linus's tree right now?

Or are newer kernels working fine?

thanks,

greg k-h

Re: [PATCH v3 1/1] PCI/ERR: Fix reset logic in pcie_do_recovery() call

2020-09-24 Thread Kuppuswamy, Sathyanarayanan





On 9/24/20 1:52 PM, Sinan Kaya wrote:

On 9/24/2020 12:06 AM, Kuppuswamy, Sathyanarayanan wrote:

For problem description, please check the following details

Current pcie_do_recovery() implementation has following two issues:

1. Fatal (DPC) error recovery is currently broken for non-hotplug
capable devices. Current fatal error recovery implementation relies
on PCIe hotplug (pciehp) handler for detaching and re-enumerating
the affected devices/drivers. pciehp handler listens for DLLSC state
changes and handles device/driver detachment on DLLSC_LINK_DOWN event
and re-enumeration on DLLSC_LINK_UP event. So when dealing with
non-hotplug capable devices, recovery code does not restore the state
of the affected devices correctly. Correct implementation should
restore the device state and call report_slot_reset() function after
resetting the link to restore the state of the device/driver.



So, this is a matter of moving the save/restore logic from the hotplug
driver into common code so that DPC slot reset takes advantage of it?

We are not moving it out of hotplug path. But fixing it in this code path.
With this fix, we will not depend on hotplug driver to restore the state.


If that's direction we are going, this is fine change IMO.


You can find fatal non-hotplug related issues reported in following links:

https://lore.kernel.org/linux-pci/20200527083130.4137-1-zhiqiang@nxp.com/

https://lore.kernel.org/linux-pci/12115.1588207324@famine/
https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.158584.git.sathyanarayanan.kuppusw...@linux.intel.com/


2. For non-fatal errors if report_error_detected() or
report_mmio_enabled() functions requests PCI_ERS_RESULT_NEED_RESET then
current pcie_do_recovery() implementation does not do the requested
explicit device reset, instead just calls the report_slot_reset() on all
affected devices. Notifying about the reset via report_slot_reset()
without doing the actual device reset is incorrect.



This makes sens too. There seems to be an ordering issue per your
description.

Yes



To fix above issues, use PCI_ERS_RESULT_NEED_RESET as error state after
successful reset_link() operation. This will ensure ->slot_reset() be
called after reset_link() operation for fatal errors.


You lost me here. Why do we want to do secondary bus reset on top of
DPC reset?

For non-hotplug capable slots, when reset (PCI_ERS_RESULT_NEED_RESET) is
requested, we want to reset it before calling ->slot_reset() callback.



Also call
pci_bus_reset() to do slot/bus reset() before triggering device specific
->slot_reset() callback. Also, using pci_bus_reset() will restore the state
of the devices after performing the reset operation.

Even though using pci_bus_reset() will do redundant reset operation after
->reset_link() for fatal errors, it should should affect the functional
behavior.




--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

Re: [PATCH v3 2/3] misc: bcm-vk: add Broadcom VK driver

2020-09-24 Thread Greg Kroah-Hartman

On Thu, Sep 24, 2020 at 02:40:08PM -0700, Scott Branden wrote:
> > Ugh, yes, it is uncommon because those are two different things.  Why do
> > you need/want a misc driver to control a tty device?  Why do you need a
> > tty device?  What really is this beast?
> The beast consists of a PCI card.  The PCI card communicates to the host via 
> shared memory in BAR space and MSIX interrupts.
> The host communicates through shared memory in BAR space and generates 
> mailbox interrupts.

That describes any PCI card :)

> In addition the PCI card has DMA access to host memory to access data for 
> processing.
> The misc driver handles these operations.  Multiple user space processes are 
> accessing the misc device at the same time
> to perform simultaenous offload operations.  Each process opens the device 
> and sends multiple commands with write operations
> and receives solicited and unsolicited responses read read operations.  In 
> addition there are sysfs entries to collect statistics and errors.
> Ioctl operations are present for loading new firmware images to the card and 
> reset operations.
> 
> In addition, the card has multiple physical UART connections to access 
> consoles on multi processors on the card.
> But, in a real system there is no serial cable connected from the card to the 
> server.  And, there could be 16 PCIe cards
> plugged into a server so that would make it even more unrealistic to access 
> these consoles via a UART.
> 
> Fortunately the data sent out to each physical UART exists in a circular 
> buffer.  These circular buffer can be access via the host in BAR space.
> So we added tty device nodes per UART (2 per PCIe).  These are not the misc 
> device node, these are seperate tty device nodes that are opened
> and behave like tty devices.
> 
> We are not using the misc driver to control tty.
>   1) The PCI driver is the foundaton of it which provides the physical 
> interface to the card, 2) misc driver is a presentation to user space to do 
> its regular message - this allows user to treat it as a char-device, so its 
> like fopen/read/write/close etc. and 3) tty is an additional debug channel 
> which could be on or off.
> 
> Please suggest how we can access the misc device and tty device nodes other 
> than how we have implemented it in a single driver?

You haven't described what the card actually is for.  You describe how
you have tried to hook it up to userspace, but what is the use-case
here?  Why even have a kernel driver at all and not do everything in a
uio driver?

Creating random misc devices and tty devices implies that the device
does not fit into any of the different existing kernel apis, so you need
to write your own "special" one, which is always worrying.

Without actually knowing what this device is, it's hard to judge if you
really should be using something existing or not.

thanks,

greg k-h

[PATCH v3 7/7] ARM: dts: imx50-kobo-aura: Add Netronix embedded controller

2020-09-24 Thread Jonathan Neuschäfer

Enable the Netronix EC on the Kobo Aura ebook reader.

Several features are still missing:
 - Frontlight/backlight. The vendor kernel drives the frontlight LED
   using the PWM output of the EC and an additional boost pin that
   increases the brightness.
 - Battery monitoring
 - Interrupts for RTC alarm and low-battery events

Signed-off-by: Jonathan Neuschäfer 
---

v3:
- Remove interrupt-controller property from embedded-controller node
- subnodes of embedded-controller node in to the main node

v2:
- https://lore.kernel.org/lkml/20200905144503.1067124-3-j.neuschae...@gmx.net/
- Fix pwm-cells property (should be 2, not 1)
---
 arch/arm/boot/dts/imx50-kobo-aura.dts | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/imx50-kobo-aura.dts 
b/arch/arm/boot/dts/imx50-kobo-aura.dts
index a0eaf869b9135..2d1a59091a37c 100644
--- a/arch/arm/boot/dts/imx50-kobo-aura.dts
+++ b/arch/arm/boot/dts/imx50-kobo-aura.dts
@@ -6,6 +6,7 @@
 /dts-v1/;
 #include "imx50.dtsi"
 #include 
+#include 

 / {
model = "Kobo Aura (N514)";
@@ -135,10 +136,24 @@  {
pinctrl-0 = <_i2c3>;
status = "okay";

-   /* TODO: embedded controller at 0x43 */
+   embedded-controller@43 {
+   pinctrl-names = "default";
+   pinctrl-0 = <_ec>;
+   compatible = "netronix,ntxec";
+   reg = <0x43>;
+   system-power-controller;
+   interrupts-extended = < 11 IRQ_TYPE_EDGE_FALLING>;
+   #pwm-cells = <2>;
+   };
 };

  {
+   pinctrl_ec: ec {
+   fsl,pins = <
+   MX50_PAD_CSPI_SS0__GPIO4_11 0x0 /* INT 
*/
+   >;
+   };
+
pinctrl_gpiokeys: gpiokeys {
fsl,pins = <
MX50_PAD_CSPI_MISO__GPIO4_100x0
--
2.28.0

[PATCH v2] kernel/kthread.c: kthread_worker: add work status check in timer_fn

2020-09-24 Thread qiang.zhang

From: Zqiang 

When queue delayed work to worker, at some point after that the timer_fn
will be call, add work to worker's work_list, at this time, the work may
be cancel, so add "work->canceling" check current work status.

Signed-off-by: Zqiang 
---
 v1->v2:
 Change description information.

 kernel/kthread.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3edaa380dc7b..85a2c9b32049 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -897,7 +897,8 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
/* Move the work from worker->delayed_work_list. */
WARN_ON_ONCE(list_empty(>node));
list_del_init(>node);
-   kthread_insert_work(worker, work, >work_list);
+   if (!work->canceling)
+   kthread_insert_work(worker, work, >work_list);
 
raw_spin_unlock_irqrestore(>lock, flags);
 }
-- 
2.17.1

[PATCH] kernel/kthread.c: kthread_worker: add work status check in timer_fn

2020-09-24 Thread qiang.zhang

From: Zqiang 

When queue delayed work to worker, at some point after that the timer_fn
will be call, add work to worker's work_list, at this time, the work may
be cancel, so add "queuing_blocked" check current work status.

Signed-off-by: Zqiang 
---
 kernel/kthread.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3edaa380dc7b..85a2c9b32049 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -897,7 +897,8 @@ void kthread_delayed_work_timer_fn(struct timer_list *t)
/* Move the work from worker->delayed_work_list. */
WARN_ON_ONCE(list_empty(>node));
list_del_init(>node);
-   kthread_insert_work(worker, work, >work_list);
+   if (!work->canceling)
+   kthread_insert_work(worker, work, >work_list);
 
raw_spin_unlock_irqrestore(>lock, flags);
 }
-- 
2.17.1

Re: [PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020-09-24 Thread Joonsoo Kim

2020년 8월 28일 (금) 오전 8:54, Joonsoo Kim 님이 작성:
>
> 2020년 8월 27일 (목) 오후 10:35, Mel Gorman 님이 작성:
> >
> > On Wed, Aug 26, 2020 at 02:12:44PM +0900, Joonsoo Kim wrote:
> > > > > And, it requires to break current code
> > > > > layering that order-0 page is always handled by the pcplist. I'd 
> > > > > prefer
> > > > > to avoid it so this patch uses different way to skip CMA page 
> > > > > allocation
> > > > > from the pcplist.
> > > >
> > > > Well it would be much simpler and won't affect most of allocations. 
> > > > Better than
> > > > flushing pcplists IMHO.
> > >
> > > Hmm...Still, I'd prefer my approach.
> >
> > I prefer the pcp bypass approach. It's simpler and it does not incur a
> > pcp drain/refill penalty.
> >
> > > There are two reasons. First,
> > > layering problem
> > > mentioned above. In rmqueue(), there is a code for MIGRATE_HIGHATOMIC.
> > > As the name shows, it's for high order atomic allocation. But, after
> > > skipping pcplist
> > > allocation as you suggested, we could get there with order 0 request.
> >
> > I guess your concern is that under some circumstances that a request that
> > passes a watermark check could fail due to a highatomic reserve and to
> > an extent this is true. However, in that case the system is already low
> > on memory depending on the allocation context, the pcp lists may get
> > flushed anyway.
>
> My concern is that non-highorder (order-0) allocation could pollute/use the
> MIGRATE_HIGHATOMIC pageblock. It's reserved for highorder atomic
> allocation so it's not good if an order-0 request could get there. It would
> cause more fragmentation on that pageblock.
>
> > > We can also
> > > change this code, but, I'd hope to maintain current layering. Second,
> > > a performance
> > > reason. After the flag for nocma is up, a burst of nocma allocation
> > > could come. After
> > > flushing the pcplist one times, we can use the free page on the
> > > pcplist as usual until
> > > the context is changed.
> >
> > It's not guaranteed because CMA pages could be freed between the nocma save
> > and restore triggering further drains due to a reschedule.  Similarly,
> > a CMA allocation in parallel could refill with CMA pages on the per-cpu
> > list. While both cases are unlikely, it's more unpredictable than a
> > straight-forward pcp bypass.
>
> Agreed that it's unpredictable than the pcp bypass. But, as you said,
> those cases
> would be rare.
>
> > I don't really see it as a layering violation of the API because all
> > order-0 pages go through the PCP lists. The fact that order-0 is serviced
> > from the pcp list is an internal implementation detail, the API doesn't
> > care.
>
> What I mean is an internal implementation layering violation. We could make
> a rule even in internal implementation to make code simpler and maintainable.
> I guess that order-0 is serviced from the pcp list is one of those.
>
> Anyway, although I prefer my approach, I'm okay with using pcp bypass.

Hello, Andrew and Vlastimil.

It's better to fix this possible bug introduced in v5.9-rc1 before
v5.9 is released.
Which approach do you prefer?
If it is determined, I will immediately send a patch as you suggested.

Thanks.

Re: [PATCH] KVM: SVM: Mark SEV launch secret pages as dirty.

2020-09-24 Thread Greg KH

On Thu, Sep 24, 2020 at 07:00:11PM -0700, Cfir Cohen wrote:
> The LAUNCH_SECRET command performs encryption of the
> launch secret memory contents. Mark pinned pages as
> dirty, before unpinning them.
> This matches the logic in sev_launch_update_data().
> 
> Fixes: 9c5e0afaf157 ("KVM: SVM: Add support for SEV LAUNCH_SECRET command")
> Signed-off-by: Cfir Cohen 
> ---
> Changelog since v2:
>  - Added 'Fixes' tag, updated comments.
> Changelog since v1:
>  - Updated commit message.
> 
>  arch/x86/kvm/svm/sev.c | 26 +-
>  1 file changed, 17 insertions(+), 9 deletions(-)
> 



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

[PATCH 5/9] fs: remove various compat readv/writev helpers

2020-09-24 Thread Christoph Hellwig

Now that import_iovec handles compat iovecs as well, all the duplicated
code in the compat readv/writev helpers is not needed.  Remove them
and switch the compat syscall handlers to use the native helpers.

Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c| 179 +++--
 include/linux/compat.h |  20 ++---
 2 files changed, 40 insertions(+), 159 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 0a68037580b455..eab427b7cc0a3f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1068,226 +1068,107 @@ SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const 
struct iovec __user *, vec,
return do_pwritev(fd, vec, vlen, pos, flags);
 }
 
+/*
+ * Various compat syscalls.  Note that they all pretend to take a native
+ * iovec - import_iovec will properly treat those as compat_iovecs based on
+ * in_compat_syscall().
+ */
 #ifdef CONFIG_COMPAT
-static size_t compat_readv(struct file *file,
-  const struct compat_iovec __user *vec,
-  unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UIO_FASTIOV];
-   struct iovec *iov = iovstack;
-   struct iov_iter iter;
-   ssize_t ret;
-
-   ret = import_iovec(READ, (const struct iovec __user *)vec, vlen,
-  UIO_FASTIOV, , );
-   if (ret >= 0) {
-   ret = do_iter_read(file, , pos, flags);
-   kfree(iov);
-   }
-   if (ret > 0)
-   add_rchar(current, ret);
-   inc_syscr(current);
-   return ret;
-}
-
-static size_t do_compat_readv(compat_ulong_t fd,
-const struct compat_iovec __user *vec,
-compat_ulong_t vlen, rwf_t flags)
-{
-   struct fd f = fdget_pos(fd);
-   ssize_t ret;
-   loff_t pos;
-
-   if (!f.file)
-   return -EBADF;
-   pos = f.file->f_pos;
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   if (ret >= 0)
-   f.file->f_pos = pos;
-   fdput_pos(f);
-   return ret;
-
-}
-
 COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen)
 {
-   return do_compat_readv(fd, vec, vlen, 0);
-}
-
-static long do_compat_preadv64(unsigned long fd,
- const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos, rwf_t flags)
-{
-   struct fd f;
-   ssize_t ret;
-
-   if (pos < 0)
-   return -EINVAL;
-   f = fdget(fd);
-   if (!f.file)
-   return -EBADF;
-   ret = -ESPIPE;
-   if (f.file->f_mode & FMODE_PREAD)
-   ret = compat_readv(f.file, vec, vlen, , flags);
-   fdput(f);
-   return ret;
+   return do_readv(fd, vec, vlen, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos)
 {
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-   return do_compat_preadv64(fd, vec, vlen, pos, 0);
+   return do_preadv(fd, vec, vlen, pos, 0);
 }
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
 COMPAT_SYSCALL_DEFINE5(preadv64v2, unsigned long, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
unsigned long, vlen, loff_t, pos, rwf_t, flags)
 {
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
+   return do_readv(fd, vec, vlen, flags);
+   return do_preadv(fd, vec, vlen, pos, flags);
 }
 #endif
 
 COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
-   const struct compat_iovec __user *,vec,
+   const struct iovec __user *, vec,
compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
rwf_t, flags)
 {
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
if (pos == -1)
-   return do_compat_readv(fd, vec, vlen, flags);
-
-   return do_compat_preadv64(fd, vec, vlen, pos, flags);
-}
-
-static size_t compat_writev(struct file *file,
-   const struct compat_iovec __user *vec,
-   unsigned long vlen, loff_t *pos, rwf_t flags)
-{
-   struct iovec iovstack[UIO_FASTIOV];
-   struct iovec *iov =

[PATCH 9/9] security/keys: remove compat_keyctl_instantiate_key_iov

2020-09-24 Thread Christoph Hellwig

Now that import_iovec handles compat iovecs, the native version of
keyctl_instantiate_key_iov can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 security/keys/compat.c   | 36 ++--
 security/keys/internal.h |  5 -
 security/keys/keyctl.c   |  2 +-
 3 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/security/keys/compat.c b/security/keys/compat.c
index 7ae531db031cf8..1545efdca56227 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -11,38 +11,6 @@
 #include 
 #include "internal.h"
 
-/*
- * Instantiate a key with the specified compatibility multipart payload and
- * link the key into the destination keyring if one is given.
- *
- * The caller must have the appropriate instantiation permit set for this to
- * work (see keyctl_assume_authority).  No other permissions are required.
- *
- * If successful, 0 will be returned.
- */
-static long compat_keyctl_instantiate_key_iov(
-   key_serial_t id,
-   const struct compat_iovec __user *_payload_iov,
-   unsigned ioc,
-   key_serial_t ringid)
-{
-   struct iovec iovstack[UIO_FASTIOV], *iov = iovstack;
-   struct iov_iter from;
-   long ret;
-
-   if (!_payload_iov)
-   ioc = 0;
-
-   ret = import_iovec(WRITE, (const struct iovec __user *)_payload_iov,
-  ioc, ARRAY_SIZE(iovstack), , );
-   if (ret < 0)
-   return ret;
-
-   ret = keyctl_instantiate_key_common(id, , ringid);
-   kfree(iov);
-   return ret;
-}
-
 /*
  * The key control system call, 32-bit compatibility version for 64-bit archs
  */
@@ -113,8 +81,8 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
return keyctl_reject_key(arg2, arg3, arg4, arg5);
 
case KEYCTL_INSTANTIATE_IOV:
-   return compat_keyctl_instantiate_key_iov(
-   arg2, compat_ptr(arg3), arg4, arg5);
+   return keyctl_instantiate_key_iov(arg2, compat_ptr(arg3), arg4,
+ arg5);
 
case KEYCTL_INVALIDATE:
return keyctl_invalidate_key(arg2);
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 338a526cbfa516..9b9cf3b6fcbb4d 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -262,11 +262,6 @@ extern long keyctl_instantiate_key_iov(key_serial_t,
   const struct iovec __user *,
   unsigned, key_serial_t);
 extern long keyctl_invalidate_key(key_serial_t);
-
-struct iov_iter;
-extern long keyctl_instantiate_key_common(key_serial_t,
- struct iov_iter *,
- key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
const char __user *_type,
const char __user *_restriction);
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 9febd37a168fd0..e26bbccda7ccee 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -1164,7 +1164,7 @@ static int keyctl_change_reqkey_auth(struct key *key)
  *
  * If successful, 0 will be returned.
  */
-long keyctl_instantiate_key_common(key_serial_t id,
+static long keyctl_instantiate_key_common(key_serial_t id,
   struct iov_iter *from,
   key_serial_t ringid)
 {
-- 
2.28.0

[PATCH 7/9] fs: remove compat_sys_vmsplice

2020-09-24 Thread Christoph Hellwig

Now that import_iovec handles compat iovecs, the native vmsplice syscall
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  2 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  2 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  2 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  2 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  2 +-
 arch/s390/kernel/syscalls/syscall.tbl |  2 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  2 +-
 arch/x86/entry/syscall_x32.c  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl|  2 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 fs/splice.c   | 57 +--
 include/linux/compat.h|  4 --
 include/uapi/asm-generic/unistd.h |  2 +-
 tools/include/uapi/asm-generic/unistd.h   |  2 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  2 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  2 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
 17 files changed, 28 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 4a236493dca5b9..11dfae3a8563bd 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -697,7 +697,7 @@ __SYSCALL(__NR_sync_file_range2, 
compat_sys_aarch32_sync_file_range2)
 #define __NR_tee 342
 __SYSCALL(__NR_tee, sys_tee)
 #define __NR_vmsplice 343
-__SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages 344
 __SYSCALL(__NR_move_pages, compat_sys_move_pages)
 #define __NR_getcpu 345
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index c99a92646f8ee9..5a39d4de0ac85b 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -278,7 +278,7 @@
 267n32 splice  sys_splice
 268n32 sync_file_range sys_sync_file_range
 269n32 tee sys_tee
-270n32 vmsplicecompat_sys_vmsplice
+270n32 vmsplicesys_vmsplice
 271n32 move_pages  compat_sys_move_pages
 272n32 set_robust_list compat_sys_set_robust_list
 273n32 get_robust_list compat_sys_get_robust_list
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 075064d10661bf..136efc6b8c5444 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -318,7 +318,7 @@
 304o32 splice  sys_splice
 305o32 sync_file_range sys_sync_file_range 
sys32_sync_file_range
 306o32 tee sys_tee
-307o32 vmsplicesys_vmsplice
compat_sys_vmsplice
+307o32 vmsplicesys_vmsplice
 308o32 move_pages  sys_move_pages  
compat_sys_move_pages
 309o32 set_robust_list sys_set_robust_list 
compat_sys_set_robust_list
 310o32 get_robust_list sys_get_robust_list 
compat_sys_get_robust_list
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index 192abde0001d9d..a9e184192caedd 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -330,7 +330,7 @@
 29232  sync_file_range parisc_sync_file_range
 29264  sync_file_range sys_sync_file_range
 293common  tee sys_tee
-294common  vmsplicesys_vmsplice
compat_sys_vmsplice
+294common  vmsplicesys_vmsplice
 295common  move_pages  sys_move_pages  
compat_sys_move_pages
 296common  getcpu  sys_getcpu
 297common  epoll_pwait sys_epoll_pwait 
compat_sys_epoll_pwait
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 6f1e2ecf0edad9..0d4985919ca34d 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -369,7 +369,7 @@
 282common  unshare sys_unshare
 283common  splice  sys_splice
 284common  tee sys_tee
-285common  vmsplicesys_vmsplice
compat_sys_vmsplice
+285common  vmsplicesys_vmsplice
 286common  openat  sys_openat  
compat_sys_openat
 287common

[PATCH 4/9] iov_iter: transparently handle compat iovecs in import_iovec

2020-09-24 Thread Christoph Hellwig

Use in compat_syscall to import either native or the compat iovecs, and
remove the now superflous compat_import_iovec.

This removes the need for special compat logic in most callers, and
the remaining ones can still be simplified by using __import_iovec
with a bool compat parameter.

Signed-off-by: Christoph Hellwig 
---
 block/scsi_ioctl.c | 12 ++--
 drivers/scsi/sg.c  |  9 +
 fs/aio.c   |  8 ++--
 fs/io_uring.c  | 20 
 fs/read_write.c|  6 --
 fs/splice.c|  2 +-
 include/linux/uio.h|  8 
 lib/iov_iter.c | 14 ++
 mm/process_vm_access.c |  3 ++-
 net/compat.c   |  4 ++--
 security/keys/compat.c |  5 ++---
 11 files changed, 26 insertions(+), 65 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index ef722f04f88a93..e08df86866ee5d 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -333,16 +333,8 @@ static int sg_io(struct request_queue *q, struct gendisk 
*bd_disk,
struct iov_iter i;
struct iovec *iov = NULL;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   ret = compat_import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
-   else
-#endif
-   ret = import_iovec(rq_data_dir(rq),
-  hdr->dxferp, hdr->iovec_count,
-  0, , );
+   ret = import_iovec(rq_data_dir(rq), hdr->dxferp,
+  hdr->iovec_count, 0, , );
if (ret < 0)
goto out_free_cdb;
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 20472aaaf630a4..bfa8d77322d732 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1820,14 +1820,7 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
struct iovec *iov = NULL;
struct iov_iter i;
 
-#ifdef CONFIG_COMPAT
-   if (in_compat_syscall())
-   res = compat_import_iovec(rw, hp->dxferp, iov_count,
- 0, , );
-   else
-#endif
-   res = import_iovec(rw, hp->dxferp, iov_count,
-  0, , );
+   res = import_iovec(rw, hp->dxferp, iov_count, 0, , );
if (res < 0)
return res;
 
diff --git a/fs/aio.c b/fs/aio.c
index d5ec303855669d..c45c20d875388c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1489,12 +1489,8 @@ static ssize_t aio_setup_rw(int rw, const struct iocb 
*iocb,
*iovec = NULL;
return ret;
}
-#ifdef CONFIG_COMPAT
-   if (compat)
-   return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
-   iter);
-#endif
-   return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
+
+   return __import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter, compat);
 }
 
 static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8b426aa29668cb..8c27dc28da182a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2852,13 +2852,8 @@ static ssize_t __io_import_iovec(int rw, struct io_kiocb 
*req,
return ret;
}
 
-#ifdef CONFIG_COMPAT
-   if (req->ctx->compat)
-   return compat_import_iovec(rw, buf, sqe_len, UIO_FASTIOV,
-   iovec, iter);
-#endif
-
-   return import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter);
+   return __import_iovec(rw, buf, sqe_len, UIO_FASTIOV, iovec, iter,
+ req->ctx->compat);
 }
 
 static ssize_t io_import_iovec(int rw, struct io_kiocb *req,
@@ -4200,8 +4195,9 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
sr->len);
iomsg->iov = NULL;
} else {
-   ret = import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
-   >iov, >msg.msg_iter);
+   ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
+>iov, >msg.msg_iter,
+false);
if (ret > 0)
ret = 0;
}
@@ -4241,9 +4237,9 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb 
*req,
sr->len = iomsg->iov[0].iov_len;
iomsg->iov = NULL;
} else {
-   ret = compat_import_iovec(READ, uiov, len, UIO_FASTIOV,
-   >iov,
-   >msg.msg_iter);
+   ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
+  UIO_FASTIOV, >iov,
+  >msg.msg_iter,

[PATCH 8/9] mm: remove compat_process_vm_{readv,writev}

2020-09-24 Thread Christoph Hellwig

Now that import_iovec handles compat iovecs, the native syscalls
can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h |  4 +-
 arch/mips/kernel/syscalls/syscall_n32.tbl |  4 +-
 arch/mips/kernel/syscalls/syscall_o32.tbl |  4 +-
 arch/parisc/kernel/syscalls/syscall.tbl   |  4 +-
 arch/powerpc/kernel/syscalls/syscall.tbl  |  4 +-
 arch/s390/kernel/syscalls/syscall.tbl |  4 +-
 arch/sparc/kernel/syscalls/syscall.tbl|  4 +-
 arch/x86/entry/syscall_x32.c  |  2 +
 arch/x86/entry/syscalls/syscall_32.tbl|  4 +-
 arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 include/linux/compat.h|  8 ---
 include/uapi/asm-generic/unistd.h |  6 +-
 mm/process_vm_access.c| 69 ---
 tools/include/uapi/asm-generic/unistd.h   |  6 +-
 .../arch/powerpc/entry/syscalls/syscall.tbl   |  4 +-
 .../perf/arch/s390/entry/syscalls/syscall.tbl |  4 +-
 .../arch/x86/entry/syscalls/syscall_64.tbl|  4 +-
 17 files changed, 30 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 11dfae3a8563bd..0c280a05f699bf 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -763,9 +763,9 @@ __SYSCALL(__NR_sendmmsg, compat_sys_sendmmsg)
 #define __NR_setns 375
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_process_vm_readv 376
-__SYSCALL(__NR_process_vm_readv, compat_sys_process_vm_readv)
+__SYSCALL(__NR_process_vm_readv, sys_process_vm_readv)
 #define __NR_process_vm_writev 377
-__SYSCALL(__NR_process_vm_writev, compat_sys_process_vm_writev)
+__SYSCALL(__NR_process_vm_writev, sys_process_vm_writev)
 #define __NR_kcmp 378
 __SYSCALL(__NR_kcmp, sys_kcmp)
 #define __NR_finit_module 379
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 5a39d4de0ac85b..0bc2e0fcf1ee56 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -317,8 +317,8 @@
 306n32 syncfs  sys_syncfs
 307n32 sendmmsgcompat_sys_sendmmsg
 308n32 setns   sys_setns
-309n32 process_vm_readvcompat_sys_process_vm_readv
-310n32 process_vm_writev   compat_sys_process_vm_writev
+309n32 process_vm_readvsys_process_vm_readv
+310n32 process_vm_writev   sys_process_vm_writev
 311n32 kcmpsys_kcmp
 312n32 finit_modulesys_finit_module
 313n32 sched_setattr   sys_sched_setattr
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 136efc6b8c5444..b408c13b934296 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -356,8 +356,8 @@
 342o32 syncfs  sys_syncfs
 343o32 sendmmsgsys_sendmmsg
compat_sys_sendmmsg
 344o32 setns   sys_setns
-345o32 process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-346o32 process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+345o32 process_vm_readvsys_process_vm_readv
+346o32 process_vm_writev   sys_process_vm_writev
 347o32 kcmpsys_kcmp
 348o32 finit_modulesys_finit_module
 349o32 sched_setattr   sys_sched_setattr
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index a9e184192caedd..2015a5124b78ad 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -372,8 +372,8 @@
 327common  syncfs  sys_syncfs
 328common  setns   sys_setns
 329common  sendmmsgsys_sendmmsg
compat_sys_sendmmsg
-330common  process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
-331common  process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
+330common  process_vm_readvsys_process_vm_readv
+331common  process_vm_writev   sys_process_vm_writev
 332common  kcmpsys_kcmp
 333common  finit_modulesys_finit_module
 334common  sched_setattr   sys_sched_setattr
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 0d4985919ca34d..66a472aa635d3f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++

[PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c

2020-09-24 Thread Christoph Hellwig

From: David Laight 

This lets the compiler inline it into import_iovec() generating
much better code.

Signed-off-by: David Laight 
Signed-off-by: Christoph Hellwig 
---
 fs/read_write.c | 179 
 lib/iov_iter.c  | 176 +++
 2 files changed, 176 insertions(+), 179 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..e5e891a88442ef 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -752,185 +752,6 @@ static ssize_t do_loop_readv_writev(struct file *filp, 
struct iov_iter *iter,
return ret;
 }
 
-/**
- * rw_copy_check_uvector() - Copy an array of  iovec from userspace
- * into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- * either @fast_pointer, a newly allocated kernel array, or NULL,
- * depending on which array was used.
- *
- * This function copies an array of  iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
- unsigned long nr_segs, unsigned long fast_segs,
- struct iovec *fast_pointer,
- struct iovec **ret_pointer)
-{
-   unsigned long seg;
-   ssize_t ret;
-   struct iovec *iov = fast_pointer;
-
-   /*
-* SuS says "The readv() function *may* fail if the iovcnt argument
-* was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-* traditionally returned zero for zero segments, so...
-*/
-   if (nr_segs == 0) {
-   ret = 0;
-   goto out;
-   }
-
-   /*
-* First get the "struct iovec" from user memory and
-* verify all the pointers
-*/
-   if (nr_segs > UIO_MAXIOV) {
-   ret = -EINVAL;
-   goto out;
-   }
-   if (nr_segs > fast_segs) {
-   iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-   if (iov == NULL) {
-   ret = -ENOMEM;
-   goto out;
-   }
-   }
-   if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-   ret = -EFAULT;
-   goto out;
-   }
-
-   /*
-* According to the Single Unix Specification we should return EINVAL
-* if an element length is < 0 when cast to ssize_t or if the
-* total length would overflow the ssize_t return value of the
-* system call.
-*
-* Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
-* overflow case.
-*/
-   ret = 0;
-   for (seg = 0; seg < nr_segs; seg++) {
-   void __user *buf = iov[seg].iov_base;
-   ssize_t len = (ssize_t)iov[seg].iov_len;
-
-   /* see if we we're about to use an invalid len or if
-* it's about to overflow ssize_t */
-   if (len < 0) {
-   ret = -EINVAL;
-   goto out;
-   }
-   if (type >= 0
-   && unlikely(!access_ok(buf, len))) {
-   ret = -EFAULT;
-   goto out;
-   }
-   if (len > MAX_RW_COUNT - ret) {
-   len = MAX_RW_COUNT - ret;
-   iov[seg].iov_len = len;
-   }
-   ret += len;
-   }
-out:
-   *ret_pointer = iov;
-   return ret;
-}
-
-#ifdef CONFIG_COMPAT
-ssize_t compat_rw_copy_check_uvector(int type,
-   const struct compat_iovec __user *uvector, unsigned long 
nr_segs,
-   unsigned long fast_segs, struct iovec *fast_pointer,
-   struct iovec **ret_pointer)
-{
-   compat_ssize_t tot_len;
-   struct iovec *iov = *ret_pointer = fast_pointer;
-   ssize_t ret = 0;
-   int seg;
-
-   /*
-* SuS says

[PATCH 6/9] fs: remove the compat readv/writev syscalls

2020-09-24 Thread Christoph Hellwig

Now that import_iovec handles compat iovecs, the native readv and writev
syscalls can be used for the compat case as well.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/unistd32.h  |  4 ++--
 arch/mips/kernel/syscalls/syscall_n32.tbl  |  4 ++--
 arch/mips/kernel/syscalls/syscall_o32.tbl  |  4 ++--
 arch/parisc/kernel/syscalls/syscall.tbl|  4 ++--
 arch/powerpc/kernel/syscalls/syscall.tbl   |  4 ++--
 arch/s390/kernel/syscalls/syscall.tbl  |  4 ++--
 arch/sparc/kernel/syscalls/syscall.tbl |  4 ++--
 arch/x86/entry/syscall_x32.c   |  2 ++
 arch/x86/entry/syscalls/syscall_32.tbl |  4 ++--
 arch/x86/entry/syscalls/syscall_64.tbl |  4 ++--
 fs/read_write.c| 14 --
 include/linux/compat.h |  4 
 include/uapi/asm-generic/unistd.h  |  4 ++--
 tools/include/uapi/asm-generic/unistd.h|  4 ++--
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |  4 ++--
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|  4 ++--
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |  4 ++--
 17 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 734860ac7cf9d5..4a236493dca5b9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -301,9 +301,9 @@ __SYSCALL(__NR_flock, sys_flock)
 #define __NR_msync 144
 __SYSCALL(__NR_msync, sys_msync)
 #define __NR_readv 145
-__SYSCALL(__NR_readv, compat_sys_readv)
+__SYSCALL(__NR_readv, sys_readv)
 #define __NR_writev 146
-__SYSCALL(__NR_writev, compat_sys_writev)
+__SYSCALL(__NR_writev, sys_writev)
 #define __NR_getsid 147
 __SYSCALL(__NR_getsid, sys_getsid)
 #define __NR_fdatasync 148
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
b/arch/mips/kernel/syscalls/syscall_n32.tbl
index f9df9edb67a407..c99a92646f8ee9 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -25,8 +25,8 @@
 15 n32 ioctl   compat_sys_ioctl
 16 n32 pread64 sys_pread64
 17 n32 pwrite64sys_pwrite64
-18 n32 readv   compat_sys_readv
-19 n32 writev  compat_sys_writev
+18 n32 readv   sys_readv
+19 n32 writev  sys_writev
 20 n32 access  sys_access
 21 n32 pipesysm_pipe
 22 n32 _newselect  compat_sys_select
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 195b43cf27c848..075064d10661bf 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -156,8 +156,8 @@
 142o32 _newselect  sys_select  
compat_sys_select
 143o32 flock   sys_flock
 144o32 msync   sys_msync
-145o32 readv   sys_readv   
compat_sys_readv
-146o32 writev  sys_writev  
compat_sys_writev
+145o32 readv   sys_readv
+146o32 writev  sys_writev
 147o32 cacheflush  sys_cacheflush
 148o32 cachectlsys_cachectl
 149o32 sysmips __sys_sysmips
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
b/arch/parisc/kernel/syscalls/syscall.tbl
index def64d221cd4fb..192abde0001d9d 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -159,8 +159,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common  flock   sys_flock
 144common  msync   sys_msync
-145common  readv   sys_readv   
compat_sys_readv
-146common  writev  sys_writev  
compat_sys_writev
+145common  readv   sys_readv
+146common  writev  sys_writev
 147common  getsid  sys_getsid
 148common  fdatasync   sys_fdatasync
 149common  _sysctl sys_ni_syscall
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index c2d737ff2e7bec..6f1e2ecf0edad9 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -193,8 +193,8 @@
 142common  _newselect  sys_select  
compat_sys_select
 143common

[PATCH 3/9] iov_iter: refactor rw_copy_check_uvector and import_iovec

2020-09-24 Thread Christoph Hellwig

Split rw_copy_check_uvector into two new helpers with more sensible
calling conventions:

 - iovec_from_user copies a iovec from userspace either into the provided
   stack buffer if it fits, or allocates a new buffer for it.  Returns
   the actually used iovec.  It also verifies that iov_len does fit a
   signed type, and handles compat iovecs if the compat flag is set.
 - __import_iovec consolidates the native and compat versions of
   import_iovec. It calls iovec_from_user, then validates each iovec
   actually points to user addresses, and ensures the total length
   doesn't overflow.

This has two major implications:

 - the access_process_vm case loses the total lenght checking, which
   wasn't required anyway, given that each call receives two iovecs
   for the local and remote side of the operation, and it verifies
   the total length on the local side already.
 - instead of a single loop there now are two loops over the iovecs.
   Given that the iovecs are cache hot this doesn't make a major
   difference

Signed-off-by: Christoph Hellwig 
---
 include/linux/compat.h |   6 -
 include/linux/fs.h |  13 --
 include/linux/uio.h|  12 +-
 lib/iov_iter.c | 300 -
 mm/process_vm_access.c |  34 +++--
 5 files changed, 138 insertions(+), 227 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index 654c1ec36671a4..b930de791ff16b 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -451,12 +451,6 @@ extern long compat_arch_ptrace(struct task_struct *child, 
compat_long_t request,
 
 struct epoll_event;/* fortunately, this one is fixed-layout */
 
-extern ssize_t compat_rw_copy_check_uvector(int type,
-   const struct compat_iovec __user *uvector,
-   unsigned long nr_segs,
-   unsigned long fast_segs, struct iovec *fast_pointer,
-   struct iovec **ret_pointer);
-
 extern void __user *compat_alloc_user_space(unsigned long len);
 
 int compat_restore_altstack(const compat_stack_t __user *uss);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a082c..e69b45b6cc7b5f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -178,14 +178,6 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t 
offset,
 /* File supports async buffered reads */
 #define FMODE_BUF_RASYNC   ((__force fmode_t)0x4000)
 
-/*
- * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
- * that indicates that they should check the contents of the iovec are
- * valid, but not check the memory that the iovec elements
- * points too.
- */
-#define CHECK_IOVEC_ONLY -1
-
 /*
  * Attribute flags.  These should be or-ed together to figure out what
  * has been changed!
@@ -1887,11 +1879,6 @@ static inline int call_mmap(struct file *file, struct 
vm_area_struct *vma)
return file->f_op->mmap(file, vma);
 }
 
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
- unsigned long nr_segs, unsigned long fast_segs,
- struct iovec *fast_pointer,
- struct iovec **ret_pointer);
-
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 3835a8a8e9eae0..92c11fe41c6228 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -266,9 +266,15 @@ bool csum_and_copy_from_iter_full(void *addr, size_t 
bytes, __wsum *csum, struct
 size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp,
struct iov_iter *i);
 
-ssize_t import_iovec(int type, const struct iovec __user * uvector,
-unsigned nr_segs, unsigned fast_segs,
-struct iovec **iov, struct iov_iter *i);
+struct iovec *iovec_from_user(const struct iovec __user *uvector,
+   unsigned long nr_segs, unsigned long fast_segs,
+   struct iovec *fast_iov, bool compat);
+ssize_t import_iovec(int type, const struct iovec __user *uvec,
+unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+struct iov_iter *i);
+ssize_t __import_iovec(int type, const struct iovec __user *uvec,
+unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
+struct iov_iter *i, bool compat);
 
 #ifdef CONFIG_COMPAT
 struct compat_iovec;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ccea9db3f72be8..d5d8afe31fca16 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1650,107 +1651,133 @@ const void *dup_iter(struct iov_iter *new, struct 
iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
-/**
- * rw_copy_check_uvector() - Copy an array of  iovec from userspace
- *

let import_iovec deal with compat_iovecs as well v4

2020-09-24 Thread Christoph Hellwig

Hi Al,

this series changes import_iovec to transparently deal with compat iovec
structures, and then cleanups up a lot of code dupliation.

Changes since v3:
 - fix up changed prototypes in compat.h as well

Changes since v2:
 - revert the switch of the access process vm sysclls to iov_iter
 - refactor the import_iovec internals differently
 - switch aio to use __import_iovec

Changes since v1:
 - improve a commit message
 - drop a pointless unlikely
 - drop the PF_FORCE_COMPAT flag
 - add a few more cleanups (including two from David Laight)

Diffstat:
 arch/arm64/include/asm/unistd32.h  |   10 
 arch/mips/kernel/syscalls/syscall_n32.tbl  |   10 
 arch/mips/kernel/syscalls/syscall_o32.tbl  |   10 
 arch/parisc/kernel/syscalls/syscall.tbl|   10 
 arch/powerpc/kernel/syscalls/syscall.tbl   |   10 
 arch/s390/kernel/syscalls/syscall.tbl  |   10 
 arch/sparc/kernel/syscalls/syscall.tbl |   10 
 arch/x86/entry/syscall_x32.c   |5 
 arch/x86/entry/syscalls/syscall_32.tbl |   10 
 arch/x86/entry/syscalls/syscall_64.tbl |   10 
 block/scsi_ioctl.c |   12 
 drivers/scsi/sg.c  |9 
 fs/aio.c   |   38 --
 fs/io_uring.c  |   20 -
 fs/read_write.c|  362 +
 fs/splice.c|   57 ---
 include/linux/compat.h |   24 -
 include/linux/fs.h |   11 
 include/linux/uio.h|   10 
 include/uapi/asm-generic/unistd.h  |   12 
 lib/iov_iter.c |  161 +++--
 mm/process_vm_access.c |   85 
 net/compat.c   |4 
 security/keys/compat.c |   37 --
 security/keys/internal.h   |5 
 security/keys/keyctl.c |2 
 tools/include/uapi/asm-generic/unistd.h|   12 
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl |   10 
 tools/perf/arch/s390/entry/syscalls/syscall.tbl|   10 
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  |   10 
 30 files changed, 280 insertions(+), 706 deletions(-)

[PATCH 1/9] compat.h: fix a spelling error in

2020-09-24 Thread Christoph Hellwig

There is no compat_sys_readv64v2 syscall, only a compat_sys_preadv64v2
one.

Signed-off-by: Christoph Hellwig 
---
 include/linux/compat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index b354ce58966e2d..654c1ec36671a4 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -812,7 +812,7 @@ asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high, rwf_t flags);
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64V2
-asmlinkage long  compat_sys_readv64v2(unsigned long fd,
+asmlinkage long  compat_sys_preadv64v2(unsigned long fd,
const struct compat_iovec __user *vec,
unsigned long vlen, loff_t pos, rwf_t flags);
 #endif
-- 
2.28.0

Re: [PATCH 5/6] mmc: sdhci_am654: Add support for software tuning

2020-09-24 Thread Kishon Vijay Abraham I




On 23/09/20 4:22 pm, Faiz Abbas wrote:
> With the new SW tuning App note[1], a custom tuning algorithm is
> required for eMMC HS200, HS400 and SD card UHS modes. The algorithm
> involves running through the 32 possible input tap delay values and
> sending the appropriate tuning command (CMD19/21) for each of them
> to get a fail or pass result for each of the values. Typically, the
> range will have a small contiguous failing window. Considering the
> tuning range as a circular buffer, the algorithm then sets a final
> tuned value directly opposite to the failing window.
> 
> [1] https://www.ti.com/lit/pdf/spract9
> 
> Signed-off-by: Faiz Abbas 

Reviewed-by: Kishon Vijay Abraham I 
> ---
>  drivers/mmc/host/sdhci_am654.c | 41 ++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c
> index 1213b711e60a..5af7638ad606 100644
> --- a/drivers/mmc/host/sdhci_am654.c
> +++ b/drivers/mmc/host/sdhci_am654.c
> @@ -396,7 +396,46 @@ static u32 sdhci_am654_cqhci_irq(struct sdhci_host 
> *host, u32 intmask)
>   return 0;
>  }
>  
> +#define ITAP_MAX 32
> +static int sdhci_am654_platform_execute_tuning(struct sdhci_host *host,
> +u32 opcode)
> +{
> + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> + struct sdhci_am654_data *sdhci_am654 = sdhci_pltfm_priv(pltfm_host);
> + int cur_val, prev_val = 1, fail_len = 0, pass_window = 0, pass_len;
> + u32 itap;
> +
> + /* Enable ITAPDLY */
> + regmap_update_bits(sdhci_am654->base, PHY_CTRL4, ITAPDLYENA_MASK,
> +1 << ITAPDLYENA_SHIFT);
> +
> + for (itap = 0; itap < ITAP_MAX; itap++) {
> + sdhci_am654_write_itapdly(sdhci_am654, itap);
> +
> + cur_val = !mmc_send_tuning(host->mmc, opcode, NULL);
> + if (cur_val && !prev_val)
> + pass_window = itap;
> +
> + if (!cur_val)
> + fail_len++;
> +
> + prev_val = cur_val;
> + }
> + /*
> +  * Having determined the length of the failing window and start of
> +  * the passing window calculate the length of the passing window and
> +  * set the final value halfway through it considering the range as a
> +  * circular buffer
> +  */
> + pass_len = ITAP_MAX - fail_len;
> + itap = (pass_window + (pass_len >> 1)) % ITAP_MAX;
> + sdhci_am654_write_itapdly(sdhci_am654, itap);
> +
> + return 0;
> +}
> +
>  static struct sdhci_ops sdhci_am654_ops = {
> + .platform_execute_tuning = sdhci_am654_platform_execute_tuning,
>   .get_max_clock = sdhci_pltfm_clk_get_max_clock,
>   .get_timeout_clock = sdhci_pltfm_clk_get_max_clock,
>   .set_uhs_signaling = sdhci_set_uhs_signaling,
> @@ -426,6 +465,7 @@ static const struct sdhci_am654_driver_data 
> sdhci_am654_drvdata = {
>  };
>  
>  static struct sdhci_ops sdhci_j721e_8bit_ops = {
> + .platform_execute_tuning = sdhci_am654_platform_execute_tuning,
>   .get_max_clock = sdhci_pltfm_clk_get_max_clock,
>   .get_timeout_clock = sdhci_pltfm_clk_get_max_clock,
>   .set_uhs_signaling = sdhci_set_uhs_signaling,
> @@ -449,6 +489,7 @@ static const struct sdhci_am654_driver_data 
> sdhci_j721e_8bit_drvdata = {
>  };
>  
>  static struct sdhci_ops sdhci_j721e_4bit_ops = {
> + .platform_execute_tuning = sdhci_am654_platform_execute_tuning,
>   .get_max_clock = sdhci_pltfm_clk_get_max_clock,
>   .get_timeout_clock = sdhci_pltfm_clk_get_max_clock,
>   .set_uhs_signaling = sdhci_set_uhs_signaling,
>

[stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-24 Thread Naresh Kamboju

>From stable rc 4.18.1 onwards to today's stable rc 4.19.147

There are two problems  while running LTP tracing tests
1) kernel panic  on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 [2]
Please refer to the full test logs from below links.

The first bad commit found by git bisect.
   commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
   tracing: Centralize preemptirq tracepoints and unify their usage

Reported-by: Naresh Kamboju 

easily reproducible on qemu
steps to reproduce:
# Boot qemu x86_64 with trace configs enabled.
# cd /opt/ltp
# ./runltp -f tracing

metadata:
  git branch: linux-4.19.y
  git repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
  make_kernelversion: 4.19.147
  kernel-config:
https://builds.tuxbuild.com/lOpUmeYR2e1pzvYdlLgGqw/kernel.config


Crash log on qemu_i386
-

ftrace-stress-test 1 TINFO: Start pid15=2414
/opt/ltp/testcases/bin/ftrace_stress/ftrace_buffer_size_kb.sh
ftrace-stress-test 1 TINFO: Start pid16=2415
/opt/ltp/testcases/bin/ftrace_stress/ftrace_tracing_cpumask.sh
ftrace-stress-test 1 TINFO: Start pid17=2416
/opt/ltp/testcases/bin/ftrace_stress/ftrace_set_ftrace_filter.sh
[   38.479869] Scheduler tracepoints stat_sleep, stat_iowait,
stat_blocked and stat_runtime require the kernel parameter
schedstats=enable or kernel.sched_schedstats=1
Sep 23 18:39:40 intel-core2-32 user.warn kernel: [   38.479869]
Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and
stat_runtime require the kernel parameter schedstats=enable or
kernel.sched_schedstats=1
[   38.549712] cat[2583]: segfault at 0 ip b7f81767 sp bfbb3a20 error
4 in ld-2.27.so[b7f6c000+25000]
[   38.550427] sh[2582]: segfault at 467 ip b7fba0d8 sp bfacdb04 error
4 in ld-2.27.so[b7f9f000+25000]
[   38.551386] Code: 50 8d 86 84 62 ff ff 50 e8 86 a9 ff ff 83 c4 10
89 c2 83 f8 ff 0f 84 72 01 00 00 8b b6 e4 08 00 00 83 fe 10 0f 86 56
01 00 00 <81> 38 6c 64 2e 73 0f 85 1d 01 00 00 81 78 04 6f 2d 31 2e 0f
85 10
[   38.552710] Code: 40 38 d5 74 ea 80 fd 00 74 12 c1 e9 10 40 38 d1
74 dd 80 f9 00 74 05 40 38 d5 74 d3 31 c0 eb cf 66 90 8b 4c 24 04 8b
54 24 08 <8a> 01 3a 02 75 09 41 42 84 c0 75 f4 31 c0 c3 b8 01 00 00 00
b9 ff
[   38.556010] systemd-journal[1327]: segfault at 5e ip b7c61e12 sp
bff45044 error 6 in libc-2.27.so[b7b29000+1cc000]
[   38.558971] sh[2584]: segfault at 0 ip b7f30c15 sp bfbce710 error 14
[   38.559387] audit: type=1701 audit(1600886380.372:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2582
comm=\"sh\" exe=\"/bin/bash.bash\" sig=11 res=1
[   38.559411] audit: type=1701 audit(1600886380.372:4):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2583
comm=\"cat\" exe=\"/bin/cat.coreutils\" sig=11 res=1
[   38.560079] Code: 66 0f 7f 5c 3a f0 72 30 66 0f 6f 54 38 10 83 e9
20 66 0f 6f 5c 38 20 66 0f 6f cb 66 0f 3a 0f da 08 66 0f 3a 0f d4 08
8d 7f 20 <66> 0f 7f 54 3a e0 66 0f 7f 5c 3a f0 73 a0 8d 49 20 01 cf 01
fa 8d
[   38.560811] audit: type=1701 audit(1600886380.373:5):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1327
comm=\"systemd-journal\" exe=\"/lib/systemd/systemd-journald\" sig=11
res=1
[   38.561615] Code: Bad RIP value.
[   38.564712] Core dump to |/bin/false pipe failed
[   38.566144] Core dump to |/bin/false pipe failed
[   38.566213] Core dump to |/bin/false pipe failed
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.549712]
cat[2583]: segfault at 0 ip b7f81767 sp bfbb3a20 error 4 in
ld-2.27.so[b7f6c000+25000]
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.550427]
sh[2582]: segfault at 467 ip b7fba0d8 sp bfacdb04 error 4 in
ld-2.27.so[b7f9f000+25000]
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.551386] Code:
50 8d 86 84 62 ff ff 50 e8 86 a9 ff ff 83 c4 10 89 c2 83 f8 ff 0f 84
72 01 00 00 8b b6 e4 08 00 00 83 fe 10 0f 86 56 01 00 00 <81> 38 6c 64
2e 73 0f 85 1d 01 00 00 81 78 04 6f 2d 31 2e 0f 85 10
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.552710] Code:
40 38 d5 74 ea 80 fd 00 74 12 c1 e9 10 40 38 d1 74 dd 80 f9 00 74 05
40 38 d5 74 d3 31 c0 eb cf 66 90 8b 4c 24 04 8b 54 24 08 <8a> 01 3a 02
75 09 41 42 84 c0 75[   38.582519] systemd[1]: segfault at 1 ip
b7de036e sp bfd888e0 error 7 in
libsystemd-shared-237.so[b7cd4000+1e2000]
 f4 31 c0 c3 b8 [   38.584227] Code: 46 18 83 e0 1f 83 c8 20 88 46 18
89 f0 e8 ba da ff ff 85 c0 89 c3 0f 88 e0 00 00 00 8b 44 24 24 31 db
85 c0 74 06 8b 44 24 24 <89> 30 83 c4 0c 89 d8 5b 5e 5f 5d c3 8d b6 00
00 00 00 8d 83 1c 1c
01 00 00 00 b9 ff
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.556010]
systemd-journal[1327]: segfau[   38.587783] systemd[1]: segfault at 0
ip b7a9fbe3 sp bfd88000 error 7 in libc-2.27.so[b79e5000+1cc000]
lt at 5e ip b7c6[   38.589349] Code: 14 8b 4c 24 10 8b 5c 24 0c b8 72
00 00 00 65 ff 15 10 00 00 00 5b 5e 3d 01 f0 ff ff 0f 83 75 e3 f5 ff
c3 66 90 66 90 55 57 56 <53> 83 ec 1c 8b 5c 24 30 8b 4c 24 34 8b 54 24
38 8b

[PATCH] xfrm: Use correct address family in xfrm_state_find

2020-09-24 Thread Herbert Xu

Resend with proper subject.
 
---8<---
The struct flowi must never be interpreted by itself as its size
depends on the address family.  Therefore it must always be grouped
with its original family value.

In this particular instance, the original family value is lost in
the function xfrm_state_find.  Therefore we get a bogus read when
it's coupled with the wrong family which would occur with inter-
family xfrm states.

This patch fixes it by keeping the original family value.

Note that the same bug could potentially occur in LSM through
the xfrm_state_pol_flow_match hook.  I checked the current code
there and it seems to be safe for now as only secid is used which
is part of struct flowi_common.  But that API should be changed
so that so that we don't get new bugs in the future.  We could
do that by replacing fl with just secid or adding a family field.

Reported-by: syzbot+577fbac3145a6eb2e...@syzkaller.appspotmail.com
Fixes: 48b8d78315bf ("[XFRM]: State selection update to use inner...")
Signed-off-by: Herbert Xu 

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 69520ad3d83b..9b5f2c2b9770 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1019,7 +1019,8 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, 
struct xfrm_state *x,
 */
if (x->km.state == XFRM_STATE_VALID) {
if ((x->sel.family &&
-!xfrm_selector_match(>sel, fl, x->sel.family)) ||
+(x->sel.family != family ||
+ !xfrm_selector_match(>sel, fl, family))) ||
!security_xfrm_state_pol_flow_match(x, pol, fl))
return;
 
@@ -1032,7 +1033,9 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, 
struct xfrm_state *x,
*acq_in_progress = 1;
} else if (x->km.state == XFRM_STATE_ERROR ||
   x->km.state == XFRM_STATE_EXPIRED) {
-   if (xfrm_selector_match(>sel, fl, x->sel.family) &&
+   if ((!x->sel.family ||
+(x->sel.family == family &&
+ xfrm_selector_match(>sel, fl, family))) &&
security_xfrm_state_pol_flow_match(x, pol, fl))
*error = -ESRCH;
}
@@ -1072,7 +1075,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const 
xfrm_address_t *saddr,
tmpl->mode == x->props.mode &&
tmpl->id.proto == x->id.proto &&
(tmpl->id.spi == x->id.spi || !tmpl->id.spi))
-   xfrm_state_look_at(pol, x, fl, encap_family,
+   xfrm_state_look_at(pol, x, fl, family,
   , _in_progress, );
}
if (best || acquire_in_progress)
@@ -1089,7 +1092,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const 
xfrm_address_t *saddr,
tmpl->mode == x->props.mode &&
tmpl->id.proto == x->id.proto &&
(tmpl->id.spi == x->id.spi || !tmpl->id.spi))
-   xfrm_state_look_at(pol, x, fl, encap_family,
+   xfrm_state_look_at(pol, x, fl, family,
   , _in_progress, );
}
 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

linux-next: build failure after merge of the mmc tree

2020-09-24 Thread Stephen Rothwell

Hi all,

After merging the mmc tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

drivers/mmc/host/davinci_mmc.c: In function 'mmc_davinci_set_ios':
drivers/mmc/host/davinci_mmc.c:696:19: error: 'mmc' redeclared as different 
kind of symbol
  696 |  struct mmc_host *mmc = mmc_from_priv(host);
  |   ^~~
drivers/mmc/host/davinci_mmc.c:691:50: note: previous definition of 'mmc' was 
here
  691 | static void mmc_davinci_set_ios(struct mmc_host *mmc, struct mmc_ios 
*ios)
  | ~^~~

Caused by commit

  359c6349a771 ("mmc: davinci: Drop pointer to mmc_host from mmc_davinci_host")

I have used the mmc tree from next-20200924 for today.

-- 
Cheers,
Stephen Rothwell


pgpfZImOil3b3.pgp
Description: OpenPGP digital signature

Re: [PATCH v2 1/5] docs: cdomain.py: add support for two new Sphinx 3.1+ tags

2020-09-24 Thread Mauro Carvalho Chehab

Em Thu, 24 Sep 2020 10:24:39 -0600
Jonathan Corbet  escreveu:

> On Thu, 24 Sep 2020 18:21:45 +0200
> Mauro Carvalho Chehab  wrote:
> 
> > As part of changing the documentation subsystem to properly
> > build with Sphinx 3.1+, add support for two tags:
> > 
> > - :c:expr:`foo`
> > - .. c:namespace::"  
> 
> So is there a reason we need :c:expr: ?  What does it add for us?

Good point. This came from a suggestion from the Sphinx issue.

The :c:expr: actually helped to identify two wrong declarations at the
DVB uAPI docs[1] but in practice it doesn't do much, except
by using a different font like placing "struct" in italics.

I was expecting it to also create cross-references to the structs
mentioned there, but, at least on my tests, it didn't work like that.

Anyway, I guess we can just get rid of it.

I'll send a v3 series without that.

Thanks,
Mauro

Re: [PATCH] mtd: spi-nor: controllers: intel-spi: Add support for command line partitions

2020-09-24 Thread Vignesh Raghavendra




On 4/17/20 9:37 PM, Mika Westerberg wrote:
> On Fri, Apr 17, 2020 at 08:26:11AM -0700, Ronald G. Minnich wrote:
>> On Intel platforms, the usable SPI area is located several
>> MiB in from the start, to leave room for descriptors and
>> the Management Engine binary. Further, not all the remaining
>> space can be used, as the last 16 MiB contains firmware.
>>
>> To make the SPI usable for mtdblock and other devices,
>> it is necessary to enable command line partitions so the
>> middle usable region can be specified.
>>
>> Add a part_probes array which includes only "cmdelineparts",
>> and change to mtd_device_parse_register to use this part_probes.
>>
>> Signed-off-by: Ronald G. Minnich 
> 
> Reviewed-by: Mika Westerberg 
> 

scripts/checkpatch.pl --strict complains:

CHECK: Alignment should match open parenthesis
#46: FILE: drivers/mtd/spi-nor/controllers/intel-spi.c:956:
+   ret = mtd_device_parse_register(>nor.mtd, part_probes,
+  NULL, , 1);

WARNING: Missing Signed-off-by: line by nominal patch author '"Ronald G. 
Minnich" '

Regards
Vignesh

Re: a saner API for allocating DMA addressable pages v3

2020-09-24 Thread Christoph Hellwig

This is in dma-mapping for-next now.

Re: [PATCH 1/4] ARM/omap1: switch to use dma_direct_set_offset for lbus DMA offsets

2020-09-24 Thread Christoph Hellwig

On Mon, Sep 21, 2020 at 09:44:18AM +0300, Tony Lindgren wrote:
> > > Looks nice to me :) I still can't test this probably for few more weeks
> > > though but hopefully Aaro or Janusz (Added to Cc) can test it.
> > 
> > Works for me on Amstrad Delta (tested with a USB ethernet adapter).
> > 
> > Tested-by: Janusz Krzysztofik 
> 
> Great, good to hear! And thanks for testing it.
> 
> Christoph, feel free to queue this along with the other patches:
> 
> Acked-by: Tony Lindgren 
> 
> Or let me know if you want me to pick it up.

I've pulled it in now that the other patches are unlikely to be
tested in time for 5.10.

mmotm 2020-09-24-21-14 uploaded

2020-09-24 Thread akpm

The mm-of-the-moment snapshot 2020-09-24-21-14 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (5.x
or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

https://github.com/hnaz/linux-mm

The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is also available at

https://github.com/hnaz/linux-mm



This mmotm tree contains the following patches against 5.9-rc6:
(patches marked "*" will be included in linux-next)

  origin.patch
* mm-thp-swap-fix-allocating-cluster-for-swapfile-by-mistake.patch
* mm-memcontrol-fix-missing-suffix-of-workingset_restore.patch
* mm-gup-fix-gup_fast-with-dynamic-page-table-folding.patch
* mm-validate-pmd-after-splitting.patch
* mm-migrate-correct-thp-migration-stats.patch
* mm-swapfile-avoid-split_swap_cluster-null-pointer-dereference.patch
* lib-stringc-implement-stpcpy.patch
* lib-include-memregionh-in-memregionc.patch
* arch-x86-lib-usercopy_64c-fix-__copy_user_flushcache-cache-writeback.patch
* mm-replace-memmap_context-by-meminit_context.patch
* mm-dont-rely-on-system-state-to-detect-hot-plug-operations.patch
* checkpatch-test-git_dir-changes.patch
* proc-kpageflags-prevent-an-integer-overflow-in-stable_page_flags.patch
* proc-kpageflags-do-not-use-uninitialized-struct-pages.patch
* 
mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged.patch
* compiler-clang-add-build-check-for-clang-1001.patch
* revert-kbuild-disable-clangs-default-use-of-fmerge-all-constants.patch
* revert-arm64-bti-require-clang-=-1001-for-in-kernel-bti-support.patch
* revert-arm64-vdso-fix-compilation-with-clang-older-than-8.patch
* 
partially-revert-arm-8905-1-emit-__gnu_mcount_nc-when-using-clang-1000-or-newer.patch
* kasan-remove-mentions-of-unsupported-clang-versions.patch
* compiler-gcc-improve-version-error.patch
* ntfs-add-check-for-mft-record-size-in-superblock.patch
* fs-ocfs2-delete-repeated-words-in-comments.patch
* ocfs2-clear-links-count-in-ocfs2_mknod-if-an-error-occurs.patch
* ocfs2-fix-ocfs2-corrupt-when-iputting-an-inode.patch
* ramfs-support-o_tmpfile.patch
* fs-xattrc-fix-kernel-doc-warnings-for-setxattr-removexattr.patch
* kernel-watchdog-flush-all-printk-nmi-buffers-when-hardlockup-detected.patch
  mm.patch
* mm-slabc-clean-code-by-removing-redundant-if-condition.patch
* include-linux-slabh-fix-a-typo-error-in-comment.patch
* mm-slub-branch-optimization-in-free-slowpath.patch
* mm-slub-fix-missing-alloc_slowpath-stat-when-bulk-alloc.patch
* mm-slub-make-add_full-condition-more-explicit.patch
* mm-kmemleak-rely-on-rcu-for-task-stack-scanning.patch
* x86-numa-cleanup-configuration-dependent-command-line-options.patch
* x86-numa-add-nohmat-option.patch
* x86-numa-add-nohmat-option-fix.patch
* efi-fake_mem-arrange-for-a-resource-entry-per-efi_fake_mem-instance.patch
* acpi-hmat-refactor-hmat_register_target_device-to-hmem_register_device.patch
* 
acpi-hmat-refactor-hmat_register_target_device-to-hmem_register_device-fix.patch
* resource-report-parent-to-walk_iomem_res_desc-callback.patch
* mm-memory_hotplug-introduce-default-phys_to_target_node-implementation.patch
* 
mm-memory_hotplug-introduce-default-phys_to_target_node-implementation-fix.patch
* acpi-hmat-attach-a-device-for-each-soft-reserved-range.patch
* acpi-hmat-attach-a-device-for-each-soft-reserved-range-fix.patch
* device-dax-drop-the-dax_regionpfn_flags-attribute.patch
* device-dax-move-instance-creation-parameters-to-struct-dev_dax_data.patch
* device-dax-make-pgmap-optional-for-instance-creation.patch
* device-dax-make-pgmap-optional-for-instance-creation-fix.patch
* device-dax-kill-dax_kmem_res.patch
* device-dax-add-an-allocation-interface-for-device-dax-instances.patch
* device-dax-introduce-seed-devices.patch
*

Re: [PATCH 1/2] docs: cdomain.py: add support for two new Sphinx 3.1+ tags

2020-09-24 Thread Mauro Carvalho Chehab

Em Thu, 24 Sep 2020 10:22:25 -0600
Jonathan Corbet  escreveu:

> On Thu, 24 Sep 2020 18:13:54 +0200
> Mauro Carvalho Chehab  wrote:
> 
> > > How can this possibly work without a "global namespace" declaration in
> > > markup_namespace()?
> > 
> > ... While I'm not a python expert, the namespace variable is global
> > because it was defined outside the "markup_namespace" function.  
> 
> Assignments within functions are *always* local unless declared global.
> 
> Try this:
> 
>   $ python3
>   >>> x = 0
>   >>> def y(v):
>   >>> x = v
>   >>>
>   >>> y(1)
>   >>> x  
>   0
>   >>>  
> 
> So your assignment to "namespace" in markup_namespace() cannot change the
> global, since it's not declared global.

Ok! Thanks for helping with this. I'll declare namespace as global for
the next version.

Thanks,
Mauro

Re: [PATCH 3/3] dma-mapping: better document dma_addr_t and DMA_MAPPING_ERROR

2020-09-24 Thread 'Christoph Hellwig'

On Tue, Sep 22, 2020 at 01:56:46PM +, David Laight wrote:
> > +/*
> > + * A dma_addr_t can hold any valid DMA or bus address for the platform.  
> > It can
> > + * be given to a device to use as a DMA source or target.  A CPU cannot
> > + * reference a dma_addr_t directly because there may be translation 
> > between its
> > + * physical address space and the bus address space.
> 
> It can't access it 'directly' because it isn't a virtual address
> 
> > + *
> > + * DMA_MAPPING_ERROR is the magic error code if a mapping failed.  It 
> > should not
> > + * be used directly in drivers, but checked for using dma_mapping_error()
> > + * instead.
> > + */
> 
> I think it might be worth adding:
> 
> A dma_addr_t value may be device dependant and differ from the
> 'physical address' of the memory.

This is what I've committed:

 * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
 * be given to a device to use as a DMA source or target.  It is specific to a
 * given device and there may be a translation between the CPU physical address
 * space and the bus address space.

Re: [PATCH v3] e1000e: Increase iteration on polling MDIC ready bit

2020-09-24 Thread Kai-Heng Feng




> On Sep 25, 2020, at 03:57, Andrew Lunn  wrote:
> 
> On Fri, Sep 25, 2020 at 12:45:42AM +0800, Kai-Heng Feng wrote:
>> We are seeing the following error after S3 resume:
>> [  704.746874] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.844232] e1000e :00:1f.6 eno1: MDI Write did not complete
>> [  704.902817] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903075] e1000e :00:1f.6 eno1: reading PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.903281] e1000e :00:1f.6 eno1: Setting page 0x6020
>> [  704.903486] e1000e :00:1f.6 eno1: writing PHY page 769 (or 0x6020 
>> shifted) reg 0x17
>> [  704.943155] e1000e :00:1f.6 eno1: MDI Error
>> ...
>> [  705.108161] e1000e :00:1f.6 eno1: Hardware Error
>> 
>> As Andrew Lunn pointed out, MDIO has nothing to do with phy, and indeed
>> increase polling iteration can resolve the issue.
>> 
>> The root cause is quite likely Intel ME, since it's a blackbox to the
>> kernel so the only approach we can take is to be patient and wait
>> longer.
> 
> Please could you explain how you see Intel ME being responsible for
> this. I'm not convinced.

Some other occurrences:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d17c7868b2f8e329dcee4ecd2f5d16cfc9b26ac8
https://lore.kernel.org/netdev/20200323191639.48826-1-aaron...@canonical.com/

Of course we need an ACK from Intel this one is also related to ME.

Kai-Heng

> 
>  Andrew

PING: [PATCH] block: add io_error stat for block device

2020-09-24 Thread zhenwei pi


Hi, Jens

How do you think about error stat of a block device?

On 9/10/20 10:20 AM, zhenwei pi wrote:

Currently if hitting block req error, block layer only prints error
log with a rate limitation. Then agent has to parse kernel log to
record what happens.

In this patch, add read/write/discard/flush stat counter to record
io errors.

Signed-off-by: zhenwei pi 
---
  block/blk-core.c  | 14 +++---
  block/genhd.c | 19 +++
  include/linux/part_stat.h |  1 +
  3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 10c08ac50697..8f1424835700 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1558,9 +1558,17 @@ bool blk_update_request(struct request *req, 
blk_status_t error,
req->q->integrity.profile->complete_fn(req, nr_bytes);
  #endif
  
-	if (unlikely(error && !blk_rq_is_passthrough(req) &&

-!(req->rq_flags & RQF_QUIET)))
-   print_req_error(req, error, __func__);
+   if (unlikely(error && !blk_rq_is_passthrough(req))) {
+   if (op_is_flush(req_op(req)))
+   part_stat_inc(>rq_disk->part0,
+   io_errors[STAT_FLUSH]);
+   else
+   part_stat_inc(>rq_disk->part0,
+   io_errors[op_stat_group(req_op(req))]);
+
+   if (!(req->rq_flags & RQF_QUIET))
+   print_req_error(req, error, __func__);
+   }
  
  	blk_account_io_completion(req, nr_bytes);
  
diff --git a/block/genhd.c b/block/genhd.c

index 99c64641c314..852035095485 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -104,6 +104,7 @@ static void part_stat_read_all(struct hd_struct *part, 
struct disk_stats *stat)
stat->sectors[group] += ptr->sectors[group];
stat->ios[group] += ptr->ios[group];
stat->merges[group] += ptr->merges[group];
+   stat->io_errors[group] += ptr->io_errors[group];
}
  
  		stat->io_ticks += ptr->io_ticks;

@@ -1374,6 +1375,22 @@ static ssize_t disk_discard_alignment_show(struct device 
*dev,
return sprintf(buf, "%d\n", queue_discard_alignment(disk->queue));
  }
  
+static ssize_t io_error_show(struct device *dev,

+ struct device_attribute *attr, char *buf)
+{
+   struct hd_struct *p = dev_to_part(dev);
+   struct disk_stats stat;
+
+   part_stat_read_all(p, );
+
+   return sprintf(buf,
+   "%8lu %8lu %8lu %8lu\n",
+   stat.io_errors[STAT_READ],
+   stat.io_errors[STAT_WRITE],
+   stat.io_errors[STAT_DISCARD],
+   stat.io_errors[STAT_FLUSH]);
+}
+
  static DEVICE_ATTR(range, 0444, disk_range_show, NULL);
  static DEVICE_ATTR(ext_range, 0444, disk_ext_range_show, NULL);
  static DEVICE_ATTR(removable, 0444, disk_removable_show, NULL);
@@ -1386,6 +1403,7 @@ static DEVICE_ATTR(capability, 0444, 
disk_capability_show, NULL);
  static DEVICE_ATTR(stat, 0444, part_stat_show, NULL);
  static DEVICE_ATTR(inflight, 0444, part_inflight_show, NULL);
  static DEVICE_ATTR(badblocks, 0644, disk_badblocks_show, 
disk_badblocks_store);
+static DEVICE_ATTR(io_error, 0444, io_error_show, NULL);
  
  #ifdef CONFIG_FAIL_MAKE_REQUEST

  ssize_t part_fail_show(struct device *dev,
@@ -1437,6 +1455,7 @@ static struct attribute *disk_attrs[] = {
  #ifdef CONFIG_FAIL_IO_TIMEOUT
_attr_fail_timeout.attr,
  #endif
+   _attr_io_error.attr,
NULL
  };
  
diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h

index 24125778ef3e..4fe3836d2308 100644
--- a/include/linux/part_stat.h
+++ b/include/linux/part_stat.h
@@ -9,6 +9,7 @@ struct disk_stats {
unsigned long sectors[NR_STAT_GROUPS];
unsigned long ios[NR_STAT_GROUPS];
unsigned long merges[NR_STAT_GROUPS];
+   unsigned long io_errors[NR_STAT_GROUPS];
unsigned long io_ticks;
local_t in_flight[2];
  };



--
zhenwei pi

Re: [PATCH 1/2] mips: Add strong UC ordering config

2020-09-24 Thread Jiaxun Yang





在 2020/9/20 19:00, Serge Semin 写道:

In accordance with [1, 2] memory transactions using CCA=2 (Uncached
Cacheability and Coherency Attribute) are always strongly ordered. This
means the younger memory accesses using CCA=2 are never allowed to be
executed before older memory accesses using CCA=2 (no bypassing is
allowed), and Loads and Stores using CCA=2 are never speculative. It is
expected by the specification that the rest of the system maintains these
properties for processor initiated uncached accesses. So the system IO
interconnect doesn't reorder uncached transactions once they have left the
processor subsystem. Taking into account these properties and what [3]
says about the relaxed IO-accessors we can infer that normal Loads and
Stores from/to CCA=2 memory and without any additional execution barriers
will fully comply with the {read,write}X_relaxed() methods requirements.

Let's convert then currently generated relaxed IO-accessors to being pure
Loads and Stores. Seeing the commit 3d474dacae72 ("MIPS: Enforce strong
ordering for MMIO accessors") and commit 8b656253a7a4 ("MIPS: Provide
actually relaxed MMIO accessors") have already made a preparation in the
corresponding macro, we can do that just by replacing the "barrier"
parameter utilization with the "relax" one. Note the "barrier" macro
argument can be removed, since it isn't fully used anyway other than being
always assigned to 1.

Of course it would be fullish to believe that all the available MIPS-based
CPUs completely follow the denoted specification, especially considering
how old the architecture is. Instead we introduced a dedicated kernel
config, which when enabled will convert the relaxed IO-accessors to being
pure Loads and Stores without any additional barriers around. So if some
CPU supports the strongly ordered UC memory access, it can enable that
config and use a fully optimized relaxed IO-methods. For instance,
Baikal-T1 architecture support code will do that.

[1] MIPS Coherence Protocol Specification, Document Number: MD00605,
 Revision 01.01. September 14, 2015, 4.2 Execution Order Behavior,
 p. 33

[2] MIPS Coherence Protocol Specification, Document Number: MD00605,
 Revision 01.01. September 14, 2015, 4.8.1 IO Device Access, p. 58

[3] "LINUX KERNEL MEMORY BARRIERS", Documentation/memory-barriers.txt,
 Section "KERNEL I/O BARRIER EFFECTS"

Signed-off-by: Serge Semin 
Cc: Maciej W. Rozycki 

Reviewed-by: Jiaxun Yang 


Based on #mipslinus discussions, I suspect this option can be selected by
most modern MIPS processors including all IMG/MTI cores,
Ingenic and Loongson.

Thanks.

- Jiaxun


---
  arch/mips/Kconfig  |  8 
  arch/mips/include/asm/io.h | 20 ++--
  2 files changed, 18 insertions(+), 10 deletions(-)

[v3] cpuidle: add riscv cpuidle driver

2020-09-24 Thread liush

This patch adds a simple cpuidle driver for RISC-V systems using
the WFI state. Other states will be supported in the future.

Reported-by: kernel test robot 
Signed-off-by: liush 
---
Changes in v3:
- fix the issue reported by kernel test robot
  "drivers/cpuidle/cpuidle-riscv.c:22:12: warning: no previous prototype 
   for 'riscv_low_level_suspend_enter' [-Wmissing-prototypes]"
Changes in v2:
- call "mb()" before run "WFI" in cpu_do_idle 
- modify commit description 
- place "select CPU_IDLE" in alphabetical order  
- replace "__asm__ __volatile__ ("wfi")" with "wait_for_interrupt()" 
- delete "cpuidle.c",move "cpu_do_idle()" to cpuidle.h 
- modify "arch_cpu_idle", "cpu_do_idle" can be called by 
  "arch_cpu_idle"
- fix space/tab issues
- modify riscv_low_level_suspend_enter to __weak mode

 arch/riscv/Kconfig   |  7 +
 arch/riscv/include/asm/cpuidle.h | 16 
 arch/riscv/kernel/process.c  |  3 ++-
 drivers/cpuidle/Kconfig  |  5 
 drivers/cpuidle/Kconfig.riscv| 11 
 drivers/cpuidle/Makefile |  4 +++
 drivers/cpuidle/cpuidle-riscv.c  | 55 
 7 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/include/asm/cpuidle.h
 create mode 100644 drivers/cpuidle/Kconfig.riscv
 create mode 100644 drivers/cpuidle/cpuidle-riscv.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index df18372..799bf86 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -33,6 +33,7 @@ config RISCV
select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
select CLONE_BACKWARDS
select COMMON_CLK
+   select CPU_IDLE
select EDAC_SUPPORT
select GENERIC_ARCH_TOPOLOGY if SMP
select GENERIC_ATOMIC64 if !64BIT
@@ -407,6 +408,12 @@ config BUILTIN_DTB
depends on RISCV_M_MODE
depends on OF
 
+menu "CPU Power Management"
+
+source "drivers/cpuidle/Kconfig"
+
+endmenu
+
 menu "Power management options"
 
 source "kernel/power/Kconfig"
diff --git a/arch/riscv/include/asm/cpuidle.h b/arch/riscv/include/asm/cpuidle.h
new file mode 100644
index ..599b810
--- /dev/null
+++ b/arch/riscv/include/asm/cpuidle.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __RISCV_CPUIDLE_H
+#define __RISCV_CPUIDLE_H
+
+static inline void cpu_do_idle(void)
+{
+   /*
+* Add mb() here to ensure that all
+* IO/MEM access are completed prior
+* to enter WFI.
+*/
+   mb();
+   wait_for_interrupt();
+}
+
+#endif
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 2b97c49..5431aaa 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 register unsigned long gp_in_global __asm__("gp");
 
@@ -35,7 +36,7 @@ extern asmlinkage void ret_from_kernel_thread(void);
 
 void arch_cpu_idle(void)
 {
-   wait_for_interrupt();
+   cpu_do_idle();
local_irq_enable();
 }
 
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index c0aeedd..f6be0fd 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -62,6 +62,11 @@ depends on PPC
 source "drivers/cpuidle/Kconfig.powerpc"
 endmenu
 
+menu "RISCV CPU Idle Drivers"
+depends on RISCV
+source "drivers/cpuidle/Kconfig.riscv"
+endmenu
+
 config HALTPOLL_CPUIDLE
tristate "Halt poll cpuidle driver"
depends on X86 && KVM_GUEST
diff --git a/drivers/cpuidle/Kconfig.riscv b/drivers/cpuidle/Kconfig.riscv
new file mode 100644
index ..7bec059
--- /dev/null
+++ b/drivers/cpuidle/Kconfig.riscv
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# RISCV CPU Idle drivers
+#
+config RISCV_CPUIDLE
+   bool "Generic RISCV CPU idle Driver"
+   select DT_IDLE_STATES
+   select CPU_IDLE_MULTIPLE_DRIVERS
+   help
+ Select this option to enable generic cpuidle driver for RISCV.
+ Now only support C0 State.
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 26bbc5e..4c83c4e 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -34,3 +34,7 @@ obj-$(CONFIG_MIPS_CPS_CPUIDLE)+= cpuidle-cps.o
 # POWERPC drivers
 obj-$(CONFIG_PSERIES_CPUIDLE)  += cpuidle-pseries.o
 obj-$(CONFIG_POWERNV_CPUIDLE)  += cpuidle-powernv.o
+
+###
+# RISCV drivers
+obj-$(CONFIG_RISCV_CPUIDLE)+= cpuidle-riscv.o
diff --git a/drivers/cpuidle/cpuidle-riscv.c b/drivers/cpuidle/cpuidle-riscv.c
new file mode 100644
index ..5dddcfa
--- /dev/null
+++ b/drivers/cpuidle/cpuidle-riscv.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * RISC-V CPU idle driver.
+ *
+ * Copyright (C) 2020-2022 Allwinner Ltd
+ *
+ * Based on code - driver/cpuidle/cpuidle-at91.c
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+

Re: INFO: rcu detected stall in sys_exit_group (6)

2020-09-24 Thread Paul E. McKenney

On Fri, Sep 25, 2020 at 09:46:16AM +0800, Hillf Danton wrote:
> 
> On Thu, 24 Sep 2020 15:26:55 -0700 Paul E. McKenney wrote:
> > 
> > On Thu, Sep 24, 2020 at 03:06:10PM +0300, Dan Carpenter wrote:
> > > 
> > > On Thu, Sep 24, 2020 at 07:07:31PM +0800, Hillf Danton wrote:
> > > > 
> > > > Thu, 24 Sep 2020 02:26:25 -0700
> > > > > syzbot found the following issue on:
> > > > > 
> > > > > HEAD commit:805c6d3c Merge branch 'fixes' of 
> > > > > git://git.kernel.org/pub/..
> > > > > git tree:   upstream
> > > > > console output: 
> > > > > https://syzkaller.appspot.com/x/log.txt?x=11fef4e390
> > > > > kernel config:  
> > > > > https://syzkaller.appspot.com/x/.config?x=af502ec9a451c9fc
> > > > > dashboard link: 
> > > > > https://syzkaller.appspot.com/bug?extid=f04854e1c5c9e913cc27
> > > > > compiler:   clang version 10.0.0 
> > > > > (https://github.com/llvm/llvm-project/ 
> > > > > c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> > > > > syz repro:  
> > > > > https://syzkaller.appspot.com/x/repro.syz?x=12d419c590
> > > > > C reproducer:   
> > > > > https://syzkaller.appspot.com/x/repro.c?x=1686b4bb90
> > > > > 
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the 
> > > > > commit:
> > > > > Reported-by: syzbot+f04854e1c5c9e913c...@syzkaller.appspotmail.com
> > > > > 
> > > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > > rcu:  Tasks blocked on level-0 rcu_node (CPUs 0-1):
> > > > > [ cut here ]
> > > > > WARNING: CPU: 1 PID: 27130 at kernel/sched/core.c:3013 rq_unlock 
> > > > > kernel/sched/sched.h:1325 [inline]
> > > > > WARNING: CPU: 1 PID: 27130 at kernel/sched/core.c:3013 
> > > > > try_invoke_on_locked_down_task+0x12d/0x270 kernel/sched/core.c:3019
> > > > > Kernel panic - not syncing: panic_on_warn set ...
> > > > > CPU: 1 PID: 27130 Comm: syz-executor076 Not tainted 
> > > > > 5.9.0-rc6-syzkaller #0
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, 
> > > > > BIOS Google 01/01/2011
> > > > > Call Trace:
> > > > >  
> > > > >  __dump_stack lib/dump_stack.c:77 [inline]
> > > > >  dump_stack+0x1d6/0x29e lib/dump_stack.c:118
> > > > >  panic+0x2c0/0x800 kernel/panic.c:231
> > > > >  __warn+0x227/0x250 kernel/panic.c:600
> > > > >  report_bug+0x1b1/0x2e0 lib/bug.c:198
> > > > >  handle_bug+0x42/0x80 arch/x86/kernel/traps.c:234
> > > > >  exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:254
> > > > >  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
> > > > > RIP: 0010:try_invoke_on_locked_down_task+0x12d/0x270 
> > > > > kernel/sched/sched.h:1325
> > > > > Code: f8 48 c1 e8 03 42 8a 04 38 84 c0 0f 85 10 01 00 00 8b 74 24 18 
> > > > > 48 89 ef e8 90 47 09 00 4c 89 ef e8 48 4c fb 06 e9 a4 00 00 00 <0f> 
> > > > > 0b e9 2b ff ff ff 48 c7 c1 34 d6 af 89 80 e1 07 80 c1 03 38 c1
> > > > > RSP: 0018:c9da8c50 EFLAGS: 00010046
> > > > > RAX:  RBX: 888097326380 RCX: 6195009cdd28a200
> > > > > RDX: c9da8d00 RSI: 8162e8d0 RDI: 888089c50500
> > > > > RBP: 8162e8d0 R08: dc00 R09: fbfff12df8f9
> > > > > R10: fbfff12df8f9 R11:  R12: 
> > > > > R13: 896ff600 R14: 888089c50500 R15: dc00
> > > > >  rcu_print_task_stall kernel/rcu/tree_stall.h:267 [inline]
> > > > >  print_other_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
> > > > >  check_cpu_stall kernel/rcu/tree_stall.h:634 [inline]
> > > > >  rcu_pending kernel/rcu/tree.c:3637 [inline]
> > > > >  rcu_sched_clock_irq+0x12bc/0x1eb0 kernel/rcu/tree.c:2519
> > > > >  update_process_times+0x130/0x1b0 kernel/time/timer.c:1710
> > > > >  tick_sched_handle kernel/time/tick-sched.c:176 [inline]
> > > > >  tick_sched_timer+0x25e/0x410 kernel/time/tick-sched.c:1328
> > > > >  __run_hrtimer kernel/time/hrtimer.c:1524 [inline]
> > > > >  __hrtimer_run_queues+0x42d/0x930 kernel/time/hrtimer.c:1588
> > > > >  hrtimer_interrupt+0x373/0xd60 kernel/time/hrtimer.c:1650
> > > > >  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline]
> > > > >  __sysvec_apic_timer_interrupt+0xf0/0x260 
> > > > > arch/x86/kernel/apic/apic.c:1097
> > > > >  asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
> > > > >  
> > > > >  __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
> > > > >  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
> > > > >  sysvec_apic_timer_interrupt+0x94/0xf0 
> > > > > arch/x86/kernel/apic/apic.c:1091
> > > > >  asm_sysvec_apic_timer_interrupt+0x12/0x20 
> > > > > arch/x86/include/asm/idtentry.h:581
> > > > > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 
> > > > > [inline]
> > > > > RIP: 0010:__raw_spin_unlock_irqrestore 
> > > > > include/linux/spinlock_api_smp.h:160 [inline]
> > > > > RIP: 0010:_raw_spin_unlock_irqrestore+0x63/0x90 
> > > > > kernel/locking/spinlock.c:191
> > > > > Code: b9 00 00 00 00 00 fc ff df 80 3c 08 00 74 0c 48 c7

Re: [PATCH] Bluetooth: btusb: Avoid unnecessary reset upon system resume

2020-09-24 Thread Abhishek Pandit-Subedi

+ Alex Lu (who contributed the original change)

Hi Kai-Heng,


On Thu, Sep 24, 2020 at 12:10 AM Kai-Heng Feng
 wrote:
>
> [+Cc linux-usb]
>
> Hi Abhishek,
>
> > On Sep 24, 2020, at 04:41, Abhishek Pandit-Subedi 
> >  wrote:
> >
> > Hi Kai-Heng,
> >
> > Which Realtek controller is this on?'
>
> The issue happens on 8821CE.
>
> >
> > Specifically for RTL8822CE, we tested without reset_resume being set
> > and that was causing the controller being reset without bluez ever
> > learning about it (resulting in devices being unusable without
> > toggling the BT power).
>
> The reset is done by the kernel, so how does that affect bluez?
>
> From what you described, it sounds more like runtime resume since bluez is 
> already running.
> If we need reset resume for runtime resume, maybe it's another bug which 
> needs to be addressed?

>From btusb.c:  
>https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/tree/drivers/bluetooth/btusb.c#n4189
/* Realtek devices lose their updated firmware over global
* suspend that means host doesn't send SET_FEATURE
* (DEVICE_REMOTE_WAKEUP)
*/

Runtime suspend always requires remote wakeup to be set and reset
resume isn't used there.

During system suspend, when remote wakeup is not set, RTL8822CE loses
the FW loaded by the driver and any state currently in the controller.
This causes the kernel and the controller state to go out of sync.
One of the issues we observed on the Realtek controller without the
reset resume quirk was that paired or connected devices would just
stop working after resume.

>
> > If the firmware doesn't cut off power during suspend, maybe you
> > shouldn't set the BTUSB_WAKEUP_DISABLE flag for that controller.
>
> We don't know beforehand if the platform firmware (BIOS for my case) will cut 
> power off or not.
>
> In general, laptops will cut off the USB power during S3.
> When AC is plugged, some laptops cuts USB power off and some don't. This also 
> applies to many desktops. Not to mention there can be BIOS options to control 
> USB power under S3/S4/S5...
>
> So we don't know beforehand.
>

I think the confusion here stems from what is actually being turned
off between our two boards and what we're referring to as firmware :)

In your case, the Realtek controller retains firmware unless the
platform cuts of power to USB (which it does during S3).
In my case, the Realtek controller loses firmware when Remote Wakeup
isn't set, even if the platform doesn't cut power to USB.

In your case, since you don't need to enforce the 'Remote Wakeup' bit,
if you unset the BTUSB_WAKEUP_DISABLE for that VID:PID, you should get
the desirable behavior (which is actually the default behavior; remote
wake will always be asserted instead of only during Runtime Suspend).

@Alex -- What is the common behavior for Realtek controllers? Should
we set BTUSB_WAKEUP_DISABLE only on RTL8822CE or should we unset it
only on RTL8821CE?

> >
> > I would prefer this doesn't get accepted in its current state.
>
> Of course.
> I think we need to find the root cause for your case before applying this one.
>
> Kai-Heng
>
> >
> > Abhishek
> >
> > On Wed, Sep 23, 2020 at 10:56 AM Kai-Heng Feng
> >  wrote:
> >>
> >> Realtek bluetooth controller may fail to work after system sleep:
> >> [ 1272.707670] Bluetooth: hci0: command 0x1001 tx timeout
> >> [ 1280.835712] Bluetooth: hci0: RTL: HCI_OP_READ_LOCAL_VERSION failed 
> >> (-110)
> >>
> >> If platform firmware doesn't cut power off during suspend, the firmware
> >> is considered retained in controller but the driver is still asking USB
> >> core to perform a reset-resume. This can make bluetooth controller
> >> unusable.
> >>
> >> So avoid unnecessary reset to resolve the issue.
> >>
> >> For devices that really lose power during suspend, USB core will detect
> >> and handle reset-resume correctly.
> >>
> >> Signed-off-by: Kai-Heng Feng 
> >> ---
> >> drivers/bluetooth/btusb.c | 8 +++-
> >> 1 file changed, 3 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> >> index 8d2608ddfd08..de86ef4388f9 100644
> >> --- a/drivers/bluetooth/btusb.c
> >> +++ b/drivers/bluetooth/btusb.c
> >> @@ -4255,17 +4255,15 @@ static int btusb_suspend(struct usb_interface 
> >> *intf, pm_message_t message)
> >>enable_irq(data->oob_wake_irq);
> >>}
> >>
> >> -   /* For global suspend, Realtek devices lose the loaded fw
> >> -* in them. But for autosuspend, firmware should remain.
> >> -* Actually, it depends on whether the usb host sends
> >> +   /* For global suspend, Realtek devices lose the loaded fw in them 
> >> if
> >> +* platform firmware cut power off. But for autosuspend, firmware
> >> +* should remain.  Actually, it depends on whether the usb host 
> >> sends
> >> * set feature (enable wakeup) or not.
> >> */
> >>if (test_bit(BTUSB_WAKEUP_DISABLE, >flags)) {
> >>if

Re: [rcu:rcu/test 55/64] kernel/rcu/rcutorture.c:1507:13: warning: no previous prototype for 'rcu_nocb_cpu_offload'

2020-09-24 Thread Paul E. McKenney

On Fri, Sep 25, 2020 at 11:08:19AM +0800, kernel test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
> rcu/test
> head:   516c2f87d0d1204a9ea1b36298283f74056a7eab
> commit: 4df5d8a622235d0b26aba92c881d190667e4d6c3 [55/64] rcutorture: Test 
> runtime toggling of CPUs' callback offloading
> config: microblaze-randconfig-r033-20200923 (attached as .config)
> compiler: microblaze-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout 4df5d8a622235d0b26aba92c881d190667e4d6c3
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
> ARCH=microblaze 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> All warnings (new ones prefixed by >>):
> 
> >> kernel/rcu/rcutorture.c:1507:13: warning: no previous prototype for 
> >> 'rcu_nocb_cpu_offload' [-Wmissing-prototypes]
> 1507 | void __weak rcu_nocb_cpu_offload(int cpu) {}
>  | ^~~~
> >> kernel/rcu/rcutorture.c:1508:13: warning: no previous prototype for 
> >> 'rcu_nocb_cpu_deoffload' [-Wmissing-prototypes]
> 1508 | void __weak rcu_nocb_cpu_deoffload(int cpu) {}
>  | ^~

The previous prototype will be showing up later, once the code that
this is testing appears.  At which point these two lines will go away.

Thanx, Paul

> # 
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?id=4df5d8a622235d0b26aba92c881d190667e4d6c3
> git remote add rcu 
> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> git fetch --no-tags rcu rcu/test
> git checkout 4df5d8a622235d0b26aba92c881d190667e4d6c3
> vim +/rcu_nocb_cpu_offload +1507 kernel/rcu/rcutorture.c
> 
>   1506
> > 1507void __weak rcu_nocb_cpu_offload(int cpu) {}
> > 1508void __weak rcu_nocb_cpu_deoffload(int cpu) {}
>   1509
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

[PATCH 6/8] arm64: dts: meson: disable vrtc for VIM3L boards meson-khadas-vim3

2020-09-24 Thread Artem Lapkin

vrtc not used for meson-khadas-vim3

Signed-off-by: Artem Lapkin 
---
 arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
index 3111bf35c0f..81bb88a76d5 100644
--- a/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
@@ -284,6 +284,10 @@ _ef {
 pinctrl-names = "default";
 };
 
+ {
+   status = "disabled";
+};
+
  {
status = "okay";
vref-supply = <_1v8>;
-- 
2.25.1

[PATCH 7/8] arm64: dts: meson: enable RTC for VIM1 meson-gxl-s905x-khadas-vim

2020-09-24 Thread Artem Lapkin

enable RTC for VIM1 meson-gxl-s905x-khadas-vim

Signed-off-by: Artem Lapkin 
---
 arch/arm64/boot/dts/amlogic/meson-gxl-s905x-khadas-vim.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxl-s905x-khadas-vim.dts 
b/arch/arm64/boot/dts/amlogic/meson-gxl-s905x-khadas-vim.dts
index 8bcdffdf55d..354f21d4171 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxl-s905x-khadas-vim.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxl-s905x-khadas-vim.dts
@@ -98,7 +98,7 @@ _B {
 
rtc: rtc@51 {
/* has to be enabled manually when a battery is connected: */
-   status = "disabled";
+   status = "okay";
compatible = "haoyu,hym8563";
reg = <0x51>;
#clock-cells = <0>;
-- 
2.25.1

[PATCH 4/8] arm64: dts: meson: remove fixed memory size for Khadas VIM3/VIM3L meson-khadas-vim3

2020-09-24 Thread Artem Lapkin

no need force setup memory size!
VIM3 boards have 2Gb and 4Gb variants
memory size will be automatically defined

mainline uboot works properly in any case
but old vendor uboot works not properly for 4Gb variants

Signed-off-by: Artem Lapkin 
---
 arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
index 7e137399257..3111bf35c0f 100644
--- a/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
@@ -20,10 +20,15 @@ chosen {
stdout-path = "serial0:115200n8";
};
 
+/*  no need force setup memory size!
+VIM3 boards have 2Gb and 4Gb variants
+memory size will be automatically defined
+
memory@0 {
device_type = "memory";
reg = <0x0 0x0 0x0 0x8000>;
};
+*/
 
adc-keys {
compatible = "adc-keys";
-- 
2.25.1

[PATCH 5/8] arm64: dts: meson: remove reset-gpios from ethernet node for VIM2 meson-gxm-khadas-vim2

2020-09-24 Thread Artem Lapkin

1) fix down/up ethernet interface - need remove reset-gpios for ethernet node

`ifconfig eth0 down && ifconfig eth0 up` # didnt works with reset-gpios

2) add max-speed 1Gbit

Signed-off-by: Artem Lapkin 
---
 arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts 
b/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts
index a6baf865aa2..70343da2811 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts
@@ -195,7 +195,7 @@ external_phy: ethernet-phy@0 {
 
reset-assert-us = <1>;
reset-deassert-us = <3>;
-   reset-gpios = < GPIOZ_14 GPIO_ACTIVE_LOW>;
+   max-speed = <1000>;
 
interrupt-parent = <_intc>;
/* MAC_INTR on GPIOZ_15 */
-- 
2.25.1

[PATCH 8/8] arm64: dts: meson: enable RTC for VIM2 meson-gxm-khadas-vim2

2020-09-24 Thread Artem Lapkin

enable RTC for VIM2 meson-gxm-khadas-vim2

Signed-off-by: Artem Lapkin 
---
 arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts 
b/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts
index 70343da2811..76b7e34a9a3 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts
@@ -229,7 +229,7 @@ _B {
 
rtc: rtc@51 {
/* has to be enabled manually when a battery is connected: */
-   status = "disabled";
+   status = "okay";
compatible = "haoyu,hym8563";
reg = <0x51>;
#clock-cells = <0>;
-- 
2.25.1

Re: [PATCH] random: use correct memory barriers for crng_node_pool

2020-09-24 Thread Paul E. McKenney

On Thu, Sep 24, 2020 at 07:09:08PM -0700, Eric Biggers wrote:
> On Thu, Sep 24, 2020 at 05:59:34PM -0700, Paul E. McKenney wrote:
> > On Tue, Sep 22, 2020 at 02:55:58PM -0700, Eric Biggers wrote:
> > > On Tue, Sep 22, 2020 at 01:56:28PM -0700, Paul E. McKenney wrote:
> > > > > You're missing the point here.  b and c could easily be allocated by 
> > > > > a function
> > > > > alloc_b() that's in another file.
> > > > 
> > > > I am still missing something.
> > > > 
> > > > If by "allocated" you mean something like kmalloc(), the compiler 
> > > > doesn't
> > > > know the address.  If you instead mean that there is a function that
> > > > returns the address of another translation unit's static variable, then
> > > > any needed ordering should preferably be built into that function's API.
> > > > Either way, one would hope for some documentation of anything the caller
> > > > needed to be careful of.
> > > > 
> > > > > > Besides which, control dependencies should be used only by LKMM 
> > > > > > experts
> > > > > > at this point.  
> > > > > 
> > > > > What does that even mean?  Control dependencies are everywhere.
> > > > 
> > > > Does the following work better for you?
> > > > 
> > > > "... the non-local ordering properties of control dependencies should be
> > > > relied on only by LKMM experts ...".
> > > 
> > > No.  I don't know what that means.  And I think very few people would 
> > > know.
> > > 
> > > I just want to know if I use the one-time init pattern with a pointer to 
> > > a data
> > > structure foo, are the readers using foo_use() supposed to use 
> > > READ_ONCE() or
> > > are they supposed to use smp_load_acquire().
> > > 
> > > It seems the answer is that smp_load_acquire() is the only safe choice, 
> > > since
> > > foo_use() *might* involve a control dependency, or might in the future 
> > > since
> > > it's part of another kernel subsystem and its implementation could change.
> > 
> > First, the specific issue of one-time init.
> > 
> > If you are the one writing the code implementing one-time init, it is your
> > choice.  It seems like you prefer smp_load_acquire().  If someone sees
> > performance problems due to the resulting memory-barrier instructions,
> > they have the option of submitting a patch and either forking the
> > implementation or taking your implementation over from you, depending
> > on how that conversation goes.
> 
> It doesn't matter what I "prefer".  The question is, how to write code that is
> actually guaranteed to be correct on all supported Linux architectures, 
> without
> assuming internal implementation details of other kernel subsystems.

And that question allows ample room for personal preferences.

There are after all tradeoffs.  Do you want to live within the current
knowledge of your users, or are you willing to invest time and energy
into teaching them something new?  If someone wants a level of performance
that is accommodated only by a difficult-to-use pattern, will you choose
to accommodate them, or will you tell them to build write their own?

There are often a number of ways to make something work, and they all
have advantages and disadvantages.  There are tradeoffs, and preferences
have a role to play as well.

> > Third, your first pointer-based variant works with smp_load_acquire(),
> > but could instead use READ_ONCE() as long as it is reworked to something
> > like this:
> > 
> > static struct foo *foo;
> > static DEFINE_MUTEX(foo_init_mutex);
> > 
> > // The returned pointer must be handled carefully in the same
> > // way as would a pointer returned from rcu_derefeference().
> > // See Documentation/RCU/rcu_dereference.rst.
> > struct foo *init_foo_if_needed(void)
> > {
> > int err = 0;
> > struct foo *retp;
> > 
> > /* pairs with smp_store_release() below */
> > retp = READ_ONCE(foo);
> > if (retp)
> > return retp;
> > 
> > mutex_lock(_init_mutex);
> > if (!foo) {
> > // The compiler doesn't know the address:
> > struct foo *p = alloc_foo();
> > 
> > if (p) /* pairs with smp_load_acquire() above */
> > smp_store_release(, p);
> > else
> > err = -ENOMEM;
> > }
> > mutex_unlock(_init_mutex);
> > return err;
> > }
> > 
> > There are more than 2,000 instances of the rcu_dereference() family of
> > primitives in the v5.8 kernel, so it should not be hard to find people
> > who are familiar with it.  TL;DR:  Just dereference the pointer and you
> > will be fine.
> 
> I don't understand why you are saying READ_ONCE() is safe here.  alloc_foo()
> could very well initialize some static data as well as foo itself, and after
> 'if (retp)', use_foo() can be called that happens to use the static data as 
> well
> as foo itself.  That's a control

[PATCH 1/8] arm64: dts: meson: update spifc node on Khadas VIM2 meson-gxm-khadas-vim2

2020-09-24 Thread Artem Lapkin

1) The VIM2 Boards use w25q128 spi chip only not w25q32 or w25q16
   it's not really seriously becouse have 'jedec,spi-nor' which
   have auto chips identifications

2) max-frequency is 104Mhz

Signed-off-by: Artem Lapkin 
---
 arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts 
b/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts
index bff8ec2c1c7..a6baf865aa2 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxm-khadas-vim2.dts
@@ -336,12 +336,12 @@  {
pinctrl-0 = <_pins>;
pinctrl-names = "default";
 
-   w25q32: spi-flash@0 {
+   w25q128: spi-flash@0 {
#address-cells = <1>;
#size-cells = <1>;
-   compatible = "winbond,w25q16", "jedec,spi-nor";
+   compatible = "winbond,w25q128fw", "jedec,spi-nor";
reg = <0>;
-   spi-max-frequency = <300>;
+   spi-max-frequency = <10400>;
};
 };
 
-- 
2.25.1

[PATCH 2/8] arm64: dts: meson: update leds node on Khadas VIM3/VIM3L boards meson-khadas-vim3

2020-09-24 Thread Artem Lapkin

GPIO_ACTIVE_LOW replaced to GPIO_ACTIVE_HIGH for white and red leds

Signed-off-by: Artem Lapkin 
---
 arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
index 94f75b44650..73783692e30 100644
--- a/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
@@ -41,13 +41,13 @@ leds {
 
led-white {
label = "vim3:white:sys";
-   gpios = <_ao GPIOAO_4 GPIO_ACTIVE_LOW>;
+   gpios = <_ao GPIOAO_4 GPIO_ACTIVE_HIGH>;
linux,default-trigger = "heartbeat";
};
 
led-red {
label = "vim3:red";
-   gpios = <_expander 5 GPIO_ACTIVE_LOW>;
+   gpios = <_expander 5 GPIO_ACTIVE_HIGH>;
};
};
 
-- 
2.25.1

[PATCH 3/8] arm64: dts: meson: update leds node on Khadas VIM3/VIM3L board meson-khadas-vim3

2020-09-24 Thread Artem Lapkin

add aliases names led_white and led_red for white and red leds

Signed-off-by: Artem Lapkin 
---
 arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi 
b/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
index 73783692e30..7e137399257 100644
--- a/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-khadas-vim3.dtsi
@@ -12,6 +12,8 @@ / {
aliases {
serial0 = _AO;
ethernet0 = 
+   led_red = _red;
+   led_white = _white;
};
 
chosen {
@@ -39,13 +41,13 @@ button-function {
leds {
compatible = "gpio-leds";
 
-   led-white {
+   led_white: led-white {
label = "vim3:white:sys";
gpios = <_ao GPIOAO_4 GPIO_ACTIVE_HIGH>;
linux,default-trigger = "heartbeat";
};
 
-   led-red {
+   led_red: led-red {
label = "vim3:red";
gpios = <_expander 5 GPIO_ACTIVE_HIGH>;
};
-- 
2.25.1

[PATCH 0/8] dts updates and fixes for Khadas VIM1 VIM2 VIM3 VIML boards

2020-09-24 Thread Artem Lapkin

dts updates and fixes for Khadas VIM1 VIM2 VIM3 VIML boards

Artem Lapkin (8):
  arm64: dts: meson: update spifc node on Khadas VIM2
meson-gxm-khadas-vim2
  arm64: dts: meson: update leds node on Khadas VIM3/VIM3L boards
meson-khadas-vim3
  arm64: dts: meson: update leds node on Khadas VIM3/VIM3L board
meson-khadas-vim3
  arm64: dts: meson: remove fixed memory size for Khadas VIM3/VIM3L
meson-khadas-vim3
  arm64: dts: meson: remove reset-gpios from ethernet node for VIM2
meson-gxm-khadas-vim2
  arm64: dts: meson: disable vrtc for VIM3L boards meson-khadas-vim3
  arm64: dts: meson: enable RTC for VIM1 meson-gxl-s905x-khadas-vim
  arm64: dts: meson: enable RTC for VIM2 meson-gxm-khadas-vim2

 .../amlogic/meson-gxl-s905x-khadas-vim.dts|  2 +-
 .../dts/amlogic/meson-gxm-khadas-vim2.dts | 10 +-
 .../boot/dts/amlogic/meson-khadas-vim3.dtsi   | 19 +++
 3 files changed, 21 insertions(+), 10 deletions(-)

-- 
2.25.1

Re: [PATCH v2 seccomp 2/6] asm/syscall.h: Add syscall_arches[] array

2020-09-24 Thread YiFei Zhu

On Thu, Sep 24, 2020 at 10:09 PM Kees Cook  wrote:
> Right, sorry, I may not have been clear. When building my RFC I noticed
> that I couldn't use NR_syscall very "early" in the header file include
> stack on arm64, which complicated things. So I guess what I mean is
> something like "it's probably better to do all these seccomp-specific
> macros/etc in asm/include/seccomp.h rather than in syscall.h because I
> know at least one architecture that might cause trouble."

Ah. Makes sense.

> Ironicailly, that's the only place I actually know for sure where people
> using x32 because it shows measurable (10%) speed-up for builders:
> https://lore.kernel.org/lkml/caoesgmgu1i3p7xmzucetj63t-st_jh+bfahy-k6lhgqnrik...@mail.gmail.com

Wow. 10% is significant. Makes you wonder why x32 hasn't conquered the world.

> So, yes, as you and Jann both point out, it wouldn't be terrible to just
> ignore x32, it seems a shame to penalize it. That said, if the masking
> step from my v1 is actually noticable on a native workload, then yeah,
> probably x32 should be ignored. My instinct (not measured) is that it's
> faster than walking a small array.[citation needed]

My instinct: should be pretty similar, with the loop unrolled.

You convince me that penalizing supporting x32 would be a pity :( The
10% is so nice I want it.

> It's easier to do a per-arch revert (i.e. all the -stable tree
> machinery, etc) with a single SHA instead of having to write a partial
> revert, etc.

I see. Thanks for clarifying.

How about this? Rather than specifically designing names for bitmasks
(native, compat, multiplex), just have SECCOMP_ARCH_{1,2,3}? Each arch
number would provide the size of the bitmap and a static inline
function to check the given seccomp_data belongs to the arch and if
so, the order of the bit in the bitmap. There is no need for the
shifts and madness in seccomp.c; it's arch-dependent code in their own
seccomp.h. We let the preprocessor and compiler to make things
optimized.

YiFei Zhu

Re: [PATCH v2 seccomp 6/6] seccomp/cache: Report cache data through /proc/pid/seccomp_cache

2020-09-24 Thread Kees Cook

On Thu, Sep 24, 2020 at 10:11:17PM -0500, YiFei Zhu wrote:
> On Thu, Sep 24, 2020 at 6:56 PM Kees Cook  wrote:
> > > This file is guarded by CONFIG_PROC_SECCOMP_CACHE with a default
> > The question of permissions is my central concern here: who should see
> > this? Some contained processes have been intentionally blocked from
> > self-introspection so even the "standard" high bar of "ptrace attach
> > allowed?" can't always be sufficient.
> >
> > My compromise about filter visibility in the past was saying that
> > CAP_SYS_ADMIN was required (see seccomp_get_filter()). I'm nervous to
> > weaken this. (There is some work that hasn't been sent upstream yet that
> > is looking to expose the filter _contents_ via /proc that has been
> > nervous too.)
> >
> > Now full contents vs "allow"/"filter" are certainly different things,
> > but I don't feel like I've got enough evidence to show that this
> > introspection would help debugging enough to justify the partially
> > imagined safety of not exposing it to potential attackers.
> 
> Agreed. I'm inclined to make it CONFIG_DEBUG_SECCOMP_CACHE and guarded
> by a CAP just to make it "debug only".

Yeah; I just can't quite see what the best direction is here. I will
ponder this more. As I mentioned, it does seem handy. :)

> Is there something to stop a config from being enabled in an
> allyesconfig? I remember seeing something like that. Else if someone
> is manually selecting we can add a help text with a big banner...

Yeah, allyesconfig and allmodconfig both effectively set
CONFIG_COMPILE_TEST. Anyway, likely a caps test will end up being the
way to do it.

> 
> > But behavior-wise, yeah, I like it; I'm fine with human-readable and
> > full AUDIT_ARCH values. (Though, as devil's advocate again, to repeat
> > Jann's own words back: do we want to add this only to have a new UAPI to
> > support going forward?)
> 
> Is this something we want to keep stable?

The Prime Directive of "never break userspace" is really "never break
userspace in a way that someone notices". So if nothing ever parses that
file, then we don't have to keep it stable, but if something does, and
we change it, we have to fix it.

So, a capability test means very few things will touch it, and if we
decide it's not a big deal, we can relax permissions in the future.

-- 
Kees Cook

Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-24 Thread Huang, Ying

Rafael Aquini  writes:
>> Or, can you help to run the test with a debug kernel based on upstream
>> kernel.  I can provide some debug patch.
>> 
>
> Sure, I can set your patches to run with the test cases we have that tend to 
> reproduce the issue with some degree of success.

Thanks!

I found a race condition.  During THP splitting, "head" may be unlocked
before calling split_swap_cluster(), because head != page during
deferred splitting.  So we should call split_swap_cluster() before
unlocking.  The debug patch to do that is as below.  Can you help to
test it?

Best Regards,
Huang, Ying

8<
>From 24ce0736a9f587d2dba12f12491c88d3e296a491 Mon Sep 17 00:00:00 2001
From: Huang Ying 
Date: Fri, 25 Sep 2020 11:10:56 +0800
Subject: [PATCH] dbg: Call split_swap_clsuter() before unlock page during
 split THP

---
 mm/huge_memory.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index faadc449cca5..8d79e5e6b46e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2444,6 +2444,12 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
 
remap_page(head);
 
+   if (PageSwapCache(head)) {
+   swp_entry_t entry = { .val = page_private(head) };
+
+   split_swap_cluster(entry);
+   }
+
for (i = 0; i < HPAGE_PMD_NR; i++) {
struct page *subpage = head + i;
if (subpage == page)
@@ -2678,12 +2684,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
}
 
__split_huge_page(page, list, end, flags);
-   if (PageSwapCache(head)) {
-   swp_entry_t entry = { .val = page_private(head) };
-
-   ret = split_swap_cluster(entry);
-   } else
-   ret = 0;
+   ret = 0;
} else {
if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
pr_alert("total_mapcount: %u, page_count(): %u\n",
-- 
2.28.0

Re: [PATCH v3 1/5] fpga: dfl: rename the bus type "dfl" to "fpga-dfl"

2020-09-24 Thread Xu Yilun

On Thu, Sep 24, 2020 at 12:01:55PM -0700, Tom Rix wrote:
> 
> On 9/24/20 9:59 AM, Xu Yilun wrote:
> > Now the DFL device drivers could be made as independent modules and put
> > in different subsystems according to their functionalities. So the name
> > should be descriptive and unique in the whole kernel.
> >
> > The patch changes the naming of dfl bus related structures, functions,
> > APIs and documentations.
> 
> This set is largely a mechanical change, it looks ok.
> 
> 
> The big thing i would recommend giving some thought is
> 
> include/uabi/linux/fpga-dfl.h uses the prefix DFL_FPGA
> 
> this change uses fpga_dfl.
> 
> I do not think it matters where 'fpga' is in the internal names,
> 
> so for consistency with the external interface the prefix should dfl_fpga.

I think fpga_dfl should be a more descriptive name from the perspective
of other subsystem. Usually the biger namespace comes first. Actually I
see we are so far consistent on stuff for the whole kernel namespace,
Kconfig, head file names, they all have prefix FPGA_DFL or fpga_dfl

Thanks,
Yilun

> 
> >
> > Signed-off-by: Xu Yilun 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-dfl  |  15 --
> >  Documentation/ABI/testing/sysfs-bus-fpga-dfl |  15 ++
> >  MAINTAINERS  |   2 +-
> >  drivers/fpga/dfl.c   | 254 
> > ++-
> >  drivers/fpga/dfl.h   |  77 
> >  5 files changed, 184 insertions(+), 179 deletions(-)
> >  delete mode 100644 Documentation/ABI/testing/sysfs-bus-dfl
> >  create mode 100644 Documentation/ABI/testing/sysfs-bus-fpga-dfl
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-dfl 
> > b/Documentation/ABI/testing/sysfs-bus-dfl
> > deleted file mode 100644
> > index 23543be..000
> > --- a/Documentation/ABI/testing/sysfs-bus-dfl
> > +++ /dev/null
> > @@ -1,15 +0,0 @@
> > -What:  /sys/bus/dfl/devices/dfl_dev.X/type
> > -Date:  Aug 2020
> > -KernelVersion: 5.10
> > -Contact:   Xu Yilun 
> > -Description:   Read-only. It returns type of DFL FIU of the device. 
> > Now DFL
> > -   supports 2 FIU types, 0 for FME, 1 for PORT.
> > -   Format: 0x%x
> > -
> > -What:  /sys/bus/dfl/devices/dfl_dev.X/feature_id
> > -Date:  Aug 2020
> > -KernelVersion: 5.10
> > -Contact:   Xu Yilun 
> > -Description:   Read-only. It returns feature identifier local to its 
> > DFL FIU
> > -   type.
> > -   Format: 0x%x
> > diff --git a/Documentation/ABI/testing/sysfs-bus-fpga-dfl 
> > b/Documentation/ABI/testing/sysfs-bus-fpga-dfl
> > new file mode 100644
> > index 000..072decf
> > --- /dev/null
> > +++ b/Documentation/ABI/testing/sysfs-bus-fpga-dfl
> > @@ -0,0 +1,15 @@
> > +What:  /sys/bus/fpga-dfl/devices/fpga_dfl_dev.X/type
> > +Date:  Aug 2020
> > +KernelVersion: 5.10
> > +Contact:   Xu Yilun 
> > +Description:   Read-only. It returns type of FPGA DFL FIU of the 
> > device. Now
> > +   FPGA DFL supports 2 FIU types, 0 for FME, 1 for PORT.
> > +   Format: 0x%x
> > +
> > +What:  /sys/bus/fpga-dfl/devices/fpga_dfl_dev.X/feature_id
> 
> fpga-dfl and fpga_dfl, the prefix needs to be consistent or..
> 
> sys/bus/fpga-dfl/ already has the fpga, dfl info and the second one is 
> redundant so could be
> 
> /sys/bus/fpga-dfl/devices/dev.X/feature_id
> 
> 
> re: mfd's comment.
> 
> This is a new, unreleased interface, now is the only we can change it, so 
> lets do it now.
> 
> I'd even suggest splitting the patch to get the change into 5.10.
> 
> Tom
> 
> > +Date:  Aug 2020
> > +KernelVersion: 5.10
> > +Contact:   Xu Yilun 
> > +Description:   Read-only. It returns feature identifier local to its 
> > FPGA DFL
> > +   FIU type.
> > +   Format: 0x%x
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 0924930..48c0859 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -6839,7 +6839,7 @@ M:Wu Hao 
> >  R: Tom Rix 
> >  L: linux-f...@vger.kernel.org
> >  S: Maintained
> > -F: Documentation/ABI/testing/sysfs-bus-dfl
> > +F: Documentation/ABI/testing/sysfs-bus-fpga-dfl
> >  F: Documentation/fpga/dfl.rst
> >  F: drivers/fpga/dfl*
> >  F: include/uapi/linux/fpga-dfl.h
> > diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
> > index b450870..f146cda 100644
> > --- a/drivers/fpga/dfl.c
> > +++ b/drivers/fpga/dfl.c
> > @@ -20,10 +20,10 @@ static DEFINE_MUTEX(dfl_id_mutex);
> >  
> >  /*
> >   * when adding a new feature dev support in DFL framework, it's required to
> > - * add a new item in enum dfl_id_type and provide related information in 
> > below
> > - * dfl_devs table which is indexed by dfl_id_type, e.g. name string used 
> > for
> > - * platform device creation (define name strings in dfl.h, as they could be
> > - * reused by platform device drivers).
> > + * add a new item in enum fpga_dfl_id_type and provide related information 
> >

Re: [PATCH net-next 0/6] net: hns3: updates for -next

2020-09-24 Thread David Miller

From: Huazhong Tan 
Date: Fri, 25 Sep 2020 08:26:12 +0800

> There are some updates for the HNS3 ethernet driver.
> #1 & #2 are two cleanups.
> #3 adds new hardware error for the client.
> #4 adds debugfs support the pf's interrupt resource.
> #5 adds new pci device id for 200G device.
> #6 renames the unsuitable macro of vf's pci device id.

Series applied, thank you.

Re: [PATCH v2 1/1] kdump: append uts_namespace.name offset to VMCOREINFO

2020-09-24 Thread Baoquan He

On 09/24/20 at 02:46pm, Alexander Egorenkov wrote:
> The offset of the field 'init_uts_ns.name' has changed
> since
> 
> commit 9a56493f6942c0e2df1579986128721da96e00d8
> Author: Kirill Tkhai 
> Date:   Mon Aug 3 13:16:21 2020 +0300
> 
> uts: Use generic ns_common::count
> 
> Link: 
> https://lore.kernel.org/r/159644978167.604812.1773586504374412107.stgit@localhost.localdomain

Seems there's some argument about the generic ns_common::count in the
thread of above link. While except of it, the adding the offset of
uts_namespace.name looks good to me.

Acked-by: Baoquan He 


> 
> Make the offset of the field 'uts_namespace.name' available
> in VMCOREINFO because tools like 'crash-utility' and
> 'makedumpfile' must be able to read it from crash dumps.
> 
> Signed-off-by: Alexander Egorenkov 
> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 106e4500fd53..173fdc261882 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -447,6 +447,7 @@ static int __init crash_save_vmcoreinfo_init(void)
>   VMCOREINFO_PAGESIZE(PAGE_SIZE);
>  
>   VMCOREINFO_SYMBOL(init_uts_ns);
> + VMCOREINFO_OFFSET(uts_namespace, name);
>   VMCOREINFO_SYMBOL(node_online_map);
>  #ifdef CONFIG_MMU
>   VMCOREINFO_SYMBOL_ARRAY(swapper_pg_dir);
> -- 
> 2.26.2
>

[PATCH] fs/d_path.c: Supply missing internal.h include file

2020-09-24 Thread Pujin Shi

If the header file containing a function's prototype isn't included by
the sourcefile containing the associated function, the build system
complains of missing prototypes.

Fixes the following W=1 kernel build warning(s):

fs/d_path.c:311:7: warning: no previous prototype for ‘simple_dname’ 
[-Wmissing-prototypes]

Signed-off-by: Pujin Shi 
---
 fs/d_path.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/d_path.c b/fs/d_path.c
index 0f1fc1743302..745dc1f77787 100644
--- a/fs/d_path.c
+++ b/fs/d_path.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include "mount.h"
+#include "internal.h"
 
 static int prepend(char **buffer, int *buflen, const char *str, int namelen)
 {
-- 
2.18.1

Re: [PATCH v2 seccomp 6/6] seccomp/cache: Report cache data through /proc/pid/seccomp_cache

2020-09-24 Thread YiFei Zhu

On Thu, Sep 24, 2020 at 6:56 PM Kees Cook  wrote:
> > This file is guarded by CONFIG_PROC_SECCOMP_CACHE with a default
> The question of permissions is my central concern here: who should see
> this? Some contained processes have been intentionally blocked from
> self-introspection so even the "standard" high bar of "ptrace attach
> allowed?" can't always be sufficient.
>
> My compromise about filter visibility in the past was saying that
> CAP_SYS_ADMIN was required (see seccomp_get_filter()). I'm nervous to
> weaken this. (There is some work that hasn't been sent upstream yet that
> is looking to expose the filter _contents_ via /proc that has been
> nervous too.)
>
> Now full contents vs "allow"/"filter" are certainly different things,
> but I don't feel like I've got enough evidence to show that this
> introspection would help debugging enough to justify the partially
> imagined safety of not exposing it to potential attackers.

Agreed. I'm inclined to make it CONFIG_DEBUG_SECCOMP_CACHE and guarded
by a CAP just to make it "debug only".

> I suspect it _is_ the right thing to do (just look at my own RFC's
> "debug" patch), but I'd like this to be well justified in the commit
> log.
>
> And yes, while it does hide behind a CONFIG, I'd still want it justified,
> especially since distros have a tendency to just turn everything on
> anyway. ;)

Is there something to stop a config from being enabled in an
allyesconfig? I remember seeing something like that. Else if someone
is manually selecting we can add a help text with a big banner...

> But behavior-wise, yeah, I like it; I'm fine with human-readable and
> full AUDIT_ARCH values. (Though, as devil's advocate again, to repeat
> Jann's own words back: do we want to add this only to have a new UAPI to
> support going forward?)

Is this something we want to keep stable?

YiFei Zhu

Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-24 Thread Andrew Morton

On Fri, 25 Sep 2020 11:06:53 +0800 "Huang\, Ying"  wrote:

> >> UGH! I missed adding it to my cc list. Shall I just forward it, now, or
> >> do you prefer a fresh repost?
> >
> > I added the cc:stable to my copy.
> 
> Please don't merge this patch.  This patch doesn't fix the bug, but hide
> the real bug.  I will work with Rafael on root causing and fixing.

Yup, I saw the discussion and have put the patch on hold pending
the outcome.

Re: [PATCH v2 seccomp 2/6] asm/syscall.h: Add syscall_arches[] array

2020-09-24 Thread Kees Cook

On Thu, Sep 24, 2020 at 08:27:40PM -0500, YiFei Zhu wrote:
> [resending this too]
> 
> On Thu, Sep 24, 2020 at 6:01 PM Kees Cook  wrote:
> > Disregarding the "how" of this, yeah, we'll certainly need something to
> > tell seccomp about the arrangement of syscall tables and how to find
> > them.
> >
> > However, I'd still prefer to do this on a per-arch basis, and include
> > more detail, as I've got in my v1.
> >
> > Something missing from both styles, though, is a consolidation of
> > values, where the AUDIT_ARCH* isn't reused in both the seccomp info and
> > the syscall_get_arch() return. The problems here were two-fold:
> >
> > 1) putting this in syscall.h meant you do not have full NR_syscall*
> >visibility on some architectures (e.g. arm64 plays weird games with
> >header include order).
> 
> I don't get this one -- I'm not playing with NR_syscall here.

Right, sorry, I may not have been clear. When building my RFC I noticed
that I couldn't use NR_syscall very "early" in the header file include
stack on arm64, which complicated things. So I guess what I mean is
something like "it's probably better to do all these seccomp-specific
macros/etc in asm/include/seccomp.h rather than in syscall.h because I
know at least one architecture that might cause trouble."

> > 2) seccomp needs to handle "multiplexed" tables like x86_x32 (distros
> >haven't removed CONFIG_X86_X32 widely yet, so it is a reality that
> >it must be dealt with), which means seccomp's idea of the arch
> >"number" can't be the same as the AUDIT_ARCH.
> 
> Why so? Does anyone actually use x32 in a container? The memory cost
> and analysis cost is on everyone. The worst case scenario if we don't
> support it is that the syscall is not accelerated.

Ironicailly, that's the only place I actually know for sure where people
using x32 because it shows measurable (10%) speed-up for builders:
https://lore.kernel.org/lkml/caoesgmgu1i3p7xmzucetj63t-st_jh+bfahy-k6lhgqnrik...@mail.gmail.com

So, yes, as you and Jann both point out, it wouldn't be terrible to just
ignore x32, it seems a shame to penalize it. That said, if the masking
step from my v1 is actually noticable on a native workload, then yeah,
probably x32 should be ignored. My instinct (not measured) is that it's
faster than walking a small array.[citation needed]

> > So, likely a combo of approaches is needed: an array (or more likely,
> > enum), declared in the per-arch seccomp.h file. And I don't see a way
> > to solve #1 cleanly.
> >
> > Regardless, it needs to be split per architecture so that regressions
> > can be bisected/reverted/isolated cleanly. And if we can't actually test
> > it at runtime (or find someone who can) it's not a good idea to make the
> > change. :)
> 
> You have a good point regarding tests. Don't see how it affects
> regressions though. Only one file here is ever included per-build.

It's easier to do a per-arch revert (i.e. all the -stable tree
machinery, etc) with a single SHA instead of having to write a partial
revert, etc.

-- 
Kees Cook

Re: KASAN: stack-out-of-bounds Read in xfrm_selector_match (2)

2020-09-24 Thread Herbert Xu

On Mon, Sep 21, 2020 at 07:56:20AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:eb5f95f1 Merge tag 's390-5.9-6' of git://git.kernel.org/pu..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=13996ad590
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ffe85b197a57c180
> dashboard link: https://syzkaller.appspot.com/bug?extid=577fbac3145a6eb2e7a5
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+577fbac3145a6eb2e...@syzkaller.appspotmail.com
> 
> ==
> BUG: KASAN: stack-out-of-bounds in xfrm_flowi_dport include/net/xfrm.h:877 
> [inline]
> BUG: KASAN: stack-out-of-bounds in __xfrm6_selector_match 
> net/xfrm/xfrm_policy.c:216 [inline]
> BUG: KASAN: stack-out-of-bounds in xfrm_selector_match+0xf36/0xf60 
> net/xfrm/xfrm_policy.c:229
> Read of size 2 at addr c9001914f55c by task syz-executor.4/15633
> 
> CPU: 0 PID: 15633 Comm: syz-executor.4 Not tainted 5.9.0-rc5-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x198/0x1fd lib/dump_stack.c:118
>  print_address_description.constprop.0.cold+0x5/0x497 mm/kasan/report.c:383
>  __kasan_report mm/kasan/report.c:513 [inline]
>  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
>  xfrm_flowi_dport include/net/xfrm.h:877 [inline]

This one goes back more than ten years.  This patch should fix
it.

---8<---
The struct flowi must never be interpreted by itself as its size
depends on the address family.  Therefore it must always be grouped
with its original family value.

In this particular instance, the original family value is lost in
the function xfrm_state_find.  Therefore we get a bogus read when
it's coupled with the wrong family which would occur with inter-
family xfrm states.

This patch fixes it by keeping the original family value.

Note that the same bug could potentially occur in LSM through
the xfrm_state_pol_flow_match hook.  I checked the current code
there and it seems to be safe for now as only secid is used which
is part of struct flowi_common.  But that API should be changed
so that so that we don't get new bugs in the future.  We could
do that by replacing fl with just secid or adding a family field.

Reported-by: syzbot+577fbac3145a6eb2e...@syzkaller.appspotmail.com
Fixes: 48b8d78315bf ("[XFRM]: State selection update to use inner...")
Signed-off-by: Herbert Xu 

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 69520ad3d83b..9b5f2c2b9770 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1019,7 +1019,8 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, 
struct xfrm_state *x,
 */
if (x->km.state == XFRM_STATE_VALID) {
if ((x->sel.family &&
-!xfrm_selector_match(>sel, fl, x->sel.family)) ||
+(x->sel.family != family ||
+ !xfrm_selector_match(>sel, fl, family))) ||
!security_xfrm_state_pol_flow_match(x, pol, fl))
return;
 
@@ -1032,7 +1033,9 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, 
struct xfrm_state *x,
*acq_in_progress = 1;
} else if (x->km.state == XFRM_STATE_ERROR ||
   x->km.state == XFRM_STATE_EXPIRED) {
-   if (xfrm_selector_match(>sel, fl, x->sel.family) &&
+   if ((!x->sel.family ||
+(x->sel.family == family &&
+ xfrm_selector_match(>sel, fl, family))) &&
security_xfrm_state_pol_flow_match(x, pol, fl))
*error = -ESRCH;
}
@@ -1072,7 +1075,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const 
xfrm_address_t *saddr,
tmpl->mode == x->props.mode &&
tmpl->id.proto == x->id.proto &&
(tmpl->id.spi == x->id.spi || !tmpl->id.spi))
-   xfrm_state_look_at(pol, x, fl, encap_family,
+   xfrm_state_look_at(pol, x, fl, family,
   , _in_progress, );
}
if (best || acquire_in_progress)
@@ -1089,7 +1092,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const 
xfrm_address_t *saddr,
tmpl->mode == x->props.mode &&
tmpl->id.proto == x->id.proto &&
(tmpl->id.spi == x->id.spi || !tmpl->id.spi))
-   xfrm_state_look_at(pol, x, fl, encap_family,
+   xfrm_state_look_at(pol, x, fl, family,
   , _in_progress, );
}
 
-- 
Email: Herbert Xu 
Home Page:

Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-24 Thread Huang, Ying

Hi, Andrew,

Andrew Morton  writes:

> On Wed, 23 Sep 2020 09:42:51 -0400 Rafael Aquini  wrote:
>
>> On Tue, Sep 22, 2020 at 12:47:50PM -0700, Andrew Morton wrote:
>> > On Tue, 22 Sep 2020 14:48:38 -0400 Rafael Aquini  wrote:
>> > 
>> > > The swap area descriptor only gets struct swap_cluster_info *cluster_info
>> > > allocated if the swapfile is backed by non-rotational storage.
>> > > When the swap area is laid on top of ordinary disk spindles, 
>> > > lock_cluster()
>> > > will naturally return NULL.
>> > > 
>> > > CONFIG_THP_SWAP exposes cluster_info infrastructure to a broader number 
>> > > of
>> > > use cases, and split_swap_cluster(), which is the counterpart of 
>> > > split_huge_page()
>> > > for the THPs in the swapcache, misses checking the return of 
>> > > lock_cluster before
>> > > operating on the cluster_info pointer.
>> > > 
>> > > This patch addresses that issue by adding a proper check for the pointer
>> > > not being NULL in the wrappers cluster_{is,clear}_huge(), in order to 
>> > > avoid
>> > > crashes similar to the one below:
>> > > 
>> > > ...
>> > >
>> > > Fixes: 59807685a7e77 ("mm, THP, swap: support splitting THP for THP swap 
>> > > out")
>> > > Signed-off-by: Rafael Aquini 
>> > 
>> > Did you consider cc:stable?
>> >
>> 
>> UGH! I missed adding it to my cc list. Shall I just forward it, now, or
>> do you prefer a fresh repost?
>
> I added the cc:stable to my copy.

Please don't merge this patch.  This patch doesn't fix the bug, but hide
the real bug.  I will work with Rafael on root causing and fixing.

Best Regards,
Huang, Ying

Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

2020-09-24 Thread Dave Young

Hi,

On 09/24/20 at 01:16pm, boris.ostrov...@oracle.com wrote:
> 
> On 9/24/20 12:43 PM, Michael Kelley wrote:
> > From: Eric W. Biederman  Sent: Thursday, September 
> > 24, 2020 9:26 AM
> >> Michael Kelley  writes:
> >>
> > Added Hyper-V people and people who created the param, it is below
> > commit, I also want to remove it if possible, let's see how people
> > think, but the least way should be to disable the auto setting in both 
> > systemd
> > and kernel:
> >>> Hyper-V uses a notifier to inform the host system that a Linux VM has
> >>> panic'ed.  Informing the host is particularly important in a public cloud
> >>> such as Azure so that the cloud software can alert the customer, and can
> >>> track cloud-wide reliability statistics.   Whether a kdump is taken is 
> >>> controlled
> >>> entirely by the customer and how he configures the VM, and we want
> >>> the host to be informed either way.
> >> Why?
> >>
> >> Why does the host care?
> >> Especially if the VM continues executing into a kdump kernel?
> > The host itself doesn't care.  But the host is a convenient out-of-band
> > channel for recording that a panic has occurred and to collect basic data
> > about the panic.  This out-of-band channel is then used to notify the end
> > customer that his VM has panic'ed.  Sure, the customer should be running
> > his own monitoring software, but customers don't always do what they
> > should.  Equally important, the out-of-band channel allows the cloud
> > infrastructure software to notice trends, such as that the rate of Linux
> > panics has increased, and that perhaps there is a cloud problem that
> > should be investigated.
> 
> 
> In many cases (especially in cloud environment) your dump device is remote 
> (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity 
> issues (which could be cause of the panic in the first place). So it is quite 
> desirable to inform the infrastructure that the VM is on its way out without 
> waiting for kdump to complete.

That can probably be done in kdump kernel if it is really needed.  Say
informing host that panic happened and a kdump kernel is runnning.

But I think to set crash_kexec_post_notifiers by default is still bad. 

> 
> 
> >
> >> Further like I have mentioned everytime something like this has come up
> >> a call on the kexec on panic code path should be a direct call (That can
> >> be audited) not something hidden in a notifier call chain (which can not).
> >>
> 
> We btw already have a direct call from panic() to kmsg_dump() which is 
> indirectly controlled by crash_kexec_post_notifiers, and it would also be 
> preferable to be able to call it before kdump as well.

Right, that is the same thing we are talking about.

Thanks
Dave

Re: [PATCH] net/ethernet/broadcom: fix spelling typo

2020-09-24 Thread David Miller

From: Wang Qing 
Date: Thu, 24 Sep 2020 14:50:24 +0800

> Modify the comment typo: "compliment" -> "complement".
> 
> Signed-off-by: Wang Qing 

Applied, thank you.

Re: [PATCH v2 seccomp 3/6] seccomp/cache: Add "emulator" to check if filter is arg-dependent

2020-09-24 Thread YiFei Zhu

[resending this, forgot to hit reply all...]

On Thu, Sep 24, 2020 at 6:25 PM Kees Cook  wrote:
> I'm not interested in seccomp having a config option for this. It should
> entire exist or not, and that depends on the per-architecture support.
> You mentioned in another thread that you wanted it to let people play
> with this support in some way. Can you elaborate on this? My perspective
> is that of distro and vendor kernels: there is _one_ config and end
> users can't really do anything about it without rolling their own
> kernels.

That's one. The other is to allow future optional extensions, like
syscall-argument-capable accelerators.

Distro / vendor kernels will keep defaults anyways, no?

> So, as Jann pointed out, using NR_syscalls only accidentally works --
> they're actually different sizes and there isn't strictly any reason to
> expect one to be smaller than another. So, we need to either choose the
> max() in asm/linux/seccomp.h or be more efficient with space usage and
> use explicitly named bitmaps (how my v1 does things).

Right.

> This isn't used in this patch; likely leftover/in need of moving?

Correct. Will remove.

> I moved this up in the structure to see if I could benefit from cache
> line sharing. In either case, we must verify (with "pahole") that we do
> not induce massive padding in the struct.
>
> But yes, attaching this to the filter is the right way to go.

Right. I don't think it would cause massive padding with all I know
about padding learnt from [1].

I'm used to use gdb to look at structure layout, and this is what I see:
(gdb) ptype /o struct seccomp_filter
/* offset|  size */  type = struct seccomp_filter {
/*0  | 4 */refcount_t refs;
/*4  | 4 */refcount_t users;
/*8  | 1 */bool log;
/* XXX  7-byte hole  */
/*   16  | 8 */struct seccomp_filter *prev;
[...]
/*  264  |   112 */struct seccomp_cache_filter_data {
/*  264  |   112 */unsigned long syscall_ok[2][7];

   /* total size (bytes):  112 */
   } cache;

   /* total size (bytes):  376 */
 }

The bitmaps are long-aligned; so is the prev-pointer. If we want we
can put the cache struct right before prev and that should not
introduce any new holes. It's the refcounts and the bool that's not
cooperative.

> nit: "ok" is too vague. We mean either "constant action" or "allow" (or
> "filter" in the negative case).

Right.

> Why is this split out? (i.e. why is it not just a self-contained loop
> the way Jann wrote it?)

Because my brain thinks like a finite state machine and this function
is a state transition. ;) Though yeah I agree a loop is probably more
readable.

> I appreciate the -errno intent, but it actually risks making these
> changes break existing userspace filters: if something is unhandled in
> the emulator in a way we don't find during design and testing, the
> filter load will actually _fail_ instead of just falling back to "run
> filter". Failures should be reported (WARN_ON_ONCE()), but my v1
> intentionally lets this continue.

Right.

> This version appears to have removed all the comments; I liked Jann's
> comments and I had rearranged things a bit to make it more readable
> (IMO) for people that do not immediate understand BPF. :)

Right.

> > +/**
> > + * seccomp_cache_prepare - emulate the filter to find cachable syscalls
> > + * @sfilter: The seccomp filter
> > + *
> > + * Returns 0 if successful or -errno if error occurred.
> > + */
> > +int seccomp_cache_prepare(struct seccomp_filter *sfilter)
> > +{
> > + struct sock_fprog_kern *fprog = sfilter->prog->orig_prog;
> > + struct sock_filter *filter = fprog->filter;
> > + int arch, nr, res = 0;
> > +
> > + for (arch = 0; arch < ARRAY_SIZE(syscall_arches); arch++) {
> > + for (nr = 0; nr < NR_syscalls; nr++) {
> > + struct seccomp_emu_env env = {0};

Btw, do you know what is the initial state of the A register at the
start of BPF execution? In my RFC I assumed it's unknown but then in
v1 after the "reg_known" removal the register is assumed to be 0. Idk
if it is correct to assume so.

> I don't really like the complexity here, passing around syscall_ok, etc.
> I feel like seccomp_emu_step() should be self-contained to say "allow or
> filter" directly.

Ok.

> I also prefer an inversion to the logic: if we start bitmaps as "default
> allow", we only ever increase the filtering cases: we can never
> accidentally ADD an allow to the bitmap. (This was an intentional design
> in the RFC and v1 to do as much as possible to fail safe.)

Wait why? If it's default allow, what if you hit an error? You can
accidentally not remove an allow from the bitmap, and that is much
more of an issue than accidentally not add an allow. I don't
understand your reasoning of "accidentally ADD an allow", an action
will only occur when everything is

Re: [net] net: mscc: ocelot: fix fields offset in SG_CONFIG_REG_3

2020-09-24 Thread David Miller

From: Xiaoliang Yang 
Date: Thu, 24 Sep 2020 10:11:13 +0800

> INIT_IPS and GATE_ENABLE fields have a wrong offset in SG_CONFIG_REG_3.
> This register is used by stream gate control of PSFP, and it has not
> been used before, because PSFP is not implemented in ocelot driver.
> 
> Signed-off-by: Xiaoliang Yang 

Applied and queued up for -stable.

Re: [PATCH net] hinic: fix wrong return value of mac-set cmd

2020-09-24 Thread David Miller

From: Luo bin 
Date: Thu, 24 Sep 2020 09:31:51 +0800

> It should also be regarded as an error when hw return status=4 for PF's
> setting mac cmd. Only if PF return status=4 to VF should this cmd be
> taken special treatment.
> 
> Fixes: 7dd29ee12865 ("hinic: add sriov feature support")
> Signed-off-by: Luo bin 

Applied and queued up for -stable.

Re: [net] net: dsa: felix: convert TAS link speed based on phylink speed

2020-09-24 Thread David Miller

From: Xiaoliang Yang 
Date: Thu, 24 Sep 2020 09:57:46 +0800

> state->speed holds a value of 10, 100, 1000 or 2500, but
> QSYS_TAG_CONFIG_LINK_SPEED expects a value of 0, 1, 2, 3. So convert the
> speed to a proper value.
> 
> Fixes: de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via 
> taprio offload")
> Signed-off-by: Xiaoliang Yang 
> Reviewed-by: Vladimir Oltean 

Applied and queued up for -stable.

[PATCH] Revert "net: ethernet: ixgbe: check the return value of ixgbe_mii_bus_init()"

2020-09-24 Thread Yongxin Liu

This reverts commit 09ef193fef7efb0175a04634853862d717adbb95.

For C3000 family of SoCs, they have four ixgbe devices sharing a single MDIO 
bus.
ixgbe_mii_bus_init() returns -ENODEV for other three devices. The propagation
of the error code makes other three ixgbe devices unregistered.

Signed-off-by: Yongxin Liu 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2f8a4cfc5fa1..5e5223becf86 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -11031,14 +11031,10 @@ static int ixgbe_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
IXGBE_LINK_SPEED_10GB_FULL | IXGBE_LINK_SPEED_1GB_FULL,
true);
 
-   err = ixgbe_mii_bus_init(hw);
-   if (err)
-   goto err_netdev;
+   ixgbe_mii_bus_init(hw);
 
return 0;
 
-err_netdev:
-   unregister_netdev(netdev);
 err_register:
ixgbe_release_hw_control(adapter);
ixgbe_clear_interrupt_scheme(adapter);
-- 
2.14.4

Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference

2020-09-24 Thread Andrew Morton

On Wed, 23 Sep 2020 09:42:51 -0400 Rafael Aquini  wrote:

> On Tue, Sep 22, 2020 at 12:47:50PM -0700, Andrew Morton wrote:
> > On Tue, 22 Sep 2020 14:48:38 -0400 Rafael Aquini  wrote:
> > 
> > > The swap area descriptor only gets struct swap_cluster_info *cluster_info
> > > allocated if the swapfile is backed by non-rotational storage.
> > > When the swap area is laid on top of ordinary disk spindles, 
> > > lock_cluster()
> > > will naturally return NULL.
> > > 
> > > CONFIG_THP_SWAP exposes cluster_info infrastructure to a broader number of
> > > use cases, and split_swap_cluster(), which is the counterpart of 
> > > split_huge_page()
> > > for the THPs in the swapcache, misses checking the return of lock_cluster 
> > > before
> > > operating on the cluster_info pointer.
> > > 
> > > This patch addresses that issue by adding a proper check for the pointer
> > > not being NULL in the wrappers cluster_{is,clear}_huge(), in order to 
> > > avoid
> > > crashes similar to the one below:
> > > 
> > > ...
> > >
> > > Fixes: 59807685a7e77 ("mm, THP, swap: support splitting THP for THP swap 
> > > out")
> > > Signed-off-by: Rafael Aquini 
> > 
> > Did you consider cc:stable?
> >
> 
> UGH! I missed adding it to my cc list. Shall I just forward it, now, or
> do you prefer a fresh repost?

I added the cc:stable to my copy.

Re: [MPTCP][PATCH net-next 00/16] mptcp: RM_ADDR/ADD_ADDR enhancements

2020-09-24 Thread David Miller

From: Geliang Tang 
Date: Thu, 24 Sep 2020 08:29:46 +0800

> This series include two enhancements for the MPTCP path management,
> namely RM_ADDR support and ADD_ADDR echo support, as specified by RFC
> sections 3.4.1 and 3.4.2.
> 
> 1 RM_ADDR support include 9 patches (1-3 and 8-13):
> 
> Patch 1 is the helper for patch 2, these two patches add the RM_ADDR
> outgoing functions, which are derived from ADD_ADDR's corresponding
> functions.
> 
> Patch 3 adds the RM_ADDR incoming logic, when RM_ADDR suboption is
> received, close the subflow matching the rm_id, and update PM counter.
> 
> Patch 8 is the main remove routine. When the PM netlink removes an address,
> we traverse all the existing msk sockets to find the relevant sockets. Then
> trigger the RM_ADDR signal and remove the subflow which using this local
> address, this subflow removing functions has been implemented in patch 9.
> 
> Finally, patches 10-13 are the self-tests for RM_ADDR.
> 
> 2 ADD_ADDR echo support include 7 patches (4-7 and 14-16).
> 
> Patch 4 adds the ADD_ADDR echo logic, when the ADD_ADDR suboption has been
> received, send out the same ADD_ADDR suboption with echo-flag, and no HMAC
> included.
> 
> Patches 5 and 6 are the self-tests for ADD_ADDR echo. Patch 7 is a little
> cleaning up.
> 
> Patch 14 and 15 are the helpers for patch 16. These three patches add
> the ADD_ADDR retransmition when no ADD_ADDR echo is received.

Series applied, thank you.

Re: [PATCH net v2] drivers/net/wan/x25_asy: Correct the ndo_open and ndo_stop functions

2020-09-24 Thread David Miller

From: Xie He 
Date: Wed, 23 Sep 2020 11:18:18 -0700

> 1.
> Move the lapb_register/lapb_unregister calls into the ndo_open/ndo_stop
> functions.
> This makes the LAPB protocol start/stop when the network interface
> starts/stops. When the network interface is down, the LAPB protocol
> shouldn't be running and the LAPB module shoudn't be generating control
> frames.
> 
> 2.
> Move netif_start_queue/netif_stop_queue into the ndo_open/ndo_stop
> functions.
> This makes the TX queue start/stop when the network interface
> starts/stops.
> (netif_stop_queue was originally in the ndo_stop function. But to make
> the code look better, I created a new function to use as ndo_stop, and
> made it call the original ndo_stop function. I moved netif_stop_queue
> from the original ndo_stop function to the new ndo_stop function.)
> 
> Cc: Martin Schiller 
> Signed-off-by: Xie He 

Applied, thank you.

Re: [PATCH v4 6/6] io_uring: add support for zone-append

2020-09-24 Thread Damien Le Moal

On 2020/09/25 2:20, Kanchan Joshi wrote:
> On Tue, Sep 8, 2020 at 8:48 PM h...@infradead.org  wrote:
>>
>> On Mon, Sep 07, 2020 at 12:31:42PM +0530, Kanchan Joshi wrote:
>>> But there are use-cases which benefit from supporting zone-append on
>>> raw block-dev path.
>>> Certain user-space log-structured/cow FS/DB will use the device that
>>> way. Aerospike is one example.
>>> Pass-through is synchronous, and we lose the ability to use io-uring.
>>
>> So use zonefs, which is designed exactly for that use case.
> 
> Not specific to zone-append, but in general it may not be good to lock
> new features/interfaces to ZoneFS alone, given that direct-block
> interface has its own merits.
> Mapping one file to a one zone is good for some use-cases, but
> limiting for others.
> Some user-space FS/DBs would be more efficient (less meta, indirection)
> with the freedom to decide file-to-zone mapping/placement.

There is no metadata in zonefs. One file == one zone and the mapping between
zonefs files and zones is static, determined at mount time simply using report
zones. Zonefs files cannot be renamed nor deleted in anyway. Choosing a zonefs
file *is* the same as choosing a zone. Zonfes is *not* a POSIX file system doing
dynamic block allocation to files. The backing storage of files in zonefs is
static and fixed to the zone they represent. The difference between zonefs vs
raw zoned block device is the API that has to be used by the application, that
is, file descriptor representing the entire disk for raw disk vs file descriptor
representing one zone in zonefs. Note that the later has *a lot* of advantages
over the former: enables O_APPEND use, protects against bugs with user write
offsets mistakes, adds consistency of cached data against zone resets, and more.

> - Rocksdb and those LSM style DBs would map SSTable to zone, but
> SSTable file may be two small (initially) and may become too large
> (after compaction) for a zone.

You are contradicting yourself here. If a SSTable is mapped to a zone, then its
size cannot exceed the zone capacity, regardless of the interface used to access
the zones. And except for L0 tables which can be smaller (and are in memory
anyway), all levels tables have the same maximum size, which for zoned drives
must be the zone capacity. In any case, solving any problem in this area does
not depend in any way on zonefs vs raw disk interface. The implementation will
differ depending on the chosen interface, but what needs to be done to map
SSTables to zones is the same in both cases.

> - The internal parallelism of a single zone is a design-choice, and
> depends on the drive. Writing multiple zones parallely (striped/raid
> way) can give better performance than writing on one. In that case one
> would want to file that seamlessly combines multiple-zones in a
> striped fashion.

Then write a FS for that... Or have a library do it in user space. For the
library case, the implementation will differ for zonefs vs raw disk due to the
different API (regular file vs block devicer file), but the principles to follow
for stripping zones into a single storage object remain the same.

> Also it seems difficult (compared to block dev) to fit simple-copy TP
> in ZoneFS. The new
> command needs: one NVMe drive, list of source LBAs and one destination
> LBA. In ZoneFS, we would deal with N+1 file-descriptors (N source zone
> file, and one destination zone file) for that. While with block
> interface, we do not need  more than one file-descriptor representing
> the entire device. With more zone-files, we face open/close overhead too.

Are you expecting simple-copy to allow requests that are not zone aligned ? I do
not think that will ever happen. Otherwise, the gotcha cases for it would be far
too numerous. Simple-copy is essentially an optimized regular write command.
Similarly to that command, it will not allow copies over zone boundaries and
will need the destination LBA to be aligned to the destination zone WP. I have
not checked the TP though and given the NVMe NDA, I will stop the discussion 
here.

filesend() could be used as the interface for simple-copy. Implementing that in
zonefs would not be that hard. What is your plan for simple-copy interface for
raw block device ? An  ioctl ? filesend() too ? As as with any other user level
API, we should not be restricted to a particular device type if we can avoid it,
so in-kernel emulation of the feature is needed for devices that do not have
simple-copy or scsi extended copy. filesend() seems to me like the best choice
since all of that is already implemented there.

As for the open()/close() overhead for zonefs, may be some use cases may suffer
from it, but my tests with LevelDB+zonefs did not show any significant
difference. zonefs open()/close() operations are way faster than for a regular
file system since there is no metadata and all inodes always exist in-memory.
And zonefs() now supports MAR/MOR limits for O_WRONLY open(). That can simplify

Re: [v3 1/2] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged

2020-09-24 Thread Andrew Morton

On Wed, 23 Sep 2020 14:27:30 -0700 Vijay Balakrishna 
 wrote:

> Can this patch be included?  As Kirill is ok with patch now.

He is?  I can't immediately find that email.

Do we have an acked-by?

Re: [External] Re: [PATCH] tasklet: Introduce tasklet tracepoints

2020-09-24 Thread Muchun Song

On Fri, Sep 25, 2020 at 1:59 AM Thomas Gleixner  wrote:
>
> On Sat, Sep 05 2020 at 14:04, Muchun Song wrote:
>
> > Introduce tracepoints for tasklets just like softirq does.
>
> What does softirq?

I mean that the softirq has tracepoints and the tasklet may
also be needed.

>
> > In this case, we can calculate tasklet latency and know what tasklet
> > run.
>
> I rather see people working on removal of tasklets.

I do not know about this, why do we remove the tasklet?

>
> Thanks,
>
> tglx

Thanks.

-- 
Yours,
Muchun

Re: [PATCH v2] mm/mempool: Add 'else' to split mutually exclusive case

2020-09-24 Thread linmiaohe

Andrew Morton  wrote:
> On Thu, 24 Sep 2020 07:16:41 -0400 Miaohe Lin  wrote:
>
>> Add else to split mutually exclusive case and avoid some unnecessary check.
>> It doesn't seem to change code generation (compiler is smart), but I 
>> think it helps readability.
>
> OK, I guess.  But the comments are now in the wrong place.
>

Many thanks for kindly reply and fix the wrong comments place.
Have a good day!

> --- a/mm/mempool.c~mm-mempool-add-else-to-split-mutually-exclusive-case-fix
> +++ a/mm/mempool.c
> @@ -60,8 +60,8 @@ static void check_element(mempool_t *poo
>   /* Mempools backed by slab allocator */
>   if (pool->free == mempool_free_slab || pool->free == mempool_kfree) {
>   __check_element(pool, element, ksize(element));
> - /* Mempools backed by page allocator */
>   } else if (pool->free == mempool_free_pages) {
> + /* Mempools backed by page allocator */
>   int order = (int)(long)pool->pool_data;
>   void *addr = kmap_atomic((struct page *)element);
>  
> @@ -83,8 +83,8 @@ static void poison_element(mempool_t *po
>   /* Mempools backed by slab allocator */
>   if (pool->alloc == mempool_alloc_slab || pool->alloc == 
> mempool_kmalloc) {
>   __poison_element(element, ksize(element));
> - /* Mempools backed by page allocator */
>   } else if (pool->alloc == mempool_alloc_pages) {
> + /* Mempools backed by page allocator */
>   int order = (int)(long)pool->pool_data;
>   void *addr = kmap_atomic((struct page *)element);
>  
> _
>
>

Re: [PATCH RFC 3/4] mm/page_alloc: always move pages to the tail of the freelist in unset_migratetype_isolate()

2020-09-24 Thread Wei Yang

On Thu, Sep 24, 2020 at 01:13:29PM +0200, Vlastimil Babka wrote:
>On 9/16/20 8:34 PM, David Hildenbrand wrote:
>> Page isolation doesn't actually touch the pages, it simply isolates
>> pageblocks and moves all free pages to the MIGRATE_ISOLATE freelist.
>> 
>> We already place pages to the tail of the freelists when undoing
>> isolation via __putback_isolated_page(), let's do it in any case
>> (e.g., if order == pageblock_order) and document the behavior.
>> 
>> This change results in all pages getting onlined via online_pages() to
>> be placed to the tail of the freelist.
>> 
>> Cc: Andrew Morton 
>> Cc: Alexander Duyck 
>> Cc: Mel Gorman 
>> Cc: Michal Hocko 
>> Cc: Dave Hansen 
>> Cc: Vlastimil Babka 
>> Cc: Wei Yang 
>> Cc: Oscar Salvador 
>> Cc: Mike Rapoport 
>> Cc: Scott Cheloha 
>> Cc: Michael Ellerman 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  include/linux/page-isolation.h |  2 ++
>>  mm/page_alloc.c| 36 +-
>>  mm/page_isolation.c|  8 ++--
>>  3 files changed, 39 insertions(+), 7 deletions(-)
>> 
>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>> index 572458016331..a36be2cf4dbb 100644
>> --- a/include/linux/page-isolation.h
>> +++ b/include/linux/page-isolation.h
>> @@ -38,6 +38,8 @@ struct page *has_unmovable_pages(struct zone *zone, struct 
>> page *page,
>>  void set_pageblock_migratetype(struct page *page, int migratetype);
>>  int move_freepages_block(struct zone *zone, struct page *page,
>>  int migratetype, int *num_movable);
>> +int move_freepages_block_tail(struct zone *zone, struct page *page,
>> +  int migratetype);
>>  
>>  /*
>>   * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index bba9a0f60c70..75b0f49b4022 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -899,6 +899,15 @@ static inline void move_to_free_list(struct page *page, 
>> struct zone *zone,
>>  list_move(>lru, >free_list[migratetype]);
>>  }
>>  
>> +/* Used for pages which are on another list */
>> +static inline void move_to_free_list_tail(struct page *page, struct zone 
>> *zone,
>> +  unsigned int order, int migratetype)
>> +{
>> +struct free_area *area = >free_area[order];
>> +
>> +list_move_tail(>lru, >free_list[migratetype]);
>> +}
>
>There are just 3 callers of move_to_free_list() before this patch, I would just
>add the to_tail parameter there instead of new wrapper. For callers with
>constant parameter, the inline will eliminate it anyway.

Got the same feeling :-)

>
>>  static inline void del_page_from_free_list(struct page *page, struct zone 
>> *zone,
>> unsigned int order)
>>  {
>> @@ -2323,7 +2332,7 @@ static inline struct page 
>> *__rmqueue_cma_fallback(struct zone *zone,
>>   */
>>  static int move_freepages(struct zone *zone,
>>struct page *start_page, struct page *end_page,
>> -  int migratetype, int *num_movable)
>> +  int migratetype, int *num_movable, bool to_tail)
>>  {
>>  struct page *page;
>>  unsigned int order;
>> @@ -2354,7 +2363,10 @@ static int move_freepages(struct zone *zone,
>>  VM_BUG_ON_PAGE(page_zone(page) != zone, page);
>>  
>>  order = page_order(page);
>> -move_to_free_list(page, zone, order, migratetype);
>> +if (to_tail)
>> +move_to_free_list_tail(page, zone, order, migratetype);
>> +else
>> +move_to_free_list(page, zone, order, migratetype);
>>  page += 1 << order;
>>  pages_moved += 1 << order;
>>  }
>> @@ -2362,8 +2374,9 @@ static int move_freepages(struct zone *zone,
>>  return pages_moved;
>>  }
>>  
>> -int move_freepages_block(struct zone *zone, struct page *page,
>> -int migratetype, int *num_movable)
>> +static int __move_freepages_block(struct zone *zone, struct page *page,
>> +  int migratetype, int *num_movable,
>> +  bool to_tail)
>>  {
>>  unsigned long start_pfn, end_pfn;
>>  struct page *start_page, *end_page;
>> @@ -2384,7 +2397,20 @@ int move_freepages_block(struct zone *zone, struct 
>> page *page,
>>  return 0;
>>  
>>  return move_freepages(zone, start_page, end_page, migratetype,
>> -num_movable);
>> +  num_movable, to_tail);
>> +}
>> +
>> +int move_freepages_block(struct zone *zone, struct page *page,
>> + int migratetype, int *num_movable)
>> +{
>> +return __move_freepages_block(zone, page, migratetype, num_movable,
>> +  false);
>> +}
>> +
>> +int move_freepages_block_tail(struct

Re: [PATCH v3 1/5] fpga: dfl: rename the bus type "dfl" to "fpga-dfl"

2020-09-24 Thread Moritz Fischer

On Thu, Sep 24, 2020 at 12:01:55PM -0700, Tom Rix wrote:
> 
> On 9/24/20 9:59 AM, Xu Yilun wrote:
> > Now the DFL device drivers could be made as independent modules and put
> > in different subsystems according to their functionalities. So the name
> > should be descriptive and unique in the whole kernel.
> >
> > The patch changes the naming of dfl bus related structures, functions,
> > APIs and documentations.
> 
> This set is largely a mechanical change, it looks ok.
> 
> 
> The big thing i would recommend giving some thought is
> 
> include/uabi/linux/fpga-dfl.h uses the prefix DFL_FPGA
> 
> this change uses fpga_dfl.
> 
> I do not think it matters where 'fpga' is in the internal names,
> 
> so for consistency with the external interface the prefix should dfl_fpga.
> 
> >
> > Signed-off-by: Xu Yilun 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-dfl  |  15 --
> >  Documentation/ABI/testing/sysfs-bus-fpga-dfl |  15 ++
> >  MAINTAINERS  |   2 +-
> >  drivers/fpga/dfl.c   | 254 
> > ++-
> >  drivers/fpga/dfl.h   |  77 
> >  5 files changed, 184 insertions(+), 179 deletions(-)
> >  delete mode 100644 Documentation/ABI/testing/sysfs-bus-dfl
> >  create mode 100644 Documentation/ABI/testing/sysfs-bus-fpga-dfl
> >
> > diff --git a/Documentation/ABI/testing/sysfs-bus-dfl 
> > b/Documentation/ABI/testing/sysfs-bus-dfl
> > deleted file mode 100644
> > index 23543be..000
> > --- a/Documentation/ABI/testing/sysfs-bus-dfl
> > +++ /dev/null
> > @@ -1,15 +0,0 @@
> > -What:  /sys/bus/dfl/devices/dfl_dev.X/type
> > -Date:  Aug 2020
> > -KernelVersion: 5.10
> > -Contact:   Xu Yilun 
> > -Description:   Read-only. It returns type of DFL FIU of the device. 
> > Now DFL
> > -   supports 2 FIU types, 0 for FME, 1 for PORT.
> > -   Format: 0x%x
> > -
> > -What:  /sys/bus/dfl/devices/dfl_dev.X/feature_id
> > -Date:  Aug 2020
> > -KernelVersion: 5.10
> > -Contact:   Xu Yilun 
> > -Description:   Read-only. It returns feature identifier local to its 
> > DFL FIU
> > -   type.
> > -   Format: 0x%x
> > diff --git a/Documentation/ABI/testing/sysfs-bus-fpga-dfl 
> > b/Documentation/ABI/testing/sysfs-bus-fpga-dfl
> > new file mode 100644
> > index 000..072decf
> > --- /dev/null
> > +++ b/Documentation/ABI/testing/sysfs-bus-fpga-dfl
> > @@ -0,0 +1,15 @@
> > +What:  /sys/bus/fpga-dfl/devices/fpga_dfl_dev.X/type
> > +Date:  Aug 2020
> > +KernelVersion: 5.10
> > +Contact:   Xu Yilun 
> > +Description:   Read-only. It returns type of FPGA DFL FIU of the 
> > device. Now
> > +   FPGA DFL supports 2 FIU types, 0 for FME, 1 for PORT.
> > +   Format: 0x%x
> > +
> > +What:  /sys/bus/fpga-dfl/devices/fpga_dfl_dev.X/feature_id
> 
> fpga-dfl and fpga_dfl, the prefix needs to be consistent or..
> 
> sys/bus/fpga-dfl/ already has the fpga, dfl info and the second one is 
> redundant so could be
> 
> /sys/bus/fpga-dfl/devices/dev.X/feature_id
> 
> 
> re: mfd's comment.
> 
> This is a new, unreleased interface, now is the only we can change it, so 
> lets do it now.

Imho the question is more whether it is worth the churn on the sysfs
part or not ...

I think changing the stuff that touches files where other subsystems are
involved is more critical, and that's how I read Greg's comment.
> 
> I'd even suggest splitting the patch to get the change into 5.10.

If we think that's enough of a big deal that this needs to happen, then
yes, splitting the sysfs part out into a separate patch will increase
the likelihood of it actually happening.

> 
> Tom
> 
> > +Date:  Aug 2020
> > +KernelVersion: 5.10
> > +Contact:   Xu Yilun 
> > +Description:   Read-only. It returns feature identifier local to its 
> > FPGA DFL
> > +   FIU type.
> > +   Format: 0x%x
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 0924930..48c0859 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -6839,7 +6839,7 @@ M:Wu Hao 
> >  R: Tom Rix 
> >  L: linux-f...@vger.kernel.org
> >  S: Maintained
> > -F: Documentation/ABI/testing/sysfs-bus-dfl
> > +F: Documentation/ABI/testing/sysfs-bus-fpga-dfl
> >  F: Documentation/fpga/dfl.rst
> >  F: drivers/fpga/dfl*
> >  F: include/uapi/linux/fpga-dfl.h
> > diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
> > index b450870..f146cda 100644
> > --- a/drivers/fpga/dfl.c
> > +++ b/drivers/fpga/dfl.c
> > @@ -20,10 +20,10 @@ static DEFINE_MUTEX(dfl_id_mutex);
> >  
> >  /*
> >   * when adding a new feature dev support in DFL framework, it's required to
> > - * add a new item in enum dfl_id_type and provide related information in 
> > below
> > - * dfl_devs table which is indexed by dfl_id_type, e.g. name string used 
> > for
> > - * platform device creation (define name strings in dfl.h, as they could be
> > - * reused by

Re: [PATCH] ide/falconide: Fix module unload

2020-09-24 Thread Michael Schmitz


Hi Finn,

thanks for catching this!

Reviewed-By: Michael Schmitz 

Am 25.09.2020 um 13:39 schrieb Finn Thain:

Unloading the falconide module results in a crash:

Unable to handle kernel NULL pointer dereference at virtual address 
Oops: 
Modules linked in: falconide(-)
PC: [<002930b2>] ide_host_remove+0x2e/0x1d2
SR: 2000  SP: 00b49e28  a2: 009b0f90
d0: d1: 009b0f90d2: d3: 00b48000
d4: 003cef32d5: 00299188a0: 0086d000a1: 0086d000
Process rmmod (pid: 322, task=009b0f90)
Frame format=7 eff addr= ssw=0505 faddr=
wb 1 stat/addr/data:   
wb 2 stat/addr/data:   
wb 3 stat/addr/data:   00018da9
push data:    
Stack from 00b49e90:
004c456a 0027f176 0027cb0a 0027cb9e  0086d00a 2187d3f0 0027f0e0
00b49ebc 2187d1f6  00b49ec8 002811e8 0086d000 00b49ef0 0028024c
0086d00a 002800d6 00279a1a 0001 0001 0086d00a 2187d3f0 00279a58
00b49f1c 002802e0 0086d00a 2187d3f0 004c456a 0086d00a ef96af74 
2187d3f0 002805d2 800de064 00b49f44 0027f088 2187d3f0 00ac1cf4 2187d3f0
004c43be 2187d3f0  2187d3f0 800b66a8 00b49f5c 00280776 2187d3f0
Call Trace: [<0027f176>] __device_driver_unlock+0x0/0x48
 [<0027cb0a>] device_links_busy+0x0/0x94
 [<0027cb9e>] device_links_unbind_consumers+0x0/0x130
 [<0027f0e0>] __device_driver_lock+0x0/0x5a
 [<2187d1f6>] falconide_remove+0x12/0x18 [falconide]
 [<002811e8>] platform_drv_remove+0x1c/0x28
 [<0028024c>] device_release_driver_internal+0x176/0x17c
 [<002800d6>] device_release_driver_internal+0x0/0x17c
 [<00279a1a>] get_device+0x0/0x22
 [<00279a58>] put_device+0x0/0x18
 [<002802e0>] driver_detach+0x56/0x82
 [<002805d2>] driver_remove_file+0x0/0x24
 [<0027f088>] bus_remove_driver+0x4c/0xa4
 [<00280776>] driver_unregister+0x28/0x5a
 [<00281a00>] platform_driver_unregister+0x12/0x18
 [<2187d2a0>] ide_falcon_driver_exit+0x10/0x16 [falconide]
 [<000764f0>] sys_delete_module+0x110/0x1f2
 [<000e83ea>] sys_rename+0x1a/0x1e
 [<2e0c>] syscall+0x8/0xc
 [<00188004>] ext4_multi_mount_protect+0x35a/0x3ce
Code: 0029 9188 4bf9 0027 aa1c 283c 003c ef32 <265c> 4a8b 6700 00b8 2043 2028 
000c 0280 00ff ff00 6600 0176 40c0 7202 b2b9 004c
Disabling lock debugging due to kernel taint

This happens because the driver_data pointer is uninitialized.
Add the missing platform_set_drvdata() call. For clarity, use the
matching platform_get_drvdata() as well.

Cc: Michael Schmitz 
Cc: Bartlomiej Zolnierkiewicz 
Cc: linux-m...@lists.linux-m68k.org
Fixes: 5ed0794cde593 ("m68k/atari: Convert Falcon IDE drivers to platform 
drivers")
Signed-off-by: Finn Thain 
---
This patch was tested using Aranym.
---
 drivers/ide/falconide.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ide/falconide.c b/drivers/ide/falconide.c
index dbeb2605e5f6e..607c44bc50f1b 100644
--- a/drivers/ide/falconide.c
+++ b/drivers/ide/falconide.c
@@ -166,6 +166,7 @@ static int __init falconide_init(struct platform_device 
*pdev)
if (rc)
goto err_free;

+   platform_set_drvdata(pdev, host);
return 0;
 err_free:
ide_host_free(host);
@@ -176,7 +177,7 @@ static int __init falconide_init(struct platform_device 
*pdev)

 static int falconide_remove(struct platform_device *pdev)
 {
-   struct ide_host *host = dev_get_drvdata(>dev);
+   struct ide_host *host = platform_get_drvdata(pdev);

ide_host_remove(host);

Re: [PATCH v2] mm/mempool: Add 'else' to split mutually exclusive case

2020-09-24 Thread Andrew Morton

On Thu, 24 Sep 2020 07:16:41 -0400 Miaohe Lin  wrote:

> Add else to split mutually exclusive case and avoid some unnecessary check.
> It doesn't seem to change code generation (compiler is smart), but I think
> it helps readability.
> 
> ...
>
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -58,11 +58,10 @@ static void __check_element(mempool_t *pool, void 
> *element, size_t size)
>  static void check_element(mempool_t *pool, void *element)
>  {
>   /* Mempools backed by slab allocator */
> - if (pool->free == mempool_free_slab || pool->free == mempool_kfree)
> + if (pool->free == mempool_free_slab || pool->free == mempool_kfree) {
>   __check_element(pool, element, ksize(element));
> -
>   /* Mempools backed by page allocator */
> - if (pool->free == mempool_free_pages) {
> + } else if (pool->free == mempool_free_pages) {
>   int order = (int)(long)pool->pool_data;
>   void *addr = kmap_atomic((struct page *)element);
>  
> @@ -82,11 +81,10 @@ static void __poison_element(void *element, size_t size)
>  static void poison_element(mempool_t *pool, void *element)
>  {
>   /* Mempools backed by slab allocator */
> - if (pool->alloc == mempool_alloc_slab || pool->alloc == mempool_kmalloc)
> + if (pool->alloc == mempool_alloc_slab || pool->alloc == 
> mempool_kmalloc) {
>   __poison_element(element, ksize(element));
> -
>   /* Mempools backed by page allocator */
> - if (pool->alloc == mempool_alloc_pages) {
> + } else if (pool->alloc == mempool_alloc_pages) {
>   int order = (int)(long)pool->pool_data;
>   void *addr = kmap_atomic((struct page *)element);
>  

OK, I guess.  But the comments are now in the wrong place.

--- a/mm/mempool.c~mm-mempool-add-else-to-split-mutually-exclusive-case-fix
+++ a/mm/mempool.c
@@ -60,8 +60,8 @@ static void check_element(mempool_t *poo
/* Mempools backed by slab allocator */
if (pool->free == mempool_free_slab || pool->free == mempool_kfree) {
__check_element(pool, element, ksize(element));
-   /* Mempools backed by page allocator */
} else if (pool->free == mempool_free_pages) {
+   /* Mempools backed by page allocator */
int order = (int)(long)pool->pool_data;
void *addr = kmap_atomic((struct page *)element);
 
@@ -83,8 +83,8 @@ static void poison_element(mempool_t *po
/* Mempools backed by slab allocator */
if (pool->alloc == mempool_alloc_slab || pool->alloc == 
mempool_kmalloc) {
__poison_element(element, ksize(element));
-   /* Mempools backed by page allocator */
} else if (pool->alloc == mempool_alloc_pages) {
+   /* Mempools backed by page allocator */
int order = (int)(long)pool->pool_data;
void *addr = kmap_atomic((struct page *)element);
 
_

[PATCH 5/5] PCI/ERR: don't mix io state not changed and no driver together

2020-09-24 Thread Ethan Zhao

When we see 'can't recover (no error_detected callback)' on console,
Maybe the reason is io state is not changed by calling
pci_dev_set_io_state(), that is confused. fix it.

Signed-off-by: Ethan Zhao 
Tested-by: Wen jin 
Tested-by: Shanshan Zhang 
---
 drivers/pci/pcie/err.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index e35c4480c86b..d85f27c90c26 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -55,8 +55,10 @@ static int report_error_detected(struct pci_dev *dev,
if (!pci_dev_get(dev))
return 0;
device_lock(>dev);
-   if (!pci_dev_set_io_state(dev, state) ||
-   !dev->driver ||
+   if (!pci_dev_set_io_state(dev, state)) {
+   pci_dbg(dev, "Device might already being in error handling 
...\n");
+   vote = PCI_ERS_RESULT_NONE;
+   } else if (!dev->driver ||
!dev->driver->err_handler ||
!dev->driver->err_handler->error_detected) {
/*
-- 
2.18.4

[PATCH 4/5] PCI: only return true when dev io state is really changed

2020-09-24 Thread Ethan Zhao

When uncorrectable error happens, AER driver and DPC driver interrupt
handlers likely call
   pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with
pci_channel_io_frozen the same time.
   If pci_dev_set_io_state() return true even if the original state is
pci_channel_io_frozen, that will cause AER or DPC handler re-enter
the error detecting and recovery procedure one after another.
   The result is the recovery flow mixed between AER and DPC.
So simplify the pci_dev_set_io_state() function to only return true
when dev->error_state is changed.

Signed-off-by: Ethan Zhao 
Tested-by: Wen jin 
Tested-by: Shanshan Zhang 
---
 drivers/pci/pci.h | 31 +++
 1 file changed, 3 insertions(+), 28 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fa12f7cbc1a0..d420bb977f3b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -362,35 +362,10 @@ static inline bool pci_dev_set_io_state(struct pci_dev 
*dev,
bool changed = false;
 
device_lock_assert(>dev);
-   switch (new) {
-   case pci_channel_io_perm_failure:
-   switch (dev->error_state) {
-   case pci_channel_io_frozen:
-   case pci_channel_io_normal:
-   case pci_channel_io_perm_failure:
-   changed = true;
-   break;
-   }
-   break;
-   case pci_channel_io_frozen:
-   switch (dev->error_state) {
-   case pci_channel_io_frozen:
-   case pci_channel_io_normal:
-   changed = true;
-   break;
-   }
-   break;
-   case pci_channel_io_normal:
-   switch (dev->error_state) {
-   case pci_channel_io_frozen:
-   case pci_channel_io_normal:
-   changed = true;
-   break;
-   }
-   break;
-   }
-   if (changed)
+   if (dev->error_state != new) {
dev->error_state = new;
+   changed = true;
+   }
return changed;
 }
 
-- 
2.18.4

[PATCH 0/5] Fix DPC hotplug race and enhance error hanlding

2020-09-24 Thread Ethan Zhao

This simple patch set fixed some serious security issues found when DPC
error injection and NVMe SSD hotplug brute force test were doing -- race
condition between DPC handler and pciehp, AER interrupt handlers, caused
system hang and system with DPC feature couldn't recover to normal
working state as expected (NVMe instance lost, mount operation hang,
race PCIe access caused uncorrectable errors reported alternativly etc). 

With this patch set applied, stable 5.9-rc6 could pass the PCIe Gen4 NVMe
SSD brute force hotplug test with any time interval between hot-remove and
plug-in operation tens of times without any errors occur and system works
normal.

With this patch set applied, system with DPC feature could recover from
NON-FATAL and FATAL errors injection test and works as expected.

System works smoothly when errors happen while hotplug is doing, no 
uncorrectable errors found.

Brute DPC error injection script:

for i in {0..100}
do
setpci -s 64:02.0 0x196.w=000a
setpci -s 65:00.0 0x04.w=0544
mount /dev/nvme0n1p1 /root/nvme
sleep 1
done

Other details see every commits description part.

This patch set could be applied to stable 5.9-rc6 directly.

Help to review and test.

Thanks,
Ethan


Ethan Zhao (5):
  PCI: define a function to check and wait till port finish DPC handling
  PCI: pciehp: check and wait port status out of DPC before handling
DLLSC and PDC
  PCI/ERR: get device before call device driver to avoid null pointer
reference
  PCI: only return true when dev io state is really changed
  PCI/ERR: don't mix io state not changed and no driver together

 drivers/pci/hotplug/pciehp_hpc.c |  4 +++-
 drivers/pci/pci.h| 31 +++
 drivers/pci/pcie/err.c   | 18 --
 include/linux/pci.h  | 31 +++
 4 files changed, 53 insertions(+), 31 deletions(-)

-- 
2.18.4

Re: [PATCH v6 0/6] mm: introduce memfd_secret system call to create "secret" memory areas

2020-09-24 Thread Andrew Morton

On Thu, 24 Sep 2020 16:28:58 +0300 Mike Rapoport  wrote:

> From: Mike Rapoport 
> 
> Hi,
> 
> This is an implementation of "secret" mappings backed by a file descriptor. 
> I've dropped the boot time reservation patch for now as it is not strictly
> required for the basic usage and can be easily added later either with or
> without CMA.
> 
> ...
> 
> The file descriptor backing secret memory mappings is created using a
> dedicated memfd_secret system call The desired protection mode for the
> memory is configured using flags parameter of the system call. The mmap()
> of the file descriptor created with memfd_secret() will create a "secret"
> memory mapping. The pages in that mapping will be marked as not present in
> the direct map and will have desired protection bits set in the user page
> table. For instance, current implementation allows uncached mappings.
> 
> Although normally Linux userspace mappings are protected from other users, 
> such secret mappings are useful for environments where a hostile tenant is
> trying to trick the kernel into giving them access to other tenants
> mappings.
> 
> Additionally, the secret mappings may be used as a mean to protect guest
> memory in a virtual machine host.
> 
> For demonstration of secret memory usage we've created a userspace library
> [1] that does two things: the first is act as a preloader for openssl to

I can find no [1].

I'm not a fan of the enumerated footnote thing.  Why not inline the url
right here so readers don't need to jump around?

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-24 Thread Ming Lei

On Fri, Sep 25, 2020 at 09:14:16AM +0800, Ming Lei wrote:
> On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > On Thu, Sep 24, 2020 at 08:59:01AM +0800, Ming Lei wrote:
> > > 
> > > The list corruption issue can be reproduced on kvm/qumu guest too when
> > > running xfstests(ext4) generic/038.
> > > 
> > > However, the issue may become not reproduced when adding or removing 
> > > memory
> > > debug options, such as adding KASAN.
> > 
> > Can you reliably reprodue this crash?  And if so, with what config and
> > what kernel version.
> 
> Yeah, it can be reproduced reliably by running xfstests(ext4)
> generic/038 over virtio-scsi(test device)/virtio-blk(scratch device).
> 
> The kernel is -rc4, and not test -rc5 yet.
> 
> It is exactly the config you provided, and I just enabled CDROM & ISOFS
> against your config for passing cloud-init data via cdrom to VM.
> 
> > One of the reasons why I had gone silent on this bug is that I've been
> > trying many, many configurations and configurations which reliably
> > reproduced on say, -rc4 would not reproduce on -rc5, and then I would
> > get a completely different set of results on -rc6.  So I've been
> > trying to run a lot of different experiments to try to understand what
> > might be going on, since it seems pretty clear this must be a very
> > timing-sensitive failure.
> > 
> > I also found that the re-occrance went down significantly if I enabled
> > KASAN, and while it didn't go away, I wasn't able to get a KASAN
> > failure to trigger, either.  Turning off CONFIG_PROVE_LOCKING and a
> > *lot* of other debugging configs made the problem vanish in -rc4, but
> > that trick didn't work with -rc5 or -rc6.
> 
> The issue can be reproduced reliably in my environment after
> disabling LOCKDEP only for disabling KMEMLEAK only.
> 
> But after disabling DEBUG_OBJECTS, it becomes hard to trigger.
> 
> > 
> > Each time I discovered one of these things, I was about to post to the
> > e-mail thread, only to have a confirmation test run on a different
> > kernel version make the problem go away.  In particular, your revert
> > helped with -rc4 and -rc6 IIRC, but it didn't help in -rc5.
> > 
> > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> > that one of the stack traces had virtio balloon in the trace, I
> > realized that when I switched the GCE VM type from e1-standard-2 to
> > n1-standard-2 (where e1 VM's are cheaper because they use
> > virtio-balloon to better manage host OS memory utilization), problem
> > has become, much, *much* rarer (and possibly has gone away, although
> > I'm going to want to run a lot more tests before I say that
> > conclusively) on my test setup.  At the very least, using an n1 VM
> > (which doesn't have virtio-balloon enabled in the hypervisor) is
> > enough to unblock ext4 development.
> > 
> > Any chance your kvm/qemu configuration might have been using
> > virtio-ballon?  Because other ext4 developers who have been using
> > kvm-xftests have not had any problems
> 
> I don't setup virtio-ballon, see the attached script for setting the VM.
> 
> > 
> > > When I enable PAGE_POISONING, double free on kmalloc(192) is captured:
> > > 
> > > [ 1198.317139] slab: double free detected in cache 'kmalloc-192', objp 
> > > 89ada7584300^M
> > > [ 1198.326651] [ cut here ]^M
> > > [ 1198.327969] kernel BUG at mm/slab.c:2535!^M
> > > [ 1198.329129] invalid opcode:  [#1] SMP PTI^M
> > > [ 1198.333776] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> > > 5.9.0-rc4_quiesce_srcu-xfstests #102^M
> > > [ 1198.336085] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014^M
> > > [ 1198.339826] RIP: 0010:free_block.cold.92+0x13/0x15^M
> > > [ 1198.341472] Code: 8d 44 05 f0 eb d0 48 63 83 e0 00 00 00 48 8d 54 05 
> > > f8 e9 4b 81 ff ff 48 8b 73 58 48 89 ea 48 c7 c7 98 e7 4a 9c e8 20 c3 eb 
> > > ff <0f> 0b 48 8b 73 58 48 c7 c2 20 e8 4a 9c 48 c7 c7 70 32 22 9c e8 19^M
> > > [ 1198.347331] RSP: 0018:982e40710be8 EFLAGS: 00010046^M
> > > [ 1198.349091] RAX: 0048 RBX: 89adb6441400 RCX: 
> > > ^M
> > > [ 1198.351839] RDX:  RSI: 89adbaa97800 RDI: 
> > > 89adbaa97800^M
> > > [ 1198.354572] RBP: 89ada7584300 R08: 0417 R09: 
> > > 0057^M
> > > [ 1198.357150] R10: 0001 R11: 982e40710aa5 R12: 
> > > 89adbaaae598^M
> > > [ 1198.359067] R13: e7bc819d6108 R14: e7bc819d6100 R15: 
> > > 89adb6442280^M
> > > [ 1198.360975] FS:  () GS:89adbaa8() 
> > > knlGS:^M
> > > [ 1198.363202] CS:  0010 DS:  ES:  CR0: 80050033^M
> > > [ 1198.365986] CR2: 55f6a3811318 CR3: 00017adca005 CR4: 
> > > 00770ee0^M
> > > [ 1198.368679] DR0:  DR1:  DR2: 
> > > ^M
> > > [ 1198.371386] DR3:  DR6:

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1437 matches

Mail list logo