date:20200604

Re: [PATCH 1/1] docs: dev-tools: coccinelle: underlines

2020-06-04 Thread Julia Lawall




On Fri, 5 Jun 2020, Heinrich Schuchardt wrote:

> Underline lengths should match the lengths of headings to avoid build
> warnings with Sphinx.
>
> Signed-off-by: Heinrich Schuchardt 


Acked-by: Julia Lawall 

Thanks for your help.

> ---
>  Documentation/dev-tools/coccinelle.rst | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/dev-tools/coccinelle.rst 
> b/Documentation/dev-tools/coccinelle.rst
> index 00a3409b0c28..70274c3f5f5a 100644
> --- a/Documentation/dev-tools/coccinelle.rst
> +++ b/Documentation/dev-tools/coccinelle.rst
> @@ -14,7 +14,7 @@ many uses in kernel development, including the application 
> of complex,
>  tree-wide patches and detection of problematic programming patterns.
>
>  Getting Coccinelle
> 
> +--
>
>  The semantic patches included in the kernel use features and options
>  which are provided by Coccinelle version 1.0.0-rc11 and above.
> @@ -56,7 +56,7 @@ found at:
>  https://github.com/coccinelle/coccinelle/blob/master/install.txt
>
>  Supplemental documentation
> 
> +--
>
>  For supplemental documentation refer to the wiki:
>
> @@ -128,7 +128,7 @@ To enable verbose messages set the V= variable, for 
> example::
> make coccicheck MODE=report V=1
>
>  Coccinelle parallelization
> 
> +--
>
>  By default, coccicheck tries to run as parallel as possible. To change
>  the parallelism, set the J= variable. For example, to run across 4 CPUs::
> @@ -333,7 +333,7 @@ as an example if requiring at least Coccinelle >= 1.0.5::
>   // Requires: 1.0.5
>
>  Proposing new semantic patches
> 
> +--
>
>  New semantic patches can be proposed and submitted by kernel
>  developers. For sake of clarity, they should be organized in the
> --
> 2.26.2
>
>

Re: [PATCH] PCI: tegra: handle failure case of pm_runtime_get_sync

2020-06-04 Thread Jon Hunter



On 05/06/2020 04:12, Navid Emamdoost wrote:
> Calling pm_runtime_get_sync increments the counter even in case of
> failure, causing incorrect ref count. Call pm_runtime_put if
> pm_runtime_get_sync fails.
> 
> Signed-off-by: Navid Emamdoost 
> ---
>  drivers/pci/controller/pci-tegra.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/controller/pci-tegra.c 
> b/drivers/pci/controller/pci-tegra.c
> index 3e64ba6a36a8..3d4b448fd8df 100644
> --- a/drivers/pci/controller/pci-tegra.c
> +++ b/drivers/pci/controller/pci-tegra.c
> @@ -2712,6 +2712,7 @@ static int tegra_pcie_probe(struct platform_device 
> *pdev)
>   err = pm_runtime_get_sync(pcie->dev);
>   if (err < 0) {
>   dev_err(dev, "fail to enable pcie controller: %d\n", err);
> + pm_runtime_put_sync(pcie->dev);
>   goto teardown_msi;
>   }

Same thing for this patch, there is already a put in the error path and
so it is not necessary to add the put call here. Just update the goto
label.

Jon

-- 
nvpublic

linux-next: manual merge of the akpm tree with the sh tree

2020-06-04 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the akpm tree got a conflict in:

  arch/sh/include/asm/pgtable_64.h

between commit:

  37744feebc08 ("sh: remove sh5 support")

from the sh tree and patch:

  "mm: consolidate pte_index() and pte_offset_*() definitions"

from the akpm tree.

I fixed it up (the former deleted the file, so I did that) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpTYP6PHsXHZ.pgp
Description: OpenPGP digital signature

Re: [PATCH] dmaengine: tegra210-adma: handle pm_runtime_get_sync failure cases

2020-06-04 Thread Jon Hunter



On 04/06/2020 21:10, Navid Emamdoost wrote:
> Calling pm_runtime_get_sync increments the counter even in case of
> failure, causing incorrect ref count. Call pm_runtime_put if
> pm_runtime_get_sync fails.
> 
> Signed-off-by: Navid Emamdoost 
> ---
>  drivers/dma/tegra210-adma.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/tegra210-adma.c b/drivers/dma/tegra210-adma.c
> index c4ce5dfb149b..899eaaf9fc48 100644
> --- a/drivers/dma/tegra210-adma.c
> +++ b/drivers/dma/tegra210-adma.c
> @@ -659,6 +659,7 @@ static int tegra_adma_alloc_chan_resources(struct 
> dma_chan *dc)
>   ret = pm_runtime_get_sync(tdc2dev(tdc));
>   if (ret < 0) {
>   free_irq(tdc->irq, tdc);
> + pm_runtime_put(tdc2dev(tdc));
>   return ret;
>   }
>  
> @@ -870,7 +871,7 @@ static int tegra_adma_probe(struct platform_device *pdev)
>  
>   ret = pm_runtime_get_sync(>dev);
>   if (ret < 0)
> - goto rpm_disable;
> + goto rpm_put;
>  
>   ret = tegra_adma_init(tdma);
>   if (ret)
> 

The label rpm_disable should now be removed. You should also update the
subject-prefix to be [PATCH V2] to make it clear that this is the
updated patch.

Jon

-- 
nvpublic

drivers/tty/serial/ucc_uart.c:264:21: sparse: sparse: incorrect type in argument 1 (different address spaces)

2020-06-04 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   435faf5c218a47fd6258187f62d9bb1009717896
commit: 5a35435ef4e6e4bd2aabd6706b146b298a9cffe5 soc: fsl: qe: remove PPC32 
dependency from CONFIG_QUICC_ENGINE
date:   6 months ago
config: arm-randconfig-s031-20200605 (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-246-g41f651b4-dirty
git checkout 5a35435ef4e6e4bd2aabd6706b146b298a9cffe5
# save the attached .config to linux build tree
make W=1 C=1 ARCH=arm CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)

>> drivers/tty/serial/ucc_uart.c:264:21: sparse: sparse: incorrect type in 
>> argument 1 (different address spaces) @@ expected void const volatile 
>> [noderef]  * @@ got restricted __be16 * @@
>> drivers/tty/serial/ucc_uart.c:264:21: sparse: expected void const 
>> volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:264:21: sparse: got restricted __be16 *
>> drivers/tty/serial/ucc_uart.c:264:21: sparse: sparse: incorrect type in 
>> argument 1 (different address spaces) @@ expected void const volatile 
>> [noderef]  * @@ got restricted __be16 * @@
>> drivers/tty/serial/ucc_uart.c:264:21: sparse: expected void const 
>> volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:264:21: sparse: got restricted __be16 *
>> drivers/tty/serial/ucc_uart.c:264:21: sparse: sparse: incorrect type in 
>> argument 1 (different address spaces) @@ expected void const volatile 
>> [noderef]  * @@ got restricted __be16 * @@
>> drivers/tty/serial/ucc_uart.c:264:21: sparse: expected void const 
>> volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:264:21: sparse: got restricted __be16 *
>> drivers/tty/serial/ucc_uart.c:264:21: sparse: sparse: incorrect type in 
>> argument 1 (different address spaces) @@ expected void const volatile 
>> [noderef]  * @@ got restricted __be16 * @@
>> drivers/tty/serial/ucc_uart.c:264:21: sparse: expected void const 
>> volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:264:21: sparse: got restricted __be16 *
   drivers/tty/serial/ucc_uart.c:268:21: sparse: sparse: incorrect type in 
argument 1 (different address spaces) @@ expected void const volatile 
[noderef]  * @@ got restricted __be16 * @@
   drivers/tty/serial/ucc_uart.c:268:21: sparse: expected void const 
volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:268:21: sparse: got restricted __be16 *
   drivers/tty/serial/ucc_uart.c:268:21: sparse: sparse: incorrect type in 
argument 1 (different address spaces) @@ expected void const volatile 
[noderef]  * @@ got restricted __be16 * @@
   drivers/tty/serial/ucc_uart.c:268:21: sparse: expected void const 
volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:268:21: sparse: got restricted __be16 *
   drivers/tty/serial/ucc_uart.c:268:21: sparse: sparse: incorrect type in 
argument 1 (different address spaces) @@ expected void const volatile 
[noderef]  * @@ got restricted __be16 * @@
   drivers/tty/serial/ucc_uart.c:268:21: sparse: expected void const 
volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:268:21: sparse: got restricted __be16 *
   drivers/tty/serial/ucc_uart.c:268:21: sparse: sparse: incorrect type in 
argument 1 (different address spaces) @@ expected void const volatile 
[noderef]  * @@ got restricted __be16 * @@
   drivers/tty/serial/ucc_uart.c:268:21: sparse: expected void const 
volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:268:21: sparse: got restricted __be16 *
   drivers/tty/serial/ucc_uart.c:286:6: sparse: sparse: symbol 
'qe_uart_set_mctrl' was not declared. Should it be static?
   drivers/tty/serial/ucc_uart.c:349:17: sparse: sparse: incorrect type in 
argument 1 (different address spaces) @@ expected void const volatile 
[noderef]  * @@ got restricted __be16 * @@
   drivers/tty/serial/ucc_uart.c:349:17: sparse: expected void const 
volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:349:17: sparse: got restricted __be16 *
   drivers/tty/serial/ucc_uart.c:350:17: sparse: sparse: incorrect type in 
argument 1 (different address spaces) @@ expected void const volatile 
[noderef]  * @@ got restricted __be16 * @@
   drivers/tty/serial/ucc_uart.c:350:17: sparse: expected void const 
volatile [noderef]  *
   drivers/tty/serial/ucc_uart.c:350:17: sparse: got restricted __be16 *
   drivers/tty/serial/ucc_uart.c:350:17: sparse: sparse: incorrect type in 
argument 1 (different address spaces) @@ expected void const volatile 
[noderef]  * @@ got restricted __be16 * @@
   drivers/tty/serial/ucc_uart.c:350:17: sparse: expected void const 
volatile [noderef]  *

linux-next: manual merge of the akpm tree with the sh tree

2020-06-04 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the akpm tree got conflicts in:

  arch/sh/kernel/traps_64.c
  arch/sh/kernel/signal_64.c
  arch/sh/kernel/ptrace_64.c
  arch/sh/kernel/process_64.c

between commit:

  37744feebc08 ("sh: remove sh5 support")

from the sh tree and patch:

  "mm: don't include asm/pgtable.h if linux/mm.h is already included"

from the akpm tree.

I fixed it up (the former deleted the file, so I did that) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpPztn67boMW.pgp
Description: OpenPGP digital signature

Re: [PATCH v2] coccinelle: api: add kzfree script

2020-06-04 Thread Markus Elfring

> On the other hand, do you really require E to be a pointer?
> If you do that, it will have to find the type of E.

I suggest to reconsider this information.


> If E refers to a structure field, then the type might not be available
> in the current function, and you may need command line argments like
> --all-includes or --recursive-includes.

Will the software documentation need corresponding extensions for
the safe application of the semantic patch language?

Will the used data structure access operator like arrow or dot
influence the interpretation of the software situation?


> Is avoiding transforming the case where E is not verified
> to be a pointer a concern?

I would find it desirable to express constraints for pointer data types
according to the applied programming interfaces.

Regards,
Markus

linux-next: manual merge of the akpm tree with the sh tree

2020-06-04 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the akpm tree got a conflict in:

  arch/sh/mm/tlbex_64.c
  arch/sh/mm/cache-sh5.c

between commit:

  37744feebc08 ("sh: remove sh5 support")

from the sh tree and commit:

  "sh: add support for folded p4d page tables"

from the akpm tree.

I fixed it up (the former deleted the files, so I did that) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpmqXaFSK4vU.pgp
Description: OpenPGP digital signature

linux-next: manual merge of the akpm tree with the sh tree

2020-06-04 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the akpm tree got a conflict in:

  arch/sh/include/asm/pgtable_64.h

between commit:

  37744feebc08 ("sh: remove sh5 support")

from the sh tree and patch:

  "sh: drop __pXd_offset() macros that duplicate pXd_index() ones"

from the akpm tree.

I fixed it up (the former deleted the file, so I did that) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpVVeEWZlPgc.pgp
Description: OpenPGP digital signature

linux-next: manual merge of the akpm tree with the sh tree

2020-06-04 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the akpm tree got a conflict in:

  arch/sh/kernel/process_64.c

between commit:

  37744feebc08 ("sh: remove sh5 support")

from the sh tree and patch:

  "kernel: rename show_stack_loglvl() => show_stack()"

from the akpm tree.

I fixed it up (the former removed the file, so I did that) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpucvD403dQZ.pgp
Description: OpenPGP digital signature

linux-next 04 June: warning: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined

2020-06-04 Thread Christophe Leroy


Hi all,

Getting the following warning on linux-next from yesterday,

  CC  net/sunrpc/svcsock.o
net/sunrpc/svcsock.c:227:5: warning: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" 
is not defined [-Wundef]

 #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
 ^

Bisected to ca07eda33e01 (refs/bisect/bad) SUNRPC: Refactor svc_recvfrom()

Missing #include 

Christophe

Re: 答复: [PATCH][v6] KVM: X86: support APERF/MPERF registers

2020-06-04 Thread Like Xu


On 2020/6/5 12:23, Li,Rongqing wrote:




-邮件原件-
发件人: Xu, Like [mailto:like...@intel.com]
发送时间: 2020年6月5日 10:32
收件人: Li,Rongqing 
抄送: linux-kernel@vger.kernel.org; k...@vger.kernel.org; x...@kernel.org;
h...@zytor.com; b...@alien8.de; mi...@redhat.com; t...@linutronix.de;
jmatt...@google.com; wanpen...@tencent.com; vkuzn...@redhat.com;
sean.j.christopher...@intel.com; pbonz...@redhat.com; xiaoyao...@intel.com;
wei.hua...@amd.com
主题: Re: [PATCH][v6] KVM: X86: support APERF/MPERF registers

Hi RongQing,

On 2020/6/5 9:44, Li RongQing wrote:

Guest kernel reports a fixed cpu frequency in /proc/cpuinfo, this is
confused to user when turbo is enable, and aperf/mperf can be used to
show current cpu frequency after 7d5905dc14a
"(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
so guest should support aperf/mperf capability

This patch implements aperf/mperf by three mode: none, software
emulation, and pass-through

None: default mode, guest does not support aperf/mperf

s/None/Note


Software emulation: the period of aperf/mperf in guest mode are
accumulated as emulated value

Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because that
hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed no
over-commit.

The flag "KVM_HINTS_REALTIME 0" (in the Documentation/virt/kvm/cpuid.rst)
is claimed as "guest checks this feature bit to determine that vCPUs are never
preempted for an unlimited time allowing optimizations".

I couldn't see its relationship with "1:1 vCPU: pCPU binding".
The patch doesn't check this flag as well for your pass-through purpose.

Thanks,
Like Xu



I think this is user space jobs to bind HINT_REALTIME and mperf passthrough, 
KVM just do what userspace wants.



That's fine for user space to bind HINT_REALTIME and mperf passthrough，
But I was asking why HINT_REALTIME means "1:1 vCPU: pCPU binding".

As you said, "Pass-though: it is only suitable for KVM_HINTS_REALTIME",
which means, KVM needs to make sure the kvm->arch.aperfmperf_mode value
could "only" be set to KVM_APERFMPERF_PT when the check
kvm_para_has_hint(KVM_HINTS_REALTIME) is passed.

Specifically, the KVM_HINTS_REALTIME is a per-kvm capability
while the kvm_aperfmperf_mode is a per-vm capability. It's unresolved.

KVM doesn't always do what userspace wants especially
you're trying to expose some features about
power and thermal management in the virtualization context.


and this gives user space a possibility, guest has passthrough mperfaperf 
without HINT_REALTIME, guest can get coarse cpu frequency without performance 
effect if guest can endure error frequency occasionally






-Li

[PATCH] irqchip/gic-v4.1: Use readx_poll_timeout_atomic() to fix sleep in atomic

2020-06-04 Thread Zenghui Yu

readx_poll_timeout() can sleep if @sleep_us is specified by the caller,
and is therefore unsafe to be used inside the atomic context, which is
this case when we use it to poll the GICR_VPENDBASER.Dirty bit in
irq_set_vcpu_affinity() callback.

Let's convert to its atomic version instead which helps to get the v4.1
board back to life!

Fixes: 96806229ca03 ("irqchip/gic-v4.1: Add support for VPENDBASER's 
Dirty+Valid signaling")
Signed-off-by: Zenghui Yu 
---
 drivers/irqchip/irq-gic-v3-its.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index cd685f521c77..6a5a87fc4601 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3797,10 +3797,10 @@ static void its_wait_vpt_parse_complete(void)
if (!gic_rdists->has_vpend_valid_dirty)
return;
 
-   WARN_ON_ONCE(readq_relaxed_poll_timeout(vlpi_base + GICR_VPENDBASER,
-   val,
-   !(val & GICR_VPENDBASER_Dirty),
-   10, 500));
+   WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base + 
GICR_VPENDBASER,
+  val,
+  !(val & 
GICR_VPENDBASER_Dirty),
+  10, 500));
 }
 
 static void its_vpe_schedule(struct its_vpe *vpe)
-- 
2.19.1

[PATCH v2 1/1] riscv: Select ARCH_SUPPORTS_ATOMIC_RMW by default

2020-06-04 Thread Chenxi Mao

Select ARCH_SUPPORTS_ATOMIC_RMW by default to enabel osqlocks.

PS2: Add signed off info.

Signed-off-by: Chenxi Mao 
---
 arch/riscv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index a31e1a41913a..cbdc605d20d9 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -68,6 +68,7 @@ config RISCV
select ARCH_HAS_GCOV_PROFILE_ALL
select HAVE_COPY_THREAD_TLS
select HAVE_ARCH_KASAN if MMU && 64BIT
+   select ARCH_SUPPORTS_ATOMIC_RMW
 
 config ARCH_MMAP_RND_BITS_MIN
default 18 if 64BIT
-- 
2.25.1

Re: [PATCH v3] mm: Fix mremap not considering huge pmd devmap

2020-06-04 Thread Fan Yang

Hi Ajay,

> On Jun 5, 2020 at 02:23，Ajay Kaher  wrote：
> 
> So, v4.9.y should be vulnerable, however not able to reproduce on v4.9.y.
> Does any specific scenerio need to test for v4.9.y?
> 
> For v4.9, modified test program as MAP_SHARED_VALIDATE is not available:
> - return mmap(NULL, REGION_PM_SIZE, PROT, MAP_SHARED_VALIDATE|MAP_SYNC,
> + return mmap(NULL, REGION_PM_SIZE, PROT, MAP_SHARED|MAP_SYNC,
> 
> Let me know if I need to test some other way for v4.9.y.
> 

I further looked into this.  In v4.9, fsdax (mount a dax file
system, then open, mmap, mremap) does not suffer this issue
because fsdax does not use huge page (FS_DAX_PMD is marked
BROKEN).

fs/dax.c:dax_pmd_fault:

if (!IS_ENABLED(CONFIG_FS_DAX_PMD))
return VM_FAULT_FALLBACK;

fs/Kconfig:

config FS_DAX_PMD
bool
default FS_DAX
depends on FS_DAX
depends on ZONE_DEVICE
depends on TRANSPARENT_HUGEPAGE
depends on BROKEN


However, I can re-produce the issue for the devdax mode.  Here is how
I re-produce it:

1. It seems some interface changed for ndctl.  So I use an old
   commit (4295f1ea614a26e1304ed590fb7209c8c78270ab) in the repo
   https://github.com/pmem/ndctl.
2. sudo ./ndctl/ndctl create-namespace -f -t pmem -m dax -e 'namespace0.0'
3. then use the following program:

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define PROTPROT_READ|PROT_WRITE

#define REGION_PM_TMP_PATH  "/dev/dax0.0"

#define REGION_MEM_SIZE 1024ULL*1024*1024*2
#define REGION_PM_SIZE  1024ULL*1024*1024*4
#define REMAP_MEM_OFF   1024ULL*1024*1024*1
#define REMAP_PM_OFF1024ULL*1024*1024*3
#define REMAP_SIZE  1024ULL*1024*1024*1

#define REGION_MEM_PTR  ((void *)0x7fd4ULL)
#define REGION_PM_PTR   ((void *)0x4fd3ULL)

char * map_tmp_pm_region(void)
{
int fd;

fd = open(REGION_PM_TMP_PATH, O_RDWR, 0644);
if (fd < 0) {
perror(REGION_PM_TMP_PATH);
exit(-1);
}

return mmap(REGION_PM_PTR, REGION_PM_SIZE, PROT, MAP_SHARED|MAP_SYNC,
   fd, 0);
}

int main(int argc, char **argv)
{
char *regm, *regp, *remap;
int ret;

regm = mmap(REGION_MEM_PTR, REGION_MEM_SIZE, PROT, 
MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0);
if (regm == MAP_FAILED) {
perror("regm");
return -1;
}

regp = map_tmp_pm_region();
if (regp == MAP_FAILED) {
perror("regp");
return -1;
}

memset(regm, 'a', REGION_MEM_SIZE);
memset(regp, 'i', REGION_PM_SIZE);

remap = mremap(regp + REMAP_PM_OFF, REMAP_SIZE, REMAP_SIZE,
   MREMAP_MAYMOVE|MREMAP_FIXED, regm + REMAP_MEM_OFF);
if (remap != regm + REMAP_MEM_OFF) {
perror("mremap");
return -1;
}

*(regm + REMAP_MEM_OFF) = 0x00;
return 0;
}

4. Then I was able to see the "Corrupted page table" message in dmesg.

Best regards,
Fan

Re: [PATCH] drm/i915/gvt: print actionable error message when gm runs out

2020-06-04 Thread Zhenyu Wang

On 2020.06.03 14:33:21 +0200, Julian Stecklina wrote:
> When a user tries to allocate too many or too big vGPUs and runs out
> of graphics memory, the resulting error message is not actionable and
> looks like an internal error.
> 
> Change the error message to clearly point out what actions a user can
> take to resolve this situation.
> 
> Cc: Thomas Prescher 
> Cc: Zhenyu Wang 
> Signed-off-by: Julian Stecklina 
> ---
>  drivers/gpu/drm/i915/gvt/aperture_gm.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c 
> b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> index 0d6d598713082..5c5c8e871dae2 100644
> --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> @@ -69,9 +69,12 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
> start, end, flags);
>   mmio_hw_access_post(gt);
>   mutex_unlock(>ggtt->vm.mutex);
> - if (ret)
> - gvt_err("fail to alloc %s gm space from host\n",
> - high_gm ? "high" : "low");
> + if (ret) {
> + gvt_err("vgpu%d: failed to allocate %s gm space from host\n",
> + vgpu->id, high_gm ? "high" : "low");
> + gvt_err("vgpu%d: destroying vGPUs, decreasing vGPU memory size 
> or increasing GPU aperture size may resolve this\n",
> + vgpu->id);

Currently we can't decrease vGPU mem size as defined by mdev type,
so actually you may try different vGPU type. And aperture size is
also handled for supported vGPU mdev types, so assume user should
already be awared of that too. I just don't want us to be too chatty. :)

> + }
>  
>   return ret;
>  }
> -- 
> 2.26.2
> 
> ___
> intel-gvt-dev mailing list
> intel-gvt-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827


signature.asc
Description: PGP signature

Re: [PATCH][v6] KVM: X86: support APERF/MPERF registers

2020-06-04 Thread Xiaoyao Li


On 6/5/2020 9:44 AM, Li RongQing wrote:

Guest kernel reports a fixed cpu frequency in /proc/cpuinfo,
this is confused to user when turbo is enable, and aperf/mperf
can be used to show current cpu frequency after 7d5905dc14a
"(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
so guest should support aperf/mperf capability

This patch implements aperf/mperf by three mode: none, software
emulation, and pass-through

None: default mode, guest does not support aperf/mperf

Software emulation: the period of aperf/mperf in guest mode are
accumulated as emulated value

Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because
that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed
no over-commit.

And a per-VM capability is added to configure aperfmperf mode

Signed-off-by: Li RongQing 
Signed-off-by: Chai Wen 
Signed-off-by: Jia Lina 
---
diff v5:
return error if guest is configured with mperf/aperf, but host cpu has not

diff v4:
fix maybe-uninitialized warning

diff v3:
fix interception of MSR_IA32_MPERF/APERF in svm

diff v2:
support aperfmperf pass though
move common codes to kvm_get_msr_common

diff v1:
1. support AMD, but not test
2. support per-vm capability to enable


  Documentation/virt/kvm/api.rst  | 10 ++
  arch/x86/include/asm/kvm_host.h | 11 +++
  arch/x86/kvm/cpuid.c| 15 ++-
  arch/x86/kvm/svm/svm.c  |  8 
  arch/x86/kvm/vmx/vmx.c  |  6 ++
  arch/x86/kvm/x86.c  | 42 +
  arch/x86/kvm/x86.h  | 15 +++
  include/uapi/linux/kvm.h|  1 +
  8 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index d871dacb984e..f854f4da6fd8 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6126,3 +6126,13 @@ KVM can therefore start protected VMs.
  This capability governs the KVM_S390_PV_COMMAND ioctl and the
  KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
  guests when the state change is invalid.
+
+8.23 KVM_CAP_APERFMPERF
+
+
+:Architectures: x86
+:Parameters: args[0] is aperfmperf mode;
+ 0 for not support, 1 for software emulation, 2 for pass-through
+:Returns: 0 on success; -1 on error
+
+This capability indicates that KVM supports APERF and MPERF MSR registers
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fd78bd44b2d6..14643f8af9c4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -824,6 +824,9 @@ struct kvm_vcpu_arch {
  
  	/* AMD MSRC001_0015 Hardware Configuration */

u64 msr_hwcr;
+
+   u64 v_mperf;
+   u64 v_aperf;
  };
  
  struct kvm_lpage_info {

@@ -889,6 +892,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT,/* created with KVM_CAP_SPLIT_IRQCHIP */
  };
  
+enum kvm_aperfmperf_mode {

+   KVM_APERFMPERF_NONE,
+   KVM_APERFMPERF_SOFT,  /* software emulate aperfmperf */
+   KVM_APERFMPERF_PT,/* pass-through aperfmperf to guest */
+};
+
  #define APICV_INHIBIT_REASON_DISABLE0
  #define APICV_INHIBIT_REASON_HYPERV 1
  #define APICV_INHIBIT_REASON_NESTED 2
@@ -986,6 +995,8 @@ struct kvm_arch {
  
  	struct kvm_pmu_event_filter *pmu_event_filter;

struct task_struct *nx_lpage_recovery_thread;
+
+   enum kvm_aperfmperf_mode aperfmperf_mode;
  };
  
  struct kvm_vm_stat {

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cd708b0b460a..80f18b29a845 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -122,6 +122,16 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
   MSR_IA32_MISC_ENABLE_MWAIT);
}
  
+	best = kvm_find_cpuid_entry(vcpu, 6, 0);

+   if (best) {
+   if (guest_has_aperfmperf(vcpu->kvm)) {
+   if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
+   return -EINVAL;


kvm_vm_ioctl_enable_cap() ensures that guest_has_aperfmperf() always 
aligns with boot_cpu_has(X86_FEATURE_APERFMPERF). So above is unnecessary.



+   best->ecx |= 1;
+   } else {
+   best->ecx &= ~1;
+   }
+   }


you could do

bool guest_cpuid_aperfmperf = false;
if (best)
guest_cpuid_aperfmperf = !!(best->ecx & BIT(0));

if (guest_cpuid_aperfmerf != guest_has_aperfmperf(vcpu->kvm))
return -EINVAL;


In fact, I think we can do nothing here. Leave it as what usersapce 
wants just like how KVM treats other CPUID bits.


Paolo,

What's your point?


/* Note, maxphyaddr must be updated before tdp_level. */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
vcpu->arch.tdp_level = kvm_x86_ops.get_tdp_level(vcpu);


[...]


@@ -4930,6 +4939,11 @@ int

Re: [RFC] dt-bindings: mailbox: add doorbell support to ARM MHU

2020-06-04 Thread Sudeep Holla

On Thu, Jun 04, 2020 at 10:15:55AM -0500, Jassi Brar wrote:
> On Thu, Jun 4, 2020 at 4:20 AM Sudeep Holla  wrote:
> >
> > On Wed, Jun 03, 2020 at 01:32:42PM -0500, Jassi Brar wrote:
> > > On Wed, Jun 3, 2020 at 1:04 PM Sudeep Holla  wrote:
> > > >
> > > > On Fri, May 29, 2020 at 09:37:58AM +0530, Viresh Kumar wrote:
> > > > > On 28-05-20, 13:20, Rob Herring wrote:
> > > > > > Whether Linux
> > > > > > requires serializing mailbox accesses is a separate issue. On that 
> > > > > > side,
> > > > > > it seems silly to not allow driving the h/w in the most efficient 
> > > > > > way
> > > > > > possible.
> > > > >
> > > > > That's exactly what we are trying to say. The hardware allows us to
> > > > > write all 32 bits in parallel, without any hardware issues, why
> > > > > shouldn't we do that ? The delay (which Sudeep will find out, he is
> > > > > facing issues with hardware access because of lockdown right now)
> > > >
> > > > OK, I was able to access the setup today. I couldn't reach a point
> > > > where I can do measurements as the system just became unusable with
> > > > one physical channel instead of 2 virtual channels as in my patches.
> > > >
> > > > My test was simple. Switch to schedutil and read sensors periodically
> > > > via sysfs.
> > > >
> > > >  arm-scmi firmware:scmi: message for 1 is not expected!
> > > >
> > > This sounds like you are not serialising requests on a shared channel.
> > > Can you please also share the patch?
> >
> > OK, I did try with a small patch initially and then realised we must hit
> > issue with mainline as is. Tried and the behaviour is exact same. All
> > I did is removed my patches and use bit[0] as the signal. It doesn't
> > matter as writes to the register are now serialised. Oh, the above
> > message comes when OS times out in advance while firmware continues to
> > process the old request and respond.
> >
> > The trace I sent gives much better view of what's going on.
> >
> BTW, you didn't even share 'good' vs 'bad' log for me to actually see
> if the api lacks.
>
> Let us look closely ...
>
>  >>bash-1019  [005]  1149.452340: scmi_xfer_begin:
> transfer_id=1537 msg_id=7 protocol_id=19 seq=0 poll=1
>  >>bash-1019  [005]  1149.452407: scmi_xfer_end:
> transfer_id=1537 msg_id=7 protocol_id=19 seq=0 status=0
> >
> This round trip took  67usecs.  (log shows some at even 3usecs)
> That includes mailbox api overhead, memcpy and the time taken by
> remote to respond.

This is DVFS request which firmware acknowledges quickly and expected
to at most 100us.

> So the api is definitely capable of much faster transfers than you require.
>

I am not complaining about that. The delay is mostly due to the load on
the mailbox and parallelising helps is the focus here.

> >> bash-1526  [000]  1149.472553: scmi_xfer_begin:  transfer_id=1538 
> >> msg_id=6 protocol_id=21 seq=0 poll=0
> >>  -0 [001]  1149.472733: scmi_xfer_begin:  
> >> transfer_id=1539 msg_id=7 protocol_id=19 seq=1 poll=1
> >
> Here another request is started before the first is finished.

Ah, the prints are when the client requested. It is not when the mailbox
started it. So this just indicates the beginning of the transfer from the
client. I must have mentioned that earlier. Some request timeout before
being picked up by mailbox if the previous request is not acknowledge
quickly. E.g. Say a sensor command started which may take upto 1ms,
almost 5-6 DVFS request after that will timeout.

> If you want this to work you have to serialise access to the single
> physical channel and/or run the remote firmware in asynchronous mode -
> that is, the firmware ack the bit as soon as it sees and starts
> working in the background, so that we return in ~3usec always, while
> the data returns whenever it is ready.

Yes it does that for few requests like DVFS while it uses synchronous
mode for few others. While ideally I agree everything in asynchronous
most is better, I don't know there may be reasons for such design. Also
the solution given is to use different bits as independent channels
which hardware allows. Anyways it's open source SCP project[1].

--
Regards,
Sudeep

[1] https://github.com/ARM-software/SCP-firmware

[GIT PULL] first round of SCSI updates for the 5.6+ merge window

2020-06-04 Thread James Bottomley

This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host of
other minor updates.  There are no major core changes in this series
apart from a refactoring in scsi_lib.c.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc

The short changelog is:

Alex Dewar (2):
  scsi: aic7xxx: Remove unnecessary NULL checks before kfree
  scsi: aic7xxx: Use kzalloc() instead of kmalloc()+memset()

André Almeida (1):
  scsi: core: doc: Change function comments to kernel-doc style

Arun Easi (1):
  scsi: qla2xxx: Fix MPI failure AEN (8200) handling

Asutosh Das (4):
  scsi: ufs-qcom: Configure write booster type
  scsi: ufs: sysfs: Add sysfs entries for write booster
  scsi: ufs: Add write booster feature support
  scsi: ufs: full reinit upon resume if link was off

Bart Van Assche (20):
  scsi: qla2xxx: Remove an unused function
  scsi: qla2xxx: Fix endianness annotations in source files
  scsi: qla2xxx: Fix endianness annotations in header files
  scsi: qla2xxx: Use make_handle() instead of open-coding it
  scsi: qla2xxx: Cast explicitly to uint16_t / uint32_t
  scsi: qla2xxx: Change {RD,WRT}_REG_*() function names from upper case 
into lower case
  scsi: qla2xxx: Fix the code that reads from mailbox registers
  scsi: qla2xxx: Use register names instead of register offsets
  scsi: qla2xxx: Change two hardcoded constants into offsetof() / sizeof() 
expressions
  scsi: qla2xxx: Increase the size of struct qla_fcp_prio_cfg to 
FCP_PRIO_CFG_SIZE
  scsi: qla2xxx: Make a gap in struct qla2xxx_offld_chain explicit
  scsi: qla2xxx: Add more BUILD_BUG_ON() statements
  scsi: qla2xxx: Sort BUILD_BUG_ON() statements alphabetically
  scsi: qla2xxx: Simplify the functions for dumping firmware
  scsi: qla2xxx: Suppress two recently introduced compiler warnings
  scsi: qla2xxx: Fix spelling of a variable name
  scsi: ufs: Make ufshcd_wait_for_register() sleep instead of busy-waiting
  scsi: sr: Use {get,put}_unaligned_be*() instead of open-coding these 
functions
  scsi: qla2xxx: Use ARRAY_SIZE() instead of open-coding it
  scsi: qla2xxx: Split qla2x00_configure_local_loop()

Benjamin Block (8):
  scsi: zfcp: Move allocation of the shost object to after xconf- and 
xport-data
  scsi: zfcp: Fence early sysfs interfaces for accesses of shost objects
  scsi: zfcp: Fence adapter status propagation for common statuses
  scsi: zfcp: Move p-t-p port allocation to after xport data
  scsi: zfcp: Fence fc_host updates during link-down handling
  scsi: zfcp: Move fc_host updates during xport data handling into fenced 
function
  scsi: zfcp: Move shost updates during xconfig data handling into fenced 
function
  scsi: zfcp: Move shost modification after QDIO (re-)open into fenced 
function

Bob Liu (1):
  scsi: iscsi: Register sysfs for iscsi workqueue

Bodo Stroesser (6):
  scsi: target: tcmu: Userspace must not complete queued commands
  scsi: target: loopback: Fix READ with data and sensebytes
  scsi: target: tcmu: Make pgr_support and alua_support attributes writable
  scsi: target: Make transport_flags per device
  scsi: target: tcmu: Add attributes enforce_pr_isids and force_pr_aptpl
  scsi: target: Add missing emulate_pr attribute to passthrough backends

Chad Dupuis (2):
  scsi: qedf: Fix crash when MFW calls for protocol stats while function is 
still probing
  scsi: qedf: Add schedule recovery handler

Chandrakanth Patil (1):
  scsi: megaraid_sas: Update driver version to 07.714.04.00-rc1

Chen Tao (1):
  scsi: ibmvscsi: Make some functions static

ChenTao (1):
  scsi: ufs-mediatek: Make ufs_mtk_fixup_dev_quirks static

Christoph Hellwig (1):
  scsi: mpt3sas: Don't change the DMA coherent mask after allocations

Christophe JAILLET (1):
  scsi: aacraid: Fix error handling paths in aac_probe_one()

Colin Ian King (2):
  scsi: lpfc: Remove redundant initialization to variable rc
  scsi: qla2xxx: make 1-bit bit-fields unsigned int

Damien Le Moal (7):
  scsi: sd: Add zoned capabilities device attribute
  scsi: sd: Signal drive managed SMR disks
  scsi: scsi_debug: Disallow zone sizes that are not powers of 2
  scsi: scsi_debug: Implement ZBC host-aware emulation
  scsi: scsi_debug: Add zone_size_mb module parameter
  scsi: scsi_debug: Add zone_nr_conv module parameter
  scsi: scsi_debug: Add zone_max_open module parameter

Dan Carpenter (5):
  scsi: cxgb3i: Fix some leaks in init_act_open()
  scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd()
  scsi: aacraid: Fix an oops in error handling
  scsi: scsi_debug: Fix an error handling bug in sdeb_zbc_model_str()
  scsi: qedi: Check for buffer overflow in qedi_set_path()

Daniel Wagner (2):

Re: [PATCH v1] Bluetooth: hci_qca: Fix double free during SSR timeout

2020-06-04 Thread Abhishek Pandit-Subedi

Hi,

On Thu, Jun 4, 2020 at 6:59 AM Venkata Lakshmi Narayana Gubba
 wrote:
>
> Due to race conditions between qca_hw_error and qca_controller_memdump
> during SSR timeout,the same pointer is freed twice. Which results to
> double free error. Now a lock is acquired while SSR state moved to timeout.
suggestion: Change "which results to double free error" to "This
results in a double free."
suggestion: Change "while SSR state moved to timeout" to "when SSR
state is changed to timeout"

>
> Signed-off-by: Venkata Lakshmi Narayana Gubba 
> ---
>  drivers/bluetooth/hci_qca.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
> index 836949d..9110775 100644
> --- a/drivers/bluetooth/hci_qca.c
> +++ b/drivers/bluetooth/hci_qca.c
> @@ -983,8 +983,11 @@ static void qca_controller_memdump(struct work_struct 
> *work)
> while ((skb = skb_dequeue(>rx_memdump_q))) {
>
> mutex_lock(>hci_memdump_lock);
> -   /* Skip processing the received packets if timeout detected. 
> */
> -   if (qca->memdump_state == QCA_MEMDUMP_TIMEOUT) {
> +   /* Skip processing the received packets if timeout detected
> +* or memdump collection completed.
> +*/
> +   if (qca->memdump_state == QCA_MEMDUMP_TIMEOUT ||
> +   qca->memdump_state == QCA_MEMDUMP_COLLECTED) {
> mutex_unlock(>hci_memdump_lock);
> return;
> }
> @@ -1485,7 +1488,7 @@ static void qca_hw_error(struct hci_dev *hdev, u8 code)
>  {
> struct hci_uart *hu = hci_get_drvdata(hdev);
> struct qca_data *qca = hu->priv;
> -   struct qca_memdump_data *qca_memdump = qca->qca_memdump;
> +   struct qca_memdump_data *qca_memdump = NULL;
> char *memdump_buf = NULL;
>
> set_bit(QCA_HW_ERROR_EVENT, >flags);
> @@ -1509,9 +1512,10 @@ static void qca_hw_error(struct hci_dev *hdev, u8 code)
> qca_wait_for_dump_collection(hdev);
> }
>
> +   mutex_lock(>hci_memdump_lock);
> if (qca->memdump_state != QCA_MEMDUMP_COLLECTED) {
> bt_dev_err(hu->hdev, "clearing allocated memory due to 
> memdump timeout");
> -   mutex_lock(>hci_memdump_lock);
> +   qca_memdump = qca->qca_memdump;
> if (qca_memdump)
> memdump_buf = qca_memdump->memdump_buf_head;
> vfree(memdump_buf);

This section of code looks a bit unclear because it's only partially
in an if statement. Suggestion:
  if (qca->qca_memdump) {
vfree(qca->qca_memdump->memdump_buf_head);
kfree(qca->qca_memdump);
qca->qca_memdump = NULL;
  }

> @@ -1520,8 +1524,13 @@ static void qca_hw_error(struct hci_dev *hdev, u8 code)
> qca->memdump_state = QCA_MEMDUMP_TIMEOUT;
> cancel_delayed_work(>ctrl_memdump_timeout);
> skb_queue_purge(>rx_memdump_q);
> -   mutex_unlock(>hci_memdump_lock);
> +   }
> +   mutex_unlock(>hci_memdump_lock);
> +
> +   if (qca->memdump_state == QCA_MEMDUMP_TIMEOUT ||
> +   qca->memdump_state == QCA_MEMDUMP_COLLECTED) {
> cancel_work_sync(>ctrl_memdump_evt);
> +   skb_queue_purge(>rx_memdump_q);
> }

Earlier in the function, you call qca_wait_for_dump_collection for
[Idle, Collecting] so the state should be either [Timeout, Collected]
at this branch. So, you can remove the `cancel_delayed_work` and
`skb_queue_purge` from above and just leave it only in the bottom
branch. Currently you're duplicating these calls unnecessarily.

I don't know if we discussed this in an earlier review but I noticed
that `qca_wait_for_dump_collection` doesn't actually pay attention to
the return value of `wait_on_bit_timeout`. I don't have context for
the order of calls anymore but is there a possibility for that timeout
to complete before `qca_memdump_timeout` is called? In that case, you
should probably set the state to timeout in
`qca_wait_for_dump_collection` as well.

>
> clear_bit(QCA_HW_ERROR_EVENT, >flags);
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
>

Re: [PATCH v5 5/7] blktrace: fix debugfs use after free

2020-06-04 Thread Bart Van Assche

On 2020-06-01 10:05, Luis Chamberlain wrote:
> diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
> index a55cbfd060f5..5b0310f38e11 100644
> --- a/kernel/trace/blktrace.c
> +++ b/kernel/trace/blktrace.c
> @@ -511,6 +511,11 @@ static int do_blk_trace_setup(struct request_queue *q, 
> char *name, dev_t dev,
>*/
>   if (bdev && bdev != bdev->bd_contains) {
>   dir = bdev->bd_part->debugfs_dir;
> + } else if (q->sg_debugfs_dir &&
> +strlen(buts->name) == strlen(q->sg_debugfs_dir->d_name.name)
> +&& strcmp(buts->name, q->sg_debugfs_dir->d_name.name) == 0) {
> + /* scsi-generic requires use of its own directory */
> + dir = q->sg_debugfs_dir;
>   } else {
>   /*
>* For queues that do not have a gendisk attached to them, that
> 

Please Cc Martin Petersen for patches that modify SCSI code.

The string comparison check looks fragile to me. Is the purpose of that
check perhaps to verify whether tracing is being activated through the
SCSI generic interface? If so, how about changing that test into
something like the following?

MAJOR(dev) == SCSI_GENERIC_MAJOR

Thanks,

Bart.

Re: [PATCH v2 1/2] PCI/ERR: Fix fatal error recovery for non-hotplug capable devices

2020-06-04 Thread Jay Vosburgh

sathyanarayanan.kuppusw...@linux.intel.com wrote:

>From: Kuppuswamy Sathyanarayanan 
>
>Fatal (DPC) error recovery is currently broken for non-hotplug
>capable devices. With current implementation, after successful
>fatal error recovery, non-hotplug capable device state won't be
>restored properly. You can find related issues in following links.
>
>https://lkml.org/lkml/2020/5/27/290
>https://lore.kernel.org/linux-pci/12115.1588207324@famine/
>https://lkml.org/lkml/2020/3/28/328
>
>Current fatal error recovery implementation relies on hotplug handler
>for detaching/re-enumerating the affected devices/drivers on DLLSC
>state changes. So when dealing with non-hotplug capable devices,
>recovery code does not restore the state of the affected devices
>correctly. Correct implementation should call report_slot_reset()
>function after resetting the link to restore the state of the
>device/driver.
>
>So use PCI_ERS_RESULT_NEED_RESET as error status for successful
>reset_link() operation and use PCI_ERS_RESULT_DISCONNECT for failure
>case. PCI_ERS_RESULT_NEED_RESET error state will ensure slot_reset()
>is called after reset link operation which will also fix the above
>mentioned issue.
>
>[original patch is from jay.vosbu...@canonical.com]
>[original patch link 
>https://lore.kernel.org/linux-pci/12115.1588207324@famine/]
>Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
>Signed-off-by: Jay Vosburgh 
>Signed-off-by: Kuppuswamy Sathyanarayanan 
>

I've tested this patch set on one of our test machines, and it
resolves the issue.  I plan to test with other systems tomorrow.

-J

>---
> drivers/pci/pcie/err.c | 24 ++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>index 14bb8f54723e..5fe8561c7185 100644
>--- a/drivers/pci/pcie/err.c
>+++ b/drivers/pci/pcie/err.c
>@@ -165,8 +165,28 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   pci_dbg(dev, "broadcast error_detected message\n");
>   if (state == pci_channel_io_frozen) {
>   pci_walk_bus(bus, report_frozen_detected, );
>-  status = reset_link(dev);
>-  if (status != PCI_ERS_RESULT_RECOVERED) {
>+  /*
>+   * After resetting the link using reset_link() call, the
>+   * possible value of error status is either
>+   * PCI_ERS_RESULT_DISCONNECT (failure case) or
>+   * PCI_ERS_RESULT_NEED_RESET (success case).
>+   * So ignore the return value of report_error_detected()
>+   * call for fatal errors. Instead use
>+   * PCI_ERS_RESULT_NEED_RESET as initial status value.
>+   *
>+   * Ignoring the status return value of report_error_detected()
>+   * call will also help in case of EDR mode based error
>+   * recovery. In EDR mode AER and DPC Capabilities are owned by
>+   * firmware and hence report_error_detected() call will possibly
>+   * return PCI_ERS_RESULT_NO_AER_DRIVER. So if we don't ignore
>+   * the return value of report_error_detected() then
>+   * pcie_do_recovery() would report incorrect status after
>+   * successful recovery. Ignoring PCI_ERS_RESULT_NO_AER_DRIVER
>+   * in non EDR case should not have any functional impact.
>+   */
>+  status = PCI_ERS_RESULT_NEED_RESET;
>+  if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) {
>+  status = PCI_ERS_RESULT_DISCONNECT;
>   pci_warn(dev, "link reset failed\n");
>   goto failed;
>   }
>-- 
>2.17.1
>

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: linux-next: manual merge of the livepatching tree with the modules tree

2020-06-04 Thread Stephen Rothwell

Hi all,

On Fri, 8 May 2020 18:05:24 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the livepatching tree got a conflict in:
> 
>   kernel/module.c
> 
> between commits:
> 
>   db991af02f11 ("module: break nested ARCH_HAS_STRICT_MODULE_RWX and 
> STRICT_MODULE_RWX #ifdefs")
>   5c3a7db0c7ec ("module: Harden STRICT_MODULE_RWX")
> 
> from the modules tree and commit:
> 
>   e6eff4376e28 ("module: Make module_enable_ro() static again")
> 
> from the livepatching tree.
> 
> diff --cc kernel/module.c
> index c69291362676,a26343ea4d50..
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@@ -2055,29 -2023,20 +2042,30 @@@ static void module_enable_nx(const stru
>   frob_writable_data(>init_layout, set_memory_nx);
>   }
>   
>  +static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
>  +   char *secstrings, struct module *mod)
>  +{
>  +const unsigned long shf_wx = SHF_WRITE|SHF_EXECINSTR;
>  +int i;
>  +
>  +for (i = 0; i < hdr->e_shnum; i++) {
>  +if ((sechdrs[i].sh_flags & shf_wx) == shf_wx)
>  +return -ENOEXEC;
>  +}
>  +
>  +return 0;
>  +}
>  +
>   #else /* !CONFIG_STRICT_MODULE_RWX */
>  +/* module_{enable,disable}_ro() stubs are in module.h */
>   static void module_enable_nx(const struct module *mod) { }
> + static void module_enable_ro(const struct module *mod, bool after_init) {}
>  -#endif /*  CONFIG_STRICT_MODULE_RWX */
>  -static void module_enable_x(const struct module *mod)
>  +static int module_enforce_rwx_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs,
>  +   char *secstrings, struct module *mod)
>   {
>  -frob_text(>core_layout, set_memory_x);
>  -frob_text(>init_layout, set_memory_x);
>  +return 0;
>   }
>  -#else /* !CONFIG_ARCH_HAS_STRICT_MODULE_RWX */
>  -static void module_enable_nx(const struct module *mod) { }
>  -static void module_enable_x(const struct module *mod) { }
>  -#endif /* CONFIG_ARCH_HAS_STRICT_MODULE_RWX */
>  -
>  +#endif /*  CONFIG_STRICT_MODULE_RWX */
>   
>   #ifdef CONFIG_LIVEPATCH
>   /*

This is now a conflict between the modules tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell


pgpX4LCie2N4r.pgp
Description: OpenPGP digital signature

答复: [PATCH][v6] KVM: X86: support APERF/MPERF registers

2020-06-04 Thread Li,Rongqing



> -邮件原件-
> 发件人: Xu, Like [mailto:like...@intel.com]
> 发送时间: 2020年6月5日 10:32
> 收件人: Li,Rongqing 
> 抄送: linux-kernel@vger.kernel.org; k...@vger.kernel.org; x...@kernel.org;
> h...@zytor.com; b...@alien8.de; mi...@redhat.com; t...@linutronix.de;
> jmatt...@google.com; wanpen...@tencent.com; vkuzn...@redhat.com;
> sean.j.christopher...@intel.com; pbonz...@redhat.com; xiaoyao...@intel.com;
> wei.hua...@amd.com
> 主题: Re: [PATCH][v6] KVM: X86: support APERF/MPERF registers
> 
> Hi RongQing,
> 
> On 2020/6/5 9:44, Li RongQing wrote:
> > Guest kernel reports a fixed cpu frequency in /proc/cpuinfo, this is
> > confused to user when turbo is enable, and aperf/mperf can be used to
> > show current cpu frequency after 7d5905dc14a
> > "(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
> > so guest should support aperf/mperf capability
> >
> > This patch implements aperf/mperf by three mode: none, software
> > emulation, and pass-through
> >
> > None: default mode, guest does not support aperf/mperf
> s/None/Note
> >
> > Software emulation: the period of aperf/mperf in guest mode are
> > accumulated as emulated value
> >
> > Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because that
> > hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed no
> > over-commit.
> The flag "KVM_HINTS_REALTIME 0" (in the Documentation/virt/kvm/cpuid.rst)
> is claimed as "guest checks this feature bit to determine that vCPUs are never
> preempted for an unlimited time allowing optimizations".
> 
> I couldn't see its relationship with "1:1 vCPU: pCPU binding".
> The patch doesn't check this flag as well for your pass-through purpose.
> 
> Thanks,
> Like Xu


I think this is user space jobs to bind HINT_REALTIME and mperf passthrough, 
KVM just do what userspace wants.

and this gives user space a possibility, guest has passthrough mperfaperf 
without HINT_REALTIME, guest can get coarse cpu frequency without performance 
effect if guest can endure error frequency occasionally


-Li

Re: [PATCH] scsi: megaraid_sas: fix kdump kernel boot hung caused by JBOD

2020-06-04 Thread Kai Liu


On 2020/06/05 Fri 01:05, Chandrakanth Patil wrote:


Hi Kai Liu,

Gen3 (Invader) and Gen3.5 (Ventura/Aero) generations of controllers are
affected.


Hi Chandrakanth,

My card is not one of these but it's also problematic:

# lspci -nn|grep 3408
02:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID Tri-Mode SAS3408 
[1000:0017] (rev 01)

According to megaraid_sas.h it's Tomcat:

#define PCI_DEVICE_ID_LSI_TOMCAT0x0017

According to product information on broadcom.com the card model is 
9440-8i. So I tried to upgrade to the latest firmware version 
51.13.0-3223 but I got these error:


# ./storcli64 /c0 download file=9440-8i_nopad.rom
Download Completed.
Flashing image to adapter...
CLI Version = 007.1316.. Mar 12, 2020
Operating system = Linux 5.3.18-0.g6748ac9-default
Controller = 0
Status = Failure
Description = image corrupted

I tried few more versions from broadcom website, they all failed with 
the same "image corrupted" error.


Here is the controller information:

# ./storcli64 /c0 show
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.1316.. Mar 12, 2020
Operating system = Linux 5.3.18-0.g6748ac9-default
Controller = 0
Status = Success
Description = None

Product Name = SAS3408
Serial Number = 033FAT10K8000236
SAS Address =  57c1cf15516f4000
PCI Address = 00:02:00:00
System Time = 06/05/2020 12:36:59
Mfg. Date = 00/00/00
Controller Time = 06/05/2020 04:36:58
FW Package Build = 50.6.3-0109
BIOS Version = 7.06.02.2_0x07060502
FW Version = 5.060.01-2262
Driver Name = megaraid_sas
Driver Version = 07.713.01.00-rc1
Vendor Id = 0x1000
Device Id = 0x17
SubVendor Id = 0x19E5
SubDevice Id = 0xD213
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 2
Device Number = 0
Function Number = 0
Domain ID = 0
Drive Groups = 3


Thanks,
Kai Liu

Re: [PATCH] Bluetooth: Allow suspend even when preparation has failed

2020-06-04 Thread Abhishek Pandit-Subedi

Sent a v2 with proper fixes and reported-by tags.

Thanks
Abhishek

On Thu, Jun 4, 2020 at 3:46 AM Rafael J. Wysocki  wrote:
>
> On Wed, Jun 3, 2020 at 10:22 PM Abhishek Pandit-Subedi
>  wrote:
> >
> > It is preferable to allow suspend even when Bluetooth has problems
> > preparing for sleep. When Bluetooth fails to finish preparing for
> > suspend, log the error and allow the suspend notifier to continue
> > instead.
> >
> > To also make it clearer why suspend failed, change bt_dev_dbg to
> > bt_dev_err when handling the suspend timeout.
> >
> > Signed-off-by: Abhishek Pandit-Subedi 
>
> Thanks for the patch, it looks reasonable to me.
>
> It would be good to add a Fixes tag to it to indicate that it works
> around an issue introduced by an earlier commit.
>
> Len, Todd, would it be possible to test this one on the affected machines?
>
> > ---
> > To verify this is properly working, I added an additional change to
> > hci_suspend_wait_event to always return -16. This validates that suspend
> > continues even when an error has occurred during the suspend
> > preparation.
> >
> > Example on Chromebook:
> > [   55.834524] PM: Syncing filesystems ... done.
> > [   55.841930] PM: Preparing system for sleep (s2idle)
> > [   55.940492] Bluetooth: hci_core.c:hci_suspend_notifier() hci0: Suspend 
> > notifier action (3) failed: -16
> > [   55.940497] Freezing user space processes ... (elapsed 0.001 seconds) 
> > done.
> > [   55.941692] OOM killer disabled.
> > [   55.941693] Freezing remaining freezable tasks ... (elapsed 0.000 
> > seconds) done.
> > [   55.942632] PM: Suspending system (s2idle)
> >
> > I ran this through a suspend_stress_test in the following scenarios:
> > * Peer classic device connected: 50+ suspends
> > * No devices connected: 100 suspends
> > * With the above test case returning -EBUSY: 50 suspends
> >
> > I also ran this through our automated testing for suspend and wake on
> > BT from suspend continues to work.
> >
> >
> >  net/bluetooth/hci_core.c | 17 ++---
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index dbe2d79f233fba..54da48441423e0 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -3289,10 +3289,10 @@ static int hci_suspend_wait_event(struct hci_dev 
> > *hdev)
> >  WAKE_COND, SUSPEND_NOTIFIER_TIMEOUT);
> >
> > if (ret == 0) {
> > -   bt_dev_dbg(hdev, "Timed out waiting for suspend");
> > +   bt_dev_err(hdev, "Timed out waiting for suspend events");
> > for (i = 0; i < __SUSPEND_NUM_TASKS; ++i) {
> > if (test_bit(i, hdev->suspend_tasks))
> > -   bt_dev_dbg(hdev, "Bit %d is set", i);
> > +   bt_dev_err(hdev, "Suspend timeout bit: %d", 
> > i);
> > clear_bit(i, hdev->suspend_tasks);
> > }
> >
> > @@ -3360,12 +3360,15 @@ static int hci_suspend_notifier(struct 
> > notifier_block *nb, unsigned long action,
> > ret = hci_change_suspend_state(hdev, BT_RUNNING);
> > }
> >
> > -   /* If suspend failed, restore it to running */
> > -   if (ret && action == PM_SUSPEND_PREPARE)
> > -   hci_change_suspend_state(hdev, BT_RUNNING);
> > -
> >  done:
> > -   return ret ? notifier_from_errno(-EBUSY) : NOTIFY_STOP;
> > +   /* We always allow suspend even if suspend preparation failed and
> > +* attempt to recover in resume.
> > +*/
> > +   if (ret)
> > +   bt_dev_err(hdev, "Suspend notifier action (%x) failed: %d",
> > +  action, ret);
> > +
> > +   return NOTIFY_STOP;
> >  }
> >
> >  /* Alloc HCI device */
> > --
> > 2.27.0.rc2.251.g90737beb825-goog
> >

[PATCH v2] Bluetooth: Allow suspend even when preparation has failed

2020-06-04 Thread Abhishek Pandit-Subedi

It is preferable to allow suspend even when Bluetooth has problems
preparing for sleep. When Bluetooth fails to finish preparing for
suspend, log the error and allow the suspend notifier to continue
instead.

To also make it clearer why suspend failed, change bt_dev_dbg to
bt_dev_err when handling the suspend timeout.

Fixes: dd522a7429b07e ("Bluetooth: Handle LE devices during suspend")
Reported-by: Len Brown 
Signed-off-by: Abhishek Pandit-Subedi 
---
To verify this is properly working, I added an additional change to
hci_suspend_wait_event to always return -16. This validates that suspend
continues even when an error has occurred during the suspend
preparation.

Example on Chromebook:
[   55.834524] PM: Syncing filesystems ... done.
[   55.841930] PM: Preparing system for sleep (s2idle)
[   55.940492] Bluetooth: hci_core.c:hci_suspend_notifier() hci0: Suspend 
notifier action (3) failed: -16
[   55.940497] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   55.941692] OOM killer disabled.
[   55.941693] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) 
done.
[   55.942632] PM: Suspending system (s2idle)

I ran this through a suspend_stress_test in the following scenarios:
* Peer classic device connected: 50+ suspends
* No devices connected: 100 suspends
* With the above test case returning -EBUSY: 50 suspends

I also ran this through our automated testing for suspend and wake on
BT from suspend continues to work.


Changes in v2:
- Added fixes and reported-by tags

 net/bluetooth/hci_core.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index dbe2d79f233fba..54da48441423e0 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3289,10 +3289,10 @@ static int hci_suspend_wait_event(struct hci_dev *hdev)
 WAKE_COND, SUSPEND_NOTIFIER_TIMEOUT);
 
if (ret == 0) {
-   bt_dev_dbg(hdev, "Timed out waiting for suspend");
+   bt_dev_err(hdev, "Timed out waiting for suspend events");
for (i = 0; i < __SUSPEND_NUM_TASKS; ++i) {
if (test_bit(i, hdev->suspend_tasks))
-   bt_dev_dbg(hdev, "Bit %d is set", i);
+   bt_dev_err(hdev, "Suspend timeout bit: %d", i);
clear_bit(i, hdev->suspend_tasks);
}
 
@@ -3360,12 +3360,15 @@ static int hci_suspend_notifier(struct notifier_block 
*nb, unsigned long action,
ret = hci_change_suspend_state(hdev, BT_RUNNING);
}
 
-   /* If suspend failed, restore it to running */
-   if (ret && action == PM_SUSPEND_PREPARE)
-   hci_change_suspend_state(hdev, BT_RUNNING);
-
 done:
-   return ret ? notifier_from_errno(-EBUSY) : NOTIFY_STOP;
+   /* We always allow suspend even if suspend preparation failed and
+* attempt to recover in resume.
+*/
+   if (ret)
+   bt_dev_err(hdev, "Suspend notifier action (%x) failed: %d",
+  action, ret);
+
+   return NOTIFY_STOP;
 }
 
 /* Alloc HCI device */
-- 
2.27.0.278.ge193c7cf3a9-goog

[PATCH] f2fs: add F2FS_IOC_TRIM_FILE ioctl

2020-06-04 Thread Daeho Jeong

From: Daeho Jeong 

Added a new ioctl to send discard commands to whole data area of
a regular file for security reason.

Signed-off-by: Daeho Jeong 
---
 fs/f2fs/f2fs.h |   1 +
 fs/f2fs/file.c | 129 +
 2 files changed, 130 insertions(+)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c812fb8e2d9c..9ae81d0fefa0 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -434,6 +434,7 @@ static inline bool __has_cursum_space(struct f2fs_journal 
*journal,
_IOR(F2FS_IOCTL_MAGIC, 18, __u64)
 #define F2FS_IOC_RESERVE_COMPRESS_BLOCKS   \
_IOR(F2FS_IOCTL_MAGIC, 19, __u64)
+#define F2FS_IOC_TRIM_FILE _IO(F2FS_IOCTL_MAGIC, 20)
 
 #define F2FS_IOC_GET_VOLUME_NAME   FS_IOC_GETFSLABEL
 #define F2FS_IOC_SET_VOLUME_NAME   FS_IOC_SETFSLABEL
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index dfa1ac2d751a..58507bb5649c 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3749,6 +3749,132 @@ static int f2fs_reserve_compress_blocks(struct file 
*filp, unsigned long arg)
return ret;
 }
 
+static int f2fs_trim_file(struct file *filp)
+{
+   struct inode *inode = file_inode(filp);
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+   struct address_space *mapping = inode->i_mapping;
+   struct bio *bio = NULL;
+   struct block_device *prev_bdev = NULL;
+   loff_t file_size;
+   pgoff_t index, pg_start = 0, pg_end;
+   block_t prev_block = 0, len = 0;
+   int ret = 0;
+
+   if (!f2fs_hw_support_discard(sbi))
+   return -EOPNOTSUPP;
+
+   if (!S_ISREG(inode->i_mode) || f2fs_is_atomic_file(inode) ||
+   f2fs_compressed_file(inode))
+   return -EINVAL;
+
+   if (f2fs_readonly(sbi->sb))
+   return -EROFS;
+
+   ret = mnt_want_write_file(filp);
+   if (ret)
+   return ret;
+
+   inode_lock(inode);
+
+   file_size = i_size_read(inode);
+   if (!file_size)
+   goto err;
+   pg_end = (pgoff_t)round_up(file_size, PAGE_SIZE) >> PAGE_SHIFT;
+
+   ret = f2fs_convert_inline_inode(inode);
+   if (ret)
+   goto err;
+
+   down_write(_I(inode)->i_gc_rwsem[WRITE]);
+   down_write(_I(inode)->i_mmap_sem);
+
+   ret = filemap_write_and_wait(mapping);
+   if (ret)
+   goto out;
+
+   truncate_inode_pages(mapping, 0);
+
+   for (index = pg_start; index < pg_end;) {
+   struct dnode_of_data dn;
+   unsigned int end_offset;
+
+   set_new_dnode(, inode, NULL, NULL, 0);
+   ret = f2fs_get_dnode_of_data(, index, LOOKUP_NODE);
+   if (ret)
+   goto out;
+
+   end_offset = ADDRS_PER_PAGE(dn.node_page, inode);
+   if (pg_end < end_offset + index)
+   end_offset = pg_end - index;
+
+   for (; dn.ofs_in_node < end_offset;
+   dn.ofs_in_node++, index++) {
+   struct block_device *cur_bdev;
+   block_t blkaddr = f2fs_data_blkaddr();
+
+   if (__is_valid_data_blkaddr(blkaddr)) {
+   if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode),
+   blkaddr, DATA_GENERIC_ENHANCE)) {
+   ret = -EFSCORRUPTED;
+   goto out;
+   }
+   } else
+   continue;
+
+   cur_bdev = f2fs_target_device(sbi, blkaddr, NULL);
+   if (f2fs_is_multi_device(sbi)) {
+   int i = f2fs_target_device_index(sbi, blkaddr);
+
+   blkaddr -= FDEV(i).start_blk;
+   }
+
+   if (len) {
+   if (prev_bdev == cur_bdev &&
+   blkaddr == prev_block + len) {
+   len++;
+   } else {
+   ret = __blkdev_issue_discard(prev_bdev,
+   SECTOR_FROM_BLOCK(prev_block),
+   SECTOR_FROM_BLOCK(len),
+   GFP_NOFS, 0, );
+   if (ret)
+   goto out;
+
+   len = 0;
+   }
+   }
+
+   if (!len) {
+   prev_bdev = cur_bdev;
+   prev_block = blkaddr;
+   len = 1;
+   }
+   }
+
+   f2fs_put_dnode();
+

Re: general protection fault in kobject_get (2)

2020-06-04 Thread Eric Biggers

On Wed, May 20, 2020 at 07:56:41AM +0200, Greg KH wrote:
> On Tue, May 19, 2020 at 09:53:16PM -0700, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:d00f26b6 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
> > git tree:   net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1316343c10
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=26d0bd769afe1a2c
> > dashboard link: https://syzkaller.appspot.com/bug?extid=407fd358a932bbf639c6
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+407fd358a932bbf63...@syzkaller.appspotmail.com
> > 
> > general protection fault, probably for non-canonical address 
> > 0xdc13:  [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0098-0x009f]
> > CPU: 1 PID: 16682 Comm: syz-executor.3 Not tainted 5.7.0-rc4-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > Google 01/01/2011
> > RIP: 0010:kobject_get+0x30/0x150 lib/kobject.c:640
> > Code: 53 e8 d4 7e c6 fd 4d 85 e4 0f 84 a2 00 00 00 e8 c6 7e c6 fd 49 8d 7c 
> > 24 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 
> > 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e7 00 00 00
> > RSP: 0018:c9000772f240 EFLAGS: 00010203
> > RAX: dc00 RBX: 85acfca0 RCX: c9000fc67000
> > RDX: 0013 RSI: 83acadfa RDI: 009c
> > RBP: 0060 R08: 8880a8dfa4c0 R09: ed100a03f403
> > R10: 8880501fa017 R11: ed100a03f402 R12: 0060
> > R13: c9000772f3c0 R14: 88805d1ec4e8 R15: 88805d1ec580
> > FS:  7f1ebed26700() GS:8880ae70() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 004d88f0 CR3: a86c4000 CR4: 001406e0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: fffe0ff0 DR7: 0400
> > Call Trace:
> >  get_device+0x20/0x30 drivers/base/core.c:2620
> >  __ib_get_client_nl_info+0x1d4/0x2a0 drivers/infiniband/core/device.c:1863
> >  ib_get_client_nl_info+0x30/0x180 drivers/infiniband/core/device.c:1883
> >  nldev_get_chardev+0x52b/0xa40 drivers/infiniband/core/nldev.c:1625
> >  rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:195 [inline]
> >  rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
> >  rdma_nl_rcv+0x586/0x900 drivers/infiniband/core/netlink.c:259
> >  netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
> >  netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
> >  netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
> >  sock_sendmsg_nosec net/socket.c:652 [inline]
> >  sock_sendmsg+0xcf/0x120 net/socket.c:672
> >  sys_sendmsg+0x6e6/0x810 net/socket.c:2352
> >  ___sys_sendmsg+0x100/0x170 net/socket.c:2406
> >  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
> >  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
> >  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> > RIP: 0033:0x45c829
> > Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 
> > 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff 
> > ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:7f1ebed25c78 EFLAGS: 0246 ORIG_RAX: 002e
> > RAX: ffda RBX: 004ff720 RCX: 0045c829
> > RDX:  RSI: 2200 RDI: 0003
> > RBP: 0078bf00 R08:  R09: 
> > R10:  R11: 0246 R12: 
> > R13: 09ad R14: 004d5f10 R15: 7f1ebed266d4
> > Modules linked in:
> > ---[ end trace 239938a6c4c3c99f ]---
> > RIP: 0010:kobject_get+0x30/0x150 lib/kobject.c:640
> > Code: 53 e8 d4 7e c6 fd 4d 85 e4 0f 84 a2 00 00 00 e8 c6 7e c6 fd 49 8d 7c 
> > 24 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 
> > 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e7 00 00 00
> > RSP: 0018:c9000772f240 EFLAGS: 00010203
> > RAX: dc00 RBX: 85acfca0 RCX: c9000fc67000
> > RDX: 0013 RSI: 83acadfa RDI: 009c
> > RBP: 0060 R08: 8880a8dfa4c0 R09: ed100a03f403
> > R10: 8880501fa017 R11: ed100a03f402 R12: 0060
> > R13: c9000772f3c0 R14: 88805d1ec4e8 R15: 88805d1ec580
> > FS:  7f1ebed26700() GS:8880ae70() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 0073fad4 CR3: a86c4000 CR4: 001406e0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: fffe0ff0 DR7: 0400
> 
> Looks like an

Re: memory leak in do_eventfd

2020-06-04 Thread Eric Biggers

[+Cc kvm mailing list]

On Wed, May 20, 2020 at 06:12:17PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:5a9ffb95 Merge tag '5.7-rc5-smb3-fixes' of git://git.samba..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10b72a0210
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f8295ae5b3f8268d
> dashboard link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=17585b7610
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12500a0210
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+f196caa45793d6374...@syzkaller.appspotmail.com
> 
> BUG: memory leak
> unreferenced object 0x888117169ac0 (size 64):
>   comm "syz-executor012", pid 6609, jiffies 4294942172 (age 13.720s)
>   hex dump (first 32 bytes):
> 01 00 00 00 ff ff ff ff 00 00 00 00 00 c9 ff ff  
> d0 9a 16 17 81 88 ff ff d0 9a 16 17 81 88 ff ff  
>   backtrace:
> [<351bb234>] kmalloc include/linux/slab.h:555 [inline]
> [<351bb234>] do_eventfd+0x35/0xf0 fs/eventfd.c:418
> [] __do_sys_eventfd fs/eventfd.c:443 [inline]
> [] __se_sys_eventfd fs/eventfd.c:441 [inline]
> [] __x64_sys_eventfd+0x14/0x20 fs/eventfd.c:441
> [<86d6f989>] do_syscall_64+0x6e/0x220 arch/x86/entry/common.c:295
> [<6c5bcb63>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> BUG: memory leak
> unreferenced object 0x888117169100 (size 64):
>   comm "syz-executor012", pid 6609, jiffies 4294942172 (age 13.720s)
>   hex dump (first 32 bytes):
> e8 99 dd 00 00 c9 ff ff e8 99 dd 00 00 c9 ff ff  
> 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00  ... 
>   backtrace:
> [<436d2955>] kmalloc include/linux/slab.h:555 [inline]
> [<436d2955>] kzalloc include/linux/slab.h:669 [inline]
> [<436d2955>] kvm_assign_ioeventfd_idx+0x4f/0x270 
> arch/x86/kvm/../../../virt/kvm/eventfd.c:798
> [] kvm_assign_ioeventfd 
> arch/x86/kvm/../../../virt/kvm/eventfd.c:934 [inline]
> [] kvm_ioeventfd+0xbb/0x194 
> arch/x86/kvm/../../../virt/kvm/eventfd.c:961
> [] kvm_vm_ioctl+0x1e6/0x1030 
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:3670
> [<5da94937>] vfs_ioctl fs/ioctl.c:47 [inline]
> [<5da94937>] ksys_ioctl+0xa6/0xd0 fs/ioctl.c:771
> [] __do_sys_ioctl fs/ioctl.c:780 [inline]
> [] __se_sys_ioctl fs/ioctl.c:778 [inline]
> [] __x64_sys_ioctl+0x1a/0x20 fs/ioctl.c:778
> [<86d6f989>] do_syscall_64+0x6e/0x220 arch/x86/entry/common.c:295
> [<6c5bcb63>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to syzkaller-bugs+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/syzkaller-bugs/1daa8d05a61e3440%40google.com.

Re: KASAN: use-after-free Read in mousedev_cleanup

2020-06-04 Thread Eric Biggers

On Sat, May 23, 2020 at 10:04:14AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:c11d28ab Add linux-next specific files for 20200522
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=119c3f0610
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3f6dbdea4159fb66
> dashboard link: https://syzkaller.appspot.com/bug?extid=29b33f3f410e564731f1
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=135ffa9a10
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1076acba10
> 
> The bug was bisected to:
> 
> commit 4ef12f7198023c09ad6d25b652bd8748c965c7fa
> Author: Heikki Krogerus 
> Date:   Wed May 13 15:18:40 2020 +
> 
> kobject: Make sure the parent does not get released before its children
> 

Commit was reverted.

#syz invalid

Re: [PATCH 0/2] proc: use subset option to hide some top-level procfs entries

2020-06-04 Thread Eric W. Biederman

Alexey Gladkov  writes:

> On Thu, Jun 04, 2020 at 03:33:25PM -0500, Eric W. Biederman wrote:
>> Alexey Gladkov  writes:
>> 
>> > Greetings!
>> >
>> > Preface
>> > ---
>> > This patch set can be applied over:
>> >
>> > git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
>> > d35bec8a5788
>> 
>> I am not going to seriously look at this for merging until after the
>> merge window closes. 
>
> OK. I'll wait.

That will mean your patches can be based on -rc1.

>> Have you thought about the possibility of relaxing the permission checks
>> to mount proc such that we don't need to verify there is an existing
>> mount of proc?  With just the subset pids I think this is feasible.  It
>> might not be worth it at this point, but it is definitely worth asking
>> the question.  As one of the benefits early propopents of the idea of a
>> subset of proc touted was that they would not be as restricted as they
>> are with today's proc.
>
> I'm not sure I follow.
>
> What do you mean by the possibility of relaxing the permission checks to
> mount proc?
>
> Do you suggest to allow a user to mount procfs with hidepid=2,subset=pid
> options? If so then this is an interesting idea.

The key part would be subset=pid.  You would still need to be root in
your user namespace, and mount namespace.  You would not need to have a
separate copy of proc with nothing hidden already mounted.

>> I ask because this has a bearing on the other options you are playing
>> with.
>
> I can not agree with this because I do not touch on other options.
> The hidepid and subset=pid has no relation to the visibility of regular
> files. On the other hand, in procfs there is absolutely no way to restrict
> access other than selinux.

Untrue.  At a practical level the user namespace greatly restricts
access to proc because many of the non-process files are limited to
global root only.

>> Do we want to find a way to have the benefit of relaxed permission
>> checks while still including a few more files.
>
> In fact, I see no problem allowing the user to mount procfs with the
> hidepid=2,subset=pid options.
>
> We can make subset=self, which would allow not only pids subset but also
> other symlinks that lead to self (/proc/net, /proc/mounts) and if we ever
> add virtualization to meminfo, cpuinfo etc.
>
>> > Overview
>> > 
>> > Directories and files can be created and deleted by dynamically loaded 
>> > modules.
>> > Not all of these files are virtualized and safe inside the container.
>> >
>> > However, subset=pid is not enough because many containers wants to have
>> > /proc/meminfo, /proc/cpuinfo, etc. We need a way to limit the visibility of
>> > files per procfs mountpoint.
>> 
>> Is it desirable to have meminfo and cpuinfo as they are today or do
>> people want them to reflect the ``container'' context.   So that
>> applications like the JVM don't allocation too many cpus or don't try
>> and consume too much memory, or run on nodes that cgroups current make
>> unavailable.
>
> Of course, it would be better if these files took into account the
> limitations of cgroups or some kind of ``containerized'' context.
>
>> Are there any users or planned users of this functionality yet?
>
> I know that java uses meminfo for sure.
>
> The purpose of this patch is to isolate the container from unwanted files
> in procfs.

If what we want is the ability not to use the original but to have
a modified version of these files.  We probably want empty files that
serve as mount points.

Or possibly a version of these files that takes into account
restrictions.  In either even we need to do the research through real
programs and real kernel options to see what is our best option for
exporting the limitations that programs have and deciding on the long
term API for that.

If we research things and we decide the best way to let java know of
it's limitations is to change /proc/meminfo.  That needs to be a change
that always applies to meminfo and is not controlled by options.

>> I am concerned that you might be adding functionality that no one will
>> ever use that will just add code to the kernel that no one cares about,
>> that will then accumulate bugs.  Having had to work through a few of
>> those cases to make each mount of proc have it's own super block I am
>> not a great fan of adding another one.
>>
>> If the runc, lxc and other container runtime folks can productively use
>> such and option to do useful things and they are sensible things to do I
>> don't have any fundamental objection.  But I do want to be certain this
>> is a feature that is going to be used.
>
> Ok, just an example how docker or runc (actually almost all golang-based
> container systems) is trying to block access to something in procfs:
>
> $ docker run -it --rm busybox
> # mount |grep /proc
> proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> proc on /proc/bus type proc (ro,relatime)
> proc on /proc/fs type proc (ro,relatime)
> proc on /proc/irq type

Re: KASAN: use-after-free Read in put_device

2020-06-04 Thread Eric Biggers

On Sat, May 23, 2020 at 10:04:14AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:c11d28ab Add linux-next specific files for 20200522
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=114e4f0610
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3f6dbdea4159fb66
> dashboard link: https://syzkaller.appspot.com/bug?extid=60f9ee7f99afe29ef9fa
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=10ba16e210
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16caefe610
> 
> The bug was bisected to:
> 
> commit 4ef12f7198023c09ad6d25b652bd8748c965c7fa
> Author: Heikki Krogerus 
> Date:   Wed May 13 15:18:40 2020 +
> 
> kobject: Make sure the parent does not get released before its children
> 

Commit was reverted.

#syz invalid

Re: possible deadlock in media_devnode_release

2020-06-04 Thread Eric Biggers

On Sat, May 23, 2020 at 10:38:50AM -0700, Randy Dunlap wrote:
> On 5/23/20 10:04 AM, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:c11d28ab Add linux-next specific files for 20200522
> > git tree:   linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1733017210
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=3f6dbdea4159fb66
> > dashboard link: https://syzkaller.appspot.com/bug?extid=e3c234427cd464510547
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=122eacba10
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12fffa9a10
> > 
> > The bug was bisected to:
> > 
> > commit 4ef12f7198023c09ad6d25b652bd8748c965c7fa
> > Author: Heikki Krogerus 
> > Date:   Wed May 13 15:18:40 2020 +
> > 
> > kobject: Make sure the parent does not get released before its children
> 
> Hi,
> 
> Greg just sent a revert for this patch:
> https://lore.kernel.org/lkml/20200523152922.ga224...@kroah.com/
> 
> so all 3 of these reports should be cleared up soon.

Commit was reverted, so invalidating this bug report:

#syz invalid

Re: KASAN: use-after-free Read in joydev_cleanup

2020-06-04 Thread Eric Biggers

On Sun, May 24, 2020 at 03:24:12AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:c11d28ab Add linux-next specific files for 20200522
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1587269a10
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3f6dbdea4159fb66
> dashboard link: https://syzkaller.appspot.com/bug?extid=833ac95f0a2451d63a9f
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1114d62610
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14c4da9a10
> 
> The bug was bisected to:
> 
> commit 4ef12f7198023c09ad6d25b652bd8748c965c7fa
> Author: Heikki Krogerus 
> Date:   Wed May 13 15:18:40 2020 +
> 
> kobject: Make sure the parent does not get released before its children
> 

Commit was reverted.

#syz invalid

Re: KASAN: use-after-free Read in evdev_cleanup

2020-06-04 Thread Eric Biggers

On Sun, May 24, 2020 at 01:22:44PM +0200, Greg KH wrote:
> On Sun, May 24, 2020 at 03:24:12AM -0700, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:c11d28ab Add linux-next specific files for 20200522
> > git tree:   linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14f5444110
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=3f6dbdea4159fb66
> > dashboard link: https://syzkaller.appspot.com/bug?extid=20458a5eab138777efc9
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=133e254a10
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15dbf01610
> > 
> > The bug was bisected to:
> > 
> > commit 4ef12f7198023c09ad6d25b652bd8748c965c7fa
> > Author: Heikki Krogerus 
> > Date:   Wed May 13 15:18:40 2020 +
> > 
> > kobject: Make sure the parent does not get released before its children
> 
> All issues that point to this commit can be now marked invalid as it has
> been reverted in Linus's tree.
> 

This is the way to invalidate a syzbot bug report:

#syz invalid

Re: linux-next test error: BUG: using smp_processor_id() in preemptible [ADDR] code: kworker/u4:LINE/6740

2020-06-04 Thread Eric Biggers

On Tue, Jun 02, 2020 at 05:20:16AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:0e21d462 Add linux-next specific files for 20200602
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13f3dcca10
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ecc1aef35f550ee3
> dashboard link: https://syzkaller.appspot.com/bug?extid=8ea916061cc749544c8f
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+8ea916061cc749544...@syzkaller.appspotmail.com
> 
> BUG: using smp_processor_id() in preemptible [] code: 
> kworker/u4:6/6740
> caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
> CPU: 1 PID: 6740 Comm: kworker/u4:6 Not tainted 5.7.0-next-20200602-syzkaller 
> #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Workqueue: writeback wb_workfn (flush-8:0)
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
>  ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
>  ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
>  ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
>  mpage_map_one_extent fs/ext4/inode.c:2377 [inline]
>  mpage_map_and_submit_extent fs/ext4/inode.c:2430 [inline]
>  ext4_writepages+0x1ab5/0x3400 fs/ext4/inode.c:2782
>  do_writepages+0xfa/0x2a0 mm/page-writeback.c:2338
>  __writeback_single_inode+0x12a/0x13d0 fs/fs-writeback.c:1453
>  writeback_sb_inodes+0x515/0xdc0 fs/fs-writeback.c:1717
>  __writeback_inodes_wb+0xc3/0x250 fs/fs-writeback.c:1786
>  wb_writeback+0x8db/0xd50 fs/fs-writeback.c:1895
>  wb_check_old_data_flush fs/fs-writeback.c:1997 [inline]
>  wb_do_writeback fs/fs-writeback.c:2050 [inline]
>  wb_workfn+0xab3/0x1090 fs/fs-writeback.c:2079
>  process_one_work+0x965/0x1690 kernel/workqueue.c:2269
>  worker_thread+0x96/0xe10 kernel/workqueue.c:2415
>  kthread+0x3b5/0x4a0 kernel/kthread.c:291
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

#syz dup: linux-next test error: BUG: using smp_processor_id() in preemptible 
[ADDR] code: syz-fuzzer/6792

Re: linux-next test error: BUG: using smp_processor_id() in preemptible [ADDR] code: systemd-rfkill/6731

2020-06-04 Thread Eric Biggers

On Tue, Jun 02, 2020 at 05:20:16AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:0e21d462 Add linux-next specific files for 20200602
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=102c59ce10
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ecc1aef35f550ee3
> dashboard link: https://syzkaller.appspot.com/bug?extid=7f2b4a7d4281e8c2aad0
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+7f2b4a7d4281e8c2a...@syzkaller.appspotmail.com
> 
> BUG: using smp_processor_id() in preemptible [] code: 
> systemd-rfkill/6731
> caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
> CPU: 0 PID: 6731 Comm: systemd-rfkill Not tainted 
> 5.7.0-next-20200602-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
>  ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
>  ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
>  ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
>  ext4_getblk+0xad/0x520 fs/ext4/inode.c:833
>  ext4_bread+0x7c/0x380 fs/ext4/inode.c:883
>  ext4_append+0x153/0x360 fs/ext4/namei.c:67
>  ext4_init_new_dir fs/ext4/namei.c:2757 [inline]
>  ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802
>  vfs_mkdir+0x419/0x690 fs/namei.c:3632
>  do_mkdirat+0x21e/0x280 fs/namei.c:3655
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7fe0d32c9687
> Code: Bad RIP value.
> RSP: 002b:7fffd5e80488 EFLAGS: 0246 ORIG_RAX: 0053
> RAX: ffda RBX: 55fab378a985 RCX: 7fe0d32c9687
> RDX: 7fffd5e80350 RSI: 01ed RDI: 55fab378a985
> RBP: 7fe0d32c9680 R08: 0100 R09: 
> R10: 55fab378a980 R11: 0246 R12: 01ed
> R13: 7fffd5e80610 R14:  R15: 
> 

#syz dup: linux-next test error: BUG: using smp_processor_id() in preemptible 
[ADDR] code: syz-fuzzer/6792

Re: linux-next test error: BUG: using smp_processor_id() in preemptible [ADDR] code: syz-fuzzer/6927

2020-06-04 Thread Eric Biggers

On Tue, Jun 02, 2020 at 04:20:17AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:0e21d462 Add linux-next specific files for 20200602
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1597f2fe10
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ecc1aef35f550ee3
> dashboard link: https://syzkaller.appspot.com/bug?extid=cd8a20b91d68ef113b45
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+cd8a20b91d68ef113...@syzkaller.appspotmail.com
> 
> BUG: using smp_processor_id() in preemptible [] code: syz-fuzzer/6927
> caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
> CPU: 1 PID: 6927 Comm: syz-fuzzer Not tainted 5.7.0-next-20200602-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
>  ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
>  ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
>  ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
>  ext4_getblk+0xad/0x520 fs/ext4/inode.c:833
>  ext4_bread+0x7c/0x380 fs/ext4/inode.c:883
>  ext4_append+0x153/0x360 fs/ext4/namei.c:67
>  ext4_init_new_dir fs/ext4/namei.c:2757 [inline]
>  ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802
>  vfs_mkdir+0x419/0x690 fs/namei.c:3632
>  do_mkdirat+0x21e/0x280 fs/namei.c:3655
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x4b02a0
> Code: Bad RIP value.
> RSP: 002b:00ccf4b8 EFLAGS: 0212 ORIG_RAX: 0102
> RAX: ffda RBX: 00c2c000 RCX: 004b02a0
> RDX: 01c0 RSI: 00c000116be0 RDI: ff9c
> RBP: 00ccf510 R08:  R09: 
> R10:  R11: 0212 R12: 
> R13: 0060 R14: 005f R15: 0100
> 
> 

#syz dup: linux-next test error: BUG: using smp_processor_id() in preemptible 
[ADDR] code: syz-fuzzer/6792

Re: linux-next test error: BUG: using smp_processor_id() in preemptible [ADDR] code: systemd-rfkill/6726

2020-06-04 Thread Eric Biggers

On Tue, Jun 02, 2020 at 05:19:16AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:0e21d462 Add linux-next specific files for 20200602
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13a7ffe210
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ecc1aef35f550ee3
> dashboard link: https://syzkaller.appspot.com/bug?extid=94f7894cc5600cc07094
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+94f7894cc5600cc07...@syzkaller.appspotmail.com
> 
> BUG: using smp_processor_id() in preemptible [] code: 
> systemd-rfkill/6726
> caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
> CPU: 0 PID: 6726 Comm: systemd-rfkill Not tainted 
> 5.7.0-next-20200602-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
>  ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
>  ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
>  ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
>  ext4_getblk+0xad/0x520 fs/ext4/inode.c:833
>  ext4_bread+0x7c/0x380 fs/ext4/inode.c:883
>  ext4_append+0x153/0x360 fs/ext4/namei.c:67
>  ext4_init_new_dir fs/ext4/namei.c:2757 [inline]
>  ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802
>  vfs_mkdir+0x419/0x690 fs/namei.c:3632
>  do_mkdirat+0x21e/0x280 fs/namei.c:3655
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7fa49b78d687
> Code: Bad RIP value.
> RSP: 002b:7ffde44382d8 EFLAGS: 0246 ORIG_RAX: 0053
> RAX: ffda RBX: 5647ba692985 RCX: 7fa49b78d687
> RDX: 7ffde44381a0 RSI: 01ed RDI: 5647ba692985
> RBP: 7fa49b78d680 R08: 0100 R09: 
> R10: 5647ba692980 R11: 0246 R12: 01ed
> R13: 7ffde4438460 R14:  R15: 

#syz dup: linux-next test error: BUG: using smp_processor_id() in preemptible 
[ADDR] code: syz-fuzzer/6792

Re: linux-next test error: BUG: using smp_processor_id() in preemptible [ADDR] code: systemd-rfkill/6728

2020-06-04 Thread Eric Biggers

On Thu, Jun 04, 2020 at 09:02:20PM -0700, Eric Biggers wrote:
> Introduced by:
> 
> commit 42f56b7a4a7db127a9d281da584152dc3d525d25
> Author: Ritesh Harjani 
> Date:   Wed May 20 12:10:34 2020 +0530
> 
> ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC 
> handling
> 

Hmm, syzbot reported this several times already.  Marking it as a duplicate
of the report where the discussion happened:

#syz dup: linux-next test error: BUG: using smp_processor_id() in preemptible 
[ADDR] code: syz-fuzzer/6792

- Eric

Re: linux-next test error: BUG: using smp_processor_id() in preemptible [ADDR] code: kworker/u4:LINE/46

2020-06-04 Thread Eric Biggers

On Tue, Jun 02, 2020 at 04:20:16AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:0e21d462 Add linux-next specific files for 20200602
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=17fee51610
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ecc1aef35f550ee3
> dashboard link: https://syzkaller.appspot.com/bug?extid=4d28f1825b8fb92fa383
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+4d28f1825b8fb92fa...@syzkaller.appspotmail.com
> 
> BUG: using smp_processor_id() in preemptible [] code: kworker/u4:3/46
> caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
> CPU: 1 PID: 46 Comm: kworker/u4:3 Not tainted 5.7.0-next-20200602-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Workqueue: writeback wb_workfn (flush-8:0)
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
>  ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
>  ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
>  ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
>  mpage_map_one_extent fs/ext4/inode.c:2377 [inline]
>  mpage_map_and_submit_extent fs/ext4/inode.c:2430 [inline]
>  ext4_writepages+0x1ab5/0x3400 fs/ext4/inode.c:2782
>  do_writepages+0xfa/0x2a0 mm/page-writeback.c:2338
>  __writeback_single_inode+0x12a/0x13d0 fs/fs-writeback.c:1453
>  writeback_sb_inodes+0x515/0xdc0 fs/fs-writeback.c:1717
>  __writeback_inodes_wb+0xc3/0x250 fs/fs-writeback.c:1786
>  wb_writeback+0x8db/0xd50 fs/fs-writeback.c:1895
>  wb_check_old_data_flush fs/fs-writeback.c:1997 [inline]
>  wb_do_writeback fs/fs-writeback.c:2050 [inline]
>  wb_workfn+0xab3/0x1090 fs/fs-writeback.c:2079
>  process_one_work+0x965/0x1690 kernel/workqueue.c:2269
>  worker_thread+0x96/0xe10 kernel/workqueue.c:2415
>  kthread+0x3b5/0x4a0 kernel/kthread.c:291
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
> tipc: TX() has been purged, node left!

#syz dup: linux-next test error: BUG: using smp_processor_id() in preemptible 
[ADDR] code: syz-fuzzer/6792

Re: [PATCH] pwm: Add missing "CONFIG_" prefix

2020-06-04 Thread Joe Perches

On Thu, 2020-06-04 at 14:52 -0700, Kees Cook wrote:
> On Wed, Jun 03, 2020 at 04:04:31PM -0700, Joe Perches wrote:
> > On Wed, 2020-06-03 at 15:40 -0700, Kees Cook wrote:
> > > The IS_ENABLED() use was missing the CONFIG_ prefix which would have
> > > lead to skipping this code.
> > > 
> > > Fixes: 3ad1f3a33286 ("pwm: Implement some checks for lowlevel drivers")
> > > Signed-off-by: Kees Cook 
> > > ---
> > >  drivers/pwm/core.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pwm/core.c b/drivers/pwm/core.c
> > > index 9973c442b455..6b3cbc0490c6 100644
> > > --- a/drivers/pwm/core.c
> > > +++ b/drivers/pwm/core.c
> > > @@ -121,7 +121,7 @@ static int pwm_device_request(struct pwm_device *pwm, 
> > > const char *label)
> > >   pwm->chip->ops->get_state(pwm->chip, pwm, >state);
> > >   trace_pwm_get(pwm, >state);
> > >  
> > > - if (IS_ENABLED(PWM_DEBUG))
> > > + if (IS_ENABLED(CONFIG_PWM_DEBUG))
> > >   pwm->last = pwm->state;
> > >   }
> > >  
> > > -- 
> > > 2.25.1
> > > 
> > 
> > more odd uses (mostly in comments)
> > 
> > $ git grep -P -oh '\bIS_ENABLED\s*\(\s*\w+\s*\)'| \
> >   sed -r 's/\s+//g'| \
> >   grep -v '(CONFIG_' | \
> >   sort | uniq -c | sort -rn
> >   7 IS_ENABLED(DEBUG)
> >   4 IS_ENABLED(DRM_I915_SELFTEST)
> >   4 IS_ENABLED(cfg)
> >   2 IS_ENABLED(opt_name)
> >   2 IS_ENABLED(DEBUG_PRINT_TRIE_GRAPHVIZ)
> >   2 IS_ENABLED(config)
> >   2 IS_ENABLED(cond)
> >   2 IS_ENABLED(__BIG_ENDIAN)
> >   1 IS_ENABLED(x)
> >   1 IS_ENABLED(STRICT_KERNEL_RWX)
> >   1 IS_ENABLED(PWM_DEBUG)
> >   1 IS_ENABLED(option)
> >   1 IS_ENABLED(ETHTOOL_NETLINK)
> >   1 IS_ENABLED(DEBUG_RANDOM_TRIE)
> >   1 IS_ENABLED(DEBUG_CHACHA20POLY1305_SLOW_CHUNK_TEST)
> > 
> > STRICT_KERNEL_RWX is misused here in ppc
> > 
> > ---
> > 
> > Fix pr_warn without newline too.
> > 
> >  arch/powerpc/mm/book3s64/hash_utils.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
> > b/arch/powerpc/mm/book3s64/hash_utils.c
> > index 51e3c15f7aff..dd60c5f2b991 100644
> > --- a/arch/powerpc/mm/book3s64/hash_utils.c
> > +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> > @@ -660,11 +660,10 @@ static void __init htab_init_page_sizes(void)
> >  * Pick a size for the linear mapping. Currently, we only
> >  * support 16M, 1M and 4K which is the default
> >  */
> > -   if (IS_ENABLED(STRICT_KERNEL_RWX) &&
> > +   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) &&
> > (unsigned long)_stext % 0x100) {
> > if (mmu_psize_defs[MMU_PAGE_16M].shift)
> > -   pr_warn("Kernel not 16M aligned, "
> > -   "disabling 16M linear map alignment");
> > +   pr_warn("Kernel not 16M aligned, disabling 16M 
> > linear map alignment\n");
> > aligned = false;
> > }
> 
> Joe, I was going to send all of the fixes for these issues, but your
> patch doesn't have a SoB. Shall I add one for the above patch?

 sure if you want, or submit it yourself.

My feeling about these types of changes is the maintainers
of the subsystems, in this case ppc, should manage this
themselves and shouldn't require anyone else to actually
bother to send real patches.

Re: BUG: using smp_processor_id() in preemptible code in debug_smp_processor_id

2020-06-04 Thread Eric Biggers

On Thu, Jun 04, 2020 at 07:42:18AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:065fcfd4 selftests: net: ip_defrag: ignore EPERM
> git tree:   net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15c3e51610
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d89141553e61b775
> dashboard link: https://syzkaller.appspot.com/bug?extid=9e0b179ae55eaf7a307a
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=131b5cf210
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=176dfcf210
> 
> The bug was bisected to:
> 
> commit e42671084361302141a09284fde9bbc14fdd16bf
> Author: Manivannan Sadhasivam 
> Date:   Thu May 7 12:53:06 2020 +
> 
> net: qrtr: Do not depend on ARCH_QCOM
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1295eb9110
> final crash:https://syzkaller.appspot.com/x/report.txt?x=1195eb9110
> console output: https://syzkaller.appspot.com/x/log.txt?x=1695eb9110
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+9e0b179ae55eaf7a3...@syzkaller.appspotmail.com
> Fixes: e42671084361 ("net: qrtr: Do not depend on ARCH_QCOM")
> 
> RDX:  RSI: 2100 RDI: 0004
> RBP: 006cb018 R08: 0001 R09: 004002c8
> R10:  R11: 0246 R12: 00401e90
> R13: 00401f20 R14:  R15: 
> BUG: using smp_processor_id() in preemptible [] code: 
> syz-executor013/7182
> caller is radix_tree_node_alloc.constprop.0+0x200/0x330 lib/radix-tree.c:264
> CPU: 0 PID: 7182 Comm: syz-executor013 Not tainted 5.7.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x188/0x20d lib/dump_stack.c:118
>  check_preemption_disabled lib/smp_processor_id.c:47 [inline]
>  debug_smp_processor_id.cold+0x88/0x9b lib/smp_processor_id.c:57
>  radix_tree_node_alloc.constprop.0+0x200/0x330 lib/radix-tree.c:264
>  radix_tree_extend+0x234/0x4a0 lib/radix-tree.c:426
>  idr_get_free+0x60c/0x8e0 lib/radix-tree.c:1494
>  idr_alloc_u32+0x170/0x2d0 lib/idr.c:46
>  idr_alloc+0xc2/0x130 lib/idr.c:87
>  qrtr_port_assign net/qrtr/qrtr.c:703 [inline]
>  __qrtr_bind.isra.0+0x12e/0x5c0 net/qrtr/qrtr.c:756
>  qrtr_autobind net/qrtr/qrtr.c:787 [inline]
>  qrtr_autobind+0xaf/0xf0 net/qrtr/qrtr.c:775
>  qrtr_sendmsg+0x1d6/0x770 net/qrtr/qrtr.c:895
>  sock_sendmsg_nosec net/socket.c:652 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:672
>  sys_sendmsg+0x6e6/0x810 net/socket.c:2352
>  ___sys_sendmsg+0x100/0x170 net/socket.c:2406
>  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
>  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> RIP: 0033:0x4405a9
> Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 
> 83 5b 14 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7ffe905331b8 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 004a1bd8 RCX: 004405a9
> RDX:  RSI: 

#syz dup: BUG: using smp_processor_id() in preemptible code in 
radix_tree_node_alloc

See discussion at 
https://lkml.kernel.org/lkml/a363b205a74ca...@google.com/T/#u

Re: linux-next test error: BUG: using smp_processor_id() in preemptible [ADDR] code: systemd-rfkill/6728

2020-06-04 Thread Eric Biggers

Introduced by:

commit 42f56b7a4a7db127a9d281da584152dc3d525d25
Author: Ritesh Harjani 
Date:   Wed May 20 12:10:34 2020 +0530

ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC 
handling

On Thu, Jun 04, 2020 at 07:02:18PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:0e21d462 Add linux-next specific files for 20200602
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1046166110
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ecc1aef35f550ee3
> dashboard link: https://syzkaller.appspot.com/bug?extid=aed048f49c59eb997737
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+aed048f49c59eb997...@syzkaller.appspotmail.com
> 
> BUG: using smp_processor_id() in preemptible [] code: 
> systemd-rfkill/6728
> caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
> CPU: 1 PID: 6728 Comm: systemd-rfkill Not tainted 
> 5.7.0-next-20200602-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
>  ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
>  ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
>  ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
>  ext4_getblk+0xad/0x520 fs/ext4/inode.c:833
>  ext4_bread+0x7c/0x380 fs/ext4/inode.c:883
>  ext4_append+0x153/0x360 fs/ext4/namei.c:67
>  ext4_init_new_dir fs/ext4/namei.c:2757 [inline]
>  ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802
>  vfs_mkdir+0x419/0x690 fs/namei.c:3632
>  do_mkdirat+0x21e/0x280 fs/namei.c:3655
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f9ffaa79687
> Code: Bad RIP value.
> RSP: 002b:7ffeb3197c38 EFLAGS: 0246 ORIG_RAX: 0053
> RAX: ffda RBX: 55c2e6155985 RCX: 7f9ffaa79687
> RDX: 7ffeb3197b00 RSI: 01ed RDI: 55c2e6155985
> RBP: 7f9ffaa79680 R08: 0100 R09: 
> R10: 55c2e6155980 R11: 0246 R12: 01ed
> R13: 7ffeb3197dc0 R14:  R15: 
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>

Re: [PATCH v6 0/4] kdb: Improve console handling

2020-06-04 Thread Sergey Senozhatsky

On (20/06/04 15:31), Sumit Garg wrote:
> 
>  drivers/tty/serial/kgdb_nmi.c |  2 +-
>  drivers/tty/serial/kgdboc.c   | 32 +--
>  drivers/usb/early/ehci-dbgp.c |  3 +-
>  include/linux/kgdb.h  |  5 ++-
>  kernel/debug/kdb/kdb_io.c | 72 
> ++-
>  5 files changed, 64 insertions(+), 50 deletions(-)

Reviewed-by: Sergey Senozhatsky 

-ss

Re: BUG: using smp_processor_id() in preemptible code in radix_tree_node_alloc

2020-06-04 Thread Eric Biggers

[+Cc Matthew Wilcox]

Possibly a bug in lib/radix-tree.c?  this_cpu_ptr() in radix_tree_node_alloc()
can be reached without a prior preempt_disable().  Or is the caller of
idr_alloc() doing something wrong?

On Thu, Jun 04, 2020 at 07:02:18PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:acf25aa6 Merge tag 'Smack-for-5.8' of git://github.com/csc..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=13d6307a10
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5263d9b5bce03c67
> dashboard link: https://syzkaller.appspot.com/bug?extid=3eec59e770685e3dc879
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15bd4c1e10
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1520c9de10
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+3eec59e770685e3dc...@syzkaller.appspotmail.com
> 
> RAX: ffda RBX: 7ffdf01d56d0 RCX: 004406c9
> RDX:  RSI: 2240 RDI: 0003
> RBP: 0005 R08: 0001 R09: 0031
> R10:  R11: 0246 R12: 00401f50
> R13: 00401fe0 R14:  R15: 
> BUG: using smp_processor_id() in preemptible [] code: 
> syz-executor036/6796
> caller is radix_tree_node_alloc.constprop.0+0x200/0x330 lib/radix-tree.c:262
> CPU: 0 PID: 6796 Comm: syz-executor036 Not tainted 5.7.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x188/0x20d lib/dump_stack.c:118
>  check_preemption_disabled lib/smp_processor_id.c:47 [inline]
>  debug_smp_processor_id.cold+0x88/0x9b lib/smp_processor_id.c:57
>  radix_tree_node_alloc.constprop.0+0x200/0x330 lib/radix-tree.c:262
>  radix_tree_extend+0x256/0x4e0 lib/radix-tree.c:424
>  idr_get_free+0x60c/0x8e0 lib/radix-tree.c:1492
>  idr_alloc_u32+0x170/0x2d0 lib/idr.c:46
>  idr_alloc+0xc2/0x130 lib/idr.c:87
>  qrtr_port_assign net/qrtr/qrtr.c:703 [inline]
>  __qrtr_bind.isra.0+0x12e/0x5c0 net/qrtr/qrtr.c:756
>  qrtr_autobind net/qrtr/qrtr.c:787 [inline]
>  qrtr_autobind+0xaf/0xf0 net/qrtr/qrtr.c:775
>  qrtr_sendmsg+0x1d6/0x770 net/qrtr/qrtr.c:895
>  sock_sendmsg_nosec net/socket.c:652 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:672
>  sys_sendmsg+0x6e6/0x810 net/socket.c:2352
>  ___sys_sendmsg+0x100/0x170 net/socket.c:2406
>  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
>  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> RIP: 0033:0x4406c9
> Code: 25 02 00 85 c0 b8 00 00 00 00 48 0f 44 c3 5b c3 90 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 
> 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7ffdf01d56c8 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 7ffdf01d56d0 RCX: 004406c9
> RDX:  RSI: 2240 RDI: 0003
> RBP: 0005 R08: 0001 R09: 0031
> R10:  R11: 0246 R12: 00401f50
> R13: 00401fe0 R14:  R15: 
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
>

[PATCH] usb: gadget: function: printer: fix use-after-free in __lock_acquire

2020-06-04 Thread qiang.zhang

From: Zqiang 

Fix this by increase object reference count.

BUG: KASAN: use-after-free in __lock_acquire+0x3fd4/0x4180
kernel/locking/lockdep.c:3831
Read of size 8 at addr 8880683b0018 by task syz-executor.0/3377

CPU: 1 PID: 3377 Comm: syz-executor.0 Not tainted 5.6.11 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xce/0x128 lib/dump_stack.c:118
 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374
 __kasan_report+0x131/0x1b0 mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:641
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
 __lock_acquire+0x3fd4/0x4180 kernel/locking/lockdep.c:3831
 lock_acquire+0x127/0x350 kernel/locking/lockdep.c:4488
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159
 printer_ioctl+0x4a/0x110 drivers/usb/gadget/function/f_printer.c:723
 vfs_ioctl fs/ioctl.c:47 [inline]
 ksys_ioctl+0xfb/0x130 fs/ioctl.c:763
 __do_sys_ioctl fs/ioctl.c:772 [inline]
 __se_sys_ioctl fs/ioctl.c:770 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:770
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4531a9
Code: ed 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 bb 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7fd14ad72c78 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 0073bfa8 RCX: 004531a9
RDX: fff9 RSI: 009e RDI: 0003
RBP: 0003 R08:  R09: 
R10:  R11: 0246 R12: 004bbd61
R13: 004d0a98 R14: 7fd14ad736d4 R15: 

Allocated by task 2393:
 save_stack+0x21/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 __kasan_kmalloc.constprop.3+0xa7/0xd0 mm/kasan/common.c:515
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529
 kmem_cache_alloc_trace+0xfa/0x2d0 mm/slub.c:2813
 kmalloc include/linux/slab.h:555 [inline]
 kzalloc include/linux/slab.h:669 [inline]
 gprinter_alloc+0xa1/0x870 drivers/usb/gadget/function/f_printer.c:1416
 usb_get_function+0x58/0xc0 drivers/usb/gadget/functions.c:61
 config_usb_cfg_link+0x1ed/0x3e0 drivers/usb/gadget/configfs.c:444
 configfs_symlink+0x527/0x11d0 fs/configfs/symlink.c:202
 vfs_symlink+0x33d/0x5b0 fs/namei.c:4201
 do_symlinkat+0x11b/0x1d0 fs/namei.c:4228
 __do_sys_symlinkat fs/namei.c:4242 [inline]
 __se_sys_symlinkat fs/namei.c:4239 [inline]
 __x64_sys_symlinkat+0x73/0xb0 fs/namei.c:4239
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 3368:
 save_stack+0x21/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 kasan_set_free_info mm/kasan/common.c:337 [inline]
 __kasan_slab_free+0x135/0x190 mm/kasan/common.c:476
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:485
 slab_free_hook mm/slub.c:1444 [inline]
 slab_free_freelist_hook mm/slub.c:1477 [inline]
 slab_free mm/slub.c:3034 [inline]
 kfree+0xf7/0x410 mm/slub.c:3995
 gprinter_free+0x49/0xd0 drivers/usb/gadget/function/f_printer.c:1353
 usb_put_function+0x38/0x50 drivers/usb/gadget/functions.c:87
 config_usb_cfg_unlink+0x2db/0x3b0 drivers/usb/gadget/configfs.c:485
 configfs_unlink+0x3b9/0x7f0 fs/configfs/symlink.c:250
 vfs_unlink+0x287/0x570 fs/namei.c:4073
 do_unlinkat+0x4f9/0x620 fs/namei.c:4137
 __do_sys_unlink fs/namei.c:4184 [inline]
 __se_sys_unlink fs/namei.c:4182 [inline]
 __x64_sys_unlink+0x42/0x50 fs/namei.c:4182
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at 8880683b
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 24 bytes inside of
 1024-byte region [8880683b, 8880683b0400)
The buggy address belongs to the page:
page:ea0001a0ec00 refcount:1 mapcount:0 mapping:88806c00e300
index:0x8880683b1800 compound_mapcount: 0
flags: 0x1010200(slab|head)
raw: 01010200  00060001 88806c00e300
raw: 8880683b1800 801a 0001 
page dumped because: kasan: bad access detected

Reported-by: Kyungtae Kim 
Signed-off-by: Zqiang 
---
 drivers/usb/gadget/function/f_printer.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/f_printer.c 
b/drivers/usb/gadget/function/f_printer.c
index 9c7ed2539ff7..8ed1295d7e35 100644
--- a/drivers/usb/gadget/function/f_printer.c
+++ b/drivers/usb/gadget/function/f_printer.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -64,7 +65,7 @@ struct printer_dev {
struct usb_gadget   *gadget;
s8

Re: [PATCH RFC 03/13] vhost: batching fetches

2020-06-04 Thread Jason Wang




On 2020/6/4 下午4:59, Michael S. Tsirkin wrote:

On Wed, Jun 03, 2020 at 03:27:39PM +0800, Jason Wang wrote:

On 2020/6/2 下午9:06, Michael S. Tsirkin wrote:

With this patch applied, new and old code perform identically.

Lots of extra optimizations are now possible, e.g.
we can fetch multiple heads with copy_from/to_user now.
We can get rid of maintaining the log array.  Etc etc.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: Eugenio Pérez
Link:https://lore.kernel.org/r/20200401183118.8334-4-epere...@redhat.com
Signed-off-by: Michael S. Tsirkin
---
   drivers/vhost/test.c  |  2 +-
   drivers/vhost/vhost.c | 47 ++-
   drivers/vhost/vhost.h |  5 -
   3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 9a3a09005e03..02806d6f84ef 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -119,7 +119,7 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
dev = >dev;
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV + 64,
   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, NULL);
f->private_data = n;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8f9a07282625..aca2a5b0d078 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -299,6 +299,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
   {
vq->num = 1;
vq->ndescs = 0;
+   vq->first_desc = 0;
vq->desc = NULL;
vq->avail = NULL;
vq->used = NULL;
@@ -367,6 +368,11 @@ static int vhost_worker(void *data)
return 0;
   }
+static int vhost_vq_num_batch_descs(struct vhost_virtqueue *vq)
+{
+   return vq->max_descs - UIO_MAXIOV;
+}

1 descriptor does not mean 1 iov, e.g userspace may pass several 1 byte
length memory regions for us to translate.


Yes but I don't see the relevance. This tells us how many descriptors to
batch, not how many IOVs.



Yes, but questions are:

- this introduce another obstacle to support more than 1K queue size
- if we support 1K queue size, does it mean we need to cache 1K 
descriptors, which seems a large stress on the cache


Thanks

Re: [PATCH V2] mm/vmstat: Add events for THP migration without split

2020-06-04 Thread Anshuman Khandual




On 06/04/2020 10:19 PM, Zi Yan wrote:
> On 4 Jun 2020, at 12:36, Matthew Wilcox wrote:
> 
>> On Thu, Jun 04, 2020 at 09:51:10AM -0400, Zi Yan wrote:
>>> On 4 Jun 2020, at 7:34, Matthew Wilcox wrote:
 On Thu, Jun 04, 2020 at 09:30:45AM +0530, Anshuman Khandual wrote:
> +Quantifying Migration
> +=
> +Following events can be used to quantify page migration.
> +
> +- PGMIGRATE_SUCCESS
> +- PGMIGRATE_FAIL
> +- THP_MIGRATION_SUCCESS
> +- THP_MIGRATION_FAILURE
> +
> +THP_MIGRATION_FAILURE in particular represents an event when a THP could 
> not be
> +migrated as a single entity following an allocation failure and ended up 
> getting
> +split into constituent normal pages before being retried. This event, 
> along with
> +PGMIGRATE_SUCCESS and PGMIGRATE_FAIL will help in quantifying and 
> analyzing THP
> +migration events including both success and failure cases.

 First, I'd suggest running this paragraph through 'fmt'.  That way you
 don't have to care about line lengths.

 Second, this paragraph doesn't really explain what I need to know to
 understand the meaning of these numbers.  When Linux attempts to migrate
 a THP, one of three things can happen:

  - It is migrated as a single THP
  - It is migrated, but had to be split
  - Migration fails

 How do I turn these four numbers into an understanding of how often each
 of those three situations happen?  And why do we need four numbers to
 report three situations?

 Or is there something else that can happen?  If so, I'd like that explained
 here too ;-)
>>>
>>> PGMIGRATE_SUCCESS and PGMIGRATE_FAIL record a combination of different 
>>> events,
>>> so it is not easy to interpret them. Let me try to explain them.
>>
>> Thanks!  Very helpful explanation.
>>
>>> 1. migrating only base pages: PGMIGRATE_SUCCESS and PGMIGRATE_FAIL just mean
>>> these base pages are migrated and fail to migrate respectively.
>>> THP_MIGRATION_SUCCESS and THP_MIGRATION_FAILURE should be 0 in this case.
>>> Simple.
>>>
>>> 2. migrating only THPs:
>>> - PGMIGRATE_SUCCESS means THPs that are migrated and base pages
>>> (from the split of THPs) that are migrated,
>>>
>>> - PGMIGRATE_FAIL means THPs that fail to migrate and base pages that 
>>> fail to migrated.
>>>
>>> - THP_MIGRATION_SUCCESS means THPs that are migrated.
>>>
>>> - THP_MIGRATION_FAILURE means THPs that are split.
>>>
>>> So PGMIGRATE_SUCCESS - THP_MIGRATION_SUCCESS means the number of migrated 
>>> base pages,
>>> which are from the split of THPs.
>>
>> Are you sure about that?  If I split a THP and each of those subpages
>> migrates, won't I then see PGMIGRATE_SUCCESS increase by 512?
> 
> That is what I mean. I guess my words did not work. I should have used 
> subpages.
> 
>>
>>> When it comes to analyze failed migration, PGMIGRATE_FAIL - 
>>> THP_MIGRATION_FAILURE
>>> means the number of pages that are failed to migrate, but we cannot tell 
>>> how many
>>> are base pages and how many are THPs.
>>>
>>> 3. migrating base pages and THP:
>>>
>>> The math should be very similar to the second case, except that
>>> a) from PGMIGRATE_SUCCESS - THP_MIGRATION_SUCCESS, we cannot tell how many 
>>> are pages begin
>>> as base pages and how many are pages begin as THPs but become base pages 
>>> after split;
>>> b) from PGMIGRATE_FAIL - THP_MIGRATION_FAILURE, an additional case,
>>> base pages that begin as base pages fail to migrate, is mixed into the 
>>> number and we
>>> cannot tell three cases apart.
>>
>> So why don't we just expose PGMIGRATE_SPLIT?  That would be defined as
>> the number of times we succeeded in migrating a THP but had to split it
>> to succeed.
> 
> It might need extra code to get that number. Currently, the subpages from 
> split
> THPs are appended to the end of the original page list, so we might need a 
> separate
> page list for these subpages to count PGMIGRATE_SPLIT. Also what if some of 
> the
> subpages fail to migrate? Do we increase PGMIGRATE_SPLIT or not?

Thanks Zi, for such a detailed explanation. Ideally, we should separate THP
migration from base page migration in terms of statistics. PGMIGRATE_SUCCESS
and PGMIGRATE_FAIL should continue to track statistics when migration starts
with base pages. But for THP, we should track the following events.

1. THP_MIGRATION_SUCCESS- THP migration is successful, without split
2. THP_MIGRATION_FAILURE- THP could neither be migrated, nor be split
3. THP_MIGRATION_SPLIT_SUCCESS  - THP got split and all sub pages migrated
4. THP_MIGRATION_SPLIT_FAILURE  - THP got split but all sub pages could not be 
migrated

THP_MIGRATION_SPLIT_FAILURE could either increment once for a single THP or
number of subpages that did not get migrated after split. As you mentioned,
this will need some extra code in the core migration. Nonetheless, if these
new events look

[PATCH 1/1] riscv: Select ARCH_SUPPORTS_ATOMIC_RMW by default

2020-06-04 Thread Chenxi Mao

Select ARCH_SUPPORTS_ATOMIC_RMW by default to enabel osqlocks.

PS2: Add signed off info.

Signed-off-by: Chenxi Mao 
---
 arch/riscv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index a31e1a41913a..cbdc605d20d9 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -68,6 +68,7 @@ config RISCV
select ARCH_HAS_GCOV_PROFILE_ALL
select HAVE_COPY_THREAD_TLS
select HAVE_ARCH_KASAN if MMU && 64BIT
+   select ARCH_SUPPORTS_ATOMIC_RMW
 
 config ARCH_MMAP_RND_BITS_MIN
default 18 if 64BIT
-- 
2.25.1

[PATCH] can: xilinx_can: handle failure cases of pm_runtime_get_sync

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/net/can/xilinx_can.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
index c1dbab8c896d..748ff70f6a7b 100644
--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -1391,7 +1391,7 @@ static int xcan_open(struct net_device *ndev)
if (ret < 0) {
netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
   __func__, ret);
-   return ret;
+   goto err;
}
 
ret = request_irq(ndev->irq, xcan_interrupt, priv->irq_flags,
@@ -1475,6 +1475,7 @@ static int xcan_get_berr_counter(const struct net_device 
*ndev,
if (ret < 0) {
netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
   __func__, ret);
+   pm_runtime_put(priv->dev);
return ret;
}
 
@@ -1789,7 +1790,7 @@ static int xcan_probe(struct platform_device *pdev)
if (ret < 0) {
netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
   __func__, ret);
-   goto err_pmdisable;
+   goto err_disableclks;
}
 
if (priv->read_reg(priv, XCAN_SR_OFFSET) != XCAN_SR_CONFIG_MASK) {
@@ -1824,7 +1825,6 @@ static int xcan_probe(struct platform_device *pdev)
 
 err_disableclks:
pm_runtime_put(priv->dev);
-err_pmdisable:
pm_runtime_disable(>dev);
 err_free:
free_candev(ndev);
-- 
2.17.1

Re: [GIT PULL] Devicetree updates for v5.8

2020-06-04 Thread pr-tracker-bot

The pull request you sent on Thu, 4 Jun 2020 16:04:59 -0600:

> git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git 
> tags/devicetree-for-5.8

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/571d54ed91c0fae174d933683c0c2e11c84843d9

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] RISC-V Patches for the 5.8 Merge Window, Part 1

2020-06-04 Thread pr-tracker-bot

The pull request you sent on Thu, 04 Jun 2020 11:57:25 -0700 (PDT):

> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git 
> tags/riscv-for-linus-5.8-mw0

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/435faf5c218a47fd6258187f62d9bb1009717896

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

[PATCH] wlcore: mesh: handle failure case of pm_runtime_get_sync

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/net/wireless/ti/wlcore/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ti/wlcore/main.c 
b/drivers/net/wireless/ti/wlcore/main.c
index f140f7d7f553..c7e4f5a80b9e 100644
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -3662,8 +3662,10 @@ void wlcore_regdomain_config(struct wl1271 *wl)
goto out;
 
ret = pm_runtime_get_sync(wl->dev);
-   if (ret < 0)
+   if (ret < 0) {
+   pm_runtime_put_autosuspend(wl->dev);
goto out;
+   }
 
ret = wlcore_cmd_regdomain_config_locked(wl);
if (ret < 0) {
-- 
2.17.1

[PATCH] PCI: rcar: handle the failure case of pm_runtime_get_sync

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/pci/controller/pcie-rcar.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/pci/controller/pcie-rcar.c 
b/drivers/pci/controller/pcie-rcar.c
index 759c6542c5c8..6b4181c0710e 100644
--- a/drivers/pci/controller/pcie-rcar.c
+++ b/drivers/pci/controller/pcie-rcar.c
@@ -1137,7 +1137,7 @@ static int rcar_pcie_probe(struct platform_device *pdev)
err = pm_runtime_get_sync(pcie->dev);
if (err < 0) {
dev_err(pcie->dev, "pm_runtime_get_sync failed\n");
-   goto err_pm_disable;
+   goto err_pm_put;
}
 
err = rcar_pcie_get_resources(pcie);
@@ -1208,8 +1208,6 @@ static int rcar_pcie_probe(struct platform_device *pdev)
 
 err_pm_put:
pm_runtime_put(dev);
-
-err_pm_disable:
pm_runtime_disable(dev);
pci_free_resource_list(>resources);
 
-- 
2.17.1

[PATCH] Staging: comedi: Added blank lines to fix coding style issue

2020-06-04 Thread Divyansh Kamboj

Fixed a coding style issue by adding a blank line after declarations

Signed-off-by: Divyansh Kamboj 
---
 drivers/staging/comedi/comedi_fops.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/staging/comedi/comedi_fops.c 
b/drivers/staging/comedi/comedi_fops.c
index e85a99b68f31..3f70e5dfac39 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -2169,6 +2169,7 @@ static long comedi_unlocked_ioctl(struct file *file, 
unsigned int cmd,
break;
case COMEDI_CHANINFO: {
struct comedi_chaninfo it;
+
if (copy_from_user(, (void __user *)arg, sizeof(it)))
rc = -EFAULT;
else
@@ -2177,6 +2178,7 @@ static long comedi_unlocked_ioctl(struct file *file, 
unsigned int cmd,
}
case COMEDI_RANGEINFO: {
struct comedi_rangeinfo it;
+
if (copy_from_user(, (void __user *)arg, sizeof(it)))
rc = -EFAULT;
else
@@ -2249,6 +2251,7 @@ static long comedi_unlocked_ioctl(struct file *file, 
unsigned int cmd,
}
case COMEDI_INSN: {
struct comedi_insn insn;
+
if (copy_from_user(, (void __user *)arg, sizeof(insn)))
rc = -EFAULT;
else
-- 
2.26.2

Re: [PATCH v2 5/6] block: nr_sects_write(): Disable preemption on seqcount write

2020-06-04 Thread Jens Axboe

On 6/3/20 8:49 AM, Ahmed S. Darwish wrote:
> For optimized block readers not holding a mutex, the "number of sectors"
> 64-bit value is protected from tearing on 32-bit architectures by a
> sequence counter.
> 
> Disable preemption before entering that sequence counter's write side
> critical section. Otherwise, the read side can preempt the write side
> section and spin for the entire scheduler tick. If the reader belongs to
> a real-time scheduling class, it can spin forever and the kernel will
> livelock.

Applied, thanks.

-- 
Jens Axboe

Re: stress-ng --hrtimers hangs system

2020-06-04 Thread Paul E. McKenney

On Fri, Jun 05, 2020 at 04:47:51AM +0300, Vladimir Oltean wrote:
> Hi,
> 
> I was testing stress-ng on an ARM64 box and I found that it can be killed 
> instantaneously with a --hrtimers 1 test:
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> The console shell locks up immediately after starting the process, and I get 
> this rcu_preempt splat after 21 seconds,
> letting me know that the grace-periods kernel thread could not run:
> 
> # stress-ng --hrtimers 1
> stress-ng: info:  [320] defaulting to a 86400 second (1 day, 0.00 secs) run 
> per stressor
> stress-ng: info:  [320] dispatching hogs: 1 hrtimers
> stress-ng: info:  [320] cache allocate: using defaults, can't determine cache 
> details from sysfs
> [   85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   85.833656]  (detected by 1, t=6502 jiffies, g=1789, q=12)
> [   85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 
> (4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [   85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 
> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> [   85.861859] rcu: Unless rcu_preempt kthread gets sufficient CPU time, 
> OOM is now expected behavior.
> [   85.871025] rcu: RCU grace-period kthread stack dump:
> [   85.876094] rcu_preempt R  running task010  2 
> 0x0028
> [   85.883173] Call trace:
> [   85.885636]  __switch_to+0xf8/0x148
> [   85.889137]  __schedule+0x2d8/0x808
> [   85.892636]  schedule+0x48/0x100
> [   85.895875]  schedule_timeout+0x1c8/0x420
> [   85.899900]  rcu_gp_kthread+0x738/0x1b78
> [   85.903836]  kthread+0x158/0x168
> [   85.907075]  ret_from_fork+0x10/0x18
> [   93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 
> stuck for 33s!
> [   93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck 
> for 33s!
> [   93.299105] Showing busy workqueues and worker pools:
> [   93.304189] workqueue events: flags=0x0
> [   93.308116]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [   93.315031] pending: vmstat_shepherd
> [   93.318990] workqueue events_unbound: flags=0x2
> [   93.323577]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3
> [   93.330309] pending: flush_to_ldisc
> [   93.334161]
> [   93.334164] ==
> [   93.334165] WARNING: possible circular locking dependency detected
> [   93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted
> [   93.334168] --
> [   93.334169] stress-ng-hrtim/326 is trying to acquire lock:
> [   93.334171] b236f6aee1a0 (console_owner){-.-.}-{0:0}, at: 
> console_unlock+0x21c/0x678
> [   93.334176]
> [   93.334177] but task is already holding lock:
> [   93.334178] 00207ac0d018 (>lock/1){-.-.}-{2:2}, at: 
> show_workqueue_state+0x288/0x3a8
> [   93.334184]
> [   93.334186] which lock already depends on the new lock.
> [   93.334187]
> [   93.334188]
> [   93.334189] the existing dependency chain (in reverse order) is:
> [   93.334190]
> [   93.334191] -> #2 (>lock/1){-.-.}-{2:2}:
> [   93.334197]_raw_spin_lock+0x5c/0x78
> [   93.334198]__queue_work+0x124/0x7c8
> [   93.334199]queue_work_on+0xd0/0xf0
> [   93.334200]tty_flip_buffer_push+0x3c/0x48
> [   93.334202]serial8250_rx_chars+0x74/0x88
> [   93.334203]fsl8250_handle_irq+0x15c/0x1a0
> [   93.334204]serial8250_interrupt+0x70/0xb8
> [   93.334206]__handle_irq_event_percpu+0xe0/0x478
> [   93.334207]handle_irq_event_percpu+0x40/0x98
> [   93.334208]handle_irq_event+0x4c/0xd0
> [   93.334209]handle_fasteoi_irq+0xb4/0x158
> [   93.334211]generic_handle_irq+0x3c/0x58
> [   93.334212]__handle_domain_irq+0x68/0xc0
> [   93.334213]gic_handle_irq+0x6c/0x160
> [   93.334214]el1_irq+0xbc/0x180
> [   93.334216]cpuidle_enter_state+0xb4/0x4f8
> [   93.334217]cpuidle_enter+0x3c/0x50
> [   93.334218]call_cpuidle+0x44/0x78
> [   93.334219]do_idle+0x228/0x2c8
> [   93.334220]cpu_startup_entry+0x2c/0x48
> [   93.334222]rest_init+0x1ac/0x280
> [   93.334223]arch_call_rest_init+0x14/0x1c
> [   93.334224]start_kernel+0x4ec/0x524
> [   93.334225]
> [   93.334226] -> #1 (>lock#2){-.-.}-{2:2}:
> [   93.334232]_raw_spin_lock_irqsave+0x78/0xa0
> [   93.334233]serial8250_console_write+0x1f4/0x348
> [   93.334234]univ8250_console_write+0x44/0x58
> [   93.334235]console_unlock+0x480/0x678
> [   93.334237]vprintk_emit+0x188/0x370
> [   93.334238]vprintk_default+0x48/0x58
> [   93.334239]vprintk_func+0xf0/0x238
> [   93.334240]printk+0x74/0x94
> [   93.334241]register_console+0x1a0/0x300
> [   93.334243]uart_add_one_port+0x4a0/0x4e0
> [   93.334244]

[PATCH] PCI: dwc: pci-dra7xx: handle failure case of pm_runtime_get_sync

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/pci/controller/dwc/pci-dra7xx.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
b/drivers/pci/controller/dwc/pci-dra7xx.c
index 3b0e58f2de58..83986f5f2be7 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -932,7 +932,7 @@ static int __init dra7xx_pcie_probe(struct platform_device 
*pdev)
ret = pm_runtime_get_sync(dev);
if (ret < 0) {
dev_err(dev, "pm_runtime_get_sync failed\n");
-   goto err_get_sync;
+   goto err_gpio;
}
 
reset = devm_gpiod_get_optional(dev, NULL, GPIOD_OUT_HIGH);
@@ -1001,8 +1001,6 @@ static int __init dra7xx_pcie_probe(struct 
platform_device *pdev)
 
 err_gpio:
pm_runtime_put(dev);
-
-err_get_sync:
pm_runtime_disable(dev);
dra7xx_pcie_disable_phy(dra7xx);
 
-- 
2.17.1

[PATCH] block: Fix use-after-free in blkdev_get()

2020-06-04 Thread Jason Yan

In blkdev_get() we call __blkdev_get() to do some internal jobs and if
there is some errors in __blkdev_get(), the bdput() is called which
means we have released the refcount of the bdev (actually the refcount of
the bdev inode). This means we cannot access bdev after that point. But
accually bdev is still accessed in blkdev_get() after calling
__blkdev_get(). This may leads to use-after-free if the refcount is the
last one we released in __blkdev_get(). Let's take a look at the
following scenerio:

  CPU0CPU1CPU2
blkdev_open blkdev_open   Remove disk
  bd_acquire
  blkdev_get
__blkdev_get  del_gendisk
bdev_unhash_inode
  bd_acquire  bdev_get_gendisk
bd_forget   failed because of unhashed
  bdput
  bdput (the last one)
bdev_evict_inode

access bdev => use after free

[  459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0
[  459.351190] Read of size 8 at addr 88806c815a80 by task 
syz-executor.0/20132
[  459.352347]
[  459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2
[  459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[  459.354947] Call Trace:
[  459.355337]  dump_stack+0x111/0x19e
[  459.355879]  ? __lock_acquire+0x24c1/0x31b0
[  459.356523]  print_address_description+0x60/0x223
[  459.357248]  ? __lock_acquire+0x24c1/0x31b0
[  459.357887]  kasan_report.cold+0xae/0x2d8
[  459.358503]  __lock_acquire+0x24c1/0x31b0
[  459.359120]  ? _raw_spin_unlock_irq+0x24/0x40
[  459.359784]  ? lockdep_hardirqs_on+0x37b/0x580
[  459.360465]  ? _raw_spin_unlock_irq+0x24/0x40
[  459.361123]  ? finish_task_switch+0x125/0x600
[  459.361812]  ? finish_task_switch+0xee/0x600
[  459.362471]  ? mark_held_locks+0xf0/0xf0
[  459.363108]  ? __schedule+0x96f/0x21d0
[  459.363716]  lock_acquire+0x111/0x320
[  459.364285]  ? blkdev_get+0xce/0xbe0
[  459.364846]  ? blkdev_get+0xce/0xbe0
[  459.365390]  __mutex_lock+0xf9/0x12a0
[  459.365948]  ? blkdev_get+0xce/0xbe0
[  459.366493]  ? bdev_evict_inode+0x1f0/0x1f0
[  459.367130]  ? blkdev_get+0xce/0xbe0
[  459.367678]  ? destroy_inode+0xbc/0x110
[  459.368261]  ? mutex_trylock+0x1a0/0x1a0
[  459.368867]  ? __blkdev_get+0x3e6/0x1280
[  459.369463]  ? bdev_disk_changed+0x1d0/0x1d0
[  459.370114]  ? blkdev_get+0xce/0xbe0
[  459.370656]  blkdev_get+0xce/0xbe0
[  459.371178]  ? find_held_lock+0x2c/0x110
[  459.371774]  ? __blkdev_get+0x1280/0x1280
[  459.372383]  ? lock_downgrade+0x680/0x680
[  459.373002]  ? lock_acquire+0x111/0x320
[  459.373587]  ? bd_acquire+0x21/0x2c0
[  459.374134]  ? do_raw_spin_unlock+0x4f/0x250
[  459.374780]  blkdev_open+0x202/0x290
[  459.375325]  do_dentry_open+0x49e/0x1050
[  459.375924]  ? blkdev_get_by_dev+0x70/0x70
[  459.376543]  ? __x64_sys_fchdir+0x1f0/0x1f0
[  459.377192]  ? inode_permission+0xbe/0x3a0
[  459.377818]  path_openat+0x148c/0x3f50
[  459.378392]  ? kmem_cache_alloc+0xd5/0x280
[  459.379016]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.379802]  ? path_lookupat.isra.0+0x900/0x900
[  459.380489]  ? __lock_is_held+0xad/0x140
[  459.381093]  do_filp_open+0x1a1/0x280
[  459.381654]  ? may_open_dev+0xf0/0xf0
[  459.382214]  ? find_held_lock+0x2c/0x110
[  459.382816]  ? lock_downgrade+0x680/0x680
[  459.383425]  ? __lock_is_held+0xad/0x140
[  459.384024]  ? do_raw_spin_unlock+0x4f/0x250
[  459.384668]  ? _raw_spin_unlock+0x1f/0x30
[  459.385280]  ? __alloc_fd+0x448/0x560
[  459.385841]  do_sys_open+0x3c3/0x500
[  459.386386]  ? filp_open+0x70/0x70
[  459.386911]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  459.387610]  ? trace_hardirqs_off_caller+0x55/0x1c0
[  459.388342]  ? do_syscall_64+0x1a/0x520
[  459.388930]  do_syscall_64+0xc3/0x520
[  459.389490]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  459.390248] RIP: 0033:0x416211
[  459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83
04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f
   05 <48> 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d
  01
[  459.393483] RSP: 002b:7fe45dfe9a60 EFLAGS: 0293 ORIG_RAX: 
0002
[  459.394610] RAX: ffda RBX: 7fe45dfea6d4 RCX: 00416211
[  459.395678] RDX: 7fe45dfe9b0a RSI: 0002 RDI: 7fe45dfe9b00
[  459.396758] RBP: 0076bf20 R08:  R09: 000a
[  459.397930] R10: 0075 R11: 0293 R12: 
[  459.399022] R13: 0bd9 R14: 004cdb80 R15: 0076bf2c
[  459.400168]
[  459.400430] Allocated by task 20132:
[  459.401038]  kasan_kmalloc+0xbf/0xe0
[  459.401652]  kmem_cache_alloc+0xd5/0x280
[  459.402330]  bdev_alloc_inode+0x18/0x40
[  459.402970]  alloc_inode+0x5f/0x180
[  459.403510]  iget5_locked+0x57/0xd0
[  459.404095]  bdget+0x94/0x4e0
[

[PATCH] PCI: qcom: handle pm_runtime_get_sync failure case

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/pci/controller/dwc/pcie-qcom.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c 
b/drivers/pci/controller/dwc/pcie-qcom.c
index 138e1a2d21cc..48c434e6e915 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -1339,10 +1339,8 @@ static int qcom_pcie_probe(struct platform_device *pdev)
 
pm_runtime_enable(dev);
ret = pm_runtime_get_sync(dev);
-   if (ret < 0) {
-   pm_runtime_disable(dev);
-   return ret;
-   }
+   if (ret < 0)
+   goto err_pm_runtime_put;
 
pci->dev = dev;
pci->ops = _pcie_ops;
-- 
2.17.1

Re: [PATCH] sata_rcar: handle pm_runtime_get_sync failure cases

2020-06-04 Thread Jens Axboe

On 6/4/20 9:06 PM, Navid Emamdoost wrote:
> Calling pm_runtime_get_sync increments the counter even in case of
> failure, causing incorrect ref count. Call pm_runtime_put if
> pm_runtime_get_sync fails.

Applied, thanks.

-- 
Jens Axboe

Re: [PATCH] loop: Fix wrong masking of status flags

2020-06-04 Thread Jens Axboe

On 6/4/20 2:25 PM, Martijn Coenen wrote:
> In faf1d25440d6, loop_set_status() now assigns lo_status directly from
> the passed in lo_flags, but then fixes it up by masking out flags that
> can't be set by LOOP_SET_STATUS; unfortunately the mask was negated.
> 
> Re-ran all ltp ioctl_loop tests, and they all passed.
> 
> Pass run of the previously failing one:
> 
> tst_test.c:1247: INFO: Timeout per run is 0h 05m 00s
> tst_device.c:88: INFO: Found free device 0 '/dev/loop0'
> ioctl_loop01.c:49: PASS: /sys/block/loop0/loop/partscan = 0
> ioctl_loop01.c:50: PASS: /sys/block/loop0/loop/autoclear = 0
> ioctl_loop01.c:51: PASS: /sys/block/loop0/loop/backing_file =
> '/tmp/ZRJ6H4/test.img'
> ioctl_loop01.c:65: PASS: get expected lo_flag 12
> ioctl_loop01.c:67: PASS: /sys/block/loop0/loop/partscan = 1
> ioctl_loop01.c:68: PASS: /sys/block/loop0/loop/autoclear = 1
> ioctl_loop01.c:77: PASS: access /dev/loop0p1 succeeds
> ioctl_loop01.c:83: PASS: access /sys/block/loop0/loop0p1 succeeds
> 
> Summary:
> passed   8
> failed   0
> skipped  0
> warnings 0

Applied, thanks.

-- 
Jens Axboe

[PATCH] PCI: tegra: handle failure case of pm_runtime_get_sync

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/pci/controller/pci-tegra.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/controller/pci-tegra.c 
b/drivers/pci/controller/pci-tegra.c
index 3e64ba6a36a8..3d4b448fd8df 100644
--- a/drivers/pci/controller/pci-tegra.c
+++ b/drivers/pci/controller/pci-tegra.c
@@ -2712,6 +2712,7 @@ static int tegra_pcie_probe(struct platform_device *pdev)
err = pm_runtime_get_sync(pcie->dev);
if (err < 0) {
dev_err(dev, "fail to enable pcie controller: %d\n", err);
+   pm_runtime_put_sync(pcie->dev);
goto teardown_msi;
}
 
-- 
2.17.1

Re: [PATCH] iomap: Handle I/O errors gracefully in page_mkwrite

2020-06-04 Thread Dave Chinner

On Thu, Jun 04, 2020 at 07:24:51PM -0700, Matthew Wilcox wrote:
> On Fri, Jun 05, 2020 at 10:31:59AM +1000, Dave Chinner wrote:
> > On Thu, Jun 04, 2020 at 04:50:50PM -0700, Matthew Wilcox wrote:
> > > > Sure, but that's not really what I was asking: why isn't this
> > > > !uptodate state caught before the page fault code calls
> > > > ->page_mkwrite? The page fault code has a reference to the page,
> > > > after all, and in a couple of paths it even has the page locked.
> > > 
> > > If there's already a PTE present, then the page fault code doesn't
> > > check the uptodate bit.  Here's the path I'm looking at:
> > > 
> > > do_wp_page()
> > >  -> vm_normal_page()
> > >  -> wp_page_shared()
> > >  -> do_page_mkwrite()
> > > 
> > > I don't see anything in there that checked Uptodate.
> > 
> > Yup, exactly the code I was looking at when I asked this question.
> > The kernel has invalidated the contents of a page, yet we still have
> > it mapped into userspace as containing valid contents, and we don't
> > check it at all when userspace generates a protection fault on the
> > page?
> 
> Right.  The iomap error path only clears PageUptodate.  It doesn't go
> to the effort of unmapping the page from userspace, so userspace has a
> read-only view of a !Uptodate page.

Hmmm - did you miss the ->discard_page() callout just before we call
ClearPageUptodate() on error in iomap_writepage_map()? That results
in XFS calling iomap_invalidatepage() on the page, which 

/me sighs as he realises that ->invalidatepage doesn't actually
invalidate page mappings but only clears the page dirty state and
releases filesystem references to the page.

Yay. We leave -invalidated page cache pages- mapped into userspace,
and page faults on those pages don't catch access to invalidated
pages.

Geez, we really suck at this whole software thing, don't we?

It's not clear to me that we can actually unmap those pages safely
in a race free manner from this code - can we actually do that from
the page writeback path?

> > > I think the iomap code is the only filesystem which clears PageUptodate
> > > on errors. 
> > 
> > I don't think you looked very hard. A quick scan shows at least
> > btrfs, f2fs, hostfs, jffs2, reiserfs, vboxfs and anything using the
> > iomap path will call ClearPageUptodate() on a write IO error.
> 
> I'll give you btrfs and jffs2, but I don't think it's true for f2fs.
> The only other filesystem using the iomap bufferd IO paths today
> is zonefs, afaik.

gfs2 as well.

-Dave.
-- 
Dave Chinner
da...@fromorbit.com

[PATCH] sata_rcar: handle pm_runtime_get_sync failure cases

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/ata/sata_rcar.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/ata/sata_rcar.c b/drivers/ata/sata_rcar.c
index 980aacdbcf3b..141ac600b64c 100644
--- a/drivers/ata/sata_rcar.c
+++ b/drivers/ata/sata_rcar.c
@@ -907,7 +907,7 @@ static int sata_rcar_probe(struct platform_device *pdev)
pm_runtime_enable(dev);
ret = pm_runtime_get_sync(dev);
if (ret < 0)
-   goto err_pm_disable;
+   goto err_pm_put;
 
host = ata_host_alloc(dev, 1);
if (!host) {
@@ -937,7 +937,6 @@ static int sata_rcar_probe(struct platform_device *pdev)
 
 err_pm_put:
pm_runtime_put(dev);
-err_pm_disable:
pm_runtime_disable(dev);
return ret;
 }
@@ -991,8 +990,10 @@ static int sata_rcar_resume(struct device *dev)
int ret;
 
ret = pm_runtime_get_sync(dev);
-   if (ret < 0)
+   if (ret < 0) {
+   pm_runtime_put(dev);
return ret;
+   }
 
if (priv->type == RCAR_GEN3_SATA) {
sata_rcar_init_module(priv);
@@ -1017,8 +1018,10 @@ static int sata_rcar_restore(struct device *dev)
int ret;
 
ret = pm_runtime_get_sync(dev);
-   if (ret < 0)
+   if (ret < 0) {
+   pm_runtime_put(dev);
return ret;
+   }
 
sata_rcar_setup_port(host);
 
-- 
2.17.1

Re: [PATCH] checkpatch: Avoid missing typo suggestions

2020-06-04 Thread Kees Cook

On Thu, Jun 04, 2020 at 06:02:05PM -0700, Joe Perches wrote:
> Huh?  Did you test this?

I didn't, no. I was going off my earlier discoveries about how the
"msdos" thing got parsed weird. My apologies!

> $ ./scripts/checkpatch.pl --strict -f test_spell.c 
> CHECK: 'Cambridg' may be misspelled - perhaps 'Cambridge'?
> #4: FILE: test_spell.c:4:
> + * Cambridg

Thanks for sorting this (and me) out! :)

Reviewed-by: Kees Cook 

-- 
Kees Cook

[PATCH 1/1] docs: dev-tools: coccinelle: underlines

2020-06-04 Thread Heinrich Schuchardt

Underline lengths should match the lengths of headings to avoid build
warnings with Sphinx.

Signed-off-by: Heinrich Schuchardt 
---
 Documentation/dev-tools/coccinelle.rst | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/dev-tools/coccinelle.rst 
b/Documentation/dev-tools/coccinelle.rst
index 00a3409b0c28..70274c3f5f5a 100644
--- a/Documentation/dev-tools/coccinelle.rst
+++ b/Documentation/dev-tools/coccinelle.rst
@@ -14,7 +14,7 @@ many uses in kernel development, including the application of 
complex,
 tree-wide patches and detection of problematic programming patterns.

 Getting Coccinelle

+--

 The semantic patches included in the kernel use features and options
 which are provided by Coccinelle version 1.0.0-rc11 and above.
@@ -56,7 +56,7 @@ found at:
 https://github.com/coccinelle/coccinelle/blob/master/install.txt

 Supplemental documentation

+--

 For supplemental documentation refer to the wiki:

@@ -128,7 +128,7 @@ To enable verbose messages set the V= variable, for 
example::
make coccicheck MODE=report V=1

 Coccinelle parallelization

+--

 By default, coccicheck tries to run as parallel as possible. To change
 the parallelism, set the J= variable. For example, to run across 4 CPUs::
@@ -333,7 +333,7 @@ as an example if requiring at least Coccinelle >= 1.0.5::
// Requires: 1.0.5

 Proposing new semantic patches

+--

 New semantic patches can be proposed and submitted by kernel
 developers. For sake of clarity, they should be organized in the
--
2.26.2

[PATCH] sdhci: tegra: Add comment for PADCALIB and PAD_CONTROL NVQUIRKS

2020-06-04 Thread Sowjanya Komatineni

This patch adds comments about NVQUIRKS HAS_PADCALIB and NEEDS_PAD_CONTROL.

Signed-off-by: Sowjanya Komatineni 
---
 drivers/mmc/host/sdhci-tegra.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
index 3a372ab..0a3f9d0 100644
--- a/drivers/mmc/host/sdhci-tegra.c
+++ b/drivers/mmc/host/sdhci-tegra.c
@@ -96,7 +96,16 @@
 #define NVQUIRK_ENABLE_SDR50   BIT(3)
 #define NVQUIRK_ENABLE_SDR104  BIT(4)
 #define NVQUIRK_ENABLE_DDR50   BIT(5)
+/*
+ * HAS_PADCALIB NVQUIRK is for SoC's supporting auto calibration of pads
+ * drive strength.
+ */
 #define NVQUIRK_HAS_PADCALIB   BIT(6)
+/*
+ * NEEDS_PAD_CONTROL NVQUIRK is for SoC's having separate 3V3 and 1V8 pads.
+ * 3V3/1V8 pad selection happens through pinctrl state selection depending
+ * on the signaling mode.
+ */
 #define NVQUIRK_NEEDS_PAD_CONTROL  BIT(7)
 #define NVQUIRK_DIS_CARD_CLK_CONFIG_TAPBIT(8)
 #define NVQUIRK_CQHCI_DCMD_R1B_CMD_TIMING  BIT(9)
-- 
2.7.4

[PATCH] gpio: arizona: put pm_runtime in case of failure

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count if pm_runtime_put is not called in
error handling paths. Call pm_runtime_put if pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/gpio/gpio-arizona.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpio/gpio-arizona.c b/drivers/gpio/gpio-arizona.c
index 7520a13b4c7c..5bda38e0780f 100644
--- a/drivers/gpio/gpio-arizona.c
+++ b/drivers/gpio/gpio-arizona.c
@@ -64,6 +64,7 @@ static int arizona_gpio_get(struct gpio_chip *chip, unsigned 
offset)
ret = pm_runtime_get_sync(chip->parent);
if (ret < 0) {
dev_err(chip->parent, "Failed to resume: %d\n", ret);
+   pm_runtime_put_autosuspend(chip->parent);
return ret;
}
 
@@ -72,12 +73,15 @@ static int arizona_gpio_get(struct gpio_chip *chip, 
unsigned offset)
if (ret < 0) {
dev_err(chip->parent, "Failed to drop cache: %d\n",
ret);
+   pm_runtime_put_autosuspend(chip->parent);
return ret;
}
 
ret = regmap_read(arizona->regmap, reg, );
-   if (ret < 0)
+   if (ret < 0) {
+   pm_runtime_put_autosuspend(chip->parent);
return ret;
+   }
 
pm_runtime_mark_last_busy(chip->parent);
pm_runtime_put_autosuspend(chip->parent);
-- 
2.17.1

Re: [exec] 166d03c9ec: ltp.execveat02.fail

2020-06-04 Thread Kees Cook

On Mon, May 25, 2020 at 05:14:20PM +0800, kernel test robot wrote:
> execveat02.c:64: PASS: execveat() fails as expected: EBADF (9)
> execveat02.c:64: PASS: execveat() fails as expected: EINVAL (22)
> execveat02.c:61: FAIL: execveat() fails unexpectedly, expected: ELOOP: EACCES 
> (13)
> execveat02.c:64: PASS: execveat() fails as expected: ENOTDIR (20)

tl;dr: I think this test is correct, and I think I see a way to improve
the offending patch series to do the right thing.

Okay, the LTP is checking for ELOOP on trying to exec a symlink:

...
 *3) execveat() fails and returns ELOOP if the file identified by dirfd and
 *   pathname is a symbolic link and flag includes AT_SYMLINK_NOFOLLOW.
...
#define TESTDIR "testdir"
#define TEST_APP "execveat_errno"
...
#define TEST_SYMLINK "execveat_symlink"
...
#define TEST_ERL_SYMLINK TESTDIR"/"TEST_SYMLINK
...
sprintf(app_sym_path, "%s/%s", cur_dir_path, TEST_ERL_SYMLINK);
...
SAFE_SYMLINK(TEST_REL_APP, TEST_ERL_SYMLINK);

fd = SAFE_OPEN(TEST_REL_APP, O_PATH);
...
static struct tcase {
int *fd;
char *pathname;
int flag;
int exp_err;
} tcases[] = {
...
{, app_sym_path, AT_SYMLINK_NOFOLLOW, ELOOP},
...
};
...
TEST(execveat(*tc->fd, tc->pathname, argv, environ, tc->flag));

This is testing the exec _of_ a symlink under AT_SYMLINK_NOFOLLOW.

The execve(2) manpage says:

   ELOOP  Too many symbolic links were encountered in resolving
  pathname or  the  name  of  a script or ELF interpreter.

   ELOOP  The maximum recursion limit was reached during recursive
  script interpretation (see "Interpreter scripts", above).
  Before Linux 3.8, the error produced for this case was ENOEXEC.

Which actually doesn't mention this case. open(2) says:

   ELOOP  Too many symbolic links were encountered in resolving pathname.

   ELOOP  pathname was a symbolic link, and flags specified O_NOFOLLOW
  but not O_PATH.

(but O_NOFOLLOW is limited to file creation. linkat(2) lists the AT_*
flags, and applied to openat, this seems to track: attempting to
execat where the final element is a symlink should fail with ELOOP,
though the manpage does warn that this makes it indistinguishable from
symlink loops -- the first item listed in the execve manpage for
ELOOP...)

Regardless, this does seem to be the "correct" result, as opening for
exec or opening just normally should really get the same error code.

The call path for execve looks like this:

do_open_execat()
struct open_flags open_exec_flags = {
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
.acc_mode = MAY_READ | MAY_EXEC,
...
do_filp_open(dfd, filename, open_flags)
path_openat(nameidata, open_flags, flags)
file = alloc_empty_file(open_flags, current_cred());
open_last_lookups(nd, file, open_flags)
step_into(nd, ...)
/* stop on symlink without LOOKUP_FOLLOW */
do_open(nameidata, file, open_flags)
/* new location of FMODE_EXEC vs S_ISREG() test */
may_open(path, acc_mode, open_flag)
/* test for S_IFLNK */
inode_permission(inode, MAY_OPEN | acc_mode)
security_inode_permission(inode, acc_mode)
vfs_open(path, file)
do_dentry_open(file, path->dentry->d_inode, open)
/* old location of FMODE_EXEC vs S_ISREG() test */
security_file_open(f)
open()

The step_into() is what kicks back out without LOOKUP_FOLLOW, so we're
left holding a symlink (S_IFMT inode). In do_open(), there is a set of
checks via may_open() which checks for S_IFMT and rejects it:

switch (inode->i_mode & S_IFMT) {
case S_IFLNK:
return -ELOOP;

So that's the case LTP was testing for.

The patch in -next ("exec: relocate S_ISREG() check")[1], moves the regular
file requirement up before may_open(), for all the reasons mentioned in
the commit log (and the next patch[2]).

When I was originally trying to determine the best place for where the
checks should live, may_open() really did seem like the right place, but I
recognized that it was examining path characteristics (which was good) but
it didn't have the file, and that seemed to be an intentional separation.

What is needed in may_open() would be the "how was this file opened?"
piece of information: file->f_mode & FMODE_EXEC. However, in looking at
this again now, I wonder if it might be possible to use the MAY_EXEC
from the acc_mode? It seems the old check (in do_dentry_open() had no
access to the acc_mode, so it was forced to use the FMODE_EXEC signal
instead.

(I actually think this remains a bit of a design problem: path-based LSMs,
which see

[PATCH] usb: gadget: function: printer: fix use-after-free in __lock_acquire

2020-06-04 Thread qiang.zhang

From: Zqiang 

Fix this by increase object reference count.

BUG: KASAN: use-after-free in __lock_acquire+0x3fd4/0x4180
kernel/locking/lockdep.c:3831
Read of size 8 at addr 8880683b0018 by task syz-executor.0/3377

CPU: 1 PID: 3377 Comm: syz-executor.0 Not tainted 5.6.11 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xce/0x128 lib/dump_stack.c:118
 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374
 __kasan_report+0x131/0x1b0 mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:641
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
 __lock_acquire+0x3fd4/0x4180 kernel/locking/lockdep.c:3831
 lock_acquire+0x127/0x350 kernel/locking/lockdep.c:4488
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x35/0x50 kernel/locking/spinlock.c:159
 printer_ioctl+0x4a/0x110 drivers/usb/gadget/function/f_printer.c:723
 vfs_ioctl fs/ioctl.c:47 [inline]
 ksys_ioctl+0xfb/0x130 fs/ioctl.c:763
 __do_sys_ioctl fs/ioctl.c:772 [inline]
 __se_sys_ioctl fs/ioctl.c:770 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:770
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4531a9
Code: ed 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 bb 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7fd14ad72c78 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 0073bfa8 RCX: 004531a9
RDX: fff9 RSI: 009e RDI: 0003
RBP: 0003 R08:  R09: 
R10:  R11: 0246 R12: 004bbd61
R13: 004d0a98 R14: 7fd14ad736d4 R15: 

Allocated by task 2393:
 save_stack+0x21/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 __kasan_kmalloc.constprop.3+0xa7/0xd0 mm/kasan/common.c:515
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529
 kmem_cache_alloc_trace+0xfa/0x2d0 mm/slub.c:2813
 kmalloc include/linux/slab.h:555 [inline]
 kzalloc include/linux/slab.h:669 [inline]
 gprinter_alloc+0xa1/0x870 drivers/usb/gadget/function/f_printer.c:1416
 usb_get_function+0x58/0xc0 drivers/usb/gadget/functions.c:61
 config_usb_cfg_link+0x1ed/0x3e0 drivers/usb/gadget/configfs.c:444
 configfs_symlink+0x527/0x11d0 fs/configfs/symlink.c:202
 vfs_symlink+0x33d/0x5b0 fs/namei.c:4201
 do_symlinkat+0x11b/0x1d0 fs/namei.c:4228
 __do_sys_symlinkat fs/namei.c:4242 [inline]
 __se_sys_symlinkat fs/namei.c:4239 [inline]
 __x64_sys_symlinkat+0x73/0xb0 fs/namei.c:4239
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 3368:
 save_stack+0x21/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 kasan_set_free_info mm/kasan/common.c:337 [inline]
 __kasan_slab_free+0x135/0x190 mm/kasan/common.c:476
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:485
 slab_free_hook mm/slub.c:1444 [inline]
 slab_free_freelist_hook mm/slub.c:1477 [inline]
 slab_free mm/slub.c:3034 [inline]
 kfree+0xf7/0x410 mm/slub.c:3995
 gprinter_free+0x49/0xd0 drivers/usb/gadget/function/f_printer.c:1353
 usb_put_function+0x38/0x50 drivers/usb/gadget/functions.c:87
 config_usb_cfg_unlink+0x2db/0x3b0 drivers/usb/gadget/configfs.c:485
 configfs_unlink+0x3b9/0x7f0 fs/configfs/symlink.c:250
 vfs_unlink+0x287/0x570 fs/namei.c:4073
 do_unlinkat+0x4f9/0x620 fs/namei.c:4137
 __do_sys_unlink fs/namei.c:4184 [inline]
 __se_sys_unlink fs/namei.c:4182 [inline]
 __x64_sys_unlink+0x42/0x50 fs/namei.c:4182
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at 8880683b
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 24 bytes inside of
 1024-byte region [8880683b, 8880683b0400)
The buggy address belongs to the page:
page:ea0001a0ec00 refcount:1 mapcount:0 mapping:88806c00e300
index:0x8880683b1800 compound_mapcount: 0
flags: 0x1010200(slab|head)
raw: 01010200  00060001 88806c00e300
raw: 8880683b1800 801a 0001 
page dumped because: kasan: bad access detected

Reported-by: Kyungtae Kim 
Signed-off-by: Zqiang 
---
 drivers/usb/gadget/function/f_printer.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/f_printer.c 
b/drivers/usb/gadget/function/f_printer.c
index 9c7ed2539ff7..8ed1295d7e35 100644
--- a/drivers/usb/gadget/function/f_printer.c
+++ b/drivers/usb/gadget/function/f_printer.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -64,7 +65,7 @@ struct printer_dev {
struct usb_gadget   *gadget;
s8

Re: [PATCH v4 7/7] mtd: spi-nor: macronix: Add Octal 8D-8D-8D supports for Macronix mx25uw51245g

2020-06-04 Thread masonccyang



> > > > 
> > > > +#define MXIC_CR2_DUMMY_SET_ADDR 0x300
> > > > +
> > > > +/* Fixup the dummy cycles to device and setup octa_dtr_enable() 
*/
> > > > +static void mx25uw51245g_post_sfdp_fixups(struct spi_nor *nor)
> > > > +{
> > > > +   struct spi_nor_flash_parameter *params = nor->params;
> > > > +   int ret;
> > > > +   u8 rdc, wdc;
> > > > +
> > > > +   ret = spi_nor_read_cr2(nor, MXIC_CR2_DUMMY_SET_ADDR, );
> > > > +   if (ret)
> > > > +  return;
> > > > +
> > > > +   /* Refer to dummy cycle and frequency table(MHz) */
> > > > +   switch (params->dummy_cycles) {
> > > > +   case 10:   /* 10 dummy cycles for 104 MHz */
> > > > +  wdc = 5;
> > > > +  break;
> > > > +   case 12:   /* 12 dummy cycles for 133 MHz */
> > > > +  wdc = 4;
> > > > +  break;
> > > > +   case 16:   /* 16 dummy cycles for 166 MHz */
> > > > +  wdc = 2;
> > > > +  break;
> > > > +   case 18:   /* 18 dummy cycles for 173 MHz */
> > > > +  wdc = 1;
> > > > +  break;
> > > > +   case 20:   /* 20 dummy cycles for 200 MHz */
> > > > +   default:
> > > > +  wdc = 0;
> > > > +   }
> > > 
> > > I don't get the point of this. You already know the fastest the 
> > > mx25uw51245g flash can run at. Why not just use the maximum dummy 
> > > cycles? SPI NOR doesn't know the speed the controller is running at 
so 
> > > the best it can do is use the maximum dummy cycles possible so it 
never 
> > > falls short. Sure, it will be _slightly_ less performance, but we 
will 
> > > be sure to read the correct data, which is much much more important.
> > 
> > In general, 200MHz needs 20 dummy cycles but some powerful device may 
only 
> > 
> > needs 18 dummy cycles or less.
> 
> Yes, but do different mx25uw51245g chips have different dummy cycle 
> requirements? Shouldn't all the chips with the same ID have same 
> performance?
> 

Same chip ID but different grade,
i.e., commercial or industrial grade. 

thanks & best regards,
Mason

CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information 
and/or personal data, which is protected by applicable laws. Please be 
reminded that duplication, disclosure, distribution, or use of this e-mail 
(and/or its attachments) or any part thereof is prohibited. If you receive 
this e-mail in error, please notify us immediately and delete this mail as 
well as its attachment(s) from your system. In addition, please be 
informed that collection, processing, and/or use of personal data is 
prohibited unless expressly permitted by personal data protection laws. 
Thank you for your attention and cooperation.

Macronix International Co., Ltd.

=





CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information and/or 
personal data, which is protected by applicable laws. Please be reminded that 
duplication, disclosure, distribution, or use of this e-mail (and/or its 
attachments) or any part thereof is prohibited. If you receive this e-mail in 
error, please notify us immediately and delete this mail as well as its 
attachment(s) from your system. In addition, please be informed that 
collection, processing, and/or use of personal data is prohibited unless 
expressly permitted by personal data protection laws. Thank you for your 
attention and cooperation.

Macronix International Co., Ltd.

=

[PATCH] gpio: arizona: handle pm_runtime_get_sync failure case

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/gpio/gpio-arizona.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpio/gpio-arizona.c b/drivers/gpio/gpio-arizona.c
index 5640efe5e750..7520a13b4c7c 100644
--- a/drivers/gpio/gpio-arizona.c
+++ b/drivers/gpio/gpio-arizona.c
@@ -106,6 +106,7 @@ static int arizona_gpio_direction_out(struct gpio_chip 
*chip,
ret = pm_runtime_get_sync(chip->parent);
if (ret < 0) {
dev_err(chip->parent, "Failed to resume: %d\n", ret);
+   pm_runtime_put(chip->parent);
return ret;
}
}
-- 
2.17.1

[PATCH] gpio: rcar: handle pm_runtime_get_sync failure case

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/gpio/gpio-rcar.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpio/gpio-rcar.c b/drivers/gpio/gpio-rcar.c
index 7284473c9fe3..eac1582c70da 100644
--- a/drivers/gpio/gpio-rcar.c
+++ b/drivers/gpio/gpio-rcar.c
@@ -250,8 +250,10 @@ static int gpio_rcar_request(struct gpio_chip *chip, 
unsigned offset)
int error;
 
error = pm_runtime_get_sync(p->dev);
-   if (error < 0)
+   if (error < 0) {
+   pm_runtime_put(p->dev);
return error;
+   }
 
error = pinctrl_gpio_request(chip->base + offset);
if (error)
-- 
2.17.1

[PATCH] mfd: arizona: handle pm_runtime_get_sync failure case

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put_sync if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/mfd/arizona-core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/mfd/arizona-core.c b/drivers/mfd/arizona-core.c
index f73cf76d1373..5b3191b6534a 100644
--- a/drivers/mfd/arizona-core.c
+++ b/drivers/mfd/arizona-core.c
@@ -46,8 +46,10 @@ int arizona_clk32k_enable(struct arizona *arizona)
switch (arizona->pdata.clk32k_src) {
case ARIZONA_32KZ_MCLK1:
ret = pm_runtime_get_sync(arizona->dev);
-   if (ret != 0)
+   if (ret != 0) {
+   pm_runtime_put_sync(arizona->dev);
goto err_ref;
+   }
ret = clk_prepare_enable(arizona->mclk[ARIZONA_MCLK1]);
if (ret != 0) {
pm_runtime_put_sync(arizona->dev);
-- 
2.17.1

[PATCH] io: pressure: zpa2326: handle pm_runtime_get_sync failure

2020-06-04 Thread Navid Emamdoost

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost 
---
 drivers/iio/pressure/zpa2326.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iio/pressure/zpa2326.c b/drivers/iio/pressure/zpa2326.c
index 99dfe33ee402..245f2e2d412b 100644
--- a/drivers/iio/pressure/zpa2326.c
+++ b/drivers/iio/pressure/zpa2326.c
@@ -664,8 +664,10 @@ static int zpa2326_resume(const struct iio_dev *indio_dev)
int err;
 
err = pm_runtime_get_sync(indio_dev->dev.parent);
-   if (err < 0)
+   if (err < 0) {
+   pm_runtime_put(indio_dev->dev.parent);
return err;
+   }
 
if (err > 0) {
/*
-- 
2.17.1

RE: [PATCH v8 00/13] add ecspi ERR009165 for i.mx6/7 soc family

2020-06-04 Thread Robin Gong

On 2020/06/03 Matthias Schiffer  wrote:
> On Wed, 2020-06-03 at 09:50 +, Robin Gong wrote:
> > On 2020/06/03 Matthias Schiffer 
> > wrote:
> >  > On Thu, 2020-05-21 at 04:34 +0800, Robin Gong wrote:
> > > > There is ecspi ERR009165 on i.mx6/7 soc family, which cause FIFO
> > > > transfer to be send twice in DMA mode. Please get more information
> > > > from:
> > > >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> > > > .
> > > >
> > >
> > >
> nxp.com%2Fdocs%2Fen%2Ferrata%2FIMX6DQCE.pdfdata=02%7C01%7C
> > > yibin.g
> > > >
> > >
> > >
> ong%40nxp.com%7C4621358b9be04a79d2d508d80798835b%7C686ea1d3bc2b
> > > 4c6fa92
> > > >
> > >
> > >
> cd99c5c301635%7C0%7C1%7C637267698912634476sdata=hR66H1hP%
> > > 2Fqb6OXe
> > > > w9wpXizY8DiNfZZ1KLwu3Kty87jc%3Dreserved=0. The workaround
> is
> > > > adding new sdma ram script which works in XCH  mode as PIO inside
> > > > sdma instead of SMC mode, meanwhile, 'TX_THRESHOLD' should be 0.
> > > > The issue
> > >
> > > should be exist on all legacy i.mx6/7 soc family before i.mx6ul.
> > > > NXP fix this design issue from i.mx6ul, so newer chips including
> > > > i.mx6ul/ 6ull/6sll do not need this workaroud anymore. All other
> > > > i.mx6/7/8 chips still need this workaroud. This patch set add new
> > > > 'fsl,imx6ul-ecspi'
> > > > for ecspi driver and 'ecspi_fixed' in sdma driver to choose if
> > > > need errata or not.
> > > > The first two reverted patches should be the same issue, though,
> > > > it seems 'fixed' by changing to other shp script. Hope Sean or
> > > > Sascha could have the chance to test this patch set if could fix
> > > > their issues.
> > > > Besides, enable sdma support for i.mx8mm/8mq and fix ecspi1 not
> > > > work on i.mx8mm because the event id is zero.
> > > >
> > > > PS:
> > > >Please get sdma firmware from below linux-firmware and copy it
> > > > to your local rootfs /lib/firmware/imx/sdma.
> > >
> > >
> > > Hello Robin,
> > >
> > > we have tried out this series, and there seems to be an issue with
> > > the
> > > PIO fallback. We are testing on an i.MX6Q board, and our kernel is
> > > a
> > > mostly-unmodified 5.4, on which we backported all SDMA patches from
> > > next-20200602 (imx-sdma.c is identical to next-20200602 version),
> > > and
> > > then applied this whole series.
> > >
> > > We build the SDMA driver as a kernel module, which is loaded by
> > > udev,
> > > so the root filesystem is ready and the SDMA firmware can be
> > > loaded.
> > > The behaviour we're seeing is the following:
> > >
> > > 1. As long as the SDMA driver is not loaded, initializing spi_imx
> > > will
> > > be deferred
> > > 2. imx_sdma is loaded. The SDMA firmware is not yet loaded at this
> > > point
> > > 3. spi_imx is initialized and an SPI-NOR flash is probed. To load
> > > the
> > > BFPT, the driver will attempt to use DMA; this will fail with
> > > EINVAL as
> > > long as the SDMA firmware is not ready, so the fallback to PIO
> > > happens
> > > (4. SDMA firmware is ready, subsequent SPI transfers use DMA)
> > >
> > > The problem happens in step 3: Whenever the driver falls back to
> > > PIO,
> > > the received data is corrupt. The behaviour is specific to the
> > > fallback: When I disable DMA completely via spi_imx.use_dma, or
> > > when
> > > the timing is lucky and the SDMA firmware gets loaded before the
> > > flash
> > > is probed, no corruption can be observed.
> >
> > Thanks Matthias, would you like post log?
> >
> 
> I have attached the following log files:
> - pio.log: DMA disabled via module parameter
> - dma.log: "lucky" timing, SDMA firmware loaded before SPI-NOR probe
> - fallback.log: DMA->PIO fallback
> 
> The logs include some additional log messages:
> - Return value of spi_imx_dma_transfer() before PIO fallback
> - SPI-NOR SFPT dump
> 
> It can be seen that the BFPT data is identical in pio.log and dma.log,
> and differs almost completely in fallback.log. The corrupted data seems
> to be random, or uninitialized memory; it differs with every boot.
Would you please have a try with the attached patch? Thanks.


0006-spi-imx-add-dma_sync_sg_for_device-after-fallback-fr.patch
Description:  0006-spi-imx-add-dma_sync_sg_for_device-after-fallback-fr.patch

Re: [GIT PULL v2] x86/mm changes for v5.8

2020-06-04 Thread Linus Torvalds

On Thu, Jun 4, 2020 at 10:29 AM Ingo Molnar  wrote:
>
> Yeah, sure - here's the updated pull request for the rest:
>
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-2020-06-04
>
># HEAD: bd1de2a7aace4d1d312fb1be264b8fafdb706208 x86/tlb/uv: Add a forward 
> declaration for struct flush_tlb_info

Nope, that still points to

 0fcfdf55db9e1ecf85edd6aa8d0bc78a448cb96a Documentation: Add L1D
flushing Documentation

although it looks like the 'x86/mm' _branch_ does point to that commit
bd1de2a7aace.

You did something odd where you created a new tag, but used the old branch. Hmm?

Linus

[PATCH] hfs: fix null-ptr-deref in hfs_find_init()

2020-06-04 Thread Yang Yingliang

There is a null-ptr-deref in hfs_find_init():

[  107.092729] hfs: continuing without an alternate MDB
[  107.097632] general protection fault, probably for non-canonical address 
0xdc08:  [#1] SMP KASAN PTI
[  107.104679] KASAN: null-ptr-deref in range 
[0x0040-0x0047]
[  107.109100] CPU: 0 PID: 379 Comm: hfs_inject Not tainted 
5.7.0-rc7-1-g24627f5f2973 #897
[  107.114142] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  107.121095] RIP: 0010:hfs_find_init+0x72/0x170
[  107.123609] Code: c1 ea 03 80 3c 02 00 0f 85 e6 00 00 00 4c 8d 65 40 48 c7 
43 18 00 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <0f> b6 04 
02 84 c0 74 08 3c 03 0f 8e a5 00 00 00 8b 45 40 be c0 0c
[  107.134660] RSP: 0018:88810291f3f8 EFLAGS: 00010202
[  107.137897] RAX: dc00 RBX: 88810291f468 RCX: 1110175cdf05
[  107.141874] RDX: 0008 RSI: 88810291f468 RDI: 88810291f480
[  107.145844] RBP:  R08:  R09: ed1020381013
[  107.149431] R10: 88810291f500 R11: ed1020381012 R12: 0040
[  107.152315] R13:  R14: 888101c0814a R15: 88810291f468
[  107.155464] FS:  009ea880() GS:88810c60() 
knlGS:
[  107.159795] CS:  0010 DS:  ES:  CR0: 80050033
[  107.162987] CR2: 5605a19dd284 CR3: 000103a0c006 CR4: 00020ef0
[  107.15] Call Trace:
[  107.167969]  ? find_held_lock+0x33/0x1c0
[  107.169972]  hfs_ext_read_extent+0x16b/0xb00
[  107.172092]  ? create_page_buffers+0x14e/0x1b0
[  107.174303]  ? hfs_free_extents+0x280/0x280
[  107.176437]  ? lock_downgrade+0x730/0x730
[  107.178272]  hfs_get_block+0x496/0x8a0
[  107.179972]  block_read_full_page+0x241/0x8d0
[  107.181971]  ? hfs_extend_file+0xae0/0xae0
[  107.183814]  ? end_buffer_async_read_io+0x10/0x10
[  107.185954]  ? add_to_page_cache_lru+0x13f/0x1f0
[  107.188006]  ? add_to_page_cache_locked+0x10/0x10
[  107.190175]  do_read_cache_page+0xc6a/0x1180
[  107.192096]  ? generic_file_read_iter+0x4c0/0x4c0
[  107.194234]  ? hfs_btree_open+0x408/0x1000
[  107.196068]  ? lock_downgrade+0x730/0x730
[  107.197926]  ? wake_bit_function+0x180/0x180
[  107.199845]  ? lockdep_init_map_waits+0x267/0x7c0
[  107.201895]  hfs_btree_open+0x455/0x1000
[  107.203479]  hfs_mdb_get+0x122c/0x1ae8
[  107.205065]  ? hfs_mdb_put+0x350/0x350
[  107.206590]  ? queue_work_node+0x260/0x260
[  107.208309]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  107.210227]  ? lockdep_init_map_waits+0x267/0x7c0
[  107.212144]  ? lockdep_init_map_waits+0x267/0x7c0
[  107.213979]  hfs_fill_super+0x9ba/0x1280
[  107.215444]  ? bdev_name.isra.9+0xf1/0x2b0
[  107.217028]  ? hfs_remount+0x190/0x190
[  107.218428]  ? pointer+0x5da/0x710
[  107.219745]  ? file_dentry_name+0xf0/0xf0
[  107.221262]  ? mount_bdev+0xd1/0x330
[  107.222592]  ? vsnprintf+0x7bd/0x1250
[  107.224007]  ? pointer+0x710/0x710
[  107.225332]  ? down_write+0xe5/0x160
[  107.226698]  ? hfs_remount+0x190/0x190
[  107.228120]  ? snprintf+0x91/0xc0
[  107.229388]  ? vsprintf+0x10/0x10
[  107.230628]  ? sget+0x3af/0x4a0
[  107.231848]  ? hfs_remount+0x190/0x190
[  107.233300]  mount_bdev+0x26e/0x330
[  107.234611]  ? hfs_statfs+0x540/0x540
[  107.236015]  legacy_get_tree+0x101/0x1f0
[  107.237431]  ? security_capable+0x58/0x90
[  107.238832]  vfs_get_tree+0x89/0x2d0
[  107.240082]  ? ns_capable_common+0x5c/0xd0
[  107.241521]  do_mount+0xd8a/0x1720
[  107.242727]  ? lock_downgrade+0x730/0x730
[  107.244116]  ? copy_mount_string+0x20/0x20
[  107.245557]  ? _copy_from_user+0xbe/0x100
[  107.246967]  ? memdup_user+0x47/0x70
[  107.248212]  __x64_sys_mount+0x162/0x1b0
[  107.249537]  do_syscall_64+0xa5/0x4f0
[  107.250742]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  107.252369] RIP: 0033:0x44e8ea
[  107.253360] Code: 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 
0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
[  107.259240] RSP: 002b:7ffd910e4c28 EFLAGS: 0207 ORIG_RAX: 
00a5
[  107.261668] RAX: ffda RBX: 00400400 RCX: 0044e8ea
[  107.263920] RDX: 0049321e RSI: 00493222 RDI: 7ffd910e4d00
[  107.266177] RBP: 7ffd910e5d10 R08:  R09: 000a
[  107.268451] R10: 0001 R11: 0207 R12: 00401c40
[  107.270721] R13:  R14: 006ba018 R15: 
[  107.273025] Modules linked in:
[  107.274029] Dumping ftrace buffer:
[  107.275121](ftrace buffer empty)
[  107.276370] ---[ end trace c5e0b9d684f3570e ]---

We need check tree in hfs_find_init().

https://lore.kernel.org/linux-fsdevel/20180419024358.ga5...@bombadil.infradead.org/
https://marc.info/?l=linux-fsdevel=152406881024567=2
References: CVE-2018-12928
Signed-off-by:

Re: [PATCH v3 2/6] PCI: uniphier: Add misc interrupt handler to invoke PME and AER

2020-06-04 Thread Kunihiko Hayashi


Hi Marc,

On 2020/06/04 19:11, Marc Zyngier wrote:

On 2020-06-04 10:43, Kunihiko Hayashi wrote:

[...]


-static void uniphier_pcie_irq_handler(struct irq_desc *desc)
+static void uniphier_pcie_misc_isr(struct pcie_port *pp)
 {
-    struct pcie_port *pp = irq_desc_get_handler_data(desc);
 struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
 struct uniphier_pcie_priv *priv = to_uniphier_pcie(pci);
-    struct irq_chip *chip = irq_desc_get_chip(desc);
-    unsigned long reg;
-    u32 val, bit, virq;
+    u32 val, virq;

-    /* INT for debug */
 val = readl(priv->base + PCL_RCV_INT);

 if (val & PCL_CFG_BW_MGT_STATUS)
 dev_dbg(pci->dev, "Link Bandwidth Management Event\n");
+
 if (val & PCL_CFG_LINK_AUTO_BW_STATUS)
 dev_dbg(pci->dev, "Link Autonomous Bandwidth Event\n");
-    if (val & PCL_CFG_AER_RC_ERR_MSI_STATUS)
-    dev_dbg(pci->dev, "Root Error\n");
-    if (val & PCL_CFG_PME_MSI_STATUS)
-    dev_dbg(pci->dev, "PME Interrupt\n");
+
+    if (pci_msi_enabled()) {


This checks whether the kernel supports MSIs. Not that they are
enabled in your controller. Is that really what you want to do?


The below two status bits are valid when the interrupt for MSI is asserted.
That is, pci_msi_enabled() is wrong.

I'll modify the function to check the two bits only if this function is
called from MSI handler.




+    if (val & PCL_CFG_AER_RC_ERR_MSI_STATUS) {
+    dev_dbg(pci->dev, "Root Error Status\n");
+    virq = irq_linear_revmap(pp->irq_domain, 0);
+    generic_handle_irq(virq);
+    }
+
+    if (val & PCL_CFG_PME_MSI_STATUS) {
+    dev_dbg(pci->dev, "PME Interrupt\n");
+    virq = irq_linear_revmap(pp->irq_domain, 0);
+    generic_handle_irq(virq);
+    }


These two cases do the exact same thing, calling the same interrupt.
What is the point of dealing with them independently?


Both PME and AER are asserted from MSI-0, and each handler checks its own
status bit in the PCIe register (aer_irq() in pcie/aer.c and pcie_pme_irq()
in pcie/pme.c).
So I think this handler calls generic_handle_irq() for the same MSI-0.


So what is wrong with

     if (val & (PCL_CFG_AER_RC_ERR_MSI_STATUS |
    PCL_CFG_PME_MSI_STATUS)) {
     // handle interrupt
     }

?


No problem.
I'll rewrite it in the same way as yours in handling interrupts.


If you have two handlers for the same interrupt, this is a shared
interrupt and each handler will be called in turn.

Yes, MSI-0 is shared with PME and AER, and it will be like that.

Thank you,

---
Best Regards
Kunihiko Hayashi

Re: [PATCH v12 1/2] dt-bindings: drm/bridge: anx7625: MIPI to DP transmitter DT schema

2020-06-04 Thread Xin Ji

On Thu, Jun 04, 2020 at 04:24:02PM -0600, Rob Herring wrote:
> On Thu, 04 Jun 2020 15:56:36 +0800, Xin Ji wrote:
> > anx7625: MIPI to DP transmitter DT schema
> > 
> > Signed-off-by: Xin Ji 
> > ---
> >  .../bindings/display/bridge/analogix,anx7625.yaml  | 95 
> > ++
> >  1 file changed, 95 insertions(+)
> >  create mode 100644 
> > Documentation/devicetree/bindings/display/bridge/analogix,anx7625.yaml
> > 
> 
> 
> Please add Acked-by/Reviewed-by tags when posting new versions. However,
> there's no need to repost patches *only* to add the tags. The upstream
> maintainer will do that for acks received on the version they apply.
> 
> If a tag was not added on purpose, please state why and what changed.

Hi Rob Herring, thanks for your comment. I'll add tags in the next
versions.

Thanks,
Xin

Re: [PATCH 0/2] overlayfs: C/R enhancements

2020-06-04 Thread Amir Goldstein

On Fri, Jun 5, 2020 at 12:34 AM Alexander Mikhalitsyn
 wrote:
>
> Hello,
>
> >But overlayfs won't accept these "output only" options as input args,
> which is a problem.
>
> Will it be problematic if we simply ignore "lowerdir_mnt_id" and 
> "upperdir_mnt_id" options in ovl_parse_opt()?
>

That would solve this small problem.

> >Wouldn't it be better for C/R to implement mount options
> that overlayfs can parse and pass it mntid and fhandle instead
> of paths?
>
> Problem is that we need to know on C/R "dump stage" which mounts are used on 
> lower layers and upper layer. Most likely I don't understand something but I 
> can't catch how "mount-time" options will help us.

As you already know from inotify/fanotify C/R fhandle is timeless, so
there would be no distinction between mount time and dump time.
About mnt_id, your patches will cause the original mount-time mounts to be busy.
That is a problem as well.

I think you should describe the use case is more details.
Is your goal to C/R any overlayfs mount that the process has open
files on? visible to process?
For NFS export, we use the persistent descriptor {uuid;fhandle}
(a.k.a. struct ovl_fh) to encode
an underlying layer object.

CRIU can look for an existing mount to a filesystem with uuid as restore stage
(or even mount this filesystem) and use open_by_handle_at() to open a
path to layer.
After mounting overlay, that mount to underlying fs can even be discarded.

And if this works for you, you don't have to export the layers ovl_fh in
/proc/mounts, you can export them in numerous other ways.
One way from the top of my head, getxattr on overlay root dir.
"trusted.overlay" xattr is anyway a reserved prefix, so "trusted.overlay.layers"
for example could work.

Thanks,
Amir.

Re: [PATCH v12 2/2] drm/bridge: anx7625: Add anx7625 MIPI DSI/DPI to DP

2020-06-04 Thread Xin Ji

On Thu, Jun 04, 2020 at 11:08:05AM +0300, Laurent Pinchart wrote:
> Hello Xin,
> 
> Thank you for the patch.
> 
> On Thu, Jun 04, 2020 at 03:58:05PM +0800, Xin Ji wrote:
> > The ANX7625 is an ultra-low power 4K Mobile HD Transmitter designed
> > for portable device. It converts MIPI DSI/DPI to DisplayPort 1.3 4K.
> > 
> > Signed-off-by: Xin Ji 
> > ---
> >  drivers/gpu/drm/bridge/analogix/Kconfig   |9 +
> >  drivers/gpu/drm/bridge/analogix/Makefile  |1 +
> >  drivers/gpu/drm/bridge/analogix/anx7625.c | 1961 
> > +
> >  drivers/gpu/drm/bridge/analogix/anx7625.h |  397 ++
> >  4 files changed, 2368 insertions(+)
> >  create mode 100644 drivers/gpu/drm/bridge/analogix/anx7625.c
> >  create mode 100644 drivers/gpu/drm/bridge/analogix/anx7625.h
> > 
> > diff --git a/drivers/gpu/drm/bridge/analogix/Kconfig 
> > b/drivers/gpu/drm/bridge/analogix/Kconfig
> > index e1fa7d8..024ea2a 100644
> > --- a/drivers/gpu/drm/bridge/analogix/Kconfig
> > +++ b/drivers/gpu/drm/bridge/analogix/Kconfig
> > @@ -25,3 +25,12 @@ config DRM_ANALOGIX_ANX78XX
> >  config DRM_ANALOGIX_DP
> > tristate
> > depends on DRM
> > +
> > +config DRM_ANALOGIX_ANX7625
> > +   tristate "Analogix Anx7625 MIPI to DP interface support"
> > +   depends on DRM
> > +   depends on OF
> > +   help
> > + ANX7625 is an ultra-low power 4K mobile HD transmitter
> > + designed for portable devices. It converts MIPI/DPI to
> > + DisplayPort1.3 4K.
> > diff --git a/drivers/gpu/drm/bridge/analogix/Makefile 
> > b/drivers/gpu/drm/bridge/analogix/Makefile
> > index 97669b3..44da392 100644
> > --- a/drivers/gpu/drm/bridge/analogix/Makefile
> > +++ b/drivers/gpu/drm/bridge/analogix/Makefile
> > @@ -1,5 +1,6 @@
> >  # SPDX-License-Identifier: GPL-2.0-only
> >  analogix_dp-objs := analogix_dp_core.o analogix_dp_reg.o 
> > analogix-i2c-dptx.o
> >  obj-$(CONFIG_DRM_ANALOGIX_ANX6345) += analogix-anx6345.o
> > +obj-$(CONFIG_DRM_ANALOGIX_ANX7625) += anx7625.o
> >  obj-$(CONFIG_DRM_ANALOGIX_ANX78XX) += analogix-anx78xx.o
> >  obj-$(CONFIG_DRM_ANALOGIX_DP) += analogix_dp.o
> > diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c 
> > b/drivers/gpu/drm/bridge/analogix/anx7625.c
> > new file mode 100644
> > index 000..f1cc6bb
> > --- /dev/null
> > +++ b/drivers/gpu/drm/bridge/analogix/anx7625.c
> > @@ -0,0 +1,1961 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright(c) 2020, Analogix Semiconductor. All rights reserved.
> > + *
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +#include "anx7625.h"
> > +
> > +/*
> > + * There is a sync issue while access I2C register between AP(CPU) and
> > + * internal firmware(OCM), to avoid the race condition, AP should access
> > + * the reserved slave address before slave address occurs changes.
> > + */
> > +static int i2c_access_workaround(struct anx7625_data *ctx,
> > +struct i2c_client *client)
> > +{
> > +   u8 offset;
> > +   struct device *dev = >dev;
> > +   int ret;
> > +
> > +   if (client == ctx->last_client)
> > +   return 0;
> > +
> > +   ctx->last_client = client;
> > +
> > +   if (client == ctx->i2c.tcpc_client)
> > +   offset = RSVD_00_ADDR;
> > +   else if (client == ctx->i2c.tx_p0_client)
> > +   offset = RSVD_D1_ADDR;
> > +   else if (client == ctx->i2c.tx_p1_client)
> > +   offset = RSVD_60_ADDR;
> > +   else if (client == ctx->i2c.rx_p0_client)
> > +   offset = RSVD_39_ADDR;
> > +   else if (client == ctx->i2c.rx_p1_client)
> > +   offset = RSVD_7F_ADDR;
> > +   else
> > +   offset = RSVD_00_ADDR;
> > +
> > +   ret = i2c_smbus_write_byte_data(client, offset, 0x00);
> > +   if (ret < 0)
> > +   DRM_DEV_ERROR(dev,
> > + "fail to access i2c id=%x\n:%x",
> > + client->addr, offset);
> > +
> > +   return ret;
> > +}
> > +
> > +static int anx7625_reg_read(struct anx7625_data *ctx,
> > +   struct i2c_client *client, u8 reg_addr)
> > +{
> > +   int ret;
> > +   struct device *dev = >dev;
> > +
> > +   i2c_access_workaround(ctx, client);
> > +
> > +   ret = i2c_smbus_read_byte_data(client, reg_addr);
> > +   if (ret < 0)
> > +   DRM_DEV_ERROR(dev, "read i2c fail id=%x:%x\n",
> > + client->addr, reg_addr);
> > +
> > +   return ret;
> > +}
> > +
> > +static int anx7625_reg_block_read(struct anx7625_data *ctx,
> > + struct i2c_client *client,
> > + u8 reg_addr, u8 len, u8 *buf)
> > +{
> > +   int ret;
> > +   struct device *dev = >dev;
> > +
> > +

Re: [PATCH][v6] KVM: X86: support APERF/MPERF registers

2020-06-04 Thread Xu, Like


Hi RongQing,

On 2020/6/5 9:44, Li RongQing wrote:

Guest kernel reports a fixed cpu frequency in /proc/cpuinfo,
this is confused to user when turbo is enable, and aperf/mperf
can be used to show current cpu frequency after 7d5905dc14a
"(x86 / CPU: Always show current CPU frequency in /proc/cpuinfo)"
so guest should support aperf/mperf capability

This patch implements aperf/mperf by three mode: none, software
emulation, and pass-through

None: default mode, guest does not support aperf/mperf

s/None/Note


Software emulation: the period of aperf/mperf in guest mode are
accumulated as emulated value

Pass-though: it is only suitable for KVM_HINTS_REALTIME, Because
that hint guarantees we have a 1:1 vCPU:CPU binding and guaranteed
no over-commit.

The flag "KVM_HINTS_REALTIME 0" (in the Documentation/virt/kvm/cpuid.rst)
is claimed as "guest checks this feature bit to determine that vCPUs are never
preempted for an unlimited time allowing optimizations".

I couldn't see its relationship with "1:1 vCPU: pCPU binding".
The patch doesn't check this flag as well for your pass-through purpose.

Thanks,
Like Xu


And a per-VM capability is added to configure aperfmperf mode

Signed-off-by: Li RongQing 
Signed-off-by: Chai Wen 
Signed-off-by: Jia Lina 
---
diff v5:
return error if guest is configured with mperf/aperf, but host cpu has not

diff v4:
fix maybe-uninitialized warning

diff v3:
fix interception of MSR_IA32_MPERF/APERF in svm

diff v2:
support aperfmperf pass though
move common codes to kvm_get_msr_common

diff v1:
1. support AMD, but not test
2. support per-vm capability to enable


  Documentation/virt/kvm/api.rst  | 10 ++
  arch/x86/include/asm/kvm_host.h | 11 +++
  arch/x86/kvm/cpuid.c| 15 ++-
  arch/x86/kvm/svm/svm.c  |  8 
  arch/x86/kvm/vmx/vmx.c  |  6 ++
  arch/x86/kvm/x86.c  | 42 +
  arch/x86/kvm/x86.h  | 15 +++
  include/uapi/linux/kvm.h|  1 +
  8 files changed, 107 insertions(+), 1 deletion(-)

Re: 回复: Re: [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink

2020-06-04 Thread Ruan Shiyang





On 2020/6/5 上午9:30, Dave Chinner wrote:

On Thu, Jun 04, 2020 at 07:51:07AM -0700, Darrick J. Wong wrote:

On Thu, Jun 04, 2020 at 03:37:42PM +0800, Ruan Shiyang wrote:



On 2020/4/28 下午2:43, Dave Chinner wrote:

On Tue, Apr 28, 2020 at 06:09:47AM +, Ruan, Shiyang wrote:


在 2020/4/27 20:28:36, "Matthew Wilcox"  写道:


On Mon, Apr 27, 2020 at 04:47:42PM +0800, Shiyang Ruan wrote:

   This patchset is a try to resolve the shared 'page cache' problem for
   fsdax.

   In order to track multiple mappings and indexes on one page, I
   introduced a dax-rmap rb-tree to manage the relationship.  A dax entry
   will be associated more than once if is shared.  At the second time we
   associate this entry, we create this rb-tree and store its root in
   page->private(not used in fsdax).  Insert (->mapping, ->index) when
   dax_associate_entry() and delete it when dax_disassociate_entry().


Do we really want to track all of this on a per-page basis?  I would
have thought a per-extent basis was more useful.  Essentially, create
a new address_space for each shared extent.  Per page just seems like
a huge overhead.


Per-extent tracking is a nice idea for me.  I haven't thought of it
yet...

But the extent info is maintained by filesystem.  I think we need a way
to obtain this info from FS when associating a page.  May be a bit
complicated.  Let me think about it...


That's why I want the -user of this association- to do a filesystem
callout instead of keeping it's own naive tracking infrastructure.
The filesystem can do an efficient, on-demand reverse mapping lookup
from it's own extent tracking infrastructure, and there's zero
runtime overhead when there are no errors present.


Hi Dave,

I ran into some difficulties when trying to implement the per-extent rmap
tracking.  So, I re-read your comments and found that I was misunderstanding
what you described here.

I think what you mean is: we don't need the in-memory dax-rmap tracking now.
Just ask the FS for the owner's information that associate with one page
when memory-failure.  So, the per-page (even per-extent) dax-rmap is
needless in this case.  Is this right?


Right.  XFS already has its own rmap tree.


*nod*


Based on this, we only need to store the extent information of a fsdax page
in its ->mapping (by searching from FS).  Then obtain the owners of this
page (also by searching from FS) when memory-failure or other rmap case
occurs.


I don't even think you need that much.  All you need is the "physical"
offset of that page within the pmem device (e.g. 'this is the 307th 4k
page == offset 1257472 since the start of /dev/pmem0') and xfs can look
up the owner of that range of physical storage and deal with it as
needed.


Right. If we have the dax device associated with the page that had
the failure, then we can determine the offset of the page into the
block device address space and that's all we need to find the owner
of the page in the filesystem.

Note that there may actually be no owner - the page that had the
fault might land in free space, in which case we can simply zero
the page and clear the error.


OK.  Thanks for pointing out.




So, a fsdax page is no longer associated with a specific file, but with a
FS(or the pmem device).  I think it's easier to understand and implement.


Effectively, yes. But we shouldn't need to actually associate the
page with anything at the filesystem level because it is already
associated with a DAX device at a lower level via a dev_pagemap.
The hardware page fault already runs thought this code
memory_failure_dev_pagemap() before it gets to the DAX code, so
really all we need to is have that function pass us the page, offset
into the device and, say, the struct dax_device associated with that
page so we can get to the filesystem superblock we can then use for
rmap lookups on...



OK.  I was just thinking about how can I execute the FS rmap search from 
the memory-failure.  Thanks again for pointing out. :)



--
Thanks,
Ruan Shiyang.


Cheers,

Dave.

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-04 Thread John Donnelly




On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:

On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:

Hi All,

On Wed, Jun 3, 2020 at 9:03 PM John Donnelly 
wrote:



On Jun 3, 2020, at 8:20 AM, chenzhou  wrote:

Hi,


On 2020/6/3 19:47, Prabhakar Kushwaha wrote:

Hi Chen,

On Tue, Jun 2, 2020 at 8:12 PM John Donnelly 
wrote:


On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
prabhakar.p...@gmail.com> wrote:

On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
john.p.donne...@oracle.com> wrote:

Hi .  See below !


On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma 
wrote:

Hi John,

On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
john.p.donne...@oracle.com> wrote:

Hi,


On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:

Hi Chen,

On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
chenzho...@huawei.com> wrote:

This patch series enable reserving crashkernel above 4G in
arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G,
which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve
crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required,
crash dump kernel
will boot failure because there is no low memory available
for allocation.


We are getting "warn_alloc" [1] warning during boot of kdump
kernel
with bootargs as [2] of primary kernel.
This error observed on ThunderX2  ARM64 platform.

It is observed with latest upstream tag (v5.7-rc3) with this
patch set
and


https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$

Also **without** this patch-set
"


https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$

"

This issue comes whenever crashkernel memory is reserved
after 0xc000_.
More details discussed earlier in


https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
   without

any
solution

This patch-set is expected to solve similar kind of issue.
i.e. low memory is only targeted for DMA, swiotlb; So above
mentioned
observation should be considered/fixed. .

--pk

[1]
[   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
[   30.367696] NET: Registered protocol family 16
[   30.369973] swapper/0: page allocation failure: order:6,
mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
[   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.7.0-rc3+ #121
[   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
[   30.369984] Call trace:
[   30.369989]  dump_backtrace+0x0/0x1f8
[   30.369991]  show_stack+0x20/0x30
[   30.369997]  dump_stack+0xc0/0x10c
[   30.370001]  warn_alloc+0x10c/0x178
[   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0
xb50
[   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
[   30.370008]  alloc_page_interleave+0x24/0x98
[   30.370011]  alloc_pages_current+0xe4/0x108
[   30.370017]  dma_atomic_pool_init+0x44/0x1a4
[   30.370020]  do_one_initcall+0x54/0x228
[   30.370027]  kernel_init_freeable+0x228/0x2cc
[   30.370031]  kernel_init+0x1c/0x110
[   30.370034]  ret_from_fork+0x10/0x18
[   30.370036] Mem-Info:
[   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
[   30.370064]  active_file:0 inactive_file:0
isolated_file:0
[   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
[   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
[   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
[   30.370064]  free:1537719 free_pcp:219 free_cma:0
[   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB
isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB
shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB
writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
writepending:0kB
present:128kB managed:0kB mlocked:0kB kernel_stack:0kB
pagetables:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   30.370084] lowmem_reserve[]: 0 250 6063 6063
[   30.370090] Node 0 DMA32 free:256000kB min:408kB
low:664kB
high:920kB reserved_highatomic:0KB active_anon:0kB
inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB
writepending:0kB
present:269700kB managed:256000kB

Re: [RFC PATCH v4 04/10] vfio/pci: let vfio_pci know number of vendor regions and vendor irqs

2020-06-04 Thread Yan Zhao

On Thu, Jun 04, 2020 at 05:25:15PM +0200, Cornelia Huck wrote:
> On Sun, 17 May 2020 22:49:44 -0400
> Yan Zhao  wrote:
> 
> > This allows a simpler VFIO_DEVICE_GET_INFO ioctl in vendor driver
> > 
> > Cc: Kevin Tian 
> > Signed-off-by: Yan Zhao 
> > ---
> >  drivers/vfio/pci/vfio_pci.c | 23 +--
> >  drivers/vfio/pci/vfio_pci_private.h |  2 ++
> >  include/linux/vfio.h|  3 +++
> >  3 files changed, 26 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 290b7ab55ecf..30137c1c5308 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -105,6 +105,24 @@ void *vfio_pci_vendor_data(void *device_data)
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_pci_vendor_data);
> >  
> > +int vfio_pci_set_vendor_regions(void *device_data, int num_vendor_regions)
> > +{
> > +   struct vfio_pci_device *vdev = device_data;
> > +
> > +   vdev->num_vendor_regions = num_vendor_regions;
> 
> Do we need any kind of sanity check here, in case this is called with a
> bogus value?
>
you are right. it at least needs to be >=0.
maybe type of "unsigned int" is more appropriate for num_vendor_regions.
we don't need to check its max value as QEMU would check it.

> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_pci_set_vendor_regions);
> > +
> > +
> > +int vfio_pci_set_vendor_irqs(void *device_data, int num_vendor_irqs)
> > +{
> > +   struct vfio_pci_device *vdev = device_data;
> > +
> > +   vdev->num_vendor_irqs = num_vendor_irqs;
> 
> Here as well.
yes. will change the type to "unsigned int". 
Thank you for kindly reviewing:)

Yan

> 
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_pci_set_vendor_irqs);
> >  /*
> >   * Our VGA arbiter participation is limited since we don't know anything
> >   * about the device itself.  However, if the device is the only VGA device
> 
> (...)
>

Re: [PATCH] iomap: Handle I/O errors gracefully in page_mkwrite

2020-06-04 Thread Matthew Wilcox

On Fri, Jun 05, 2020 at 10:31:59AM +1000, Dave Chinner wrote:
> On Thu, Jun 04, 2020 at 04:50:50PM -0700, Matthew Wilcox wrote:
> > > Sure, but that's not really what I was asking: why isn't this
> > > !uptodate state caught before the page fault code calls
> > > ->page_mkwrite? The page fault code has a reference to the page,
> > > after all, and in a couple of paths it even has the page locked.
> > 
> > If there's already a PTE present, then the page fault code doesn't
> > check the uptodate bit.  Here's the path I'm looking at:
> > 
> > do_wp_page()
> >  -> vm_normal_page()
> >  -> wp_page_shared()
> >  -> do_page_mkwrite()
> > 
> > I don't see anything in there that checked Uptodate.
> 
> Yup, exactly the code I was looking at when I asked this question.
> The kernel has invalidated the contents of a page, yet we still have
> it mapped into userspace as containing valid contents, and we don't
> check it at all when userspace generates a protection fault on the
> page?

Right.  The iomap error path only clears PageUptodate.  It doesn't go
to the effort of unmapping the page from userspace, so userspace has a
read-only view of a !Uptodate page.

> > I think the iomap code is the only filesystem which clears PageUptodate
> > on errors. 
> 
> I don't think you looked very hard. A quick scan shows at least
> btrfs, f2fs, hostfs, jffs2, reiserfs, vboxfs and anything using the
> iomap path will call ClearPageUptodate() on a write IO error.

I'll give you btrfs and jffs2, but I don't think it's true for f2fs.
The only other filesystem using the iomap bufferd IO paths today
is zonefs, afaik.

[PATCH] docs: deprecated.rst: Add zero-length and one-element arrays

2020-06-04 Thread Gustavo A. R. Silva

Add zero-length and one-element arrays to the list.

While I continue replacing zero-length and one-element arrays with
flexible-array members, I need a reference to point people to, so
they don't introduce more instances of such arrays. And while here,
add a note to the "open-coded arithmetic in allocator arguments"
section, on the use of struct_size() and the arrays-to-deprecate
mentioned here.

Signed-off-by: Gustavo A. R. Silva 
---
 Documentation/process/deprecated.rst | 84 
 1 file changed, 84 insertions(+)

diff --git a/Documentation/process/deprecated.rst 
b/Documentation/process/deprecated.rst
index 652e2aa02a66c..622c8ac72a615 100644
--- a/Documentation/process/deprecated.rst
+++ b/Documentation/process/deprecated.rst
@@ -85,6 +85,12 @@ Instead, use the helper::
 
header = kzalloc(struct_size(header, item, count), GFP_KERNEL);
 
+NOTE: If you are using struct_size() on a structure containing a zero-length
+or a one-element array as a trailing array member, stop using such arrays
+and switch to `flexible arrays
+`_
+instead.
+
 See array_size(), array3_size(), and struct_size(),
 for more details as well as the related check_add_overflow() and
 check_mul_overflow() family of functions.
@@ -200,3 +206,81 @@ All switch/case blocks must end in one of:
 * continue;
 * goto ;
 * return [expression];
+
+Zero-length and one-element arrays
+--
+Old code in the kernel uses the zero-length and one-element array extensions
+to the C90 standard, but the `preferred mechanism to declare variable-length
+types
+`_
+such as these ones is a `flexible array member`, introduced in C99::
+
+struct something {
+int length;
+char data[];
+};
+
+struct something *instance;
+
+instance = kmalloc(struct_size(instance, data, size), GFP_KERNEL);
+instance->length = size;
+memcpy(instance->data, source, size);
+
+By making use of the mechanism above, we get a compiler error in case the
+flexible array does not occur last in the structure, which helps to prevent
+some kind of `undefined behavior 
`_
 bugs from being inadvertently introduced to
+the codebase.
+
+It is also important to notice that zero-length and one-element arrays pose
+confusion for things like sizeof() and `CONFIG_FORTIFY_SOURCE`. For instance,
+there is no mechanism that warns us that the following application of the
+sizeof() operator to a zero-length array always results in zero::
+
+struct something {
+int length;
+char data[0];
+};
+
+struct something *instance;
+
+instance = kmalloc(struct_size(instance, data, size), GFP_KERNEL);
+instance->length = size;
+memcpy(instance->data, source, size);
+...
+size = sizeof(instance->data);
+
+At the last line of code above, ``size`` turns out to be zero --when one might 
have
+thought differently. Here are a couple examples of this issue `link 1
+`_,
+`link 2
+`_.
+On the other hand, `flexible array members have incomplete type, and so the 
sizeof()
+operator may not be applied 
`_,
+so any  misuse of such  operator will be immediately noticed at build time.
+
+
+With respect to one-element arrays, one has to be acutely aware that `such 
arrays
+occupy at least as much space as a single object of the type
+`_,
+hence they contribute to the size of the enclosing structure. This is prone
+to error every time people want to calculate the total size of dynamic memory
+to allocate for a structure containing an array of this kind as a member::
+
+struct something {
+int length;
+char data[1];
+};
+
+struct something *instance;
+
+instance = kmalloc(struct_size(instance, data, size - 1), GFP_KERNEL);
+instance->length = size;
+memcpy(instance->data, source, size);
+
+In the example above, we had to remember to calculate ``size - 1`` when using
+the struct_size() helper, otherwise we would have --unintentionally-- allocated
+memory for one too many ``data`` objects. The cleanest and least error-prone 
way
+to implement this is through the use of a `flexible array member`, which is
+exemplified at the `beginning

Re: [PATCH 1/2] proc: use subset option to hide some top-level procfs entries

2020-06-04 Thread kernel test robot

Hi Alexey,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20200604]
[cannot apply to kees/for-next/pstore linus/master linux/master v5.7 v5.7-rc7 
v5.7-rc6 v5.7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Alexey-Gladkov/proc-use-subset-option-to-hide-some-top-level-procfs-entries/20200605-040818
base:d4899e5542c15062cc55cac0ca99025bb64edc61
config: arm64-randconfig-c003-20200604 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


coccinelle warnings: (new ones prefixed by >>)

>> fs/proc/generic.c:204:9-10: WARNING: return of 0/1 in function 
>> 'is_pde_visible' with return type bool

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

[RFC PATCH 5/5] scsi: ufs: Prepare HPB read for cached sub-region

2020-06-04 Thread Daejun Park

This patch changes the read I/O to the HPB read I/O.

If the logical address of the read I/O belongs to active sub-region, the
HPB driver modifies the read I/O command to HPB read. It modifies the upiu
command of UFS instead of modifying the existing SCSI command.

In the HPB version 1.0, the maximum read I/O size that can be converted to
HPB read is 4KB.

The dirty map of the active sub-region prevents an incorrect HPB read that
has stale physical page number which is updated by previous write I/O.

Signed-off-by: Daejun Park 
---
 drivers/scsi/ufs/ufshpb.c | 249 ++
 1 file changed, 249 insertions(+)

diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
index f1aa8e7b5ce0..b3e488ef8675 100644
--- a/drivers/scsi/ufs/ufshpb.c
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -46,6 +46,35 @@ static struct ufshpb_driver ufshpb_drv;
 
 static int ufshpb_create_sysfs(struct ufs_hba *hba, struct ufshpb_lu *hpb);
 
+static inline int ufshpb_is_valid_srgn(struct ufshpb_region *rgn,
+struct ufshpb_subregion *srgn)
+{
+   return rgn->rgn_state != HPB_RGN_INACTIVE &&
+   srgn->srgn_state == HPB_SRGN_CLEAN;
+}
+
+static inline bool ufshpb_is_read_cmd(struct scsi_cmnd *cmd)
+{
+   if (cmd->cmnd[0] == READ_10 || cmd->cmnd[0] == READ_16)
+   return true;
+
+   return false;
+}
+
+static inline bool ufshpb_is_write_discard_cmd(struct scsi_cmnd *cmd)
+{
+   if (cmd->cmnd[0] == WRITE_10 || cmd->cmnd[0] == WRITE_16 ||
+   cmd->cmnd[0] == UNMAP)
+   return true;
+
+   return false;
+}
+
+static inline bool ufshpb_is_support_chunk(int transfer_len)
+{
+   return transfer_len <= HPB_MULTI_CHUNK_HIGH;
+}
+
 static inline bool ufshpb_is_general_lun(int lun)
 {
return lun < UFS_UPIU_MAX_UNIT_NUM_ID;
@@ -137,6 +166,225 @@ static inline void ufshpb_lu_put(struct ufshpb_lu *hpb)
put_device(>hpb_lu_dev);
 }
 
+static inline u32 ufshpb_get_lpn(struct scsi_cmnd *cmnd)
+{
+   return blk_rq_pos(cmnd->request) >>
+   (ilog2(cmnd->device->sector_size) - 9);
+}
+
+static inline unsigned int ufshpb_get_len(struct scsi_cmnd *cmnd)
+{
+   return blk_rq_sectors(cmnd->request) >>
+   (ilog2(cmnd->device->sector_size) - 9);
+}
+
+static void ufshpb_set_ppn_dirty(struct ufshpb_lu *hpb, int rgn_idx,
+int srgn_idx, int srgn_offset, int cnt)
+{
+   struct ufshpb_region *rgn;
+   struct ufshpb_subregion *srgn;
+   int set_bit_len;
+   int bitmap_len = hpb->entries_per_srgn;
+
+next_srgn:
+   rgn = hpb->rgn_tbl + rgn_idx;
+   srgn = rgn->srgn_tbl + srgn_idx;
+
+   if ((srgn_offset + cnt) > bitmap_len)
+   set_bit_len = bitmap_len - srgn_offset;
+   else
+   set_bit_len = cnt;
+
+   if (rgn->rgn_state != HPB_RGN_INACTIVE)
+   bitmap_set(srgn->mctx->ppn_dirty, srgn_offset, set_bit_len);
+
+   srgn_offset = 0;
+   if (++srgn_idx == hpb->srgns_per_rgn) {
+   srgn_idx = 0;
+   rgn_idx++;
+   }
+
+   cnt -= set_bit_len;
+   if (cnt > 0)
+   goto next_srgn;
+
+   WARN_ON(cnt < 0);
+}
+
+static bool ufshpb_test_ppn_dirty(struct ufshpb_lu *hpb, int rgn_idx,
+  int srgn_idx, int srgn_offset, int cnt)
+{
+   struct ufshpb_region *rgn;
+   struct ufshpb_subregion *srgn;
+   int bitmap_len = hpb->entries_per_srgn;
+   int i, bit_len;
+
+next_srgn:
+   rgn = hpb->rgn_tbl + rgn_idx;
+   srgn = rgn->srgn_tbl + srgn_idx;
+
+   if (!ufshpb_is_valid_srgn(rgn, srgn))
+   return true;
+
+   /*
+* If the region state is active, mctx must be allocated.
+* In this case, check whether the region is evicted or
+* mctx allcation fail.
+*/
+   WARN_ON(!srgn->mctx);
+
+   if ((srgn_offset + cnt) > bitmap_len)
+   bit_len = bitmap_len - srgn_offset;
+   else
+   bit_len = cnt;
+
+   for (i = 0; i < bit_len; i++) {
+   if (test_bit(srgn_offset + i, srgn->mctx->ppn_dirty))
+   return true;
+   }
+
+   srgn_offset = 0;
+   if (++srgn_idx == hpb->srgns_per_rgn) {
+   srgn_idx = 0;
+   rgn_idx++;
+   }
+
+   cnt -= bit_len;
+   if (cnt > 0)
+   goto next_srgn;
+
+   return false;
+}
+
+static u64 ufshpb_get_ppn(struct ufshpb_lu *hpb,
+ struct ufshpb_map_ctx *mctx, int pos, int *error)
+{
+   u64 *ppn_table;
+   struct page *page;
+   int index, offset;
+
+   index = pos / (PAGE_SIZE / HPB_ENTRY_SIZE);
+   offset = pos % (PAGE_SIZE / HPB_ENTRY_SIZE);
+
+   page = mctx->m_page[index];
+   if (unlikely(!page)) {
+   *error = -ENOMEM;
+   dev_err(>hpb_lu_dev,
+   "error. cannot find page in mctx\n");
+   return

[PATCH] proc: fix boolreturn.cocci warnings

2020-06-04 Thread kernel test robot

From: kernel test robot 

fs/proc/generic.c:204:9-10: WARNING: return of 0/1 in function 'is_pde_visible' 
with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

CC: Alexey Gladkov 
Signed-off-by: kernel test robot 
---

url:
https://github.com/0day-ci/linux/commits/Alexey-Gladkov/proc-use-subset-option-to-hide-some-top-level-procfs-entries/20200605-040818
base:d4899e5542c15062cc55cac0ca99025bb64edc61

 generic.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -201,7 +201,7 @@ bool is_pde_visible(struct proc_fs_info
int i;
 
if (!fs_info->whitelist || pde == _root)
-   return 1;
+   return true;
 
read_lock(_subdir_lock);
 
@@ -218,7 +218,7 @@ bool is_pde_visible(struct proc_fs_info
 
if (ent == pde) {
read_unlock(_subdir_lock);
-   return 1;
+   return true;
}
 
if (!S_ISDIR(ent->mode))
@@ -228,14 +228,14 @@ bool is_pde_visible(struct proc_fs_info
while (de != _root) {
if (ent == de) {
read_unlock(_subdir_lock);
-   return 1;
+   return true;
}
de = de->parent;
}
}
 
read_unlock(_subdir_lock);
-   return 0;
+   return false;
 }
 
 static DEFINE_IDA(proc_inum_ida);

Re: linux-next: build failure after merge of the sound-current tree

2020-06-04 Thread Macpaul Lin

On Fri, 2020-06-05 at 08:43 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the sound-current tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> /home/sfr/next/next/sound/usb/card.c: In function 'snd_usb_autoresume':
> /home/sfr/next/next/sound/usb/card.c:841:29: error: expected ';' before ')' 
> token
>   841 |atomic_dec(>active))
>   | ^
>   | ;
> 
> Caused by commit
> 
>   3398e5c7b038 ("ALSA: usb-audio: Manage auto-pm of all bundled interfaces")
> 
> I have reverted that commit for today.
> 

Sorry I've tested its function by "patch back" to older kernel version
4.14.
After checking the latest patch again, there is indeed a typo here.

Thanks
Macpaul Lin

Re: [RFC PATCH v4 02/10] vfio/pci: macros to generate module_init and module_exit for vendor modules

2020-06-04 Thread Yan Zhao

On Thu, Jun 04, 2020 at 05:01:06PM +0200, Cornelia Huck wrote:
> On Sun, 17 May 2020 22:45:10 -0400
> Yan Zhao  wrote:
> 
> > vendor modules call macro module_vfio_pci_register_vendor_handler to
> > generate module_init and module_exit.
> > It is necessary to ensure that vendor modules always call
> > vfio_pci_register_vendor_driver() on driver loading and
> > vfio_pci_unregister_vendor_driver on driver unloading,
> > because
> > (1) at compiling time, there's only a dependency of vendor modules on
> > vfio_pci.
> > (2) at runtime,
> > - vendor modules add refs of vfio_pci on a successful calling of
> >   vfio_pci_register_vendor_driver() and deref of vfio_pci on a
> >   successful calling of vfio_pci_unregister_vendor_driver().
> > - vfio_pci only adds refs of vendor module on a successful probe of vendor
> >   driver.
> >   vfio_pci derefs vendor module when unbinding from a device.
> > 
> > So, after vfio_pci is unbound from a device, the vendor module to that
> > device is free to get unloaded. However, if that vendor module does not
> > call vfio_pci_unregister_vendor_driver() in its module_exit, vfio_pci may
> > hold a stale pointer to vendor module.
> > 
> > Cc: Kevin Tian 
> > Suggested-by: Alex Williamson 
> > Signed-off-by: Yan Zhao 
> > ---
> >  include/linux/vfio.h | 27 +++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 3e53deb012b6..f3746608c2d9 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -223,4 +223,31 @@ struct vfio_pci_vendor_driver_ops {
> >  };
> >  int __vfio_pci_register_vendor_driver(struct vfio_pci_vendor_driver_ops 
> > *ops);
> >  void vfio_pci_unregister_vendor_driver(struct vfio_device_ops *device_ops);
> > +
> > +#define vfio_pci_register_vendor_driver(__name, __probe, __remove, \
> > +   __device_ops)   \
> > +static struct vfio_pci_vendor_driver_ops  __ops ## _node = {   
> > \
> > +   .owner  = THIS_MODULE,  \
> > +   .name   = __name,   \
> > +   .probe  = __probe,  \
> > +   .remove = __remove, \
> > +   .device_ops = __device_ops, \
> > +}; \
> > +__vfio_pci_register_vendor_driver(&__ops ## _node)
> > +
> > +#define module_vfio_pci_register_vendor_handler(name, probe, remove,   
> > \
> > +   device_ops) \
> > +static int __init device_ops ## _module_init(void) \
> > +{  \
> > +   vfio_pci_register_vendor_driver(name, probe, remove,\
> > +   device_ops);\
> 
> What if this function fails (e.g. with -ENOMEM)?
>
right. I need to return error in that case.

Thanks for pointing it out!

Yan

> > +   return 0;   \
> > +}; \
> > +static void __exit device_ops ## _module_exit(void)
> > \
> > +{  \
> > +   vfio_pci_unregister_vendor_driver(device_ops);  \
> > +}; \
> > +module_init(device_ops ## _module_init);   \
> > +module_exit(device_ops ## _module_exit)
> > +
> >  #endif /* VFIO_H */
>

Re: [RFC PATCH v4 07/10] vfio/pci: introduce a new irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION

2020-06-04 Thread Yan Zhao

On Wed, Jun 03, 2020 at 10:10:58PM -0600, Alex Williamson wrote:
> On Wed, 3 Jun 2020 22:42:28 -0400
> Yan Zhao  wrote:
> 
> > On Wed, Jun 03, 2020 at 05:04:52PM -0600, Alex Williamson wrote:
> > > On Tue, 2 Jun 2020 21:40:58 -0400
> > > Yan Zhao  wrote:
> > >   
> > > > On Tue, Jun 02, 2020 at 01:34:35PM -0600, Alex Williamson wrote:  
> > > > > I'm not at all happy with this.  Why do we need to hide the migration
> > > > > sparse mmap from the user until migration time?  What if instead we
> > > > > introduced a new VFIO_REGION_INFO_CAP_SPARSE_MMAP_SAVING capability
> > > > > where the existing capability is the normal runtime sparse setup and
> > > > > the user is required to use this new one prior to enabled device_state
> > > > > with _SAVING.  The vendor driver could then simply track mmap vmas to
> > > > > the region and refuse to change device_state if there are outstanding
> > > > > mmaps conflicting with the _SAVING sparse mmap layout.  No new IRQs
> > > > > required, no new irqfds, an incremental change to the protocol,
> > > > > backwards compatible to the extent that a vendor driver requiring this
> > > > > will automatically fail migration.
> > > > > 
> > > > right. looks we need to use this approach to solve the problem.
> > > > thanks for your guide.
> > > > so I'll abandon the current remap irq way for dirty tracking during live
> > > > migration.
> > > > but anyway, it demos how to customize irq_types in vendor drivers.
> > > > then, what do you think about patches 1-5?  
> > > 
> > > In broad strokes, I don't think we've found the right solution yet.  I
> > > really question whether it's supportable to parcel out vfio-pci like
> > > this and I don't know how I'd support unraveling whether we have a bug
> > > in vfio-pci, the vendor driver, or how the vendor driver is making use
> > > of vfio-pci.
> > >
> > > Let me also ask, why does any of this need to be in the kernel?  We
> > > spend 5 patches slicing up vfio-pci so that we can register a vendor
> > > driver and have that vendor driver call into vfio-pci as it sees fit.
> > > We have two patches creating device specific interrupts and a BAR
> > > remapping scheme that we've decided we don't need.  That brings us to
> > > the actual i40e vendor driver, where the first patch is simply making
> > > the vendor driver work like vfio-pci already does, the second patch is
> > > handling the migration region, and the third patch is implementing the
> > > BAR remapping IRQ that we decided we don't need.  It's difficult to
> > > actually find the small bit of code that's required to support
> > > migration outside of just dealing with the protocol we've defined to
> > > expose this from the kernel.  So why are we trying to do this in the
> > > kernel?  We have quirk support in QEMU, we can easily flip
> > > MemoryRegions on and off, etc.  What access to the device outside of
> > > what vfio-pci provides to the user, and therefore QEMU, is necessary to
> > > implement this migration support for i40e VFs?  Is this just an
> > > exercise in making use of the migration interface?  Thanks,
> > >   
> > hi Alex
> > 
> > There was a description of intention of this series in RFC v1
> > (https://www.spinics.net/lists/kernel/msg3337337.html).
> > sorry, I didn't include it in starting from RFC v2.
> > 
> > "
> > The reason why we don't choose the way of writing mdev parent driver is
> > that
> 
> I didn't mention an mdev approach, I'm asking what are we accomplishing
> by doing this in the kernel at all versus exposing the device as normal
> through vfio-pci and providing the migration support in QEMU.  Are you
> actually leveraging having some sort of access to the PF in supporting
> migration of the VF?  Is vfio-pci masking the device in a way that
> prevents migrating the state from QEMU?
>
yes, communication to PF is required. VF state is managed by PF and is
queried from PF when VF is stopped.

migration support in QEMU seems only suitable to devices with dirty
pages and device state available by reading/writing device MMIOs, which
is not the case for most devices.

> > (1) VFs are almost all the time directly passthroughed. Directly binding
> > to vfio-pci can make most of the code shared/reused. If we write a
> > vendor specific mdev parent driver, most of the code (like passthrough
> > style of rw/mmap) still needs to be copied from vfio-pci driver, which is
> > actually a duplicated and tedious work.
> > (2) For features like dynamically trap/untrap pci bars, if they are in
> > vfio-pci, they can be available to most people without repeated code
> > copying and re-testing.
> > (3) with a 1:1 mdev driver which passes through VFs most of the time, people
> > have to decide whether to bind VFs to vfio-pci or mdev parent driver before
> > it runs into a real migration need. However, if vfio-pci is bound
> > initially, they have no chance to do live migration when there's a need
> > later.
> > "
> > particularly, there're some devices (like

Re: 回复: Re: [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink

2020-06-04 Thread Ruan Shiyang





On 2020/6/4 下午10:51, Darrick J. Wong wrote:

On Thu, Jun 04, 2020 at 03:37:42PM +0800, Ruan Shiyang wrote:



On 2020/4/28 下午2:43, Dave Chinner wrote:

On Tue, Apr 28, 2020 at 06:09:47AM +, Ruan, Shiyang wrote:


在 2020/4/27 20:28:36, "Matthew Wilcox"  写道:


On Mon, Apr 27, 2020 at 04:47:42PM +0800, Shiyang Ruan wrote:

   This patchset is a try to resolve the shared 'page cache' problem for
   fsdax.

   In order to track multiple mappings and indexes on one page, I
   introduced a dax-rmap rb-tree to manage the relationship.  A dax entry
   will be associated more than once if is shared.  At the second time we
   associate this entry, we create this rb-tree and store its root in
   page->private(not used in fsdax).  Insert (->mapping, ->index) when
   dax_associate_entry() and delete it when dax_disassociate_entry().


Do we really want to track all of this on a per-page basis?  I would
have thought a per-extent basis was more useful.  Essentially, create
a new address_space for each shared extent.  Per page just seems like
a huge overhead.


Per-extent tracking is a nice idea for me.  I haven't thought of it
yet...

But the extent info is maintained by filesystem.  I think we need a way
to obtain this info from FS when associating a page.  May be a bit
complicated.  Let me think about it...


That's why I want the -user of this association- to do a filesystem
callout instead of keeping it's own naive tracking infrastructure.
The filesystem can do an efficient, on-demand reverse mapping lookup
from it's own extent tracking infrastructure, and there's zero
runtime overhead when there are no errors present.


Hi Dave,

I ran into some difficulties when trying to implement the per-extent rmap
tracking.  So, I re-read your comments and found that I was misunderstanding
what you described here.

I think what you mean is: we don't need the in-memory dax-rmap tracking now.
Just ask the FS for the owner's information that associate with one page
when memory-failure.  So, the per-page (even per-extent) dax-rmap is
needless in this case.  Is this right?


Right.  XFS already has its own rmap tree.


Based on this, we only need to store the extent information of a fsdax page
in its ->mapping (by searching from FS).  Then obtain the owners of this
page (also by searching from FS) when memory-failure or other rmap case
occurs.


I don't even think you need that much.  All you need is the "physical"
offset of that page within the pmem device (e.g. 'this is the 307th 4k
page == offset 1257472 since the start of /dev/pmem0') and xfs can look
up the owner of that range of physical storage and deal with it as
needed.


Yes, I think so.




So, a fsdax page is no longer associated with a specific file, but with a
FS(or the pmem device).  I think it's easier to understand and implement.


Yes.  I also suspect this will be necessary to support reflink...

--D


OK, Thank you very much.


--
Thanks,
Ruan Shiyang.





--
Thanks,
Ruan Shiyang.


At the moment, this "dax association" is used to "report" a storage
media error directly to userspace. I say "report" because what it
does is kill userspace processes dead. The storage media error
actually needs to be reported to the owner of the storage media,
which in the case of FS-DAX is the filesytem.

That way the filesystem can then look up all the owners of that bad
media range (i.e. the filesystem block it corresponds to) and take
appropriate action. e.g.

- if it falls in filesytem metadata, shutdown the filesystem
- if it falls in user data, call the "kill userspace dead" routines
for each mapping/index tuple the filesystem finds for the given
LBA address that the media error occurred.

Right now if the media error is in filesystem metadata, the
filesystem isn't even told about it. The filesystem can't even shut
down - the error is just dropped on the floor and it won't be until
the filesystem next tries to reference that metadata that we notice
there is an issue.

Cheers,

Dave.

[PATCH V3 0/3] firmware: imx: fix/update cm4 and add resource check

2020-06-04 Thread peng . fan

From: Peng Fan 

V3:
 Fix comments
 Add R-b tag

V2:
 Add R-b tag
 Drop patch 3/4 from V1
 Add comments and update Copyright for patch 2/3
 keep code consistency in Patch 3/3

V1:
 https://patchwork.kernel.org/cover/11505045/

Fix cm40 power domain, update to add more cm4 resources
Add resource owner check, to not register if not owned by Linux.

Peng Fan (3):
  firmware: imx: scu-pd: fix cm40 power domain
  firmware: imx: add resource management api
  firmware: imx: scu-pd: add more cm4 resources

 drivers/firmware/imx/Makefile   |  2 +-
 drivers/firmware/imx/rm.c   | 45 
 drivers/firmware/imx/scu-pd.c   | 14 ++--
 include/linux/firmware/imx/sci.h|  1 +
 include/linux/firmware/imx/svc/rm.h | 69 +
 5 files changed, 128 insertions(+), 3 deletions(-)
 create mode 100644 drivers/firmware/imx/rm.c
 create mode 100644 include/linux/firmware/imx/svc/rm.h

-- 
2.16.4

[RFC PATCH 4/5] scsi: ufs: L2P map management for HPB read

2020-06-04 Thread Daejun Park

This is a patch for managing L2P map in HPB module.

The HPB divides logical addresses into several regions. A region consists
of several sub-regions. The sub-region is a basic unit where L2P mapping is
managed. The driver loads L2P mapping data of each sub-region. The loaded
sub-region is called active-state. The HPB driver unloads L2P mapping data
as region unit. The unloaded region is called inactive-state.

Sub-region/region candidates to be loaded and unloaded are delivered from
the UFS device. The UFS device delivers the recommended active sub-region
and inactivate region to the driver using sensedata.
The HPB module performs L2P mapping management on the host through the
delivered information.

A pinned region is a pre-set regions on the UFS device that is always
activate-state and

The data structure for map data request and L2P map uses mempool API,
minimizing allocation overhead while avoiding static allocation.

Signed-off-by: Daejun Park 
---
 drivers/scsi/ufs/ufshpb.c | 1005 -
 drivers/scsi/ufs/ufshpb.h |   72 +++
 2 files changed, 1073 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
index cb0ad4d16d0f..f1aa8e7b5ce0 100644
--- a/drivers/scsi/ufs/ufshpb.c
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -46,6 +46,63 @@ static struct ufshpb_driver ufshpb_drv;
 
 static int ufshpb_create_sysfs(struct ufs_hba *hba, struct ufshpb_lu *hpb);
 
+static inline bool ufshpb_is_general_lun(int lun)
+{
+   return lun < UFS_UPIU_MAX_UNIT_NUM_ID;
+}
+
+static inline bool
+ufshpb_is_pinned_region(struct ufshpb_lu *hpb, int rgn_idx)
+{
+   if (hpb->lu_pinned_end != PINNED_NOT_SET &&
+   rgn_idx >= hpb->lu_pinned_start &&
+   rgn_idx <= hpb->lu_pinned_end)
+   return true;
+
+   return false;
+}
+
+static bool ufshpb_is_empty_rsp_lists(struct ufshpb_lu *hpb)
+{
+   bool ret = true;
+   unsigned long flags;
+
+   spin_lock_irqsave(>rsp_list_lock, flags);
+   if (!list_empty(>lh_inact_rgn) || !list_empty(>lh_act_srgn))
+   ret = false;
+   spin_unlock_irqrestore(>rsp_list_lock, flags);
+
+   return ret;
+}
+
+static inline int ufshpb_may_field_valid(struct ufs_hba *hba,
+struct ufshcd_lrb *lrbp,
+struct ufshpb_rsp_field *rsp_field)
+{
+   if (be16_to_cpu(rsp_field->sense_data_len) != DEV_SENSE_SEG_LEN ||
+   rsp_field->desc_type != DEV_DES_TYPE ||
+   rsp_field->additional_len != DEV_ADDITIONAL_LEN ||
+   rsp_field->hpb_type == HPB_RSP_NONE ||
+   rsp_field->active_rgn_cnt > MAX_ACTIVE_NUM ||
+   rsp_field->inactive_rgn_cnt > MAX_INACTIVE_NUM ||
+   (!rsp_field->active_rgn_cnt && !rsp_field->inactive_rgn_cnt))
+   return -EINVAL;
+
+   if (!ufshpb_is_general_lun(lrbp->lun)) {
+   dev_warn(hba->dev, "ufshpb: lun(%d) not supported\n",
+lrbp->lun);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+
+static inline struct ufshpb_lu *ufshpb_get_hpb_data(struct scsi_cmnd *cmd)
+{
+   return cmd->device->hostdata;
+}
+
 static inline int ufshpb_get_state(struct ufshpb_lu *hpb)
 {
return atomic_read(>hpb_state);
@@ -80,6 +137,789 @@ static inline void ufshpb_lu_put(struct ufshpb_lu *hpb)
put_device(>hpb_lu_dev);
 }
 
+static struct ufshpb_req *ufshpb_get_map_req(struct ufshpb_lu *hpb,
+struct ufshpb_subregion *srgn)
+{
+   struct ufshpb_req *map_req;
+   struct request *req;
+   struct bio *bio;
+
+   map_req = kmem_cache_alloc(hpb->map_req_cache, GFP_KERNEL);
+   if (!map_req)
+   return NULL;
+
+   req = blk_get_request(hpb->sdev_ufs_lu->request_queue,
+ REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
+   if (IS_ERR(req))
+   goto free_map_req;
+
+   bio = bio_alloc(GFP_KERNEL, hpb->pages_per_srgn);
+   if (!bio) {
+   blk_put_request(req);
+   goto free_map_req;
+   }
+
+   map_req->hpb = hpb;
+   map_req->req = req;
+   map_req->bio = bio;
+
+   map_req->rgn_idx = srgn->rgn_idx;
+   map_req->srgn_idx = srgn->srgn_idx;
+   map_req->mctx = srgn->mctx;
+   map_req->lun = hpb->lun;
+
+   return map_req;
+free_map_req:
+   kmem_cache_free(hpb->map_req_cache, map_req);
+   return NULL;
+}
+
+static inline void ufshpb_put_map_req(struct ufshpb_lu *hpb,
+ struct ufshpb_req *map_req)
+{
+   bio_put(map_req->bio);
+   blk_put_request(map_req->req);
+   kmem_cache_free(hpb->map_req_cache, map_req);
+}
+
+
+static inline int ufshpb_clear_dirty_bitmap(struct ufshpb_lu *hpb,
+struct ufshpb_subregion *srgn)
+{
+   WARN_ON(!srgn->mctx);
+   bitmap_zero(srgn->mctx->ppn_dirty,

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1137 matches

Mail list logo