date:20190924

Re: [RFC PATCH] interconnect: Replace of_icc_get() with icc_get() and reduce DT binding

2019-09-24 Thread Bjorn Andersson

On Tue 24 Sep 22:41 PDT 2019, Stephen Boyd wrote:

> I don't see any users of icc_get() in the kernel today, and adding them
> doesn't make sense. That's because adding calls to that function in a
> driver will make the driver SoC specific given that the arguments are
> some sort of source and destination numbers that would typically be
> listed in DT or come from platform data so they can match a global
> numberspace of interconnect numbers. It would be better to follow the
> approach of other kernel frameworks where the API is the same no matter
> how the platform is described (i.e. platform data, DT, ACPI, etc.) and
> swizzle the result in the framework to match whatever the device is by
> checking for a DT node pointer or a fwnode pointer, etc. Therefore,
> install icc_get() as the defacto API and make drivers use that instead
> of of_icc_get() which implies the driver is DT specific when it doesn't
> need to be.
> 

+1 on this part!

> The DT binding could also be simplified somewhat. Currently a path needs
> to be specified in DT for each and every use case that is possible for a
> device to want. Typically the path is to memory, which looks to be
> reserved for in the binding with the "dma-mem" named path, but sometimes
> the path is from a device to the CPU or more generically from a device
> to another device which could be a CPU, cache, DMA master, or another
> device if some sort of DMA to DMA scenario is happening. Let's remove
> the pair part of the binding so that we just list out a device's
> possible endpoints on the bus or busses that it's connected to.
> 
> If the kernel wants to figure out what the path is to memory or the CPU
> or a cache or something else it should be able to do that by finding the
> node for the "destination" endpoint, extracting that node's
> "interconnects" property, and deriving the path in software. For
> example, we shouldn't need to write out each use case path by path in DT
> for each endpoint node that wants to set a bandwidth to memory. We
> should just be able to indicate what endpoint(s) a device sits on based
> on the interconnect provider in the system and then walk the various
> interconnects to find the path from that source endpoint to the
> destination endpoint.
> 

But doesn't this implies that the other end of the path is always some
specific node, e.g. DDR? With a single node how would you describe
CPU->LLCC or GPU->OCIMEM?

> Obviously this patch doesn't compile but I'm sending it out to start
> this discussion so we don't get stuck on the binding or the kernel APIs
> for a long time. It looks like we should be OK in terms of backwards
> compatibility because we can just ignore the second element in an old
> binding, but maybe we'll want to describe paths in different directions
> (e.g. the path from the CPU to the SD controller may be different than
> the path the SD controller takes to the CPU) and that may require
> extending interconnect-names to indicate what direction/sort of path it
> is. I'm basically thinking about master vs. slave ports in AXI land.
> 
> Cc: Maxime Ripard 
> Cc: 
> Cc: Rob Herring 
> Cc: 
> Cc: Bjorn Andersson 
> Cc: Evan Green 
> Cc: David Dai 
> Signed-off-by: Stephen Boyd 
> ---
>  .../bindings/interconnect/interconnect.txt| 19 ---
>  include/linux/interconnect.h  | 13 ++---
>  2 files changed, 6 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/interconnect/interconnect.txt 
> b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> index 6f5d23a605b7..f8979186b8a7 100644
> --- a/Documentation/devicetree/bindings/interconnect/interconnect.txt
> +++ b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> @@ -11,7 +11,7 @@ The interconnect provider binding is intended to represent 
> the interconnect
>  controllers in the system. Each provider registers a set of interconnect
>  nodes, which expose the interconnect related capabilities of the interconnect
>  to consumer drivers. These capabilities can be throughput, latency, priority
> -etc. The consumer drivers set constraints on interconnect path (or endpoints)
> +etc. The consumer drivers set constraints on interconnect paths (or 
> endpoints)
>  depending on the use case. Interconnect providers can also be interconnect
>  consumers, such as in the case where two network-on-chip fabrics interface
>  directly.
> @@ -42,23 +42,12 @@ multiple paths from different providers depending on use 
> case and the
>  components it has to interact with.
>  
>  Required properties:
> -interconnects : Pairs of phandles and interconnect provider specifier to 
> denote
> - the edge source and destination ports of the interconnect path.
> -
> -Optional properties:
> -interconnect-names : List of interconnect path name strings sorted in the 
> same
> -  order as the interconnects property. Consumers drivers 
> will use
> -  interconnect-names

Re: latest git kernel (v5.3-11506-gf7c3bf8fa7e5) does not compile

2019-09-24 Thread Masahiro Yamada

Hi Anatoly,

On Sun, Sep 22, 2019 at 9:14 PM Anatoly Pugachev  wrote:

> > Thanks for the report, and apology for the breakage.
> >
> > Please check this patch.
> > https://lore.kernel.org/patchwork/patch/1130469/
> >
> > I hope it will fix the build error.
>
>
> It does. Thanks Masahiro!

Thanks for testing!

Could you please give your Tested-by in the reply to my patch?

With Tested-by from the reporter,
I think the maintainer will be able to pick up the patch
more confidently.

Thanks.

-- 
Best Regards
Masahiro Yamada

[PATCH] KVM: LAPIC: Loose fluctuation filter for auto tune lapic_timer_advance_ns

2019-09-24 Thread Wanpeng Li

From: Wanpeng Li 

5000 guest cycles delta is easy to encounter on desktop, per-vCPU 
lapic_timer_advance_ns always keeps at 1000ns initial value, lets 
loose fluctuation filter a bit to make auto tune can make some 
progress.

Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3a3a685..258407e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -67,7 +67,7 @@
 
 static bool lapic_timer_advance_dynamic __read_mostly;
 #define LAPIC_TIMER_ADVANCE_ADJUST_MIN 100
-#define LAPIC_TIMER_ADVANCE_ADJUST_MAX 5000
+#define LAPIC_TIMER_ADVANCE_ADJUST_MAX 1
 #define LAPIC_TIMER_ADVANCE_ADJUST_INIT 1000
 /* step-by-step approximation to mitigate fluctuation */
 #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
@@ -1504,7 +1504,7 @@ static inline void adjust_lapic_timer_advance(struct 
kvm_vcpu *vcpu,
timer_advance_ns += ns/LAPIC_TIMER_ADVANCE_ADJUST_STEP;
}
 
-   if (unlikely(timer_advance_ns > LAPIC_TIMER_ADVANCE_ADJUST_MAX))
+   if (unlikely(timer_advance_ns > LAPIC_TIMER_ADVANCE_ADJUST_MAX/2))
timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
apic->lapic_timer.timer_advance_ns = timer_advance_ns;
 }
-- 
2.7.4

[RFC PATCH] interconnect: Replace of_icc_get() with icc_get() and reduce DT binding

2019-09-24 Thread Stephen Boyd

I don't see any users of icc_get() in the kernel today, and adding them
doesn't make sense. That's because adding calls to that function in a
driver will make the driver SoC specific given that the arguments are
some sort of source and destination numbers that would typically be
listed in DT or come from platform data so they can match a global
numberspace of interconnect numbers. It would be better to follow the
approach of other kernel frameworks where the API is the same no matter
how the platform is described (i.e. platform data, DT, ACPI, etc.) and
swizzle the result in the framework to match whatever the device is by
checking for a DT node pointer or a fwnode pointer, etc. Therefore,
install icc_get() as the defacto API and make drivers use that instead
of of_icc_get() which implies the driver is DT specific when it doesn't
need to be.

The DT binding could also be simplified somewhat. Currently a path needs
to be specified in DT for each and every use case that is possible for a
device to want. Typically the path is to memory, which looks to be
reserved for in the binding with the "dma-mem" named path, but sometimes
the path is from a device to the CPU or more generically from a device
to another device which could be a CPU, cache, DMA master, or another
device if some sort of DMA to DMA scenario is happening. Let's remove
the pair part of the binding so that we just list out a device's
possible endpoints on the bus or busses that it's connected to.

If the kernel wants to figure out what the path is to memory or the CPU
or a cache or something else it should be able to do that by finding the
node for the "destination" endpoint, extracting that node's
"interconnects" property, and deriving the path in software. For
example, we shouldn't need to write out each use case path by path in DT
for each endpoint node that wants to set a bandwidth to memory. We
should just be able to indicate what endpoint(s) a device sits on based
on the interconnect provider in the system and then walk the various
interconnects to find the path from that source endpoint to the
destination endpoint.

Obviously this patch doesn't compile but I'm sending it out to start
this discussion so we don't get stuck on the binding or the kernel APIs
for a long time. It looks like we should be OK in terms of backwards
compatibility because we can just ignore the second element in an old
binding, but maybe we'll want to describe paths in different directions
(e.g. the path from the CPU to the SD controller may be different than
the path the SD controller takes to the CPU) and that may require
extending interconnect-names to indicate what direction/sort of path it
is. I'm basically thinking about master vs. slave ports in AXI land.

Cc: Maxime Ripard 
Cc: 
Cc: Rob Herring 
Cc: 
Cc: Bjorn Andersson 
Cc: Evan Green 
Cc: David Dai 
Signed-off-by: Stephen Boyd 
---
 .../bindings/interconnect/interconnect.txt| 19 ---
 include/linux/interconnect.h  | 13 ++---
 2 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/Documentation/devicetree/bindings/interconnect/interconnect.txt 
b/Documentation/devicetree/bindings/interconnect/interconnect.txt
index 6f5d23a605b7..f8979186b8a7 100644
--- a/Documentation/devicetree/bindings/interconnect/interconnect.txt
+++ b/Documentation/devicetree/bindings/interconnect/interconnect.txt
@@ -11,7 +11,7 @@ The interconnect provider binding is intended to represent 
the interconnect
 controllers in the system. Each provider registers a set of interconnect
 nodes, which expose the interconnect related capabilities of the interconnect
 to consumer drivers. These capabilities can be throughput, latency, priority
-etc. The consumer drivers set constraints on interconnect path (or endpoints)
+etc. The consumer drivers set constraints on interconnect paths (or endpoints)
 depending on the use case. Interconnect providers can also be interconnect
 consumers, such as in the case where two network-on-chip fabrics interface
 directly.
@@ -42,23 +42,12 @@ multiple paths from different providers depending on use 
case and the
 components it has to interact with.
 
 Required properties:
-interconnects : Pairs of phandles and interconnect provider specifier to denote
-   the edge source and destination ports of the interconnect path.
-
-Optional properties:
-interconnect-names : List of interconnect path name strings sorted in the same
-order as the interconnects property. Consumers drivers 
will use
-interconnect-names to match interconnect paths with 
interconnect
-specifier pairs.
-
- Reserved interconnect names:
-* dma-mem: Path from the device to the main memory of
-   the system
+interconnects : phandle and interconnect provider specifier to denote
+   the edge source for this node.
 
 Example:

Re: [PATCH xfstests v2] overlay: Enable character device to be the base fs partition

2019-09-24 Thread Amir Goldstein

On Wed, Sep 25, 2019 at 6:27 AM Zhihao Cheng  wrote:
>
> There are indeed many '-b' options in xfstests. I only confirmed the line of 
> overlay test. Other -b test options I need to reconfirm later.
>

FWIW, I eyeballed blockdev related overlayfs common code bits
and all I found out of order was:

@@ -3100,7 +3100,7 @@ _require_scratch_shutdown()
# SCRATCH_DEV, in this case OVL_BASE_SCRATCH_DEV
# will be null, so check OVL_BASE_SCRATCH_DEV before
# running shutdown to avoid shutting down base
fs accidently.
-   _notrun "$SCRATCH_DEV is not a block device"
+   _notrun "this test requires a valid
\$OVL_BASE_SCRATCH_DEV as ovl base fs"
else
src/godown -f $OVL_BASE_SCRATCH_MNT 2>&1 \
|| _notrun "Underlying filesystem does not
support shutdown"


Zhihaho,

That's all I meant in the nit.
The v1 commit message was perfectly fine, there was no need to change it at all.

Thanks,
Amir.

Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces

2019-09-24 Thread Peter Xu

On Mon, Sep 23, 2019 at 08:24:52PM +0800, Lu Baolu wrote:
> This adds functions to manipulate first level page tables
> which could be used by a scalale mode capable IOMMU unit.
> 
> intel_mmmap_range(domain, addr, end, phys_addr, prot)
>  - Map an iova range of [addr, end) to the physical memory
>started at @phys_addr with the @prot permissions.
> 
> intel_mmunmap_range(domain, addr, end)
>  - Tear down the map of an iova range [addr, end). A page
>list will be returned which will be freed after iotlb
>flushing.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Cc: Yi Sun 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/Makefile |   2 +-
>  drivers/iommu/intel-pgtable.c  | 342 +
>  include/linux/intel-iommu.h|  24 +-
>  include/trace/events/intel_iommu.h |  60 +
>  4 files changed, 426 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/iommu/intel-pgtable.c
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 4f405f926e73..dc550e14cc58 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -17,7 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
>  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>  obj-$(CONFIG_DMAR_TABLE) += dmar.o
>  obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
> -obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
> +obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o intel-pgtable.o
>  obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
>  obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
>  obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
> diff --git a/drivers/iommu/intel-pgtable.c b/drivers/iommu/intel-pgtable.c
> new file mode 100644
> index ..8e95978cd381
> --- /dev/null
> +++ b/drivers/iommu/intel-pgtable.c
> @@ -0,0 +1,342 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/**
> + * intel-pgtable.c - Intel IOMMU page table manipulation library

Could this be a bit misleading?  Normally I'll use "IOMMU page table"
to refer to the 2nd level page table only, and I'm always
understanding it as "the new IOMMU will understand MMU page table as
the 1st level".  At least mention "IOMMU 1st level page table"?

> + *
> + * Copyright (C) 2019 Intel Corporation
> + *
> + * Author: Lu Baolu 
> + */
> +
> +#define pr_fmt(fmt) "DMAR: " fmt
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_X86
> +/*
> + * mmmap: Map a range of IO virtual address to physical addresses.

"... to physical addresses using MMU page table"?

Might be clearer?

> + */
> +#define pgtable_populate(domain, nm) \
> +do { \
> + void *__new = alloc_pgtable_page(domain->nid);  \
> + if (!__new) \
> + return -ENOMEM; \
> + smp_wmb();  \

Could I ask what's this wmb used for?

> + spin_lock(&(domain)->page_table_lock);  \

Is this intended to lock here instead of taking the lock during the
whole page table walk?  Is it safe?

Taking the example where nm==PTE: when we reach here how do we
guarantee that the PMD page that has this PTE is still valid?

> + if (nm ## _present(*nm)) {  \
> + free_pgtable_page(__new);   \
> + } else {\
> + set_##nm(nm, __##nm(__pa(__new) | _PAGE_TABLE));\

It seems to me that PV could trap calls to set_pte().  Then these
could also be trapped by e.g. Xen?  Are these traps needed?  Is there
side effect?  I'm totally not familiar with this, but just ask aloud...

> + domain_flush_cache(domain, nm, sizeof(nm##_t)); \
> + }   \
> + spin_unlock(&(domain)->page_table_lock);\
> +} while(0);
> +
> +static int
> +mmmap_pte_range(struct dmar_domain *domain, pmd_t *pmd, unsigned long addr,
> + unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
> +{
> + pte_t *pte, *first_pte;
> + u64 pfn;
> +
> + pfn = phys_addr >> PAGE_SHIFT;
> + if (unlikely(pmd_none(*pmd)))
> + pgtable_populate(domain, pmd);
> +
> + first_pte = pte = pte_offset_kernel(pmd, addr);
> +
> + do {
> + set_pte(pte, pfn_pte(pfn, prot));
> + pfn++;
> + } while (pte++, addr += PAGE_SIZE, addr != end);
> +
> + domain_flush_cache(domain, first_pte, (void *)pte - (void *)first_pte);
> +
> + return 0;
> +}
> +
> +static int
> +mmmap_pmd_range(struct dmar_domain *domain, pud_t *pud, unsigned long addr,
> + unsigned long end, phys_addr_t

Re: [PATCH 1/2] platform: goldfish: Allow goldfish virtual platform drivers for RISCV

2019-09-24 Thread Anup Patel

On Wed, Sep 25, 2019 at 10:37 AM Greg Kroah-Hartman
 wrote:
>
> On Wed, Sep 25, 2019 at 10:30:00AM +0530, Anup Patel wrote:
> > On Wed, Sep 25, 2019 at 10:13 AM Greg Kroah-Hartman
> >  wrote:
> > >
> > > On Wed, Sep 25, 2019 at 04:30:03AM +, Anup Patel wrote:
> > > > We will be using some of the Goldfish virtual platform devices (such
> > > > as RTC) on QEMU RISC-V virt machine so this patch enables goldfish
> > > > kconfig option for RISC-V architecture.
> > > >
> > > > Signed-off-by: Anup Patel 
> > > > ---
> > > >  drivers/platform/goldfish/Kconfig | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/platform/goldfish/Kconfig 
> > > > b/drivers/platform/goldfish/Kconfig
> > > > index 77b35df3a801..0ba825030ffe 100644
> > > > --- a/drivers/platform/goldfish/Kconfig
> > > > +++ b/drivers/platform/goldfish/Kconfig
> > > > @@ -1,7 +1,7 @@
> > > >  # SPDX-License-Identifier: GPL-2.0-only
> > > >  menuconfig GOLDFISH
> > > >   bool "Platform support for Goldfish virtual devices"
> > > > - depends on X86_32 || X86_64 || ARM || ARM64 || MIPS
> > > > + depends on X86_32 || X86_64 || ARM || ARM64 || MIPS || RISCV
> > >
> > > Why does this depend on any of these?  Can't we just have:
> >
> > May be Goldfish drivers were compile tested/tried on these architectures 
> > only.
>
> True, but that does not mean a driver should only have a specific list
> of arches.  This should only be needed if you _know_ it doesn't work on
> a specific arch, not the other way around.

No problem, I will drop depends on various architectures line

>
> > > >   depends on HAS_IOMEM
> > >
> > > And that's it?
> >
> > I think it should be just "depends on HAS_IOMEM && HAS_DMA" just like
> > VirtIO MMIO. Agree ??
>
> No idea, but if that's what  is needed for building, then sure :)

The Goldfish framebuffer can do DMA access so I add dependency on
HAS_DMA.

Refer, 
https://android.googlesource.com/platform/external/qemu/+/master/docs/GOLDFISH-VIRTUAL-HARDWARE.TXT

I will send v2 as-per above.

Regards,
Anup

[PATCH v3] arm64: use generic free_initrd_mem()

2019-09-24 Thread Mike Rapoport

From: Mike Rapoport 

arm64 calls memblock_free() for the initrd area in its implementation of
free_initrd_mem(), but this call has no actual effect that late in the boot
process. By the time initrd is freed, all the reserved memory is managed by
the page allocator and the memblock.reserved is unused, so the only purpose
of the memblock_free() call is to keep track of initrd memory for debugging
and accounting.

Without the memblock_free() call the only difference between arm64 and the
generic versions of free_initrd_mem() is the memory poisoning.

Move memblock_free() call to the generic code, enable it there
for the architectures that define ARCH_KEEP_MEMBLOCK and use the generic
implementaion of free_initrd_mem() on arm64.

Signed-off-by: Mike Rapoport 
---

v3:
* fix powerpc build

v2: 
* add memblock_free() to the generic free_initrd_mem()
* rebase on the current upstream


 arch/arm64/mm/init.c | 12 
 init/initramfs.c |  5 +
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 45c00a5..87a0e3b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -580,18 +580,6 @@ void free_initmem(void)
unmap_kernel_range((u64)__init_begin, (u64)(__init_end - __init_begin));
 }
 
-#ifdef CONFIG_BLK_DEV_INITRD
-void __init free_initrd_mem(unsigned long start, unsigned long end)
-{
-   unsigned long aligned_start, aligned_end;
-
-   aligned_start = __virt_to_phys(start) & PAGE_MASK;
-   aligned_end = PAGE_ALIGN(__virt_to_phys(end));
-   memblock_free(aligned_start, aligned_end - aligned_start);
-   free_reserved_area((void *)start, (void *)end, 0, "initrd");
-}
-#endif
-
 /*
  * Dump out memory limit information on panic.
  */
diff --git a/init/initramfs.c b/init/initramfs.c
index c47dad0..3d61e13 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static ssize_t __init xwrite(int fd, const char *p, size_t count)
 {
@@ -531,6 +532,10 @@ void __weak free_initrd_mem(unsigned long start, unsigned 
long end)
 {
free_reserved_area((void *)start, (void *)end, POISON_FREE_INITMEM,
"initrd");
+
+#ifdef CONFIG_ARCH_KEEP_MEMBLOCK
+   memblock_free(__pa(start), end - start);
+#endif
 }
 
 #ifdef CONFIG_KEXEC_CORE
-- 
2.7.4

INFO: trying to register non-static key in finish_writeback_work

2019-09-24 Thread syzbot


Hello,

syzbot found the following crash on:

HEAD commit:b41dae06 Merge tag 'xfs-5.4-merge-7' of git://git.kernel.o..
git tree:   net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17d19a7e60
kernel config:  https://syzkaller.appspot.com/x/.config?x=dfcf592db22b9132
dashboard link: https://syzkaller.appspot.com/bug?extid=21875b598ddcdc309b28
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15fcf1a160

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+21875b598ddcdc309...@syzkaller.appspotmail.com

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 2603 Comm: kworker/u4:4 Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: writeback wb_workfn (flush-8:0)
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 assign_lock_key kernel/locking/lockdep.c:881 [inline]
 register_lock_class+0x179e/0x1850 kernel/locking/lockdep.c:1190
 __lock_acquire+0xf4/0x4e70 kernel/locking/lockdep.c:3837
 lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4487
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
 __wake_up_common_lock+0xc8/0x150 kernel/sched/wait.c:122
 __wake_up+0xe/0x10 kernel/sched/wait.c:142
 finish_writeback_work.isra.0+0xf6/0x120 fs/fs-writeback.c:168
 wb_do_writeback fs/fs-writeback.c:2030 [inline]
 wb_workfn+0x34f/0x11e0 fs/fs-writeback.c:2070
 process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x361/0x430 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] PREEMPT SMP KASAN
CPU: 0 PID: 2603 Comm: kworker/u4:4 Not tainted 5.3.0+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: writeback wb_workfn (flush-8:0)
RIP: 0010:__wake_up_common+0xdf/0x610 kernel/sched/wait.c:86
Code: 05 00 00 4c 8b 43 38 49 83 e8 18 49 8d 78 18 48 39 7d d0 0f 84 64 02  
00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 <80> 3c 01 00 0f  
85 0b 05 00 00 49 8b 40 18 89 55 b0 31 db 49 bc 00

RSP: 0018:8880a1dc7a90 EFLAGS: 00010046
RAX: dc00 RBX: 888079642000 RCX: 
RDX:  RSI: 1138d60e RDI: 
RBP: 8880a1dc7ae8 R08: ffe8 R09: 8880a1dc7b38
R10: ed10143b8f4b R11: 0003 R12: 
R13: 0286 R14:  R15: 0003
FS:  () GS:8880ae80() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 001b2e620020 CR3: a3f3e000 CR4: 001406f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 __wake_up_common_lock+0xea/0x150 kernel/sched/wait.c:123
 __wake_up+0xe/0x10 kernel/sched/wait.c:142
 finish_writeback_work.isra.0+0xf6/0x120 fs/fs-writeback.c:168
 wb_do_writeback fs/fs-writeback.c:2030 [inline]
 wb_workfn+0x34f/0x11e0 fs/fs-writeback.c:2070
 process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x361/0x430 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Modules linked in:
---[ end trace a54dff274d7cf269 ]---
RIP: 0010:__wake_up_common+0xdf/0x610 kernel/sched/wait.c:86
Code: 05 00 00 4c 8b 43 38 49 83 e8 18 49 8d 78 18 48 39 7d d0 0f 84 64 02  
00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 <80> 3c 01 00 0f  
85 0b 05 00 00 49 8b 40 18 89 55 b0 31 db 49 bc 00

RSP: 0018:8880a1dc7a90 EFLAGS: 00010046
RAX: dc00 RBX: 888079642000 RCX: 
RDX:  RSI: 1138d60e RDI: 
RBP: 8880a1dc7ae8 R08: ffe8 R09: 8880a1dc7b38
R10: ed10143b8f4b R11: 0003 R12: 
R13: 0286 R14:  R15: 0003
FS:  () GS:8880ae80() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 001b2e620020 CR3: a3f3e000 CR4: 001406f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to

Re: [PATCH 1/2] platform: goldfish: Allow goldfish virtual platform drivers for RISCV

2019-09-24 Thread Greg Kroah-Hartman

On Wed, Sep 25, 2019 at 10:30:00AM +0530, Anup Patel wrote:
> On Wed, Sep 25, 2019 at 10:13 AM Greg Kroah-Hartman
>  wrote:
> >
> > On Wed, Sep 25, 2019 at 04:30:03AM +, Anup Patel wrote:
> > > We will be using some of the Goldfish virtual platform devices (such
> > > as RTC) on QEMU RISC-V virt machine so this patch enables goldfish
> > > kconfig option for RISC-V architecture.
> > >
> > > Signed-off-by: Anup Patel 
> > > ---
> > >  drivers/platform/goldfish/Kconfig | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/platform/goldfish/Kconfig 
> > > b/drivers/platform/goldfish/Kconfig
> > > index 77b35df3a801..0ba825030ffe 100644
> > > --- a/drivers/platform/goldfish/Kconfig
> > > +++ b/drivers/platform/goldfish/Kconfig
> > > @@ -1,7 +1,7 @@
> > >  # SPDX-License-Identifier: GPL-2.0-only
> > >  menuconfig GOLDFISH
> > >   bool "Platform support for Goldfish virtual devices"
> > > - depends on X86_32 || X86_64 || ARM || ARM64 || MIPS
> > > + depends on X86_32 || X86_64 || ARM || ARM64 || MIPS || RISCV
> >
> > Why does this depend on any of these?  Can't we just have:
> 
> May be Goldfish drivers were compile tested/tried on these architectures only.

True, but that does not mean a driver should only have a specific list
of arches.  This should only be needed if you _know_ it doesn't work on
a specific arch, not the other way around.

> > >   depends on HAS_IOMEM
> >
> > And that's it?
> 
> I think it should be just "depends on HAS_IOMEM && HAS_DMA" just like
> VirtIO MMIO. Agree ??

No idea, but if that's what  is needed for building, then sure :)

thanks,

greg k-h

Re: [PATCH 1/2] platform: goldfish: Allow goldfish virtual platform drivers for RISCV

2019-09-24 Thread Anup Patel

On Wed, Sep 25, 2019 at 10:13 AM Greg Kroah-Hartman
 wrote:
>
> On Wed, Sep 25, 2019 at 04:30:03AM +, Anup Patel wrote:
> > We will be using some of the Goldfish virtual platform devices (such
> > as RTC) on QEMU RISC-V virt machine so this patch enables goldfish
> > kconfig option for RISC-V architecture.
> >
> > Signed-off-by: Anup Patel 
> > ---
> >  drivers/platform/goldfish/Kconfig | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/platform/goldfish/Kconfig 
> > b/drivers/platform/goldfish/Kconfig
> > index 77b35df3a801..0ba825030ffe 100644
> > --- a/drivers/platform/goldfish/Kconfig
> > +++ b/drivers/platform/goldfish/Kconfig
> > @@ -1,7 +1,7 @@
> >  # SPDX-License-Identifier: GPL-2.0-only
> >  menuconfig GOLDFISH
> >   bool "Platform support for Goldfish virtual devices"
> > - depends on X86_32 || X86_64 || ARM || ARM64 || MIPS
> > + depends on X86_32 || X86_64 || ARM || ARM64 || MIPS || RISCV
>
> Why does this depend on any of these?  Can't we just have:

May be Goldfish drivers were compile tested/tried on these architectures only.

>
> >   depends on HAS_IOMEM
>
> And that's it?

I think it should be just "depends on HAS_IOMEM && HAS_DMA" just like
VirtIO MMIO. Agree ??

Regards,
Anup

RE: [RFC PATCH 3/4] iommu/vt-d: Map/unmap domain with mmmap/mmunmap

2019-09-24 Thread Tian, Kevin

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Monday, September 23, 2019 8:25 PM
> 
> If a dmar domain has DOMAIN_FLAG_FIRST_LEVEL_TRANS bit set
> in its flags, IOMMU will use the first level page table for
> translation. Hence, we need to map or unmap addresses in the
> first level page table.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Cc: Yi Sun 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-iommu.c | 94 -
> 
>  1 file changed, 82 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 9cfe8098d993..103480016010 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -168,6 +168,11 @@ static inline unsigned long virt_to_dma_pfn(void
> *p)
>   return page_to_dma_pfn(virt_to_page(p));
>  }
> 
> +static inline unsigned long dma_pfn_to_addr(unsigned long pfn)
> +{
> + return pfn << VTD_PAGE_SHIFT;
> +}
> +
>  /* global iommu list, set NULL for ignored DMAR units */
>  static struct intel_iommu **g_iommus;
> 
> @@ -307,6 +312,9 @@ static int hw_pass_through = 1;
>   */
>  #define DOMAIN_FLAG_LOSE_CHILDRENBIT(1)
> 
> +/* Domain uses first level translation for DMA remapping. */
> +#define DOMAIN_FLAG_FIRST_LEVEL_TRANSBIT(2)
> +
>  #define for_each_domain_iommu(idx, domain)   \
>   for (idx = 0; idx < g_num_of_iommus; idx++) \
>   if (domain->iommu_refcnt[idx])
> @@ -552,6 +560,11 @@ static inline int domain_type_is_si(struct
> dmar_domain *domain)
>   return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;
>  }
> 
> +static inline int domain_type_is_flt(struct dmar_domain *domain)
> +{
> + return domain->flags & DOMAIN_FLAG_FIRST_LEVEL_TRANS;
> +}
> +
>  static inline int domain_pfn_supported(struct dmar_domain *domain,
>  unsigned long pfn)
>  {
> @@ -1147,8 +1160,15 @@ static struct page *domain_unmap(struct
> dmar_domain *domain,
>   BUG_ON(start_pfn > last_pfn);
> 
>   /* we don't need lock here; nobody else touches the iova range */
> - freelist = dma_pte_clear_level(domain, agaw_to_level(domain-
> >agaw),
> -domain->pgd, 0, start_pfn, last_pfn,
> NULL);
> + if (domain_type_is_flt(domain))
> + freelist = intel_mmunmap_range(domain,
> +dma_pfn_to_addr(start_pfn),
> +dma_pfn_to_addr(last_pfn + 1));
> + else
> + freelist = dma_pte_clear_level(domain,
> +agaw_to_level(domain->agaw),
> +domain->pgd, 0, start_pfn,
> +last_pfn, NULL);

what about providing an unified interface at the caller side, then having 
the level differentiated within the interface?

> 
>   /* free pgd */
>   if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw))
> {
> @@ -2213,9 +2233,10 @@ static inline int hardware_largepage_caps(struct
> dmar_domain *domain,
>   return level;
>  }
> 
> -static int __domain_mapping(struct dmar_domain *domain, unsigned long
> iov_pfn,
> - struct scatterlist *sg, unsigned long phys_pfn,
> - unsigned long nr_pages, int prot)
> +static int
> +__domain_mapping_dma(struct dmar_domain *domain, unsigned long
> iov_pfn,
> +  struct scatterlist *sg, unsigned long phys_pfn,
> +  unsigned long nr_pages, int prot)
>  {
>   struct dma_pte *first_pte = NULL, *pte = NULL;
>   phys_addr_t uninitialized_var(pteval);
> @@ -2223,13 +2244,6 @@ static int __domain_mapping(struct
> dmar_domain *domain, unsigned long iov_pfn,
>   unsigned int largepage_lvl = 0;
>   unsigned long lvl_pages = 0;
> 
> - BUG_ON(!domain_pfn_supported(domain, iov_pfn + nr_pages - 1));
> -
> - if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
> - return -EINVAL;
> -
> - prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
> -
>   if (!sg) {
>   sg_res = nr_pages;
>   pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) |
> prot;
> @@ -2328,6 +2342,62 @@ static int __domain_mapping(struct
> dmar_domain *domain, unsigned long iov_pfn,
>   return 0;
>  }
> 
> +static int
> +__domain_mapping_mm(struct dmar_domain *domain, unsigned long
> iov_pfn,
> + struct scatterlist *sg, unsigned long phys_pfn,
> + unsigned long nr_pages, int prot)
> +{
> + int ret = 0;
> +
> + if (!sg)
> + return intel_mmmap_range(domain,
> dma_pfn_to_addr(iov_pfn),
> +  dma_pfn_to_addr(iov_pfn +
> nr_pages),
> +  dma_pfn_to_addr(phys_pfn), prot);
> +
> + while (nr_pages > 0) {
> + unsigned long

Re: [PATCH 1/2] platform: goldfish: Allow goldfish virtual platform drivers for RISCV

2019-09-24 Thread Greg Kroah-Hartman

On Wed, Sep 25, 2019 at 04:30:03AM +, Anup Patel wrote:
> We will be using some of the Goldfish virtual platform devices (such
> as RTC) on QEMU RISC-V virt machine so this patch enables goldfish
> kconfig option for RISC-V architecture.
> 
> Signed-off-by: Anup Patel 
> ---
>  drivers/platform/goldfish/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/platform/goldfish/Kconfig 
> b/drivers/platform/goldfish/Kconfig
> index 77b35df3a801..0ba825030ffe 100644
> --- a/drivers/platform/goldfish/Kconfig
> +++ b/drivers/platform/goldfish/Kconfig
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  menuconfig GOLDFISH
>   bool "Platform support for Goldfish virtual devices"
> - depends on X86_32 || X86_64 || ARM || ARM64 || MIPS
> + depends on X86_32 || X86_64 || ARM || ARM64 || MIPS || RISCV

Why does this depend on any of these?  Can't we just have:

>   depends on HAS_IOMEM

And that's it?

thanks,

greg k-h

Re: [PATCH 1/5] linux/kernel.h: Add sizeof_member macro

2019-09-24 Thread Bharadiya,Pankaj

On Tue, Sep 24, 2019 at 09:22:10AM -0700, Kees Cook wrote:
> On Tue, Sep 24, 2019 at 04:28:35PM +0530, Pankaj Bharadiya wrote:
> > At present we have 3 different macros to calculate the size of a
> > member of a struct:
> >   - SIZEOF_FIELD
> >   - FIELD_SIZEOF
> >   - sizeof_field
> > 
> > To bring uniformity in entire kernel source tree let's add
> > sizeof_member macro.
> > 
> > Replace all occurrences of above 3 macro's with sizeof_member in
> > future patches.
> > 
> > Signed-off-by: Pankaj Bharadiya 
> > ---
> >  include/linux/kernel.h | 9 +
> >  1 file changed, 9 insertions(+)
> 
> Since stddef.h ends up needing this macro, and kernel.h includes
> stddef.h, why not put this macro in stddef.h instead? Then the
> open-coded version of it in stddef (your last patch) can use
> sizeof_member()?
> 

If I understood correctly, Andrew suggested to add such macros in kernel.h
https://www.openwall.com/lists/kernel-hardening/2019/06/11/5

Moreover similar type of other macros (like typeof_member & ARRAY_SIZE)
are defined in kernel.h
But as you pointed out, looks like stddef.h is the right place for this macro.

> Otherwise, yes, looks good. (Though I might re-order the patches so the
> last patch is the tree-wide swap -- then you don't need the exclusions,
> I think?)
>

I went through your tree. 
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=kspp/sizeof_member/full
Thank you for reordering the patches.

Thanks,
Pankaj

> -Kees
> 
> > 
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 4fa360a13c1e..0b80d8bb3978 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -79,6 +79,15 @@
> >   */
> >  #define round_down(x, y) ((x) & ~__round_mask(x, y))
> >  
> > +/**
> > + * sizeof_member - get the size of a struct's member
> > + * @T: the target struct
> > + * @m: the target struct's member
> > + * Return: the size of @m in the struct definition without having a
> > + * declared instance of @T.
> > + */
> > +#define sizeof_member(T, m) (sizeof(((T *)0)->m))
> > +
> >  /**
> >   * FIELD_SIZEOF - get the size of a struct's field
> >   * @t: the target struct
> > -- 
> > 2.17.1
> > 
> 
> -- 
> Kees Cook

[PATCH 0/2] Enable Goldfish RTC for RISC-V

2019-09-24 Thread Anup Patel

We will be using Goldfish RTC device real date-time on QEMU RISC-V virt
machine so this series:
1. Allows GOLDFISH kconfig option to be enabled for RISC-V
2. Enables GOLDFISH RTC driver in RISC-V defconfigs

This series can be found in goldfish_rtc_v1 branch at:
https//github.com/avpatel/linux.git

For the QEMU patches adding Goldfish RTC to virt machine refer:
https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg05465.html

Anup Patel (2):
  platform: goldfish: Allow goldfish virtual platform drivers for RISCV
  RISC-V: defconfig: Enable Goldfish RTC driver

 arch/riscv/configs/defconfig  | 3 +++
 arch/riscv/configs/rv32_defconfig | 3 +++
 drivers/platform/goldfish/Kconfig | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

--
2.17.1

[PATCH 2/2] RISC-V: defconfig: Enable Goldfish RTC driver

2019-09-24 Thread Anup Patel

We have Goldfish RTC device available on QEMU RISC-V virt machine
hence enable required driver in RV32 and RV64 defconfigs.

Signed-off-by: Anup Patel 
---
 arch/riscv/configs/defconfig  | 3 +++
 arch/riscv/configs/rv32_defconfig | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 3efff552a261..57b4f67b0c0b 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -73,7 +73,10 @@ CONFIG_USB_STORAGE=y
 CONFIG_USB_UAS=y
 CONFIG_MMC=y
 CONFIG_MMC_SPI=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_DRV_GOLDFISH=y
 CONFIG_VIRTIO_MMIO=y
+CONFIG_GOLDFISH=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
 CONFIG_AUTOFS4_FS=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 7da93e494445..50716c1395aa 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -69,7 +69,10 @@ CONFIG_USB_OHCI_HCD=y
 CONFIG_USB_OHCI_HCD_PLATFORM=y
 CONFIG_USB_STORAGE=y
 CONFIG_USB_UAS=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_DRV_GOLDFISH=y
 CONFIG_VIRTIO_MMIO=y
+CONFIG_GOLDFISH=y
 CONFIG_SIFIVE_PLIC=y
 CONFIG_EXT4_FS=y
 CONFIG_EXT4_FS_POSIX_ACL=y
-- 
2.17.1

[PATCH 1/2] platform: goldfish: Allow goldfish virtual platform drivers for RISCV

2019-09-24 Thread Anup Patel

We will be using some of the Goldfish virtual platform devices (such
as RTC) on QEMU RISC-V virt machine so this patch enables goldfish
kconfig option for RISC-V architecture.

Signed-off-by: Anup Patel 
---
 drivers/platform/goldfish/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/platform/goldfish/Kconfig 
b/drivers/platform/goldfish/Kconfig
index 77b35df3a801..0ba825030ffe 100644
--- a/drivers/platform/goldfish/Kconfig
+++ b/drivers/platform/goldfish/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 menuconfig GOLDFISH
bool "Platform support for Goldfish virtual devices"
-   depends on X86_32 || X86_64 || ARM || ARM64 || MIPS
+   depends on X86_32 || X86_64 || ARM || ARM64 || MIPS || RISCV
depends on HAS_IOMEM
help
  Say Y here to get to see options for the Goldfish virtual platform.
-- 
2.17.1

RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-addr' property

2019-09-24 Thread Biwen Li

> > >
> > > > > > > > >
> > > > > > > > > The 'fsl,ippdexpcr-alt-addr' property is used to handle
> > > > > > > > > an errata
> > > > > > > > > A-008646 on LS1021A
> > > > > > > > >
> > > > > > > > > Signed-off-by: Biwen Li 
> > > > > > > > > ---
> > > > > > > > > Change in v3:
> > > > > > > > >   - rename property name
> > > > > > > > > fsl,rcpm-scfg -> fsl,ippdexpcr-alt-addr
> > > > > > > > >
> > > > > > > > > Change in v2:
> > > > > > > > >   - update desc of the property 'fsl,rcpm-scfg'
> > > > > > > > >
> > > > > > > > >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 14
> > > > > > > > > ++
> > > > > > > > >  1 file changed, 14 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git
> > > > > > > > > a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > > > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > > > index 5a33619d881d..157dcf6da17c 100644
> > > > > > > > > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > > > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > > > @@ -34,6 +34,11 @@ Chassis VersionExample
> > > Chips
> > > > > > > > >  Optional properties:
> > > > > > > > >   - little-endian : RCPM register block is Little Endian.
> > > > > > > > > Without it
> > > > RCPM
> > > > > > > > > will be Big Endian (default case).
> > > > > > > > > + - fsl,ippdexpcr-alt-addr : Must add the property for
> > > > > > > > > + SoC LS1021A,
> > > > > > > >
> > > > > > > > You probably should mention this is related to a hardware
> > > > > > > > issue on LS1021a and only needed on LS1021a.
> > > > > > > Okay, got it, thanks, I will add this in v4.
> > > > > > > >
> > > > > > > > > +   Must include n + 1 entries (n =
> > > > > > > > > + #fsl,rcpm-wakeup-cells, such
> > as:
> > > > > > > > > +   #fsl,rcpm-wakeup-cells equal to 2, then must include
> > > > > > > > > + 2
> > > > > > > > > + +
> > > > > > > > > + 1
> > > > entries).
> > > > > > > >
> > > > > > > > #fsl,rcpm-wakeup-cells is the number of IPPDEXPCR
> > > > > > > > registers on an
> > > > SoC.
> > > > > > > > However you are defining an offset to scfg registers here.
> > > > > > > > Why these two are related?  The length here should
> > > > > > > > actually be related to the #address-cells of the soc/.
> > > > > > > > But since this is only needed for LS1021, you can
> > > > > > > just make it 3.
> > > > > > > I need set the value of IPPDEXPCR resgiters from ftm_alarm0
> > > > > > > device node(fsl,rcpm-wakeup = < 0x0 0x2000>;
> > > > > > > 0x0 is a value for IPPDEXPCR0, 0x2000 is a value for
> > > > IPPDEXPCR1).
> > > > > > > But because of the hardware issue on LS1021A, I need store
> > > > > > > the value of IPPDEXPCR registers to an alt address. So I
> > > > > > > defining an offset to scfg registers, then RCPM driver get
> > > > > > > an abosolute address from offset, RCPM driver write the
> > > > > > > value of IPPDEXPCR registers to these abosolute
> > > > > > > addresses(backup the value of IPPDEXPCR
> > > > registers).
> > > > > >
> > > > > > I understand what you are trying to do.  The problem is that
> > > > > > the new fsl,ippdexpcr-alt-addr property contains a phandle and an
> offset.
> > > > > > The size of it shouldn't be related to #fsl,rcpm-wakeup-cells.
> > > > > You maybe like this: fsl,ippdexpcr-alt-addr = < 0x51c>;/*
> > > > > SCFG_SPARECR8 */
> > > >
> > > > No.  The #address-cell for the soc/ is 2, so the offset to scfg
> > > > should be 0x0 0x51c.  The total size should be 3, but it shouldn't
> > > > be coming from #fsl,rcpm-wakeup-cells like you mentioned in the
> binding.
> > > Oh, I got it. You want that fsl,ippdexpcr-alt-add is relative with
> > > #address-cells instead of #fsl,rcpm-wakeup-cells.
> >
> > Yes.
> I got an example from drivers/pci/controller/dwc/pci-layerscape.c
> and arch/arm/boot/dts/ls1021a.dtsi as follows:
> fsl,pcie-scfg = < 0>, 0 is an index
> 
> In my fsl,ippdexpcr-alt-addr = < 0x0 0x51c>, It means that 0x0 is an alt
> offset address for IPPDEXPCR0, 0x51c is an alt offset address For
> IPPDEXPCR1 instead of 0x0 and 0x51c compose to an alt address of
> SCFG_SPARECR8.
Maybe I need write it as:
fsl,ippdexpcr-alt-addr = < 0x0 0x0 0x0 0x51c>;
first two 0x0 compose an alt offset address for IPPDEXPCR0,
last 0x0 and 0x51c compose an alt address for IPPDEXPCR1,

Best Regards,
Biwen Li 
> >
> > Regards,
> > Leo
> > > >
> > > > > >
> > > > > > > >
> > > > > > > > > +   The first entry must be a link to the SCFG device node.
> > > > > > > > > +   The non-first entry must be offset of registers of SCFG.
> > > > > > > > >
> > > > > > > > >  Example:
> > > > > > > > >  The RCPM node for T4240:
> > > > > > > > > @@ -43,6 +48,15 @@ The RCPM node for T4240:
> > > > > > > > >   #fsl,rcpm-wakeup-cells = <2>;
> > > > > > > > >   };
> > > > > > > > >
> > > > > > > > > +The RCPM node for LS1021A:
> > > > > > > > > + rcpm: rcpm@1ee2140 {
> > > > > > > > > +

Re: [PATCH v2] PCI: dwc: Add support to add GEN3 related equalization quirks

2019-09-24 Thread Vidya Sagar

On 9/24/2019 5:41 PM, Pankaj Dubey wrote:

-Original Message-
From: Vidya Sagar 
Sent: Tuesday, September 24, 2019 4:57 PM
To: Pankaj Dubey ; 'Gustavo Pimentel'
; 'Andrew Murray'

Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
jingooh...@gmail.com; lorenzo.pieral...@arm.com; bhelg...@google.com;
'Anvesh Salveru' 
Subject: Re: [PATCH v2] PCI: dwc: Add support to add GEN3 related equalization
quirks

On 9/24/2019 2:58 PM, Pankaj Dubey wrote:

-Original Message-
From: Vidya Sagar 
Sent: Thursday, September 19, 2019 4:54 PM
Subject: Re: [PATCH v2] PCI: dwc: Add support to add GEN3 related
equalization quirks

On 9/16/2019 6:22 PM, Gustavo Pimentel wrote:

On Mon, Sep 16, 2019 at 13:24:1, Andrew Murray

wrote:

On Mon, Sep 16, 2019 at 04:36:33PM +0530, Pankaj Dubey wrote:

-Original Message-
From: Andrew Murray 
Sent: Monday, September 16, 2019 3:46 PM
To: Pankaj Dubey 
Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
lorenzo.pieral...@arm.com; bhelg...@google.com; Anvesh Salveru

Subject: Re: [PATCH v2] PCI: dwc: Add support to add GEN3 related

equalization

quirks

On Fri, Sep 13, 2019 at 04:09:50PM +0530, Pankaj Dubey wrote:

From: Anvesh Salveru 

In some platforms, PCIe PHY may have issues which will prevent
linkup to happen in GEN3 or higher speed. In case equalization
fails, link will fallback to GEN1.

DesignWare controller gives flexibility to disable GEN3
equalization completely or only phase 2 and 3 of equalization.

This patch enables the DesignWare driver to disable the PCIe
GEN3 equalization by enabling one of the following quirks:
- DWC_EQUALIZATION_DISABLE: To disable GEN3 equalization all
phases

I don't think Gen-3 equalization can be skipped altogether.
PCIe Spec Rev 4.0 Ver 1.0 in Section-4.2.3 has the following statement.

"All the Lanes that are associated with the LTSSM (i.e., those Lanes
that are currently operational or may be operational in the future
due to Link
Upconfigure) must participate in the Equalization procedure"

and in Section-4.2.6.4.2.1.1 it says
"Note: A transition to Recovery.RcvrLock might be used in the case
where the Downstream Port determines that Phase 2 and Phase 3 are not
needed based on the platform and channel characteristics."

Based on the above statements, I think it is Ok to skip only Phases
2&3 of equalization but not 0&1.
I even checked with our hardware engineers and it seems
DWC_EQUALIZATION_DISABLE is present only for debugging purpose in
hardware simulations and shouldn't be used on real silicon otherwise it seems.

In DesignWare manual we don't see any comment that this feature is for

debugging purpose only.
Agree and as I mentioned even I got to know about it offline.

Even if it is meant for debugging purpose, if for some reason in an SoC, Gen3/4

linkup is failing due to equalization, and if disabling equalization is helping 
then
IMO it is OK to do it.
Well, I don't have specific reservations to not have it. We can use this as a 
fall
back option.

Just to re-confirm we tested one of the NVMe device on Jatson AGX Xavier RC

with equalization disabled. We do see linkup works well in GEN3. As we have
added this feature as a platform-quirk so only platforms that required this
feature can enable it.

Curious to know...You did it because link didn't come up with equalization
enabled? or just as an experiment?

We did this, just as an experiment.

Ok. Thanks for the clarification.

Reviewed-by: Vidya Sagar 

Snippet of lspci (from Jatson AGX Xavier RC) is given below, showing
EQ is completely disabled and GEN3 linkup
-
0005:01:00.0 Non-Volatile memory controller: Lite-On Technology

Corporation Device 21f1 (rev 01) (prog-if 02 [NVM Express])

  Subsystem: Marvell Technology Group Ltd. Device 1093

  LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency 
L0s

<512ns, L1 <64us

  ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
  LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
  ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
  LnkSta: Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ 
DLActive-

BWMgmt- ABWMgmt-

  DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+,

OBFF Via message

  DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,

OBFF Disabled

  LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
   Transmit Margin: Normal Operating Range,

EnterModifiedCompliance- ComplianceSOS-

   Compliance De-emphasis: -6dB
  LnkSta2: Current De-emphasis Level: -6dB, 
EqualizationComplete-,

EqualizationPhase1-

   EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
-

- Vidya Sagar

-

[PATCH] ASoC: Intel: Skylake: prevent memory leak in snd_skl_parse_uuids

2019-09-24 Thread Navid Emamdoost

In snd_skl_parse_uuids if allocation for module->instance_id fails, the
allocated memory for module shoulde be released.

Signed-off-by: Navid Emamdoost 
---
 sound/soc/intel/skylake/skl-sst-utils.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/intel/skylake/skl-sst-utils.c 
b/sound/soc/intel/skylake/skl-sst-utils.c
index d43cbf4a71ef..d4db64d72b2c 100644
--- a/sound/soc/intel/skylake/skl-sst-utils.c
+++ b/sound/soc/intel/skylake/skl-sst-utils.c
@@ -299,6 +299,7 @@ int snd_skl_parse_uuids(struct sst_dsp *ctx, const struct 
firmware *fw,
module->instance_id = devm_kzalloc(ctx->dev, size, GFP_KERNEL);
if (!module->instance_id) {
ret = -ENOMEM;
+   kfree(module);
goto free_uuid_list;
}
 
-- 
2.17.1

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-24 Thread Jason Wang




On 2019/9/24 上午12:31, Matt Cover wrote:

I think it's better to safe to just drop the packet instead of trying to
workaround it.


This patch aside, dropping the packet here
seems like the wrong choice. Loading a
prog at this hookpoint "configures"
steering. The action of configuring
steering should not result in dropped
packets.

Suboptimal delivery is generally preferable
to no delivery. Leaving the behavior as-is
(i.e. relying on netdev_cap_txqueue()) or
making any return which doesn't fit in a
u16 simply use queue 0 would be highly
preferable to dropping the packet.


Thanks



It leaves a choice for steering ebpf program to drop the packet that it 
can't classify. But consider we have already had socket filter, it 
probably not a big problem since we can drop packets there.


Thanks

[PATCH v4 2/5] Powerpc/Watchpoint: Don't ignore extraneous exceptions blindly

2019-09-24 Thread Ravi Bangoria

On Powerpc, watchpoint match range is double-word granular. On a
watchpoint hit, DAR is set to the first byte of overlap between
actual access and watched range. And thus it's quite possible that
DAR does not point inside user specified range. Ex, say user creates
a watchpoint with address range 0x1004 to 0x1007. So hw would be
configured to watch from 0x1000 to 0x1007. If there is a 4 byte
access from 0x1002 to 0x1005, DAR will point to 0x1002 and thus
interrupt handler considers it as extraneous, but it's actually not,
because part of the access belongs to what user has asked.

Instead of blindly ignoring the exception, get actual address range
by analysing an instruction, and ignore only if actual range does
not overlap with user specified range.

Note: The behaviour is unchanged for 8xx.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/kernel/hw_breakpoint.c | 52 +
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 5a2d8c306c40..c04a345e2cc2 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -179,33 +179,49 @@ void thread_change_pc(struct task_struct *tsk, struct 
pt_regs *regs)
tsk->thread.last_hit_ubp = NULL;
 }
 
-static bool is_larx_stcx_instr(struct pt_regs *regs, unsigned int instr)
+static bool dar_within_range(unsigned long dar, struct arch_hw_breakpoint 
*info)
 {
-   int ret, type;
-   struct instruction_op op;
+   return ((info->address <= dar) && (dar - info->address < info->len));
+}
 
-   ret = analyse_instr(, regs, instr);
-   type = GETTYPE(op.type);
-   return (!ret && (type == LARX || type == STCX));
+static bool
+dar_range_overlaps(unsigned long dar, int size, struct arch_hw_breakpoint 
*info)
+{
+   return ((dar <= info->address + info->len - 1) &&
+   (dar + size - 1 >= info->address));
 }
 
 /*
  * Handle debug exception notifications.
  */
 static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp,
-unsigned long addr)
+struct arch_hw_breakpoint *info)
 {
unsigned int instr = 0;
+   int ret, type, size;
+   struct instruction_op op;
+   unsigned long addr = info->address;
 
if (__get_user_inatomic(instr, (unsigned int *)regs->nip))
goto fail;
 
-   if (is_larx_stcx_instr(regs, instr)) {
+   ret = analyse_instr(, regs, instr);
+   type = GETTYPE(op.type);
+   size = GETSIZE(op.type);
+
+   if (!ret && (type == LARX || type == STCX)) {
printk_ratelimited("Breakpoint hit on instruction that can't be 
emulated."
   " Breakpoint at 0x%lx will be disabled.\n", 
addr);
goto disable;
}
 
+   /*
+* If it's extraneous event, we still need to emulate/single-
+* step the instruction, but we don't generate an event.
+*/
+   if (size && !dar_range_overlaps(regs->dar, size, info))
+   info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
+
/* Do not emulate user-space instructions, instead single-step them */
if (user_mode(regs)) {
current->thread.last_hit_ubp = bp;
@@ -237,7 +253,6 @@ int hw_breakpoint_handler(struct die_args *args)
struct perf_event *bp;
struct pt_regs *regs = args->regs;
struct arch_hw_breakpoint *info;
-   unsigned long dar = regs->dar;
 
/* Disable breakpoints during exception handling */
hw_breakpoint_disable();
@@ -269,19 +284,14 @@ int hw_breakpoint_handler(struct die_args *args)
goto out;
}
 
-   /*
-* Verify if dar lies within the address range occupied by the symbol
-* being watched to filter extraneous exceptions.  If it doesn't,
-* we still need to single-step the instruction, but we don't
-* generate an event.
-*/
info->type &= ~HW_BRK_TYPE_EXTRANEOUS_IRQ;
-   if (!((bp->attr.bp_addr <= dar) &&
- (dar - bp->attr.bp_addr < bp->attr.bp_len)))
-   info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
-
-   if (!IS_ENABLED(CONFIG_PPC_8xx) && !stepping_handler(regs, bp, 
info->address))
-   goto out;
+   if (IS_ENABLED(CONFIG_PPC_8xx)) {
+   if (!dar_within_range(regs->dar, info))
+   info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;
+   } else {
+   if (!stepping_handler(regs, bp, info))
+   goto out;
+   }
 
/*
 * As a policy, the callback is invoked in a 'trigger-after-execute'
-- 
2.21.0

[PATCH v4 4/5] Powerpc/Watchpoint: Add dar outside test in perf-hwbreak.c selftest

2019-09-24 Thread Ravi Bangoria

So far we used to ignore exception if dar points outside of user
specified range. But now we are ignoring it only if actual load/
store range does not overlap with user specified range. Include
selftests for the same:

  # ./tools/testing/selftests/powerpc/ptrace/perf-hwbreak
  ...
  TESTED: No overlap
  TESTED: Partial overlap
  TESTED: Partial overlap
  TESTED: No overlap
  TESTED: Full overlap
  success: perf_hwbreak

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/perf-hwbreak.c   | 111 +-
 1 file changed, 110 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
index 200337daec42..389c545675c6 100644
--- a/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/perf-hwbreak.c
@@ -148,6 +148,113 @@ static int runtestsingle(int readwriteflag, int 
exclude_user, int arraytest)
return 0;
 }
 
+static int runtest_dar_outside(void)
+{
+   volatile char target[8];
+   volatile __u16 temp16;
+   volatile __u64 temp64;
+   struct perf_event_attr attr;
+   int break_fd;
+   unsigned long long breaks;
+   int fail = 0;
+   size_t res;
+
+   /* setup counters */
+   memset(, 0, sizeof(attr));
+   attr.disabled = 1;
+   attr.type = PERF_TYPE_BREAKPOINT;
+   attr.exclude_kernel = 1;
+   attr.exclude_hv = 1;
+   attr.exclude_guest = 1;
+   attr.bp_type = HW_BREAKPOINT_RW;
+   /* watch middle half of target array */
+   attr.bp_addr = (__u64)(target + 2);
+   attr.bp_len = 4;
+   break_fd = sys_perf_event_open(, 0, -1, -1, 0);
+   if (break_fd < 0) {
+   perror("sys_perf_event_open");
+   exit(1);
+   }
+
+   /* Shouldn't hit. */
+   ioctl(break_fd, PERF_EVENT_IOC_RESET);
+   ioctl(break_fd, PERF_EVENT_IOC_ENABLE);
+   temp16 = *((__u16 *)target);
+   *((__u16 *)target) = temp16;
+   ioctl(break_fd, PERF_EVENT_IOC_DISABLE);
+   res = read(break_fd, , sizeof(unsigned long long));
+   assert(res == sizeof(unsigned long long));
+   if (breaks == 0) {
+   printf("TESTED: No overlap\n");
+   } else {
+   printf("FAILED: No overlap: %lld != 0\n", breaks);
+   fail = 1;
+   }
+
+   /* Hit */
+   ioctl(break_fd, PERF_EVENT_IOC_RESET);
+   ioctl(break_fd, PERF_EVENT_IOC_ENABLE);
+   temp16 = *((__u16 *)(target + 1));
+   *((__u16 *)(target + 1)) = temp16;
+   ioctl(break_fd, PERF_EVENT_IOC_DISABLE);
+   res = read(break_fd, , sizeof(unsigned long long));
+   assert(res == sizeof(unsigned long long));
+   if (breaks == 2) {
+   printf("TESTED: Partial overlap\n");
+   } else {
+   printf("FAILED: Partial overlap: %lld != 2\n", breaks);
+   fail = 1;
+   }
+
+   /* Hit */
+   ioctl(break_fd, PERF_EVENT_IOC_RESET);
+   ioctl(break_fd, PERF_EVENT_IOC_ENABLE);
+   temp16 = *((__u16 *)(target + 5));
+   *((__u16 *)(target + 5)) = temp16;
+   ioctl(break_fd, PERF_EVENT_IOC_DISABLE);
+   res = read(break_fd, , sizeof(unsigned long long));
+   assert(res == sizeof(unsigned long long));
+   if (breaks == 2) {
+   printf("TESTED: Partial overlap\n");
+   } else {
+   printf("FAILED: Partial overlap: %lld != 2\n", breaks);
+   fail = 1;
+   }
+
+   /* Shouldn't Hit */
+   ioctl(break_fd, PERF_EVENT_IOC_RESET);
+   ioctl(break_fd, PERF_EVENT_IOC_ENABLE);
+   temp16 = *((__u16 *)(target + 6));
+   *((__u16 *)(target + 6)) = temp16;
+   ioctl(break_fd, PERF_EVENT_IOC_DISABLE);
+   res = read(break_fd, , sizeof(unsigned long long));
+   assert(res == sizeof(unsigned long long));
+   if (breaks == 0) {
+   printf("TESTED: No overlap\n");
+   } else {
+   printf("FAILED: No overlap: %lld != 0\n", breaks);
+   fail = 1;
+   }
+
+   /* Hit */
+   ioctl(break_fd, PERF_EVENT_IOC_RESET);
+   ioctl(break_fd, PERF_EVENT_IOC_ENABLE);
+   temp64 = *((__u64 *)target);
+   *((__u64 *)target) = temp64;
+   ioctl(break_fd, PERF_EVENT_IOC_DISABLE);
+   res = read(break_fd, , sizeof(unsigned long long));
+   assert(res == sizeof(unsigned long long));
+   if (breaks == 2) {
+   printf("TESTED: Full overlap\n");
+   } else {
+   printf("FAILED: Full overlap: %lld != 2\n", breaks);
+   fail = 1;
+   }
+
+   close(break_fd);
+   return fail;
+}
+
 static int runtest(void)
 {
int rwflag;
@@ -172,7 +279,9 @@ static int runtest(void)
return ret;
}
}
-   return 0;
+
+   ret = runtest_dar_outside();
+   return ret;
 }
 
 
-- 
2.21.0

[PATCH v4 5/5] Powerpc/Watchpoint: Support for 8xx in ptrace-hwbreak.c selftest

2019-09-24 Thread Ravi Bangoria

On the 8xx, signals are generated after executing the instruction.
So no need to manually single-step on 8xx.

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 26 ++-
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
index 654131591fca..58505277346d 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
@@ -22,6 +22,11 @@
 #include 
 #include "ptrace.h"
 
+#define SPRN_PVR   0x11F
+#define PVR_8xx0x0050
+
+bool is_8xx;
+
 /*
  * Use volatile on all global var so that compiler doesn't
  * optimise their load/stores. Otherwise selftest can fail.
@@ -205,13 +210,15 @@ static void check_success(pid_t child_pid, const char 
*name, const char *type,
 
printf("%s, %s, len: %d: Ok\n", name, type, len);
 
-   /*
-* For ptrace registered watchpoint, signal is generated
-* before executing load/store. Singlestep the instruction
-* and then continue the test.
-*/
-   ptrace(PTRACE_SINGLESTEP, child_pid, NULL, 0);
-   wait(NULL);
+   if (!is_8xx) {
+   /*
+* For ptrace registered watchpoint, signal is generated
+* before executing load/store. Singlestep the instruction
+* and then continue the test.
+*/
+   ptrace(PTRACE_SINGLESTEP, child_pid, NULL, 0);
+   wait(NULL);
+   }
 }
 
 static void ptrace_set_debugreg(pid_t child_pid, unsigned long wp_addr)
@@ -489,5 +496,10 @@ static int ptrace_hwbreak(void)
 
 int main(int argc, char **argv, char **envp)
 {
+   int pvr = 0;
+   asm __volatile__ ("mfspr %0,%1" : "=r"(pvr) : "i"(SPRN_PVR));
+   if (pvr == PVR_8xx)
+   is_8xx = true;
+
return test_harness(ptrace_hwbreak, "ptrace-hwbreak");
 }
-- 
2.21.0

RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-addr' property

2019-09-24 Thread Biwen Li

> >
> > > > > > > >
> > > > > > > > The 'fsl,ippdexpcr-alt-addr' property is used to handle an
> > > > > > > > errata
> > > > > > > > A-008646 on LS1021A
> > > > > > > >
> > > > > > > > Signed-off-by: Biwen Li 
> > > > > > > > ---
> > > > > > > > Change in v3:
> > > > > > > > - rename property name
> > > > > > > >   fsl,rcpm-scfg -> fsl,ippdexpcr-alt-addr
> > > > > > > >
> > > > > > > > Change in v2:
> > > > > > > > - update desc of the property 'fsl,rcpm-scfg'
> > > > > > > >
> > > > > > > >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 14
> > > > > > > > ++
> > > > > > > >  1 file changed, 14 insertions(+)
> > > > > > > >
> > > > > > > > diff --git
> > > > > > > > a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > > index 5a33619d881d..157dcf6da17c 100644
> > > > > > > > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > > @@ -34,6 +34,11 @@ Chassis Version  Example
> > Chips
> > > > > > > >  Optional properties:
> > > > > > > >   - little-endian : RCPM register block is Little Endian.
> > > > > > > > Without it
> > > RCPM
> > > > > > > > will be Big Endian (default case).
> > > > > > > > + - fsl,ippdexpcr-alt-addr : Must add the property for SoC
> > > > > > > > + LS1021A,
> > > > > > >
> > > > > > > You probably should mention this is related to a hardware
> > > > > > > issue on LS1021a and only needed on LS1021a.
> > > > > > Okay, got it, thanks, I will add this in v4.
> > > > > > >
> > > > > > > > +   Must include n + 1 entries (n = #fsl,rcpm-wakeup-cells, such
> as:
> > > > > > > > +   #fsl,rcpm-wakeup-cells equal to 2, then must include 2
> > > > > > > > + +
> > > > > > > > + 1
> > > entries).
> > > > > > >
> > > > > > > #fsl,rcpm-wakeup-cells is the number of IPPDEXPCR registers
> > > > > > > on an
> > > SoC.
> > > > > > > However you are defining an offset to scfg registers here.
> > > > > > > Why these two are related?  The length here should actually
> > > > > > > be related to the #address-cells of the soc/.  But since
> > > > > > > this is only needed for LS1021, you can
> > > > > > just make it 3.
> > > > > > I need set the value of IPPDEXPCR resgiters from ftm_alarm0
> > > > > > device node(fsl,rcpm-wakeup = < 0x0 0x2000>;
> > > > > > 0x0 is a value for IPPDEXPCR0, 0x2000 is a value for
> > > IPPDEXPCR1).
> > > > > > But because of the hardware issue on LS1021A, I need store the
> > > > > > value of IPPDEXPCR registers to an alt address. So I defining
> > > > > > an offset to scfg registers, then RCPM driver get an abosolute
> > > > > > address from offset, RCPM driver write the value of IPPDEXPCR
> > > > > > registers to these abosolute addresses(backup the value of
> > > > > > IPPDEXPCR
> > > registers).
> > > > >
> > > > > I understand what you are trying to do.  The problem is that the
> > > > > new fsl,ippdexpcr-alt-addr property contains a phandle and an offset.
> > > > > The size of it shouldn't be related to #fsl,rcpm-wakeup-cells.
> > > > You maybe like this: fsl,ippdexpcr-alt-addr = < 0x51c>;/*
> > > > SCFG_SPARECR8 */
> > >
> > > No.  The #address-cell for the soc/ is 2, so the offset to scfg
> > > should be 0x0 0x51c.  The total size should be 3, but it shouldn't
> > > be coming from #fsl,rcpm-wakeup-cells like you mentioned in the binding.
> > Oh, I got it. You want that fsl,ippdexpcr-alt-add is relative with
> > #address-cells instead of #fsl,rcpm-wakeup-cells.
> 
> Yes.
I got an example from drivers/pci/controller/dwc/pci-layerscape.c
and arch/arm/boot/dts/ls1021a.dtsi as follows:
fsl,pcie-scfg = < 0>, 0 is an index

In my fsl,ippdexpcr-alt-addr = < 0x0 0x51c>,
It means that 0x0 is an alt offset address for IPPDEXPCR0, 0x51c is an alt 
offset address
For IPPDEXPCR1 instead of 0x0 and 0x51c compose to an alt address of 
SCFG_SPARECR8.
> 
> Regards,
> Leo
> > >
> > > > >
> > > > > > >
> > > > > > > > +   The first entry must be a link to the SCFG device node.
> > > > > > > > +   The non-first entry must be offset of registers of SCFG.
> > > > > > > >
> > > > > > > >  Example:
> > > > > > > >  The RCPM node for T4240:
> > > > > > > > @@ -43,6 +48,15 @@ The RCPM node for T4240:
> > > > > > > > #fsl,rcpm-wakeup-cells = <2>;
> > > > > > > > };
> > > > > > > >
> > > > > > > > +The RCPM node for LS1021A:
> > > > > > > > +   rcpm: rcpm@1ee2140 {
> > > > > > > > +   compatible = "fsl,ls1021a-rcpm", 
> > > > > > > > "fsl,qoriq-rcpm-
> > > > 2.1+";
> > > > > > > > +   reg = <0x0 0x1ee2140 0x0 0x8>;
> > > > > > > > +   #fsl,rcpm-wakeup-cells = <2>;
> > > > > > > > +   fsl,ippdexpcr-alt-addr = < 0x0 0x51c>; /*
> > > > > > > > SCFG_SPARECR8 */
> > > > > > > > +   };
> > > > > > > > +
> > > > > > > > +
> > > > > > > >  * Freescale RCPM Wakeup Source Device Tree Bindings
>

[PATCH v4 0/5] Powerpc/Watchpoint: Few important fixes

2019-09-24 Thread Ravi Bangoria

v3: https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-July/193339.html

v3->v4:
 - Instead of considering exception as extraneous when dar is outside of
   user specified range, analyse the instruction and check for overlap
   between user specified range and actual load/store range.
 - Add selftest for the same in perf-hwbreak.c
 - Make ptrace-hwbreak.c selftest more strict by checking address in
   check_success.
 - Support for 8xx in ptrace-hwbreak.c selftest (Build tested only)
 - Rebase to powerpc/next

@Christope, Can you please check Patch 5. I've just build-tested it
with ep88xc_defconfig.

Ravi Bangoria (5):
  Powerpc/Watchpoint: Fix length calculation for unaligned target
  Powerpc/Watchpoint: Don't ignore extraneous exceptions blindly
  Powerpc/Watchpoint: Rewrite ptrace-hwbreak.c selftest
  Powerpc/Watchpoint: Add dar outside test in perf-hwbreak.c selftest
  Powerpc/Watchpoint: Support for 8xx in ptrace-hwbreak.c selftest

 arch/powerpc/include/asm/debug.h  |   1 +
 arch/powerpc/include/asm/hw_breakpoint.h  |   9 +-
 arch/powerpc/kernel/dawr.c|   6 +-
 arch/powerpc/kernel/hw_breakpoint.c   |  76 ++-
 arch/powerpc/kernel/process.c |  46 ++
 arch/powerpc/kernel/ptrace.c  |  37 +-
 arch/powerpc/xmon/xmon.c  |   3 +-
 .../selftests/powerpc/ptrace/perf-hwbreak.c   | 111 +++-
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 579 +++---
 9 files changed, 595 insertions(+), 273 deletions(-)

-- 
2.21.0

[PATCH v4 3/5] Powerpc/Watchpoint: Rewrite ptrace-hwbreak.c selftest

2019-09-24 Thread Ravi Bangoria

ptrace-hwbreak.c selftest is logically broken. On powerpc, when
watchpoint is created with ptrace, signals are generated before
executing the instruction and user has to manually singlestep
the instruction with watchpoint disabled, which selftest never
does and thus it keeps on getting the signal at the same
instruction. If we fix it, selftest fails because the logical
connection between tracer(parent) and tracee(child) is also
broken. Rewrite the selftest and add new tests for unaligned
access.

With patch:
  $ ./tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak
  test: ptrace-hwbreak
  tags: git_version:powerpc-5.3-4-224-g218b868240c7-dirty
  PTRACE_SET_DEBUGREG, WO, len: 1: Ok
  PTRACE_SET_DEBUGREG, WO, len: 2: Ok
  PTRACE_SET_DEBUGREG, WO, len: 4: Ok
  PTRACE_SET_DEBUGREG, WO, len: 8: Ok
  PTRACE_SET_DEBUGREG, RO, len: 1: Ok
  PTRACE_SET_DEBUGREG, RO, len: 2: Ok
  PTRACE_SET_DEBUGREG, RO, len: 4: Ok
  PTRACE_SET_DEBUGREG, RO, len: 8: Ok
  PTRACE_SET_DEBUGREG, RW, len: 1: Ok
  PTRACE_SET_DEBUGREG, RW, len: 2: Ok
  PTRACE_SET_DEBUGREG, RW, len: 4: Ok
  PTRACE_SET_DEBUGREG, RW, len: 8: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, RO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, RW, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, WO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, RW, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, DAR OUTSIDE, RW, len: 6: Ok
  PPC_PTRACE_SETHWDEBUG, DAWR_MAX_LEN, RW, len: 512: Ok
  success: ptrace-hwbreak

Signed-off-by: Ravi Bangoria 
---
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 571 +++---
 1 file changed, 361 insertions(+), 210 deletions(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
index 3066d310f32b..654131591fca 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
@@ -22,318 +22,469 @@
 #include 
 #include "ptrace.h"
 
-/* Breakpoint access modes */
-enum {
-   BP_X = 1,
-   BP_RW = 2,
-   BP_W = 4,
-};
-
-static pid_t child_pid;
-static struct ppc_debug_info dbginfo;
-
-static void get_dbginfo(void)
-{
-   int ret;
-
-   ret = ptrace(PPC_PTRACE_GETHWDBGINFO, child_pid, NULL, );
-   if (ret) {
-   perror("Can't get breakpoint info\n");
-   exit(-1);
-   }
-}
-
-static bool hwbreak_present(void)
-{
-   return (dbginfo.num_data_bps != 0);
-}
+/*
+ * Use volatile on all global var so that compiler doesn't
+ * optimise their load/stores. Otherwise selftest can fail.
+ */
+static volatile __u64 glvar;
 
-static bool dawr_present(void)
-{
-   return !!(dbginfo.features & PPC_DEBUG_FEATURE_DATA_BP_DAWR);
-}
+#define DAWR_MAX_LEN 512
+static volatile __u8 big_var[DAWR_MAX_LEN] __attribute__((aligned(512)));
 
-static void set_breakpoint_addr(void *addr)
-{
-   int ret;
+#define A_LEN 6
+#define B_LEN 6
+struct gstruct {
+   __u8 a[A_LEN]; /* double word aligned */
+   __u8 b[B_LEN]; /* double word unaligned */
+};
+static volatile struct gstruct gstruct __attribute__((aligned(512)));
 
-   ret = ptrace(PTRACE_SET_DEBUGREG, child_pid, 0, addr);
-   if (ret) {
-   perror("Can't set breakpoint addr\n");
-   exit(-1);
-   }
-}
 
-static int set_hwbreakpoint_addr(void *addr, int range)
+static void get_dbginfo(pid_t child_pid, struct ppc_debug_info *dbginfo)
 {
-   int ret;
-
-   struct ppc_hw_breakpoint info;
-
-   info.version = 1;
-   info.trigger_type = PPC_BREAKPOINT_TRIGGER_RW;
-   info.addr_mode = PPC_BREAKPOINT_MODE_EXACT;
-   if (range > 0)
-   info.addr_mode = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE;
-   info.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
-   info.addr = (__u64)addr;
-   info.addr2 = (__u64)addr + range;
-   info.condition_value = 0;
-
-   ret = ptrace(PPC_PTRACE_SETHWDEBUG, child_pid, 0, );
-   if (ret < 0) {
-   perror("Can't set breakpoint\n");
+   if (ptrace(PPC_PTRACE_GETHWDBGINFO, child_pid, NULL, dbginfo)) {
+   perror("Can't get breakpoint info");
exit(-1);
}
-   return ret;
 }
 
-static int del_hwbreakpoint_addr(int watchpoint_handle)
+static bool dawr_present(struct ppc_debug_info *dbginfo)
 {
-   int ret;
-
-   ret = ptrace(PPC_PTRACE_DELHWDEBUG, child_pid, 0, watchpoint_handle);
-   if (ret < 0) {
-   perror("Can't delete hw breakpoint\n");
-   exit(-1);
-   }
-   return ret;
+   return !!(dbginfo->features & PPC_DEBUG_FEATURE_DATA_BP_DAWR);
 }

[PATCH v4 1/5] Powerpc/Watchpoint: Fix length calculation for unaligned target

2019-09-24 Thread Ravi Bangoria

Watchpoint match range is always doubleword(8 bytes) aligned on
powerpc. If the given range is crossing doubleword boundary, we
need to increase the length such that next doubleword also get
covered. Ex,

  address   len = 6 bytes
|=.
   |v--|--v|
   | | | | | | | | | | | | | | | | |
   |---|---|
<---8 bytes--->

In such case, current code configures hw as:
  start_addr = address & ~HW_BREAKPOINT_ALIGN
  len = 8 bytes

And thus read/write in last 4 bytes of the given range is ignored.
Fix this by including next doubleword in the length. Plus, fix
ptrace code which is messing up address/len.

Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/debug.h |  1 +
 arch/powerpc/include/asm/hw_breakpoint.h |  9 +++--
 arch/powerpc/kernel/dawr.c   |  6 ++--
 arch/powerpc/kernel/hw_breakpoint.c  | 24 +++--
 arch/powerpc/kernel/process.c| 46 
 arch/powerpc/kernel/ptrace.c | 37 ++-
 arch/powerpc/xmon/xmon.c |  3 +-
 7 files changed, 83 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index 7756026b95ca..9c1b4aaa374b 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) 
{ return 0; }
 static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
 #endif
 
+int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw);
 void __set_breakpoint(struct arch_hw_breakpoint *brk);
 bool ppc_breakpoint_available(void);
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 67e2da195eae..27ac6f5d2891 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -14,6 +14,7 @@ struct arch_hw_breakpoint {
unsigned long   address;
u16 type;
u16 len; /* length of the target data symbol */
+   u16 hw_len; /* length programmed in hw */
 };
 
 /* Note: Don't change the the first 6 bits below as they are in the same order
@@ -33,6 +34,11 @@ struct arch_hw_breakpoint {
 #define HW_BRK_TYPE_PRIV_ALL   (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \
 HW_BRK_TYPE_HYP)
 
+#define HW_BREAKPOINT_ALIGN 0x7
+
+#define DABR_MAX_LEN   8
+#define DAWR_MAX_LEN   512
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #include 
 #include 
@@ -44,8 +50,6 @@ struct pmu;
 struct perf_sample_data;
 struct task_struct;
 
-#define HW_BREAKPOINT_ALIGN 0x7
-
 extern int hw_breakpoint_slots(int type);
 extern int arch_bp_generic_fields(int type, int *gen_bp_type);
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
@@ -70,6 +74,7 @@ static inline void hw_breakpoint_disable(void)
brk.address = 0;
brk.type = 0;
brk.len = 0;
+   brk.hw_len = 0;
if (ppc_breakpoint_available())
__set_breakpoint();
 }
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
index 5f66b95b6858..8531623aa9b2 100644
--- a/arch/powerpc/kernel/dawr.c
+++ b/arch/powerpc/kernel/dawr.c
@@ -30,10 +30,10 @@ int set_dawr(struct arch_hw_breakpoint *brk)
 * DAWR length is stored in field MDR bits 48:53.  Matches range in
 * doublewords (64 bits) baised by -1 eg. 0b00=1DW and
 * 0b11=64DW.
-* brk->len is in bytes.
+* brk->hw_len is in bytes.
 * This aligns up to double word size, shifts and does the bias.
 */
-   mrd = ((brk->len + 7) >> 3) - 1;
+   mrd = ((brk->hw_len + 7) >> 3) - 1;
dawrx |= (mrd & 0x3f) << (63 - 53);
 
if (ppc_md.set_dawr)
@@ -54,7 +54,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
const char __user *user_buf,
size_t count, loff_t *ppos)
 {
-   struct arch_hw_breakpoint null_brk = {0, 0, 0};
+   struct arch_hw_breakpoint null_brk = {0, 0, 0, 0};
size_t rc;
 
/* Send error to user if they hypervisor won't allow us to write DAWR */
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 1007ec36b4cb..5a2d8c306c40 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -133,9 +133,9 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
 const struct perf_event_attr *attr,
 struct arch_hw_breakpoint *hw)
 {
-   int ret = -EINVAL, length_max;
+   int ret = -EINVAL;
 
-   if (!bp)
+   if (!bp || !attr->bp_len)
return ret;
 
hw->type = HW_BRK_TYPE_TRANSLATE;
@@ -155,26 +155,10 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
hw->address =

Re: [PATCH] vhost: It's better to use size_t for the 3rd parameter of vhost_exceeds_weight()

2019-09-24 Thread Jason Wang




On 2019/9/23 下午5:12, wangxu (AE) wrote:

Hi Michael

Thanks for your fast reply.

As the following code, the 2nd branch of iov_iter_advance() does not check if 
i->count < size, when this happens, i->count -= size may cause len exceed 
INT_MAX, and then total_len exceed INT_MAX.

handle_tx_copy() ->
get_tx_bufs(..., , ...) ->
init_iov_iter() ->
iov_iter_advance(iter, ...) // has 3 
branches:
pipe_advance()  // has checked the size: 
if (unlikely(i->count < size)) size = i->count;
iov_iter_is_discard() ...   // no 
check.



Yes, but I don't think we use ITER_DISCARD.

Thanks



iterate_and_advance()   //has checked: if 
(unlikely(i->count < n)) n = i->count;
return iov_iter_count(iter);

-Original Message-
From: Michael S. Tsirkin [mailto:m...@redhat.com]
Sent: Monday, September 23, 2019 4:07 PM
To: wangxu (AE) 
Cc: jasow...@redhat.com; k...@vger.kernel.org; 
virtualizat...@lists.linux-foundation.org; net...@vger.kernel.org; 
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] vhost: It's better to use size_t for the 3rd parameter of 
vhost_exceeds_weight()

On Mon, Sep 23, 2019 at 03:46:41PM +0800, wangxu wrote:

From: Wang Xu 

Caller of vhost_exceeds_weight(..., total_len) in drivers/vhost/net.c
usually pass size_t total_len, which may be affected by rx/tx package.

Signed-off-by: Wang Xu 


Puts a bit more pressure on the register file ...
why do we care? Is there some way that it can exceed INT_MAX?


---
  drivers/vhost/vhost.c | 4 ++--
  drivers/vhost/vhost.h | 7 ---
  2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index
36ca2cf..159223a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -412,7 +412,7 @@ static void vhost_dev_free_iovecs(struct vhost_dev
*dev)  }
  
  bool vhost_exceeds_weight(struct vhost_virtqueue *vq,

- int pkts, int total_len)
+ int pkts, size_t total_len)
  {
struct vhost_dev *dev = vq->dev;
  
@@ -454,7 +454,7 @@ static size_t vhost_get_desc_size(struct

vhost_virtqueue *vq,
  
  void vhost_dev_init(struct vhost_dev *dev,

struct vhost_virtqueue **vqs, int nvqs,
-   int iov_limit, int weight, int byte_weight)
+   int iov_limit, int weight, size_t byte_weight)
  {
struct vhost_virtqueue *vq;
int i;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index
e9ed272..8d80389d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -172,12 +172,13 @@ struct vhost_dev {
wait_queue_head_t wait;
int iov_limit;
int weight;
-   int byte_weight;
+   size_t byte_weight;
  };
  


This just costs extra memory, and value is never large, so I don't think this 
matters.


-bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int
total_len);
+bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts,
+ size_t total_len);
  void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs,
-   int nvqs, int iov_limit, int weight, int byte_weight);
+   int nvqs, int iov_limit, int weight, size_t byte_weight);
  long vhost_dev_set_owner(struct vhost_dev *dev);  bool
vhost_dev_has_owner(struct vhost_dev *dev);  long
vhost_dev_check_owner(struct vhost_dev *);
--
1.8.5.6

Re: [PATCH] soc: qcom: socinfo: add missing soc_id sysfs entry

2019-09-24 Thread Jeffrey Hugo

On Mon, Sep 16, 2019 at 3:44 PM Stephen Boyd  wrote:
>
> Quoting Srinivas Kandagatla (2019-09-12 02:10:19)
> > looks like SoC ID is not exported to sysfs for some reason.
> > This patch adds it!
> >
> > This is mostly used by userspace libraries like SNPE.
>
> What is SNPE?

Snapdragon Neural Processing Engine.  Pronounced "snap-e".  Its
basically the framework someone goes through to run a neural network
on a Qualcomm mobile SoC.  SNPE can utilize various hardware resources
such as the applications CPU, GPU, and dedicated compute resources
such as a NSP, if available.  Its been around for over a year, and
much more information can be found by just doing a simple search since
SNPE is pretty much a unique search term currently.

Re: [PATCH] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory()

2019-09-24 Thread Anshuman Khandual




On 09/25/2019 08:43 AM, Andrew Morton wrote:
> On Mon, 23 Sep 2019 11:16:38 +0530 Anshuman Khandual 
>  wrote:
> 
>>
>>
>> On 09/16/2019 11:17 AM, Anshuman Khandual wrote:
>>> In add_memory_resource() the memory range to be hot added first gets into
>>> the memblock via memblock_add() before arch_add_memory() is called on it.
>>> Reverse sequence should be followed during memory hot removal which already
>>> is being followed in add_memory_resource() error path. This now ensures
>>> required re-order between memblock_[free|remove]() and arch_remove_memory()
>>> during memory hot-remove.
>>>
>>> Cc: Andrew Morton 
>>> Cc: Oscar Salvador 
>>> Cc: Michal Hocko 
>>> Cc: David Hildenbrand 
>>> Cc: Pavel Tatashin 
>>> Cc: Dan Williams 
>>> Signed-off-by: Anshuman Khandual 
>>> ---
>>> Original patch https://lkml.org/lkml/2019/9/3/327
>>>
>>> Memory hot remove now works on arm64 without this because a recent commit
>>> 60bb462fc7ad ("drivers/base/node.c: simplify 
>>> unregister_memory_block_under_nodes()").
>>>
>>> David mentioned that re-ordering should still make sense for consistency
>>> purpose (removing stuff in the reverse order they were added). This patch
>>> is now detached from arm64 hot-remove series.
>>>
>>> https://lkml.org/lkml/2019/9/3/326
>>
>> ...
>>
>> Hello Andrew,
>>
>> Any feedbacks on this, does it look okay ?
>>
> 
> Well.  I'd parked this for 5.4-rc1 processing because it looked like a
> cleanup.

This does not fix a serious problem. It just removes an inconsistency while
freeing resources during memory hot remove which for now does not pose a
real problem.

> 
> But wy down below the ^---$ line I see "Memory hot remove now works
> on arm64".  Am I correct in believing that 60bb462fc7ad broke arm64 mem
> hot remove?  And that this patch fixes a serious regression?  If so,

No. [Proposed] arm64 memory hot remove series does not anymore depend on
this particular patch because 60bb462fc7ad has already solved the problem.

> that should have been right there in the patch title and changelog!

V2 (https://patchwork.kernel.org/patch/11159939/) for this patch makes it
very clear in it's commit message.

- Anshuman

Re: [PATCH] virtio_mmio: remove redundant dev_err message

2019-09-24 Thread Jason Wang




On 2019/9/24 下午3:21, Ding Xiang wrote:

platform_get_irq already contains error message,



Is this message contained in all possible error path? If not, it's 
probably better to keep it as is.


Thanks



so remove
the redundant dev_err message

Signed-off-by: Ding Xiang 
---
  drivers/virtio/virtio_mmio.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index e09edb5..c4b9f25 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -466,10 +466,8 @@ static int vm_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
int irq = platform_get_irq(vm_dev->pdev, 0);
int i, err, queue_idx = 0;
  
-	if (irq < 0) {

-   dev_err(>dev, "Cannot get IRQ resource\n");
+   if (irq < 0)
return irq;
-   }
  
  	err = request_irq(irq, vm_interrupt, IRQF_SHARED,

dev_name(>dev), vm_dev);

RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-addr' property

2019-09-24 Thread Leo Li




> -Original Message-
> From: Biwen Li
> Sent: Tuesday, September 24, 2019 10:47 PM
> To: Leo Li ; shawn...@kernel.org;
> robh...@kernel.org; mark.rutl...@arm.com; Ran Wang
> 
> Cc: linuxppc-...@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; devicet...@vger.kernel.org
> Subject: RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-
> addr' property
> 
> > > > > > >
> > > > > > > The 'fsl,ippdexpcr-alt-addr' property is used to handle an
> > > > > > > errata
> > > > > > > A-008646 on LS1021A
> > > > > > >
> > > > > > > Signed-off-by: Biwen Li 
> > > > > > > ---
> > > > > > > Change in v3:
> > > > > > >   - rename property name
> > > > > > > fsl,rcpm-scfg -> fsl,ippdexpcr-alt-addr
> > > > > > >
> > > > > > > Change in v2:
> > > > > > >   - update desc of the property 'fsl,rcpm-scfg'
> > > > > > >
> > > > > > >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 14
> > > > > > > ++
> > > > > > >  1 file changed, 14 insertions(+)
> > > > > > >
> > > > > > > diff --git
> > > > > > > a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > index 5a33619d881d..157dcf6da17c 100644
> > > > > > > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > > @@ -34,6 +34,11 @@ Chassis VersionExample
> Chips
> > > > > > >  Optional properties:
> > > > > > >   - little-endian : RCPM register block is Little Endian.
> > > > > > > Without it
> > RCPM
> > > > > > > will be Big Endian (default case).
> > > > > > > + - fsl,ippdexpcr-alt-addr : Must add the property for SoC
> > > > > > > + LS1021A,
> > > > > >
> > > > > > You probably should mention this is related to a hardware
> > > > > > issue on LS1021a and only needed on LS1021a.
> > > > > Okay, got it, thanks, I will add this in v4.
> > > > > >
> > > > > > > +   Must include n + 1 entries (n = #fsl,rcpm-wakeup-cells, such 
> > > > > > > as:
> > > > > > > +   #fsl,rcpm-wakeup-cells equal to 2, then must include 2 +
> > > > > > > + 1
> > entries).
> > > > > >
> > > > > > #fsl,rcpm-wakeup-cells is the number of IPPDEXPCR registers on
> > > > > > an
> > SoC.
> > > > > > However you are defining an offset to scfg registers here.
> > > > > > Why these two are related?  The length here should actually be
> > > > > > related to the #address-cells of the soc/.  But since this is
> > > > > > only needed for LS1021, you can
> > > > > just make it 3.
> > > > > I need set the value of IPPDEXPCR resgiters from ftm_alarm0
> > > > > device node(fsl,rcpm-wakeup = < 0x0 0x2000>;
> > > > > 0x0 is a value for IPPDEXPCR0, 0x2000 is a value for
> > IPPDEXPCR1).
> > > > > But because of the hardware issue on LS1021A, I need store the
> > > > > value of IPPDEXPCR registers to an alt address. So I defining an
> > > > > offset to scfg registers, then RCPM driver get an abosolute
> > > > > address from offset, RCPM driver write the value of IPPDEXPCR
> > > > > registers to these abosolute addresses(backup the value of
> > > > > IPPDEXPCR
> > registers).
> > > >
> > > > I understand what you are trying to do.  The problem is that the
> > > > new fsl,ippdexpcr-alt-addr property contains a phandle and an offset.
> > > > The size of it shouldn't be related to #fsl,rcpm-wakeup-cells.
> > > You maybe like this: fsl,ippdexpcr-alt-addr = < 0x51c>;/*
> > > SCFG_SPARECR8 */
> >
> > No.  The #address-cell for the soc/ is 2, so the offset to scfg should
> > be 0x0 0x51c.  The total size should be 3, but it shouldn't be coming
> > from #fsl,rcpm-wakeup-cells like you mentioned in the binding.
> Oh, I got it. You want that fsl,ippdexpcr-alt-add is relative with 
> #address-cells
> instead of #fsl,rcpm-wakeup-cells.

Yes.

Regards,
Leo
> >
> > > >
> > > > > >
> > > > > > > +   The first entry must be a link to the SCFG device node.
> > > > > > > +   The non-first entry must be offset of registers of SCFG.
> > > > > > >
> > > > > > >  Example:
> > > > > > >  The RCPM node for T4240:
> > > > > > > @@ -43,6 +48,15 @@ The RCPM node for T4240:
> > > > > > >   #fsl,rcpm-wakeup-cells = <2>;
> > > > > > >   };
> > > > > > >
> > > > > > > +The RCPM node for LS1021A:
> > > > > > > + rcpm: rcpm@1ee2140 {
> > > > > > > + compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-
> > > 2.1+";
> > > > > > > + reg = <0x0 0x1ee2140 0x0 0x8>;
> > > > > > > + #fsl,rcpm-wakeup-cells = <2>;
> > > > > > > + fsl,ippdexpcr-alt-addr = < 0x0 0x51c>; /*
> > > > > > > SCFG_SPARECR8 */
> > > > > > > + };
> > > > > > > +
> > > > > > > +
> > > > > > >  * Freescale RCPM Wakeup Source Device Tree Bindings
> > > > > > >  ---
> > > > > > >  Required fsl,rcpm-wakeup property should be added to a
> > > > > > > device node if the device
> > > > > > > --
> > > > > > > 2.17.1

RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-addr' property

2019-09-24 Thread Biwen Li

> > > > > >
> > > > > > The 'fsl,ippdexpcr-alt-addr' property is used to handle an
> > > > > > errata
> > > > > > A-008646 on LS1021A
> > > > > >
> > > > > > Signed-off-by: Biwen Li 
> > > > > > ---
> > > > > > Change in v3:
> > > > > > - rename property name
> > > > > >   fsl,rcpm-scfg -> fsl,ippdexpcr-alt-addr
> > > > > >
> > > > > > Change in v2:
> > > > > > - update desc of the property 'fsl,rcpm-scfg'
> > > > > >
> > > > > >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 14
> > > > > > ++
> > > > > >  1 file changed, 14 insertions(+)
> > > > > >
> > > > > > diff --git
> > > > > > a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > index 5a33619d881d..157dcf6da17c 100644
> > > > > > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > > @@ -34,6 +34,11 @@ Chassis Version  Example Chips
> > > > > >  Optional properties:
> > > > > >   - little-endian : RCPM register block is Little Endian. Without it
> RCPM
> > > > > > will be Big Endian (default case).
> > > > > > + - fsl,ippdexpcr-alt-addr : Must add the property for SoC
> > > > > > + LS1021A,
> > > > >
> > > > > You probably should mention this is related to a hardware issue
> > > > > on LS1021a and only needed on LS1021a.
> > > > Okay, got it, thanks, I will add this in v4.
> > > > >
> > > > > > +   Must include n + 1 entries (n = #fsl,rcpm-wakeup-cells, such as:
> > > > > > +   #fsl,rcpm-wakeup-cells equal to 2, then must include 2 + 1
> entries).
> > > > >
> > > > > #fsl,rcpm-wakeup-cells is the number of IPPDEXPCR registers on an
> SoC.
> > > > > However you are defining an offset to scfg registers here.  Why
> > > > > these two are related?  The length here should actually be
> > > > > related to the #address-cells of the soc/.  But since this is
> > > > > only needed for LS1021, you can
> > > > just make it 3.
> > > > I need set the value of IPPDEXPCR resgiters from ftm_alarm0 device
> > > > node(fsl,rcpm-wakeup = < 0x0 0x2000>;
> > > > 0x0 is a value for IPPDEXPCR0, 0x2000 is a value for
> IPPDEXPCR1).
> > > > But because of the hardware issue on LS1021A, I need store the
> > > > value of IPPDEXPCR registers to an alt address. So I defining an
> > > > offset to scfg registers, then RCPM driver get an abosolute
> > > > address from offset, RCPM driver write the value of IPPDEXPCR
> > > > registers to these abosolute addresses(backup the value of IPPDEXPCR
> registers).
> > >
> > > I understand what you are trying to do.  The problem is that the new
> > > fsl,ippdexpcr-alt-addr property contains a phandle and an offset.
> > > The size of it shouldn't be related to #fsl,rcpm-wakeup-cells.
> > You maybe like this: fsl,ippdexpcr-alt-addr = < 0x51c>;/*
> > SCFG_SPARECR8 */
> 
> No.  The #address-cell for the soc/ is 2, so the offset to scfg should be 0x0
> 0x51c.  The total size should be 3, but it shouldn't be coming from
> #fsl,rcpm-wakeup-cells like you mentioned in the binding.
Oh, I got it. You want that fsl,ippdexpcr-alt-add is relative with 
#address-cells instead of #fsl,rcpm-wakeup-cells.
> 
> > >
> > > > >
> > > > > > +   The first entry must be a link to the SCFG device node.
> > > > > > +   The non-first entry must be offset of registers of SCFG.
> > > > > >
> > > > > >  Example:
> > > > > >  The RCPM node for T4240:
> > > > > > @@ -43,6 +48,15 @@ The RCPM node for T4240:
> > > > > > #fsl,rcpm-wakeup-cells = <2>;
> > > > > > };
> > > > > >
> > > > > > +The RCPM node for LS1021A:
> > > > > > +   rcpm: rcpm@1ee2140 {
> > > > > > +   compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-
> > 2.1+";
> > > > > > +   reg = <0x0 0x1ee2140 0x0 0x8>;
> > > > > > +   #fsl,rcpm-wakeup-cells = <2>;
> > > > > > +   fsl,ippdexpcr-alt-addr = < 0x0 0x51c>; /*
> > > > > > SCFG_SPARECR8 */
> > > > > > +   };
> > > > > > +
> > > > > > +
> > > > > >  * Freescale RCPM Wakeup Source Device Tree Bindings
> > > > > >  ---
> > > > > >  Required fsl,rcpm-wakeup property should be added to a device
> > > > > > node if the device
> > > > > > --
> > > > > > 2.17.1

Re: [PATCH v4 05/10] mm: Return faster for non-fatal signals in user mode faults

2019-09-24 Thread Peter Xu

On Tue, Sep 24, 2019 at 08:45:18AM -0700, Matthew Wilcox wrote:
> On Tue, Sep 24, 2019 at 11:19:08AM +0800, Peter Xu wrote:
> > On Mon, Sep 23, 2019 at 07:54:47PM -0700, Matthew Wilcox wrote:
> > > On Tue, Sep 24, 2019 at 10:47:21AM +0800, Peter Xu wrote:
> > > > On Mon, Sep 23, 2019 at 11:03:49AM -0700, Linus Torvalds wrote:
> > > > > On Sun, Sep 22, 2019 at 9:26 PM Peter Xu  wrote:
> > > > > >
> > > > > > This patch is a preparation of removing that special path by 
> > > > > > allowing
> > > > > > the page fault to return even faster if we were interrupted by a
> > > > > > non-fatal signal during a user-mode page fault handling routine.
> > > > > 
> > > > > So I really wish saome other vm person would also review these things,
> > > > > but looking over this series once more, this is the patch I probably
> > > > > like the least.
> > > > > 
> > > > > And the reason I like it the least is that I have a hard time
> > > > > explaining to myself what the code does and why, and why it's so full
> > > > > of this pattern:
> > > > > 
> > > > > > -   if ((fault & VM_FAULT_RETRY) && 
> > > > > > fatal_signal_pending(current))
> > > > > > +   if ((fault & VM_FAULT_RETRY) &&
> > > > > > +   fault_should_check_signal(user_mode(regs)))
> > > > > > return;
> > > > > 
> > > > > which isn't all that pretty.
> > > > > 
> > > > > Why isn't this just
> > > > > 
> > > > >   static bool fault_signal_pending(unsigned int fault_flags, struct
> > > > > pt_regs *regs)
> > > > >   {
> > > > > return (fault_flags & VM_FAULT_RETRY) &&
> > > > > (fatal_signal_pending(current) ||
> > > > >  (user_mode(regs) && signal_pending(current)));
> > > > >   }
> > > > > 
> > > > > and then most of the users would be something like
> > > > > 
> > > > > if (fault_signal_pending(fault, regs))
> > > > > return;
> > > > > 
> > > > > and the exceptions could do their own thing.
> > > > > 
> > > > > Now the code is prettier and more understandable, I feel.
> > > > > 
> > > > > And if something doesn't follow this pattern, maybe it either _should_
> > > > > follow that pattern or it should just not use the helper but explain
> > > > > why it has an unusual pattern.
> > > 
> > > > +++ b/arch/alpha/mm/fault.c
> > > > @@ -150,7 +150,7 @@ do_page_fault(unsigned long address, unsigned long 
> > > > mmcsr,
> > > >the fault.  */
> > > > fault = handle_mm_fault(vma, address, flags);
> > > >  
> > > > -   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
> > > > +   if (fault_signal_pending(fault, regs))
> > > > return;
> > > >  
> > > > if (unlikely(fault & VM_FAULT_ERROR)) {
> > > 
> > > > +++ b/arch/arm/mm/fault.c
> > > > @@ -301,6 +301,11 @@ do_page_fault(unsigned long addr, unsigned int 
> > > > fsr, struct pt_regs *regs)
> > > > return 0;
> > > > }
> > > >  
> > > > +   /* Fast path to handle user mode signals */
> > > > +   if ((fault & VM_FAULT_RETRY) && user_mode(regs) &&
> > > > +   signal_pending(current))
> > > > +   return 0;
> > > 
> > > But _why_ are they different?  This is a good opportunity to make more
> > > code the same between architectures.
> > 
> > (Thanks for joining the discussion)
> > 
> > I'd like to do these - my only worry is that I can't really test them
> > well simply because I don't have all the hardwares.  For now the
> > changes are mostly straightforward so I'm relatively confident (not to
> > mention the code needs proper reviews too, and of course I would
> > appreciate much if anyone wants to smoke test it).  If I change it in
> > a drastic way, I won't be that confident without some tests at least
> > on multiple archs (not to mention that even smoke testing across major
> > archs will be a huge amount of work...).  So IMHO those might be more
> > suitable as follow-up for per-arch developers if we can at least reach
> > a consensus on the whole idea of this patchset.
> 
> I think the way to do this is to introduce fault_signal_pending(),
> converting the architectures to it that match that pattern.  Then one
> patch per architecture to convert the ones which use a different pattern
> to the same pattern.

Fair enough.  I can start with a fault_signal_pending() only keeps the
sigkill handling just like before, then convert all the archs, with
the last patch to only touch fault_signal_pending() for non-fatal
signals.

> 
> Oh, and while you're looking at the callers of handle_mm_fault(), a
> lot of them don't check conditions in the right order.  x86, at least,
> handles FAULT_RETRY before handling FAULT_ERROR, which is clearly wrong.
> 
> Kirill and I recently discussed it here:
> https://lore.kernel.org/linux-mm/20190911152338.gqqgxrmqycodfocb@box/T/

Hmm sure.  These sound very reasonable.

I must admit that I am not brave enough to continue grow my patchset
on my own.  The condition I'm facing right now is

Re: [PATCH V3 0/2] mm/debug: Add tests for architecture exported page table helpers

2019-09-24 Thread Anshuman Khandual

On 09/24/2019 06:01 PM, Mike Rapoport wrote:
> On Tue, Sep 24, 2019 at 02:51:01PM +0300, Kirill A. Shutemov wrote:
>> On Fri, Sep 20, 2019 at 12:03:21PM +0530, Anshuman Khandual wrote:
>>> This series adds a test validation for architecture exported page table
>>> helpers. Patch in the series adds basic transformation tests at various
>>> levels of the page table. Before that it exports gigantic page allocation
>>> function from HugeTLB.
>>>
>>> This test was originally suggested by Catalin during arm64 THP migration
>>> RFC discussion earlier. Going forward it can include more specific tests
>>> with respect to various generic MM functions like THP, HugeTLB etc and
>>> platform specific tests.
>>>
>>> https://lore.kernel.org/linux-mm/20190628102003.ga56...@arrakis.emea.arm.com/
>>>
>>> Testing:
>>>
>>> Successfully build and boot tested on both arm64 and x86 platforms without
>>> any test failing. Only build tested on some other platforms. Build failed
>>> on some platforms (known) in pud_clear_tests() as there were no available
>>> __pgd() definitions.
>>>
>>> - ARM32
>>> - IA64
>>
>> Hm. Grep shows __pgd() definitions for both of them. Is it for specific
>> config?
>  
> For ARM32 it's defined only for 3-lelel page tables, i.e with LPAE on.
> For IA64 it's defined for !STRICT_MM_TYPECHECKS which is even not a config
> option, but a define in arch/ia64/include/asm/page.h

Right. So now where we go from here ! We will need help from platform folks to
fix this unless its trivial. I did propose this on last thread (v2), wondering 
if
it will be a better idea to restrict DEBUG_ARCH_PGTABLE_TEST among architectures
which have fixed all pending issues whether build or run time. Though enabling 
all
platforms where the test builds at the least might make more sense, we might 
have
to just exclude arm32 and ia64 for now. Then run time problems can be fixed 
later
platform by platform. Any thoughts ?

BTW the test is known to run successfully on arm64, x86, ppc32 platforms. Gerald
has been trying to get it working on s390. in the meantime., if there are other
volunteers to test this on ppc64, sparc, riscv, mips, m68k etc platforms, it 
will
be really helpful.

- Anshuman

RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-addr' property

2019-09-24 Thread Leo Li




> -Original Message-
> From: Biwen Li
> Sent: Tuesday, September 24, 2019 10:30 PM
> To: Leo Li ; shawn...@kernel.org;
> robh...@kernel.org; mark.rutl...@arm.com; Ran Wang
> 
> Cc: linuxppc-...@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; devicet...@vger.kernel.org
> Subject: RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-
> addr' property
> 
> > > > >
> > > > > The 'fsl,ippdexpcr-alt-addr' property is used to handle an
> > > > > errata
> > > > > A-008646 on LS1021A
> > > > >
> > > > > Signed-off-by: Biwen Li 
> > > > > ---
> > > > > Change in v3:
> > > > >   - rename property name
> > > > > fsl,rcpm-scfg -> fsl,ippdexpcr-alt-addr
> > > > >
> > > > > Change in v2:
> > > > >   - update desc of the property 'fsl,rcpm-scfg'
> > > > >
> > > > >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 14
> > > > > ++
> > > > >  1 file changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > index 5a33619d881d..157dcf6da17c 100644
> > > > > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > > @@ -34,6 +34,11 @@ Chassis VersionExample Chips
> > > > >  Optional properties:
> > > > >   - little-endian : RCPM register block is Little Endian. Without it 
> > > > > RCPM
> > > > > will be Big Endian (default case).
> > > > > + - fsl,ippdexpcr-alt-addr : Must add the property for SoC
> > > > > + LS1021A,
> > > >
> > > > You probably should mention this is related to a hardware issue on
> > > > LS1021a and only needed on LS1021a.
> > > Okay, got it, thanks, I will add this in v4.
> > > >
> > > > > +   Must include n + 1 entries (n = #fsl,rcpm-wakeup-cells, such as:
> > > > > +   #fsl,rcpm-wakeup-cells equal to 2, then must include 2 + 1 
> > > > > entries).
> > > >
> > > > #fsl,rcpm-wakeup-cells is the number of IPPDEXPCR registers on an SoC.
> > > > However you are defining an offset to scfg registers here.  Why
> > > > these two are related?  The length here should actually be related
> > > > to the #address-cells of the soc/.  But since this is only needed
> > > > for LS1021, you can
> > > just make it 3.
> > > I need set the value of IPPDEXPCR resgiters from ftm_alarm0 device
> > > node(fsl,rcpm-wakeup = < 0x0 0x2000>;
> > > 0x0 is a value for IPPDEXPCR0, 0x2000 is a value for IPPDEXPCR1).
> > > But because of the hardware issue on LS1021A, I need store the value
> > > of IPPDEXPCR registers to an alt address. So I defining an offset to
> > > scfg registers, then RCPM driver get an abosolute address from
> > > offset, RCPM driver write the value of IPPDEXPCR registers to these
> > > abosolute addresses(backup the value of IPPDEXPCR registers).
> >
> > I understand what you are trying to do.  The problem is that the new
> > fsl,ippdexpcr-alt-addr property contains a phandle and an offset.  The
> > size of it shouldn't be related to #fsl,rcpm-wakeup-cells.
> You maybe like this: fsl,ippdexpcr-alt-addr = < 0x51c>;/*
> SCFG_SPARECR8 */

No.  The #address-cell for the soc/ is 2, so the offset to scfg should be 0x0 
0x51c.  The total size should be 3, but it shouldn't be coming from 
#fsl,rcpm-wakeup-cells like you mentioned in the binding.

> >
> > > >
> > > > > +   The first entry must be a link to the SCFG device node.
> > > > > +   The non-first entry must be offset of registers of SCFG.
> > > > >
> > > > >  Example:
> > > > >  The RCPM node for T4240:
> > > > > @@ -43,6 +48,15 @@ The RCPM node for T4240:
> > > > >   #fsl,rcpm-wakeup-cells = <2>;
> > > > >   };
> > > > >
> > > > > +The RCPM node for LS1021A:
> > > > > + rcpm: rcpm@1ee2140 {
> > > > > + compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-
> 2.1+";
> > > > > + reg = <0x0 0x1ee2140 0x0 0x8>;
> > > > > + #fsl,rcpm-wakeup-cells = <2>;
> > > > > + fsl,ippdexpcr-alt-addr = < 0x0 0x51c>; /*
> > > > > SCFG_SPARECR8 */
> > > > > + };
> > > > > +
> > > > > +
> > > > >  * Freescale RCPM Wakeup Source Device Tree Bindings
> > > > >  ---
> > > > >  Required fsl,rcpm-wakeup property should be added to a device
> > > > > node if the device
> > > > > --
> > > > > 2.17.1

Re: [PATCH V4 4/4] ASoC: fsl_asrc: Fix error with S24_3LE format bitstream in i.MX8

2019-09-24 Thread S.j. Wang

Hi

> On Tue, Sep 24, 2019 at 06:52:35PM +0800, Shengjiu Wang wrote:
> > There is error "aplay: pcm_write:2023: write error: Input/output error"
> > on i.MX8QM/i.MX8QXP platform for S24_3LE format.
> >
> > In i.MX8QM/i.MX8QXP, the DMA is EDMA, which don't support 24bit
> > sample, but we didn't add any constraint, that cause issues.
> >
> > So we need to query the caps of dma, then update the hw parameters
> > according to the caps.
> >
> > Signed-off-by: Shengjiu Wang 
> > ---
> >  sound/soc/fsl/fsl_asrc.c |  4 +--
> >  sound/soc/fsl/fsl_asrc.h |  3 ++
> >  sound/soc/fsl/fsl_asrc_dma.c | 59
> > +++-
> >  3 files changed, 56 insertions(+), 10 deletions(-)
> >
> > @@ -270,12 +268,17 @@ static int fsl_asrc_dma_hw_free(struct
> > snd_pcm_substream *substream)
> >
> >  static int fsl_asrc_dma_startup(struct snd_pcm_substream *substream)
> > {
> > + bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
> >   struct snd_soc_pcm_runtime *rtd = substream->private_data;
> >   struct snd_pcm_runtime *runtime = substream->runtime;
> >   struct snd_soc_component *component =
> snd_soc_rtdcom_lookup(rtd,
> > DRV_NAME);
> > + struct snd_dmaengine_dai_dma_data *dma_data;
> >   struct device *dev = component->dev;
> >   struct fsl_asrc *asrc_priv = dev_get_drvdata(dev);
> >   struct fsl_asrc_pair *pair;
> > + struct dma_chan *tmp_chan = NULL;
> > + u8 dir = tx ? OUT : IN;
> > + int ret = 0;
> >
> >   pair = kzalloc(sizeof(struct fsl_asrc_pair), GFP_KERNEL);
> 
> Sorry, I didn't catch it previously. We would need to release this memory
> also for all error-out paths, as the code doesn't have any error-out routine,
> prior to applying this change.
> 
> >   if (!pair)
> > @@ -285,11 +288,51 @@ static int fsl_asrc_dma_startup(struct
> > snd_pcm_substream *substream)
> 
> > + /* Request a dummy pair, which will be released later.
> > +  * Request pair function needs channel num as input, for this
> > +  * dummy pair, we just request "1" channel temporary.
> > +  */
> 
> "temporary" => "temporarily"
> 
> > + ret = fsl_asrc_request_pair(1, pair);
> > + if (ret < 0) {
> > + dev_err(dev, "failed to request asrc pair\n");
> > + return ret;
> > + }
> > +
> > + /* Request a dummy dma channel, which will be release later. */
> 
> "release" => "released"

Ok, will update them.

Best regards
Wang shengjiu

RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-addr' property

2019-09-24 Thread Biwen Li

> > > >
> > > > The 'fsl,ippdexpcr-alt-addr' property is used to handle an errata
> > > > A-008646 on LS1021A
> > > >
> > > > Signed-off-by: Biwen Li 
> > > > ---
> > > > Change in v3:
> > > > - rename property name
> > > >   fsl,rcpm-scfg -> fsl,ippdexpcr-alt-addr
> > > >
> > > > Change in v2:
> > > > - update desc of the property 'fsl,rcpm-scfg'
> > > >
> > > >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 14
> > > > ++
> > > >  1 file changed, 14 insertions(+)
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > index 5a33619d881d..157dcf6da17c 100644
> > > > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > > @@ -34,6 +34,11 @@ Chassis Version  Example Chips
> > > >  Optional properties:
> > > >   - little-endian : RCPM register block is Little Endian. Without it 
> > > > RCPM
> > > > will be Big Endian (default case).
> > > > + - fsl,ippdexpcr-alt-addr : Must add the property for SoC
> > > > + LS1021A,
> > >
> > > You probably should mention this is related to a hardware issue on
> > > LS1021a and only needed on LS1021a.
> > Okay, got it, thanks, I will add this in v4.
> > >
> > > > +   Must include n + 1 entries (n = #fsl,rcpm-wakeup-cells, such as:
> > > > +   #fsl,rcpm-wakeup-cells equal to 2, then must include 2 + 1 entries).
> > >
> > > #fsl,rcpm-wakeup-cells is the number of IPPDEXPCR registers on an SoC.
> > > However you are defining an offset to scfg registers here.  Why
> > > these two are related?  The length here should actually be related
> > > to the #address-cells of the soc/.  But since this is only needed
> > > for LS1021, you can
> > just make it 3.
> > I need set the value of IPPDEXPCR resgiters from ftm_alarm0 device
> > node(fsl,rcpm-wakeup = < 0x0 0x2000>;
> > 0x0 is a value for IPPDEXPCR0, 0x2000 is a value for IPPDEXPCR1).
> > But because of the hardware issue on LS1021A, I need store the value
> > of IPPDEXPCR registers to an alt address. So I defining an offset to
> > scfg registers, then RCPM driver get an abosolute address from offset,
> > RCPM driver write the value of IPPDEXPCR registers to these abosolute
> > addresses(backup the value of IPPDEXPCR registers).
> 
> I understand what you are trying to do.  The problem is that the new
> fsl,ippdexpcr-alt-addr property contains a phandle and an offset.  The size
> of it shouldn't be related to #fsl,rcpm-wakeup-cells.
You maybe like this: fsl,ippdexpcr-alt-addr = < 0x51c>;/* SCFG_SPARECR8 */
> 
> > >
> > > > +   The first entry must be a link to the SCFG device node.
> > > > +   The non-first entry must be offset of registers of SCFG.
> > > >
> > > >  Example:
> > > >  The RCPM node for T4240:
> > > > @@ -43,6 +48,15 @@ The RCPM node for T4240:
> > > > #fsl,rcpm-wakeup-cells = <2>;
> > > > };
> > > >
> > > > +The RCPM node for LS1021A:
> > > > +   rcpm: rcpm@1ee2140 {
> > > > +   compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-2.1+";
> > > > +   reg = <0x0 0x1ee2140 0x0 0x8>;
> > > > +   #fsl,rcpm-wakeup-cells = <2>;
> > > > +   fsl,ippdexpcr-alt-addr = < 0x0 0x51c>; /*
> > > > SCFG_SPARECR8 */
> > > > +   };
> > > > +
> > > > +
> > > >  * Freescale RCPM Wakeup Source Device Tree Bindings
> > > >  ---
> > > >  Required fsl,rcpm-wakeup property should be added to a device
> > > > node if the device
> > > > --
> > > > 2.17.1

Re: [PATCH xfstests v2] overlay: Enable character device to be the base fs partition

2019-09-24 Thread Zhihao Cheng

There are indeed many '-b' options in xfstests. I only confirmed the line of 
overlay test. Other -b test options I need to reconfirm later.

在 2019/9/25 11:17, Darrick J. Wong 写道:
> On Tue, Sep 24, 2019 at 08:05:50PM -0700, Darrick J. Wong wrote:
>> On Wed, Sep 25, 2019 at 09:54:08AM +0800, Zhihao Cheng wrote:
>>> There is a message in _supported_fs():
>>> _notrun "not suitable for this filesystem type: $FSTYP"
>>> for when overlay usecases are executed on a chararcter device based base
>>
>> You can do that?
>>
>> What does that even look like?
> 
> OH, ubifs.  Ok.
> 
> /me wonders if there are more places in xfstests with test -b that needs
> fixing...
> 
> --D
> 
>> --D
>>
>>> fs. _overay_config_override() detects that the current base fs partition
>>> is not a block device, and FSTYP won't be overwritten as 'overlay' before
>>> executing usecases which results in all overlay usecases become 'notrun'.
>>> In addition, all generic usecases are based on base fs rather than overlay.
>>>
>>> We want to rewrite FSTYP to 'overlay' before running the usecases. To do
>>> this, we need to add additional character device judgments for TEST_DEV
>>> and SCRATCH_DEV in _overay_config_override().
>>>
>>> Signed-off-by: Zhihao Cheng 
>>> ---
>>>  common/config | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/common/config b/common/config
>>> index 4c86a49..a22acdb 100644
>>> --- a/common/config
>>> +++ b/common/config
>>> @@ -550,7 +550,7 @@ _overlay_config_override()
>>> #the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
>>> #of the configured base fs and SCRATCH/TEST_DEV vars are set to the
>>> #overlayfs base and mount dirs inside base fs mount.
>>> -   [ -b "$TEST_DEV" ] || return 0
>>> +   [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0
>>>  
>>> # Config file may specify base fs type, but we obay -overlay flag
>>> [ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
>>> @@ -570,7 +570,7 @@ _overlay_config_override()
>>> export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
>>> export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
>>>  
>>> -   [ -b "$SCRATCH_DEV" ] || return 0
>>> +   [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0
>>>  
>>> # Store original base fs vars
>>> export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
>>> -- 
>>> 2.7.4
>>>
> 
> .
>

RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-addr' property

2019-09-24 Thread Leo Li




> -Original Message-
> From: Biwen Li
> Sent: Tuesday, September 24, 2019 10:13 PM
> To: Leo Li ; shawn...@kernel.org;
> robh...@kernel.org; mark.rutl...@arm.com; Ran Wang
> 
> Cc: linuxppc-...@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; devicet...@vger.kernel.org
> Subject: RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-
> addr' property
> 
> > >
> > > The 'fsl,ippdexpcr-alt-addr' property is used to handle an errata
> > > A-008646 on LS1021A
> > >
> > > Signed-off-by: Biwen Li 
> > > ---
> > > Change in v3:
> > >   - rename property name
> > > fsl,rcpm-scfg -> fsl,ippdexpcr-alt-addr
> > >
> > > Change in v2:
> > >   - update desc of the property 'fsl,rcpm-scfg'
> > >
> > >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 14
> > > ++
> > >  1 file changed, 14 insertions(+)
> > >
> > > diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > index 5a33619d881d..157dcf6da17c 100644
> > > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > > @@ -34,6 +34,11 @@ Chassis VersionExample Chips
> > >  Optional properties:
> > >   - little-endian : RCPM register block is Little Endian. Without it RCPM
> > > will be Big Endian (default case).
> > > + - fsl,ippdexpcr-alt-addr : Must add the property for SoC LS1021A,
> >
> > You probably should mention this is related to a hardware issue on
> > LS1021a and only needed on LS1021a.
> Okay, got it, thanks, I will add this in v4.
> >
> > > +   Must include n + 1 entries (n = #fsl,rcpm-wakeup-cells, such as:
> > > +   #fsl,rcpm-wakeup-cells equal to 2, then must include 2 + 1 entries).
> >
> > #fsl,rcpm-wakeup-cells is the number of IPPDEXPCR registers on an SoC.
> > However you are defining an offset to scfg registers here.  Why these
> > two are related?  The length here should actually be related to the
> > #address-cells of the soc/.  But since this is only needed for LS1021, you 
> > can
> just make it 3.
> I need set the value of IPPDEXPCR resgiters from ftm_alarm0 device
> node(fsl,rcpm-wakeup = < 0x0 0x2000>;
> 0x0 is a value for IPPDEXPCR0, 0x2000 is a value for IPPDEXPCR1).
> But because of the hardware issue on LS1021A, I need store the value of
> IPPDEXPCR registers to an alt address. So I defining an offset to scfg 
> registers,
> then RCPM driver get an abosolute address from offset,  RCPM driver write
> the value of IPPDEXPCR registers to these abosolute addresses(backup the
> value of IPPDEXPCR registers).

I understand what you are trying to do.  The problem is that the new 
fsl,ippdexpcr-alt-addr property contains a phandle and an offset.  The size of 
it shouldn't be related to #fsl,rcpm-wakeup-cells.

> >
> > > +   The first entry must be a link to the SCFG device node.
> > > +   The non-first entry must be offset of registers of SCFG.
> > >
> > >  Example:
> > >  The RCPM node for T4240:
> > > @@ -43,6 +48,15 @@ The RCPM node for T4240:
> > >   #fsl,rcpm-wakeup-cells = <2>;
> > >   };
> > >
> > > +The RCPM node for LS1021A:
> > > + rcpm: rcpm@1ee2140 {
> > > + compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-2.1+";
> > > + reg = <0x0 0x1ee2140 0x0 0x8>;
> > > + #fsl,rcpm-wakeup-cells = <2>;
> > > + fsl,ippdexpcr-alt-addr = < 0x0 0x51c>; /*
> > > SCFG_SPARECR8 */
> > > + };
> > > +
> > > +
> > >  * Freescale RCPM Wakeup Source Device Tree Bindings
> > >  ---
> > >  Required fsl,rcpm-wakeup property should be added to a device node
> > > if the device
> > > --
> > > 2.17.1

[PATCH] net/mlx5: prevent memory leak in mlx5_fpga_conn_create_cq

2019-09-24 Thread Navid Emamdoost

In mlx5_fpga_conn_create_cq if mlx5_vector2eqn fails the allocated
memory should be released.

Signed-off-by: Navid Emamdoost 
---
 drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
index 4c50efe4e7f1..61021133029e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
@@ -464,8 +464,10 @@ static int mlx5_fpga_conn_create_cq(struct mlx5_fpga_conn 
*conn, int cq_size)
}
 
err = mlx5_vector2eqn(mdev, smp_processor_id(), , );
-   if (err)
+   if (err) {
+   kvfree(in);
goto err_cqwq;
+   }
 
cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
MLX5_SET(cqc, cqc, log_cq_size, ilog2(cq_size));
-- 
2.17.1

Re: [PATCH xfstests] overlay: Enable character device to be the base fs partition

2019-09-24 Thread Zhihao Cheng

Oh, You are right, I understood it wrong. Thanks for reminding.

在 2019/9/25 11:15, Eryu Guan 写道:
> On Tue, Sep 24, 2019 at 10:19:38PM +0800, Zhihao Cheng wrote:
>> As far as I know, _require_scratch_shutdown() is called after 
>> _overay_config_override(), at this moment, FSTYP equals to base fs. 
>> According the implementation of _require_scratch_shutdown:
>> 3090 _require_scratch_shutdown()
>> 3091 {
>> 3092 [ -x src/godown ] || _notrun "src/godown executable not found"
>> 3093
>> 3094 _scratch_mkfs > /dev/null 2>&1 || _notrun "_scratch_mkfs failed on 
>> $SCRATCH_DEV"
>> 3095 _scratch_mount
>> 3096
>> 3097 if [ $FSTYP = "overlay" ]; then 
>># FSTYP = base fs
>> 3098 if [ -z $OVL_BASE_SCRATCH_DEV ]; then
>> 3099 # In lagacy overlay usage, it may specify directory as
>> 3100 # SCRATCH_DEV, in this case OVL_BASE_SCRATCH_DEV
>> 3101 # will be null, so check OVL_BASE_SCRATCH_DEV before
>> 3102 # running shutdown to avoid shutting down base fs 
>> accidently.
>> 3103 _notrun "$SCRATCH_DEV is not a block device"
>> 3104 else
>> 3105 src/godown -f $OVL_BASE_SCRATCH_MNT 2>&1 \
>> 3106 || _notrun "Underlying filesystem does not support shutdown"
>> 3107 fi
>> 3108 else
>> 3109 src/godown -f $SCRATCH_MNT 2>&1 \
>> 3110 || _notrun "$FSTYP does not support shutdown"   
>># Executes this path
>> 3111 fi
>> 3112
>> 3113 _scratch_unmount
>> 3114 }
>> So, we can't get output: _notrun "$SCRATCH_DEV is not a block device". 
>> Instead, the verbose should like:
>>   after _overlay_config_override FSTYP=ubifs# Additional print message
>>   FSTYP -- ubifs
>>   PLATFORM  -- Linux/x86_64
>>   MKFS_OPTIONS  -- /dev/ubi0_1
>>   MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch
>>
>>   generic/042[not run] ubifs does not support shutdown
>>
>> But I'll consider describing error more concisely in v2.
>>
>> 在 2019/9/24 20:33, Amir Goldstein 写道:
>>> On Tue, Sep 24, 2019 at 12:34 PM Zhihao Cheng  
>>> wrote:

 When running overlay tests using character devices as base fs partitions,
 all overlay usecase results become 'notrun'. Function
 '_overay_config_override' (common/config) detects that the current base
 fs partition is not a block device and will set FSTYP to base fs. The
 overlay usecase will check the current FSTYP, and if it is not 'overlay'
 or 'generic', it will skip the execution.

 For example, using UBIFS as base fs skips all overlay usecases:

   FSTYP -- ubifs   # FSTYP should be overridden as 'overlay'
   MKFS_OPTIONS  -- /dev/ubi0_1 # Character device
   MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch

   overlay/001   [not run] not suitable for this filesystem type: ubifs
   overlay/002   [not run] not suitable for this filesystem type: ubifs
   overlay/003   [not run] not suitable for this filesystem type: ubifs
   ...

 When checking that the base fs partition is a block/character device,
 FSTYP is overwritten as 'overlay'. This patch allows the base fs
 partition to be a character device that can also execute overlay
 usecases (such as ubifs).

 Signed-off-by: Zhihao Cheng 
 ---
  common/config | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/common/config b/common/config
 index 4c86a49..a22acdb 100644
 --- a/common/config
 +++ b/common/config
 @@ -550,7 +550,7 @@ _overlay_config_override()
 #the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the 
 values
 #of the configured base fs and SCRATCH/TEST_DEV vars are set 
 to the
 #overlayfs base and mount dirs inside base fs mount.
 -   [ -b "$TEST_DEV" ] || return 0
 +   [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0

 # Config file may specify base fs type, but we obay -overlay flag
 [ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
 @@ -570,7 +570,7 @@ _overlay_config_override()
 export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
 export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"

 -   [ -b "$SCRATCH_DEV" ] || return 0
 +   [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0

 # Store original base fs vars
 export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
 --
 2.7.4

>>>
>>> Looks fine.
>>>
>>> One nit: there is a message in _require_scratch_shutdown():
>>> _notrun "$SCRATCH_DEV is not a block device"
>>> for when $OVL_BASE_SCRATCH_DEV is not defined.
>>>
>>> Could probably use a better describing error anyway.
> 
> I think what Amir suggested is that, as you add char device support to
> overlay base device, the message in _require_scratch_shutdown()

Re: [PATCH xfstests v2] overlay: Enable character device to be the base fs partition

2019-09-24 Thread Darrick J. Wong

On Tue, Sep 24, 2019 at 08:05:50PM -0700, Darrick J. Wong wrote:
> On Wed, Sep 25, 2019 at 09:54:08AM +0800, Zhihao Cheng wrote:
> > There is a message in _supported_fs():
> > _notrun "not suitable for this filesystem type: $FSTYP"
> > for when overlay usecases are executed on a chararcter device based base
> 
> You can do that?
> 
> What does that even look like?

OH, ubifs.  Ok.

/me wonders if there are more places in xfstests with test -b that needs
fixing...

--D

> --D
> 
> > fs. _overay_config_override() detects that the current base fs partition
> > is not a block device, and FSTYP won't be overwritten as 'overlay' before
> > executing usecases which results in all overlay usecases become 'notrun'.
> > In addition, all generic usecases are based on base fs rather than overlay.
> > 
> > We want to rewrite FSTYP to 'overlay' before running the usecases. To do
> > this, we need to add additional character device judgments for TEST_DEV
> > and SCRATCH_DEV in _overay_config_override().
> > 
> > Signed-off-by: Zhihao Cheng 
> > ---
> >  common/config | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/common/config b/common/config
> > index 4c86a49..a22acdb 100644
> > --- a/common/config
> > +++ b/common/config
> > @@ -550,7 +550,7 @@ _overlay_config_override()
> > #the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
> > #of the configured base fs and SCRATCH/TEST_DEV vars are set to the
> > #overlayfs base and mount dirs inside base fs mount.
> > -   [ -b "$TEST_DEV" ] || return 0
> > +   [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0
> >  
> > # Config file may specify base fs type, but we obay -overlay flag
> > [ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
> > @@ -570,7 +570,7 @@ _overlay_config_override()
> > export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
> > export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
> >  
> > -   [ -b "$SCRATCH_DEV" ] || return 0
> > +   [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0
> >  
> > # Store original base fs vars
> > export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
> > -- 
> > 2.7.4
> >

Re: [PATCH xfstests] overlay: Enable character device to be the base fs partition

2019-09-24 Thread Eryu Guan

On Tue, Sep 24, 2019 at 10:19:38PM +0800, Zhihao Cheng wrote:
> As far as I know, _require_scratch_shutdown() is called after 
> _overay_config_override(), at this moment, FSTYP equals to base fs. According 
> the implementation of _require_scratch_shutdown:
> 3090 _require_scratch_shutdown()
> 3091 {
> 3092 [ -x src/godown ] || _notrun "src/godown executable not found"
> 3093
> 3094 _scratch_mkfs > /dev/null 2>&1 || _notrun "_scratch_mkfs failed on 
> $SCRATCH_DEV"
> 3095 _scratch_mount
> 3096
> 3097 if [ $FSTYP = "overlay" ]; then  
>   # FSTYP = base fs
> 3098 if [ -z $OVL_BASE_SCRATCH_DEV ]; then
> 3099 # In lagacy overlay usage, it may specify directory as
> 3100 # SCRATCH_DEV, in this case OVL_BASE_SCRATCH_DEV
> 3101 # will be null, so check OVL_BASE_SCRATCH_DEV before
> 3102 # running shutdown to avoid shutting down base fs accidently.
> 3103 _notrun "$SCRATCH_DEV is not a block device"
> 3104 else
> 3105 src/godown -f $OVL_BASE_SCRATCH_MNT 2>&1 \
> 3106 || _notrun "Underlying filesystem does not support shutdown"
> 3107 fi
> 3108 else
> 3109 src/godown -f $SCRATCH_MNT 2>&1 \
> 3110 || _notrun "$FSTYP does not support shutdown"
>   # Executes this path
> 3111 fi
> 3112
> 3113 _scratch_unmount
> 3114 }
> So, we can't get output: _notrun "$SCRATCH_DEV is not a block device". 
> Instead, the verbose should like:
>   after _overlay_config_override FSTYP=ubifs# Additional print message
>   FSTYP -- ubifs
>   PLATFORM  -- Linux/x86_64
>   MKFS_OPTIONS  -- /dev/ubi0_1
>   MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch
> 
>   generic/042 [not run] ubifs does not support shutdown
> 
> But I'll consider describing error more concisely in v2.
> 
> 在 2019/9/24 20:33, Amir Goldstein 写道:
> > On Tue, Sep 24, 2019 at 12:34 PM Zhihao Cheng  
> > wrote:
> >>
> >> When running overlay tests using character devices as base fs partitions,
> >> all overlay usecase results become 'notrun'. Function
> >> '_overay_config_override' (common/config) detects that the current base
> >> fs partition is not a block device and will set FSTYP to base fs. The
> >> overlay usecase will check the current FSTYP, and if it is not 'overlay'
> >> or 'generic', it will skip the execution.
> >>
> >> For example, using UBIFS as base fs skips all overlay usecases:
> >>
> >>   FSTYP -- ubifs   # FSTYP should be overridden as 'overlay'
> >>   MKFS_OPTIONS  -- /dev/ubi0_1 # Character device
> >>   MOUNT_OPTIONS -- -t ubifs /dev/ubi0_1 /tmp/scratch
> >>
> >>   overlay/001   [not run] not suitable for this filesystem type: ubifs
> >>   overlay/002   [not run] not suitable for this filesystem type: ubifs
> >>   overlay/003   [not run] not suitable for this filesystem type: ubifs
> >>   ...
> >>
> >> When checking that the base fs partition is a block/character device,
> >> FSTYP is overwritten as 'overlay'. This patch allows the base fs
> >> partition to be a character device that can also execute overlay
> >> usecases (such as ubifs).
> >>
> >> Signed-off-by: Zhihao Cheng 
> >> ---
> >>  common/config | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/common/config b/common/config
> >> index 4c86a49..a22acdb 100644
> >> --- a/common/config
> >> +++ b/common/config
> >> @@ -550,7 +550,7 @@ _overlay_config_override()
> >> #the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the 
> >> values
> >> #of the configured base fs and SCRATCH/TEST_DEV vars are set 
> >> to the
> >> #overlayfs base and mount dirs inside base fs mount.
> >> -   [ -b "$TEST_DEV" ] || return 0
> >> +   [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0
> >>
> >> # Config file may specify base fs type, but we obay -overlay flag
> >> [ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
> >> @@ -570,7 +570,7 @@ _overlay_config_override()
> >> export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
> >> export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
> >>
> >> -   [ -b "$SCRATCH_DEV" ] || return 0
> >> +   [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0
> >>
> >> # Store original base fs vars
> >> export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
> >> --
> >> 2.7.4
> >>
> > 
> > Looks fine.
> > 
> > One nit: there is a message in _require_scratch_shutdown():
> > _notrun "$SCRATCH_DEV is not a block device"
> > for when $OVL_BASE_SCRATCH_DEV is not defined.
> > 
> > Could probably use a better describing error anyway.

I think what Amir suggested is that, as you add char device support to
overlay base device, the message in _require_scratch_shutdown() should
be updated accordingly, not the commit log.

Thanks,
Eryu

Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"

2019-09-24 Thread Wanpeng Li

On Wed, 11 Sep 2019 at 21:04, Paolo Bonzini  wrote:
>
> On 11/09/19 06:25, Waiman Long wrote:
> > On 9/10/19 6:56 AM, Wanpeng Li wrote:
> >> On Mon, 9 Sep 2019 at 18:56, Waiman Long  wrote:
> >>> On 9/9/19 2:40 AM, Wanpeng Li wrote:
>  From: Wanpeng Li 
> 
>  This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait 
>  if
>  vCPU is preempted), we found great regression caused by this commit.
> 
>  Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 
>  vCPUs.
>  The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800
>  records/s with this commit.
> 
>    Host   Guestscore
> 
>  vanilla + w/o kvm optimizes vanilla   1700-1800 records/s
>  vanilla + w/o kvm optimizes vanilla + revert  13000-14000 
>  records/s
>  vanilla + w/ kvm optimizes  vanilla   4500-5000 records/s
>  vanilla + w/ kvm optimizes  vanilla + revert  14000-15500 
>  records/s
> 
>  Exit from aggressive wait-early mechanism can result in yield premature 
>  and
>  incur extra scheduling latency in over-subscribe scenario.
> 
>  kvm optimizes:
>  [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
>  [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock 
>  waiter preemption)
> 
>  Tested-by: loobin...@tencent.com
>  Cc: Peter Zijlstra 
>  Cc: Thomas Gleixner 
>  Cc: Ingo Molnar 
>  Cc: Waiman Long 
>  Cc: Paolo Bonzini 
>  Cc: Radim Krčmář 
>  Cc: loobin...@tencent.com
>  Cc: sta...@vger.kernel.org
>  Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is 
>  preempted)
>  Signed-off-by: Wanpeng Li 
>  ---
>   kernel/locking/qspinlock_paravirt.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
>  diff --git a/kernel/locking/qspinlock_paravirt.h 
>  b/kernel/locking/qspinlock_paravirt.h
>  index 89bab07..e84d21a 100644
>  --- a/kernel/locking/qspinlock_paravirt.h
>  +++ b/kernel/locking/qspinlock_paravirt.h
>  @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
>    if ((loop & PV_PREV_CHECK_MASK) != 0)
>    return false;
> 
>  - return READ_ONCE(prev->state) != vcpu_running || 
>  vcpu_is_preempted(prev->cpu);
>  + return READ_ONCE(prev->state) != vcpu_running;
>   }
> 
>   /*
> >>> There are several possibilities for this performance regression:
> >>>
> >>> 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
> >>> cacheline contention issue depending on how that callback is implemented.
> >>>
> >>> 2) KVM may set the preempt flag for a short period whenver an vmexit
> >>> happens even if a vmenter is executed shortly after. In this case, we
> >>> may want to use a more durable vcpu suspend flag that indicates the vcpu
> >>> won't get a real vcpu back for a longer period of time.
> >>>
> >>> Perhaps you can add a lock event counter to count the number of
> >>> wait_early events caused by vcpu_is_preempted() being true to see if it
> >>> really cause a lot more wait_early than without the vcpu_is_preempted()
> >>> call.
> >> pv_wait_again:1:179
> >> pv_wait_early:1:189429
> >> pv_wait_head:1:263
> >> pv_wait_node:1:189429
> >> pv_vcpu_is_preempted:1:45588
> >> =sleep 5
> >> pv_wait_again:1:181
> >> pv_wait_early:1:202574
> >> pv_wait_head:1:267
> >> pv_wait_node:1:202590
> >> pv_vcpu_is_preempted:1:46336
> >>
> >> The sampling period is 5s, 6% of wait_early events caused by
> >> vcpu_is_preempted() being true.
> >
> > 6% isn't that high. However, when one vCPU voluntarily releases its
> > vCPU, all the subsequently waiters in the queue will do the same. It is
> > a cascading effect. Perhaps we wait early too aggressive with the
> > original patch.
> >
> > I also look up the email chain of the original commit. The patch
> > submitter did not provide any performance data to support this change.
> > The patch just looked reasonable at that time. So there was no
> > objection. Given that we now have hard evidence that this was not a good
> > idea. I think we should revert it.
> >
> > Reviewed-by: Waiman Long 
> >
> > Thanks,
> > Longman
> >
>
> Queued, thanks.

Didn't see it in yesterday's updated kvm/queue. :)

Wanpeng

RE: [v3,3/3] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr-alt-addr' property

2019-09-24 Thread Biwen Li

> >
> > The 'fsl,ippdexpcr-alt-addr' property is used to handle an errata
> > A-008646 on LS1021A
> >
> > Signed-off-by: Biwen Li 
> > ---
> > Change in v3:
> > - rename property name
> >   fsl,rcpm-scfg -> fsl,ippdexpcr-alt-addr
> >
> > Change in v2:
> > - update desc of the property 'fsl,rcpm-scfg'
> >
> >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 14
> > ++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > index 5a33619d881d..157dcf6da17c 100644
> > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > @@ -34,6 +34,11 @@ Chassis Version  Example Chips
> >  Optional properties:
> >   - little-endian : RCPM register block is Little Endian. Without it RCPM
> > will be Big Endian (default case).
> > + - fsl,ippdexpcr-alt-addr : Must add the property for SoC LS1021A,
> 
> You probably should mention this is related to a hardware issue on LS1021a
> and only needed on LS1021a.
Okay, got it, thanks, I will add this in v4.
> 
> > +   Must include n + 1 entries (n = #fsl,rcpm-wakeup-cells, such as:
> > +   #fsl,rcpm-wakeup-cells equal to 2, then must include 2 + 1 entries).
> 
> #fsl,rcpm-wakeup-cells is the number of IPPDEXPCR registers on an SoC.
> However you are defining an offset to scfg registers here.  Why these two
> are related?  The length here should actually be related to the #address-cells
> of the soc/.  But since this is only needed for LS1021, you can just make it 
> 3.
I need set the value of IPPDEXPCR resgiters from ftm_alarm0 device 
node(fsl,rcpm-wakeup = < 0x0 0x2000>;
0x0 is a value for IPPDEXPCR0, 0x2000 is a value for IPPDEXPCR1).
But because of the hardware issue on LS1021A, I need store the value of 
IPPDEXPCR registers
to an alt address. So I defining an offset to scfg registers, then RCPM driver 
get an abosolute address from offset,
 RCPM driver write the value of IPPDEXPCR registers to these abosolute 
addresses(backup the value of IPPDEXPCR registers).
> 
> > +   The first entry must be a link to the SCFG device node.
> > +   The non-first entry must be offset of registers of SCFG.
> >
> >  Example:
> >  The RCPM node for T4240:
> > @@ -43,6 +48,15 @@ The RCPM node for T4240:
> > #fsl,rcpm-wakeup-cells = <2>;
> > };
> >
> > +The RCPM node for LS1021A:
> > +   rcpm: rcpm@1ee2140 {
> > +   compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-2.1+";
> > +   reg = <0x0 0x1ee2140 0x0 0x8>;
> > +   #fsl,rcpm-wakeup-cells = <2>;
> > +   fsl,ippdexpcr-alt-addr = < 0x0 0x51c>; /*
> > SCFG_SPARECR8 */
> > +   };
> > +
> > +
> >  * Freescale RCPM Wakeup Source Device Tree Bindings
> >  ---
> >  Required fsl,rcpm-wakeup property should be added to a device node if
> > the device
> > --
> > 2.17.1

Re: [PATCH] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory()

2019-09-24 Thread Andrew Morton

On Mon, 23 Sep 2019 11:16:38 +0530 Anshuman Khandual 
 wrote:

> 
> 
> On 09/16/2019 11:17 AM, Anshuman Khandual wrote:
> > In add_memory_resource() the memory range to be hot added first gets into
> > the memblock via memblock_add() before arch_add_memory() is called on it.
> > Reverse sequence should be followed during memory hot removal which already
> > is being followed in add_memory_resource() error path. This now ensures
> > required re-order between memblock_[free|remove]() and arch_remove_memory()
> > during memory hot-remove.
> > 
> > Cc: Andrew Morton 
> > Cc: Oscar Salvador 
> > Cc: Michal Hocko 
> > Cc: David Hildenbrand 
> > Cc: Pavel Tatashin 
> > Cc: Dan Williams 
> > Signed-off-by: Anshuman Khandual 
> > ---
> > Original patch https://lkml.org/lkml/2019/9/3/327
> > 
> > Memory hot remove now works on arm64 without this because a recent commit
> > 60bb462fc7ad ("drivers/base/node.c: simplify 
> > unregister_memory_block_under_nodes()").
> > 
> > David mentioned that re-ordering should still make sense for consistency
> > purpose (removing stuff in the reverse order they were added). This patch
> > is now detached from arm64 hot-remove series.
> > 
> > https://lkml.org/lkml/2019/9/3/326
>
> ...
>
> Hello Andrew,
> 
> Any feedbacks on this, does it look okay ?
> 

Well.  I'd parked this for 5.4-rc1 processing because it looked like a
cleanup.

But wy down below the ^---$ line I see "Memory hot remove now works
on arm64".  Am I correct in believing that 60bb462fc7ad broke arm64 mem
hot remove?  And that this patch fixes a serious regression?  If so,
that should have been right there in the patch title and changelog!

Re: [PATCH xfstests v2] overlay: Enable character device to be the base fs partition

2019-09-24 Thread Darrick J. Wong

On Wed, Sep 25, 2019 at 09:54:08AM +0800, Zhihao Cheng wrote:
> There is a message in _supported_fs():
> _notrun "not suitable for this filesystem type: $FSTYP"
> for when overlay usecases are executed on a chararcter device based base

You can do that?

What does that even look like?

--D

> fs. _overay_config_override() detects that the current base fs partition
> is not a block device, and FSTYP won't be overwritten as 'overlay' before
> executing usecases which results in all overlay usecases become 'notrun'.
> In addition, all generic usecases are based on base fs rather than overlay.
> 
> We want to rewrite FSTYP to 'overlay' before running the usecases. To do
> this, we need to add additional character device judgments for TEST_DEV
> and SCRATCH_DEV in _overay_config_override().
> 
> Signed-off-by: Zhihao Cheng 
> ---
>  common/config | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/common/config b/common/config
> index 4c86a49..a22acdb 100644
> --- a/common/config
> +++ b/common/config
> @@ -550,7 +550,7 @@ _overlay_config_override()
>   #the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
>   #of the configured base fs and SCRATCH/TEST_DEV vars are set to the
>   #overlayfs base and mount dirs inside base fs mount.
> - [ -b "$TEST_DEV" ] || return 0
> + [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0
>  
>   # Config file may specify base fs type, but we obay -overlay flag
>   [ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
> @@ -570,7 +570,7 @@ _overlay_config_override()
>   export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
>   export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
>  
> - [ -b "$SCRATCH_DEV" ] || return 0
> + [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0
>  
>   # Store original base fs vars
>   export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
> -- 
> 2.7.4
>

RE: [PATCH v9 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

2019-09-24 Thread Justin He (Arm Technology China)

Hi Matthew and Kirill
I didn't add your previous r-b and a-b tag since I refactored the cow_user_page
and changed the ptl range in v9. Please have a review, thanks


--
Cheers,
Justin (Jia He)



> -Original Message-
> From: Jia He 
> Sent: 2019年9月25日 10:59
> To: Catalin Marinas ; Will Deacon
> ; Mark Rutland ; James Morse
> ; Marc Zyngier ; Matthew
> Wilcox ; Kirill A. Shutemov
> ; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux...@kvack.org; Suzuki Poulose
> 
> Cc: Punit Agrawal ; Anshuman Khandual
> ; Alex Van Brunt
> ; Robin Murphy ;
> Thomas Gleixner ; Andrew Morton  foundation.org>; Jérôme Glisse ; Ralph Campbell
> ; hejia...@gmail.com; Kaly Xin (Arm Technology
> China) ; nd ; Justin He (Arm
> Technology China) 
> Subject: [PATCH v9 3/3] mm: fix double page fault on arm64 if PTE_AF is
> cleared
> 
> When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest,
> there
> will be a double page fault in __copy_from_user_inatomic of
> cow_user_page.
> 
> Below call trace is from arm64 do_page_fault for debugging purpose
> [  110.016195] Call trace:
> [  110.016826]  do_page_fault+0x5a4/0x690
> [  110.017812]  do_mem_abort+0x50/0xb0
> [  110.018726]  el1_da+0x20/0xc4
> [  110.019492]  __arch_copy_from_user+0x180/0x280
> [  110.020646]  do_wp_page+0xb0/0x860
> [  110.021517]  __handle_mm_fault+0x994/0x1338
> [  110.022606]  handle_mm_fault+0xe8/0x180
> [  110.023584]  do_page_fault+0x240/0x690
> [  110.024535]  do_mem_abort+0x50/0xb0
> [  110.025423]  el0_da+0x20/0x24
> 
> The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
> [9b007000] pgd=00023d4f8003, pud=00023da9b003,
> pmd=00023d4b3003, pte=36298607bd3
> 
> As told by Catalin: "On arm64 without hardware Access Flag, copying from
> user will fail because the pte is old and cannot be marked young. So we
> always end up with zeroed page after fork() + CoW for pfn mappings. we
> don't always have a hardware-managed access flag on arm64."
> 
> This patch fix it by calling pte_mkyoung. Also, the parameter is
> changed because vmf should be passed to cow_user_page()
> 
> Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
> in case there can be some obscure use-case.(by Kirill)
> 
> [1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork
> 
> Signed-off-by: Jia He 
> Reported-by: Yibo Cai 
> ---
>  mm/memory.c | 99
> +
>  1 file changed, 84 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e2bb51b6242e..a0a381b36ff2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
>   2;
>  #endif
> 
> +#ifndef arch_faults_on_old_pte
> +static inline bool arch_faults_on_old_pte(void)
> +{
> + return false;
> +}
> +#endif
> +
>  static int __init disable_randmaps(char *s)
>  {
>   randomize_va_space = 0;
> @@ -2140,32 +2147,82 @@ static inline int pte_unmap_same(struct
> mm_struct *mm, pmd_t *pmd,
>   return same;
>  }
> 
> -static inline void cow_user_page(struct page *dst, struct page *src,
> unsigned long va, struct vm_area_struct *vma)
> +static inline bool cow_user_page(struct page *dst, struct page *src,
> +  struct vm_fault *vmf)
>  {
> + bool ret;
> + void *kaddr;
> + void __user *uaddr;
> + bool force_mkyoung;
> + struct vm_area_struct *vma = vmf->vma;
> + struct mm_struct *mm = vma->vm_mm;
> + unsigned long addr = vmf->address;
> +
>   debug_dma_assert_idle(src);
> 
> + if (likely(src)) {
> + copy_user_highpage(dst, src, addr, vma);
> + return true;
> + }
> +
>   /*
>* If the source page was a PFN mapping, we don't have
>* a "struct page" for it. We do a best-effort copy by
>* just copying from the original user address. If that
>* fails, we just zero-fill it. Live with it.
>*/
> - if (unlikely(!src)) {
> - void *kaddr = kmap_atomic(dst);
> - void __user *uaddr = (void __user *)(va & PAGE_MASK);
> + kaddr = kmap_atomic(dst);
> + uaddr = (void __user *)(addr & PAGE_MASK);
> +
> + /*
> +  * On architectures with software "accessed" bits, we would
> +  * take a double page fault, so mark it accessed here.
> +  */
> + force_mkyoung = arch_faults_on_old_pte() && !pte_young(vmf-
> >orig_pte);
> + if (force_mkyoung) {
> + pte_t entry;
> +
> + vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr,
> >ptl);
> + if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> + /*
> +  * Other thread has already handled the fault
> +  * and we don't need to do anything. If it's
> +  * not the case, the fault will be triggered
> +  * again on the same address.
> +

[PATCH v9 2/3] arm64: mm: implement arch_faults_on_old_pte() on arm64

2019-09-24 Thread Jia He

On arm64 without hardware Access Flag, copying fromuser will fail because
the pte is old and cannot be marked young. So we always end up with zeroed
page after fork() + CoW for pfn mappings. we don't always have a
hardware-managed access flag on arm64.

Hence implement arch_faults_on_old_pte on arm64 to indicate that it might
cause page fault when accessing old pte.

Signed-off-by: Jia He 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/pgtable.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e09760ece844..2b035befb66d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -868,6 +868,20 @@ static inline void update_mmu_cache(struct vm_area_struct 
*vma,
 #define phys_to_ttbr(addr) (addr)
 #endif
 
+/*
+ * On arm64 without hardware Access Flag, copying from user will fail because
+ * the pte is old and cannot be marked young. So we always end up with zeroed
+ * page after fork() + CoW for pfn mappings. We don't always have a
+ * hardware-managed access flag on arm64.
+ */
+static inline bool arch_faults_on_old_pte(void)
+{
+   WARN_ON(preemptible());
+
+   return !cpu_has_hw_af();
+}
+#define arch_faults_on_old_pte arch_faults_on_old_pte
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_PGTABLE_H */
-- 
2.17.1

[PATCH v9 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared

2019-09-24 Thread Jia He

When we tested pmdk unit test [1] vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

Below call trace is from arm64 do_page_fault for debugging purpose
[  110.016195] Call trace:
[  110.016826]  do_page_fault+0x5a4/0x690
[  110.017812]  do_mem_abort+0x50/0xb0
[  110.018726]  el1_da+0x20/0xc4
[  110.019492]  __arch_copy_from_user+0x180/0x280
[  110.020646]  do_wp_page+0xb0/0x860
[  110.021517]  __handle_mm_fault+0x994/0x1338
[  110.022606]  handle_mm_fault+0xe8/0x180
[  110.023584]  do_page_fault+0x240/0x690
[  110.024535]  do_mem_abort+0x50/0xb0
[  110.025423]  el0_da+0x20/0x24

The pte info before __copy_from_user_inatomic is (PTE_AF is cleared):
[9b007000] pgd=00023d4f8003, pud=00023da9b003, 
pmd=00023d4b3003, pte=36298607bd3

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

This patch fix it by calling pte_mkyoung. Also, the parameter is
changed because vmf should be passed to cow_user_page()

Add a WARN_ON_ONCE when __copy_from_user_inatomic() returns error
in case there can be some obscure use-case.(by Kirill)

[1] https://github.com/pmem/pmdk/tree/master/src/test/vmmalloc_fork

Signed-off-by: Jia He 
Reported-by: Yibo Cai 
---
 mm/memory.c | 99 +
 1 file changed, 84 insertions(+), 15 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..a0a381b36ff2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -118,6 +118,13 @@ int randomize_va_space __read_mostly =
2;
 #endif
 
+#ifndef arch_faults_on_old_pte
+static inline bool arch_faults_on_old_pte(void)
+{
+   return false;
+}
+#endif
+
 static int __init disable_randmaps(char *s)
 {
randomize_va_space = 0;
@@ -2140,32 +2147,82 @@ static inline int pte_unmap_same(struct mm_struct *mm, 
pmd_t *pmd,
return same;
 }
 
-static inline void cow_user_page(struct page *dst, struct page *src, unsigned 
long va, struct vm_area_struct *vma)
+static inline bool cow_user_page(struct page *dst, struct page *src,
+struct vm_fault *vmf)
 {
+   bool ret;
+   void *kaddr;
+   void __user *uaddr;
+   bool force_mkyoung;
+   struct vm_area_struct *vma = vmf->vma;
+   struct mm_struct *mm = vma->vm_mm;
+   unsigned long addr = vmf->address;
+
debug_dma_assert_idle(src);
 
+   if (likely(src)) {
+   copy_user_highpage(dst, src, addr, vma);
+   return true;
+   }
+
/*
 * If the source page was a PFN mapping, we don't have
 * a "struct page" for it. We do a best-effort copy by
 * just copying from the original user address. If that
 * fails, we just zero-fill it. Live with it.
 */
-   if (unlikely(!src)) {
-   void *kaddr = kmap_atomic(dst);
-   void __user *uaddr = (void __user *)(va & PAGE_MASK);
+   kaddr = kmap_atomic(dst);
+   uaddr = (void __user *)(addr & PAGE_MASK);
+
+   /*
+* On architectures with software "accessed" bits, we would
+* take a double page fault, so mark it accessed here.
+*/
+   force_mkyoung = arch_faults_on_old_pte() && !pte_young(vmf->orig_pte);
+   if (force_mkyoung) {
+   pte_t entry;
+
+   vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, >ptl);
+   if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+   /*
+* Other thread has already handled the fault
+* and we don't need to do anything. If it's
+* not the case, the fault will be triggered
+* again on the same address.
+*/
+   ret = false;
+   goto pte_unlock;
+   }
+
+   entry = pte_mkyoung(vmf->orig_pte);
+   if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0))
+   update_mmu_cache(vma, addr, vmf->pte);
+   }
 
+   /*
+* This really shouldn't fail, because the page is there
+* in the page tables. But it might just be unreadable,
+* in which case we just give up and fill the result with
+* zeroes.
+*/
+   if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
/*
-* This really shouldn't fail, because the page is there
-* in the page tables. But it might just be unreadable,
-* in which case we just give up and fill the result with
-* zeroes.
+* Give a warn in case there can be some obscure
+* use-case

[PATCH v9 1/3] arm64: cpufeature: introduce helper cpu_has_hw_af()

2019-09-24 Thread Jia He

We unconditionally set the HW_AFDBM capability and only enable it on
CPUs which really have the feature. But sometimes we need to know
whether this cpu has the capability of HW AF. So decouple AF from
DBM by new helper cpu_has_hw_af().

Signed-off-by: Jia He 
Suggested-by: Suzuki Poulose 
Reported-by: kbuild test robot 
Reviewed-by: Catalin Marinas 
---
 arch/arm64/include/asm/cpufeature.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index c96ffa4722d3..c2e3abd39faa 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -667,6 +667,16 @@ static inline u32 id_aa64mmfr0_parange_to_phys_shift(int 
parange)
default: return CONFIG_ARM64_PA_BITS;
}
 }
+
+/* Check whether hardware update of the Access flag is supported */
+static inline bool cpu_has_hw_af(void)
+{
+   if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
+   return read_cpuid(ID_AA64MMFR1_EL1) & 0xf;
+
+   return false;
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif
-- 
2.17.1

[PATCH v9 0/3] fix double page fault on arm64

2019-09-24 Thread Jia He

When we tested pmdk unit test vmmalloc_fork TEST1 in arm64 guest, there
will be a double page fault in __copy_from_user_inatomic of cow_user_page.

As told by Catalin: "On arm64 without hardware Access Flag, copying from
user will fail because the pte is old and cannot be marked young. So we
always end up with zeroed page after fork() + CoW for pfn mappings. we
don't always have a hardware-managed access flag on arm64."

Changes
v9: refactor cow_user_page for indention optimization (Catalin)
hold the ptl longer (Catalin)
v8: change cow_user_page's return type (Matthew)
v7: s/pte_spinlock/pte_offset_map_lock (Kirill)
v6: fix error case of returning with spinlock taken (Catalin)
move kmap_atomic to avoid handling kunmap_atomic
v5: handle the case correctly when !pte_same
fix kbuild test failed
v4: introduce cpu_has_hw_af (Suzuki)
bail out if !pte_same (Kirill)
v3: add vmf->ptl lock/unlock (Kirill A. Shutemov)
add arch_faults_on_old_pte (Matthew, Catalin)
v2: remove FAULT_FLAG_WRITE when setting pte access flag (Catalin)

Jia He (3):
  arm64: cpufeature: introduce helper cpu_has_hw_af()
  arm64: mm: implement arch_faults_on_old_pte() on arm64
  mm: fix double page fault on arm64 if PTE_AF is cleared

 arch/arm64/include/asm/cpufeature.h | 10 +++
 arch/arm64/include/asm/pgtable.h| 14 
 mm/memory.c | 99 -
 3 files changed, 108 insertions(+), 15 deletions(-)

-- 
2.17.1

[PATCH V2] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory()

2019-09-24 Thread Anshuman Khandual

Currently during memory hot add procedure, memory gets into memblock before
calling arch_add_memory() which creates it's linear mapping.

add_memory_resource() {
..
memblock_add_node()
..
arch_add_memory()
..
}

But during memory hot remove procedure, removal from memblock happens first
before it's linear mapping gets teared down with arch_remove_memory() which
is not consistent. Resource removal should happen in reverse order as they
were added. However this does not pose any problem for now, unless there is
an assumption regarding linear mapping. One example was a subtle failure on
arm64 platform [1]. Though this has now found a different solution.

try_remove_memory() {
..
memblock_free()
memblock_remove()
..
arch_remove_memory()
..
}

This changes the sequence of resource removal including memblock and linear
mapping tear down during memory hot remove which will now be the reverse
order in which they were added during memory hot add. The changed removal
order looks like the following.

try_remove_memory() {
..
arch_remove_memory()
..
memblock_free()
memblock_remove()
..
}

[1] https://patchwork.kernel.org/patch/11127623/

Cc: Andrew Morton 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Cc: David Hildenbrand 
Cc: Pavel Tatashin 
Cc: Dan Williams 
Signed-off-by: Anshuman Khandual 
---
Changes in V2:

- Changed the commit message as per Michal and David 

Changed in V1: https://patchwork.kernel.org/patch/11146361/

Original patch https://lkml.org/lkml/2019/9/3/327

Memory hot remove now works on arm64 without this because a recent commit
60bb462fc7ad ("drivers/base/node.c: simplify 
unregister_memory_block_under_nodes()").

David mentioned that re-ordering should still make sense for consistency
purpose (removing stuff in the reverse order they were added). This patch
is now detached from arm64 hot-remove series.

https://lkml.org/lkml/2019/9/3/326

 mm/memory_hotplug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 49f7bf91c25a..4f7d426a84d0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1763,13 +1763,13 @@ static int __ref try_remove_memory(int nid, u64 start, 
u64 size)
 
/* remove memmap entry */
firmware_map_remove(start, start + size, "System RAM");
-   memblock_free(start, size);
-   memblock_remove(start, size);
 
/* remove memory block devices before removing memory */
remove_memory_block_devices(start, size);
 
arch_remove_memory(nid, start, size, NULL);
+   memblock_free(start, size);
+   memblock_remove(start, size);
__release_memory_resource(start, size);
 
try_offline_node(nid);
-- 
2.20.1

Re: [RFC PATCH v3 00/16] Core scheduling v3

2019-09-24 Thread Aubrey Li

On Sat, Sep 7, 2019 at 2:30 AM Tim Chen  wrote:
> +static inline s64 core_sched_imbalance_delta(int src_cpu, int dst_cpu,
> +   int src_sibling, int dst_sibling,
> +   struct task_group *tg, u64 task_load)
> +{
> +   struct sched_entity *se, *se_sibling, *dst_se, *dst_se_sibling;
> +   s64 excess, deficit, old_mismatch, new_mismatch;
> +
> +   if (src_cpu == dst_cpu)
> +   return -1;
> +
> +   /* XXX SMT4 will require additional logic */
> +
> +   se = tg->se[src_cpu];
> +   se_sibling = tg->se[src_sibling];
> +
> +   excess = se->avg.load_avg - se_sibling->avg.load_avg;
> +   if (src_sibling == dst_cpu) {
> +   old_mismatch = abs(excess);
> +   new_mismatch = abs(excess - 2*task_load);
> +   return old_mismatch - new_mismatch;
> +   }
> +
> +   dst_se = tg->se[dst_cpu];
> +   dst_se_sibling = tg->se[dst_sibling];
> +   deficit = dst_se->avg.load_avg - dst_se_sibling->avg.load_avg;
> +
> +   old_mismatch = abs(excess) + abs(deficit);
> +   new_mismatch = abs(excess - (s64) task_load) +
> +  abs(deficit + (s64) task_load);

If I understood correctly, these formulas made an assumption that the task
being moved to the destination is matched the destination's core cookie. so if
the task is not matched with dst's core cookie and still have to stay
in the runqueue
then the formula becomes not correct.

>  /**
>   * update_sg_lb_stats - Update sched_group's statistics for load balancing.
>   * @env: The load balancing environment.
> @@ -8345,7 +8492,8 @@ static inline void update_sg_lb_stats(struct lb_env 
> *env,
> else
> load = source_load(i, load_idx);
>
> -   sgs->group_load += load;

Why is this load update line removed?

> +   core_sched_imbalance_scan(sgs, i, env->dst_cpu);
> +
> sgs->group_util += cpu_util(i);
> sgs->sum_nr_running += rq->cfs.h_nr_running;
>

Thanks,
-Aubrey

RE: [PATCH] pwm: pwm-imx27: Use 'dev' instead of dereferencing it repeatedly

2019-09-24 Thread Anson Huang

Hi, David

> Subject: RE: [PATCH] pwm: pwm-imx27: Use 'dev' instead of dereferencing it
> repeatedly
> 
> From: Anson Huang
> > Sent: 24 September 2019 11:03
> > Hi, David
> >
> > > Subject: RE: [PATCH] pwm: pwm-imx27: Use 'dev' instead of
> > > dereferencing it repeatedly
> > >
> > > From: Anson Huang
> > > > Sent: 24 September 2019 10:00
> > > > Add helper variable dev = >dev to simply the code.
> > > >
> ...
> > > >  static int pwm_imx27_probe(struct platform_device *pdev)  {
> > > > +   struct device *dev = >dev;
> > > > struct pwm_imx27_chip *imx;
> > > >
> > > > -   imx = devm_kzalloc(>dev, sizeof(*imx), GFP_KERNEL);
> > > > +   imx = devm_kzalloc(dev, sizeof(*imx), GFP_KERNEL);
> ...
> > > Hopefully the compiler will optimise this back otherwise you've
> > > added another local variable which may cause spilling to stack.
> > > For a setup function it probably doesn't matter, but in general it
> > > might have a small negative performance impact.
> > >
> > > In any case this doesn't shorten any lines enough to remove
> > > line-wrap and using >dev is really one less variable to
> > > mentally track when reading the code.
> >
> > Do we know which compiler will optimize this? I saw many of the
> > patches doing this to avoid a lot of dereference, I understand it does
> > NOT save lines, but my intention is to avoid dereference which might save
> some instructions.
> >
> > I thought saving instructions is more important. So now there are
> > different opinion about doing this?
> 
> Remember >dev is just 'pdev + constant'.
> Assuming 'pdev' is held in a callee saved register (which you want it to be)
> then to access
> dev->foo the compiler can remember the constant and use an offset from
> dev->'pdev' instead of
> an extra 'dev' variable.
> On most modern ABI the first function call arguments are passed in registers.
> So an add  instruction (probably lea) can be used to add the constant offset
> at the same time as the value is moved into the argument register.
> 
> However your extra variable could easily get spilled out to the stack.
> So you get an extra memory read rather than (at most) an extra 'add'
> instruction.
> 
> Even if pdev->dev were a pointer, repeatedly reading it from pdev->dev
> could easily generate better code than having an extra variable that would
> mean the value was repeatedly read from the stack.

Thanks for detail education about it, please ignore these patches.

Thanks,
Anson

RE: [EXT] Re: [V4 2/2] dmaengine: fsl-dpaa2-qdma: Add NXP dpaa2 qDMA controller driver for Layerscape SoCs

2019-09-24 Thread Peng Ma

Hi Vinod,

>-Original Message-
>From: Vinod Koul 
>Sent: 2019年9月25日 3:35
>To: Peng Ma 
>Cc: dan.j.willi...@intel.com; Leo Li ;
>linux-kernel@vger.kernel.org; dmaeng...@vger.kernel.org
>Subject: Re: [EXT] Re: [V4 2/2] dmaengine: fsl-dpaa2-qdma: Add NXP dpaa2
>qDMA controller driver for Layerscape SoCs
>
>Caution: EXT Email
>
>Hey Peng,
>
>On 11-09-19, 02:01, Peng Ma wrote:
>> Hi Vinod,
>>
>> I send those series patchs(V5) on June 25, 2019. I haven't received
>> any comments yet. Their current state is "Not Applicable", so please let me
>know what I need to do next.
>> Thanks very much for your comments.
>>
>> Patch link:
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
>>
>hwork.kernel.org%2Fpatch%2F11015035%2Fdata=02%7C01%7Cpeng.m
>a%40nx
>>
>p.com%7Cfe0293a83eb8472787d308d74126682c%7C686ea1d3bc2b4c6fa92c
>d99c5c3
>>
>01635%7C0%7C0%7C637049505521024467sdata=qkij6By7Hku4eN7zo
>CnIkCK96
>> 7WnwE21W%2FVkWKibIBw%3Dreserved=0
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
>>
>hwork.kernel.org%2Fpatch%2F11015033%2Fdata=02%7C01%7Cpeng.m
>a%40nx
>>
>p.com%7Cfe0293a83eb8472787d308d74126682c%7C686ea1d3bc2b4c6fa92c
>d99c5c3
>>
>01635%7C0%7C0%7C637049505521024467sdata=OeL5OfMJggMS3K7I
>Z1yu9WWZ9
>> zSOSyWojr%2Bv7BL5BpU%3Dreserved=0
>
>Am sorry this looks to have missed by me and my script updated the status.
>
>Can you please resend me after rc1 is out and I will review it and do the
>needful
[Peng Ma] Got it. By the way, when will rc1 out?

Best Regards,
Peng
>
>--
>~Vinod

[PATCH 0/1] iio: add driver for Bosch BMA400 accelerometer

2019-09-24 Thread Dan Robertson

Add a IIO driver for the Bosch BMA400 3-axes ultra low-power accelerometer.
The initial implementation of the driver adds read support for the
acceleration and temperature data registers. The driver also has support for
reading and writing to the output data rate, oversampling ratio, and scale
configuration registers.

Comments and feedback are very much welcomed :)

Cheers,

 - Dan

Dan Robertson (1):
  iio: (bma400) add driver for the BMA400

 drivers/iio/accel/Kconfig   |  19 +
 drivers/iio/accel/Makefile  |   2 +
 drivers/iio/accel/bma400.h  |  74 +++
 drivers/iio/accel/bma400_core.c | 862 
 drivers/iio/accel/bma400_i2c.c  |  54 ++
 5 files changed, 1011 insertions(+)
 create mode 100644 drivers/iio/accel/bma400.h
 create mode 100644 drivers/iio/accel/bma400_core.c
 create mode 100644 drivers/iio/accel/bma400_i2c.c

[PATCH 1/1] iio: (bma400) add driver for the BMA400

2019-09-24 Thread Dan Robertson

Add a IIO driver for the Bosch BMA400 3-axes ultra-low power accelerometer.
The driver supports reading from the acceleration and temperature
registers. The driver also supports reading and configuring the output data
rate, oversampling ratio, and scale.

Signed-off-by: Dan Robertson 
---
 drivers/iio/accel/Kconfig   |  19 +
 drivers/iio/accel/Makefile  |   2 +
 drivers/iio/accel/bma400.h  |  74 +++
 drivers/iio/accel/bma400_core.c | 862 
 drivers/iio/accel/bma400_i2c.c  |  54 ++
 5 files changed, 1011 insertions(+)
 create mode 100644 drivers/iio/accel/bma400.h
 create mode 100644 drivers/iio/accel/bma400_core.c
 create mode 100644 drivers/iio/accel/bma400_i2c.c

diff --git a/drivers/iio/accel/Kconfig b/drivers/iio/accel/Kconfig
index 9b9656ce37e6..cca6727e037e 100644
--- a/drivers/iio/accel/Kconfig
+++ b/drivers/iio/accel/Kconfig
@@ -112,6 +112,25 @@ config BMA220
  To compile this driver as a module, choose M here: the
  module will be called bma220_spi.
 
+config BMA400
+   tristate "Bosch BMA400 3-Axis Accelerometer Driver"
+   depends on I2C
+   select REGMAP
+   select BMA400_I2C if (I2C)
+   help
+ Say Y here if you want to build a driver for the Bosch BMA400
+ triaxial acceleration sensor.
+
+ To compile this driver as a module, choose M here: the
+ module will be called bma400_core and you will also get
+ bma400_i2c for I2C
+
+config BMA400_I2C
+   tristate
+   depends on BMA400
+   depends on I2C
+   select REGMAP_I2C
+
 config BMC150_ACCEL
tristate "Bosch BMC150 Accelerometer Driver"
select IIO_BUFFER
diff --git a/drivers/iio/accel/Makefile b/drivers/iio/accel/Makefile
index 56bd0215e0d4..3a051cf37f40 100644
--- a/drivers/iio/accel/Makefile
+++ b/drivers/iio/accel/Makefile
@@ -14,6 +14,8 @@ obj-$(CONFIG_ADXL372_I2C) += adxl372_i2c.o
 obj-$(CONFIG_ADXL372_SPI) += adxl372_spi.o
 obj-$(CONFIG_BMA180) += bma180.o
 obj-$(CONFIG_BMA220) += bma220_spi.o
+obj-$(CONFIG_BMA400) += bma400_core.o
+obj-$(CONFIG_BMA400_I2C) += bma400_i2c.o
 obj-$(CONFIG_BMC150_ACCEL) += bmc150-accel-core.o
 obj-$(CONFIG_BMC150_ACCEL_I2C) += bmc150-accel-i2c.o
 obj-$(CONFIG_BMC150_ACCEL_SPI) += bmc150-accel-spi.o
diff --git a/drivers/iio/accel/bma400.h b/drivers/iio/accel/bma400.h
new file mode 100644
index ..7fa92bc457f6
--- /dev/null
+++ b/drivers/iio/accel/bma400.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * bma400.h - Register constants and other forward declarations
+ *needed by the bma400 sources.
+ *
+ * Copyright 2019 Dan Robertson 
+ *
+ */
+
+#include 
+
+/*
+ * Read-Only Registers
+ */
+
+/* Status and ID registers */
+#define BMA400_CHIP_ID_REG  0x00
+#define BMA400_ERR_REG  0x02
+#define BMA400_STATUS_REG   0x03
+
+/* Acceleration registers */
+#define BMA400_X_AXIS_LSB_REG   0x04
+#define BMA400_X_AXIS_MSB_REG   0x05
+#define BMA400_Y_AXIS_LSB_REG   0x06
+#define BMA400_Y_AXIS_MSB_REG   0x07
+#define BMA400_Z_AXIS_LSB_REG   0x08
+#define BMA400_Z_AXIS_MSB_REG   0x09
+
+/* Sensort time registers */
+#define BMA400_SENSOR_TIME0 0x0a
+#define BMA400_SENSOR_TIME1 0x0b
+#define BMA400_SENSOR_TIME2 0x0c
+
+/* Event and interrupt registers */
+#define BMA400_EVENT_REG0x0d
+#define BMA400_INT_STAT0_REG0x0e
+#define BMA400_INT_STAT1_REG0x0f
+#define BMA400_INT_STAT2_REG0x10
+
+/* Temperature register */
+#define BMA400_TEMP_DATA_REG0x11
+
+/* FIFO length and data registers */
+#define BMA400_FIFO_LENGTH0_REG 0x12
+#define BMA400_FIFO_LENGTH1_REG 0x13
+#define BMA400_FIFO_DATA_REG0x14
+
+/* Step count registers */
+#define BMA400_STEP_CNT0_REG0x15
+#define BMA400_STEP_CNT1_REG0x16
+#define BMA400_STEP_CNT3_REG0x17
+#define BMA400_STEP_STAT_REG0x18
+
+/*
+ * Read-write configuration registers
+ */
+#define BMA400_ACC_CONFIG0_REG  0x19
+#define BMA400_ACC_CONFIG1_REG  0x1a
+#define BMA400_ACC_CONFIG2_REG  0x1b
+#define BMA400_CMD_REG  0x7e
+
+/* Chip ID of BMA 400 devices found in the chip ID register. */
+#define BMA400_ID_REG_VAL   0x90
+
+/* The softreset command */
+#define BMA400_SOFTRESET_CMD0xb6
+
+extern const struct regmap_config bma400_regmap_config;
+
+int bma400_probe(struct device *dev,
+struct regmap *regmap,
+const char *name);
+
+int bma400_remove(struct device *dev);
diff --git a/drivers/iio/accel/bma400_core.c b/drivers/iio/accel/bma400_core.c
new file mode 100644
index ..55fe2f220c30
--- /dev/null
+++ b/drivers/iio/accel/bma400_core.c
@@ -0,0 +1,862 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * bma400-core.c - Core IIO driver for Bosch BMA400 triaxial acceleration
+ * sensor. Used by bma400-i2c.
+ *
+ * Copyright 2019 Dan Robertson 
+ *
+ * TODO:
+ *  - Support for power management
+ *  - Support events and interrupts
+ *  - Create channel the step count

[PATCH V9 0/2] mailbox: arm: introduce smc triggered mailbox

2019-09-24 Thread Peng Fan

From: Peng Fan 

V9:
 - Add Florian's R-b tag in patch 1/2
 - Mark arm,func-id as a required property per Andre's comments in patch 1/2.
 - Make invoke_smc_mbox_fn as a private entry in a channal per Florian's
   comments in pach 2/2
 - Include linux/types.h in arm-smccc-mbox.h in patch 2/2
 - Drop function_id from arm_smccc_mbox_cmd since func-id is from DT
   in patch 2/2/.

   Andre,
I have marked arm,func-id as a required property and dropped function-id
from client, please see whether you are happy with the patchset.
Hope we could finalize and get patches land in.

   Thanks,
   Peng.

V8:
Add missed arm-smccc-mbox.h

V7:
Typo fix
#mbox-cells changed to 0
Add a new header file arm-smccc-mbox.h
Use ARM_SMCCC_IS_64

Andre,
  The function_id is still kept in arm_smccc_mbox_cmd, because arm,func-id
property is optional, so clients could pass function_id to mbox driver.

V6:
Switch to per-channel a mbox controller
Drop arm,num-chans, transports, method
Add arm,hvc-mbox compatible
Fix smc/hvc args, drop client id and use correct type.
https://patchwork.kernel.org/cover/11146641/

V5:
yaml fix
https://patchwork.kernel.org/cover/7741/

V4:
yaml fix for num-chans in patch 1/2.
https://patchwork.kernel.org/cover/6521/

V3:
Drop interrupt
Introduce transports for mem/reg usage
Add chan-id for mem usage
Convert to yaml format
https://patchwork.kernel.org/cover/11043541/

V2:
This is a modified version from Andre Przywara's patch series
https://lore.kernel.org/patchwork/cover/812997/.
The modification are mostly:
Introduce arm,num-chans
Introduce arm_smccc_mbox_cmd
txdone_poll and txdone_irq are both set to false
arm,func-ids are kept, but as an optional property.
Rewords SCPI to SCMI, because I am trying SCMI over SMC, not SCPI.
Introduce interrupts notification.

[1] is a draft implementation of i.MX8MM SCMI ATF implementation that
use smc as mailbox, power/clk is included, but only part of clk has been
implemented to work with hardware, power domain only supports get name
for now.

The traditional Linux mailbox mechanism uses some kind of dedicated hardware
IP to signal a condition to some other processing unit, typically a dedicated
management processor.
This mailbox feature is used for instance by the SCMI protocol to signal a
request for some action to be taken by the management processor.
However some SoCs does not have a dedicated management core to provide
those services. In order to service TEE and to avoid linux shutdown
power and clock that used by TEE, need let firmware to handle power
and clock, the firmware here is ARM Trusted Firmware that could also
run SCMI service.

The existing SCMI implementation uses a rather flexible shared memory
region to communicate commands and their parameters, it still requires a
mailbox to actually trigger the action.

This patch series provides a Linux mailbox compatible service which uses
smc calls to invoke firmware code, for instance taking care of SCMI requests.
The actual requests are still communicated using the standard SCMI way of
shared memory regions, but a dedicated mailbox hardware IP can be replaced via
this new driver.

This simple driver uses the architected SMC calling convention to trigger
firmware services, also allows for using "HVC" calls to call into hypervisors
or firmware layers running in the EL2 exception level.

Patch 1 contains the device tree binding documentation, patch 2 introduces
the actual mailbox driver.

Please note that this driver just provides a generic mailbox mechanism,
It could support synchronous TX/RX, or synchronous TX with asynchronous
RX. And while providing SCMI services was the reason for this exercise,
this driver is in no way bound to this use case, but can be used generically
where the OS wants to signal a mailbox condition to firmware or a
hypervisor.
Also the driver is in no way meant to replace any existing firmware
interface, but actually to complement existing interfaces.

[1] https://github.com/MrVan/arm-trusted-firmware/tree/scmi


Peng Fan (2):
  dt-bindings: mailbox: add binding doc for the ARM SMC/HVC mailbox
  mailbox: introduce ARM SMC based mailbox

 .../devicetree/bindings/mailbox/arm-smc.yaml   |  96 
 drivers/mailbox/Kconfig|   7 +
 drivers/mailbox/Makefile   |   2 +
 drivers/mailbox/arm-smc-mailbox.c  | 167 +
 include/linux/mailbox/arm-smccc-mbox.h |  20 +++
 5 files changed, 292 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mailbox/arm-smc.yaml
 create mode 100644 drivers/mailbox/arm-smc-mailbox.c
 create mode 100644 include/linux/mailbox/arm-smccc-mbox.h

-- 
2.16.4

[PATCH V9 1/2] dt-bindings: mailbox: add binding doc for the ARM SMC/HVC mailbox

2019-09-24 Thread Peng Fan

From: Peng Fan 

The ARM SMC/HVC mailbox binding describes a firmware interface to trigger
actions in software layers running in the EL2 or EL3 exception levels.
The term "ARM" here relates to the SMC instruction as part of the ARM
instruction set, not as a standard endorsed by ARM Ltd.

Signed-off-by: Peng Fan 
Reviewed-by: Florian Fainelli 
---
 .../devicetree/bindings/mailbox/arm-smc.yaml   | 96 ++
 1 file changed, 96 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mailbox/arm-smc.yaml

diff --git a/Documentation/devicetree/bindings/mailbox/arm-smc.yaml 
b/Documentation/devicetree/bindings/mailbox/arm-smc.yaml
new file mode 100644
index ..b061954d1678
--- /dev/null
+++ b/Documentation/devicetree/bindings/mailbox/arm-smc.yaml
@@ -0,0 +1,96 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mailbox/arm-smc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: ARM SMC Mailbox Interface
+
+maintainers:
+  - Peng Fan 
+
+description: |
+  This mailbox uses the ARM smc (secure monitor call) or hvc (hypervisor
+  call) instruction to trigger a mailbox-connected activity in firmware,
+  executing on the very same core as the caller. The value of r0/w0/x0
+  the firmware returns after the smc call is delivered as a received
+  message to the mailbox framework, so synchronous communication can be
+  established. The exact meaning of the action the mailbox triggers as
+  well as the return value is defined by their users and is not subject
+  to this binding.
+
+  One example use case of this mailbox is the SCMI interface, which uses
+  shared memory to transfer commands and parameters, and a mailbox to
+  trigger a function call. This allows SoCs without a separate management
+  processor (or when such a processor is not available or used) to use
+  this standardized interface anyway.
+
+  This binding describes no hardware, but establishes a firmware interface.
+  Upon receiving an SMC using the described SMC function identifier, the
+  firmware is expected to trigger some mailbox connected functionality.
+  The communication follows the ARM SMC calling convention.
+  Firmware expects an SMC function identifier in r0 or w0. The supported
+  identifier are passed from consumers, or listed in the the arm,func-id
+  property as described below. The firmware can return one value in
+  the first SMC result register, it is expected to be an error value,
+  which shall be propagated to the mailbox client.
+
+  Any core which supports the SMC or HVC instruction can be used, as long
+  as a firmware component running in EL3 or EL2 is handling these calls.
+
+properties:
+  compatible:
+oneOf:
+  - description:
+  For implementations using ARM SMC instruction.
+const: arm,smc-mbox
+
+  - description:
+  For implementations using ARM HVC instruction.
+const: arm,hvc-mbox
+
+  "#mbox-cells":
+const: 0
+
+  arm,func-id:
+description: |
+  An single 32-bit value specifying the function ID used by the mailbox.
+  The function ID follows the ARM SMC calling convention standard.
+$ref: /schemas/types.yaml#/definitions/uint32
+
+required:
+  - compatible
+  - "#mbox-cells"
+  - arm,func-id
+
+examples:
+  - |
+sram@93f000 {
+  compatible = "mmio-sram";
+  reg = <0x0 0x93f000 0x0 0x1000>;
+  #address-cells = <1>;
+  #size-cells = <1>;
+  ranges = <0x0 0x93f000 0x1000>;
+
+  cpu_scp_lpri: scp-shmem@0 {
+compatible = "arm,scmi-shmem";
+reg = <0x0 0x200>;
+  };
+};
+
+smc_tx_mbox: tx_mbox {
+  #mbox-cells = <0>;
+  compatible = "arm,smc-mbox";
+  arm,func-id = <0xc2fe>;
+};
+
+firmware {
+  scmi {
+compatible = "arm,scmi";
+mboxes = <_tx_mbox>;
+mbox-names = "tx";
+shmem = <_scp_lpri>;
+  };
+};
+
+...
-- 
2.16.4

[PATCH V9 2/2] mailbox: introduce ARM SMC based mailbox

2019-09-24 Thread Peng Fan

From: Peng Fan 

This mailbox driver implements a mailbox which signals transmitted data
via an ARM smc (secure monitor call) instruction. The mailbox receiver
is implemented in firmware and can synchronously return data when it
returns execution to the non-secure world again.
An asynchronous receive path is not implemented.
This allows the usage of a mailbox to trigger firmware actions on SoCs
which either don't have a separate management processor or on which such
a core is not available. A user of this mailbox could be the SCP
interface.

Modified from Andre Przywara's v2 patch
https://lore.kernel.org/patchwork/patch/812999/

Cc: Andre Przywara 
Signed-off-by: Peng Fan 
---
 drivers/mailbox/Kconfig|   7 ++
 drivers/mailbox/Makefile   |   2 +
 drivers/mailbox/arm-smc-mailbox.c  | 167 +
 include/linux/mailbox/arm-smccc-mbox.h |  20 
 4 files changed, 196 insertions(+)
 create mode 100644 drivers/mailbox/arm-smc-mailbox.c
 create mode 100644 include/linux/mailbox/arm-smccc-mbox.h

diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index ab4eb750bbdd..7707ee26251a 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -16,6 +16,13 @@ config ARM_MHU
  The controller has 3 mailbox channels, the last of which can be
  used in Secure mode only.
 
+config ARM_SMC_MBOX
+   tristate "Generic ARM smc mailbox"
+   depends on OF && HAVE_ARM_SMCCC
+   help
+ Generic mailbox driver which uses ARM smc calls to call into
+ firmware for triggering mailboxes.
+
 config IMX_MBOX
tristate "i.MX Mailbox"
depends on ARCH_MXC || COMPILE_TEST
diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
index c22fad6f696b..93918a84c91b 100644
--- a/drivers/mailbox/Makefile
+++ b/drivers/mailbox/Makefile
@@ -7,6 +7,8 @@ obj-$(CONFIG_MAILBOX_TEST)  += mailbox-test.o
 
 obj-$(CONFIG_ARM_MHU)  += arm_mhu.o
 
+obj-$(CONFIG_ARM_SMC_MBOX) += arm-smc-mailbox.o
+
 obj-$(CONFIG_IMX_MBOX) += imx-mailbox.o
 
 obj-$(CONFIG_ARMADA_37XX_RWTM_MBOX)+= armada-37xx-rwtm-mailbox.o
diff --git a/drivers/mailbox/arm-smc-mailbox.c 
b/drivers/mailbox/arm-smc-mailbox.c
new file mode 100644
index ..6f0b5fd6ad1b
--- /dev/null
+++ b/drivers/mailbox/arm-smc-mailbox.c
@@ -0,0 +1,167 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2016,2017 ARM Ltd.
+ * Copyright 2019 NXP
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+typedef unsigned long (smc_mbox_fn)(unsigned int, unsigned long,
+   unsigned long, unsigned long,
+   unsigned long, unsigned long,
+   unsigned long);
+
+struct arm_smc_chan_data {
+   unsigned int function_id;
+   smc_mbox_fn *invoke_smc_mbox_fn;
+};
+
+static int arm_smc_send_data(struct mbox_chan *link, void *data)
+{
+   struct arm_smc_chan_data *chan_data = link->con_priv;
+   struct arm_smccc_mbox_cmd *cmd = data;
+   unsigned long ret;
+
+   if (ARM_SMCCC_IS_64(chan_data->function_id)) {
+   ret = chan_data->invoke_smc_mbox_fn(chan_data->function_id,
+   cmd->args_smccc64[0],
+   cmd->args_smccc64[1],
+   cmd->args_smccc64[2],
+   cmd->args_smccc64[3],
+   cmd->args_smccc64[4],
+   cmd->args_smccc64[5]);
+   } else {
+   ret = chan_data->invoke_smc_mbox_fn(chan_data->function_id,
+   cmd->args_smccc32[0],
+   cmd->args_smccc32[1],
+   cmd->args_smccc32[2],
+   cmd->args_smccc32[3],
+   cmd->args_smccc32[4],
+   cmd->args_smccc32[5]);
+   }
+
+   mbox_chan_received_data(link, (void *)ret);
+
+   return 0;
+}
+
+static unsigned long __invoke_fn_hvc(unsigned int function_id,
+unsigned long arg0, unsigned long arg1,
+unsigned long arg2, unsigned long arg3,
+unsigned long arg4, unsigned long arg5)
+{
+   struct arm_smccc_res res;
+
+   arm_smccc_hvc(function_id, arg0, arg1, arg2, arg3, arg4,
+ arg5, 0, );
+   return res.a0;
+}
+
+static unsigned long __invoke_fn_smc(unsigned int function_id,
+unsigned long arg0, unsigned long arg1,
+unsigned long arg2, unsigned long

[PATCH v1 2/2] perf stat: Support topdown with --all-kernel/--all-user

2019-09-24 Thread Jin Yao

When perf stat --topdown is enabled, the internal event list is expanded to:
"{topdown-total-slots,topdown-slots-retired,topdown-recovery-bubbles,topdown-fetch-bubbles,topdown-slots-issued}".

With this patch,

1. When --all-user is enabled, it's expanded to:
"{topdown-total-slots:u,topdown-slots-retired:u,topdown-recovery-bubbles:u,topdown-fetch-bubbles:u,topdown-slots-issued:u}"

2. When --all-kernel is enabled, it's expanded to:
"{topdown-total-slots:k,topdown-slots-retired:k,topdown-recovery-bubbles:k,topdown-fetch-bubbles:k,topdown-slots-issued:k}"

3. Both are enabled, it's expanded to:
"{topdown-total-slots:k,topdown-slots-retired:k,topdown-recovery-bubbles:k,topdown-fetch-bubbles:k,topdown-slots-issued:k},{topdown-total-slots:u,topdown-slots-retired:u,topdown-recovery-bubbles:u,topdown-fetch-bubbles:u,topdown-slots-issued:u}"

This patch creates new topdown stat type (STAT_TOPDOWN_XXX_K /
STAT_TOPDOWN_XXX_U), and save the event counting value to type
related entry in runtime_stat rblist.

For example,

 root@kbl:~# perf stat -a --topdown --all-kernel -- sleep 1

 Performance counter stats for 'system wide':

  retiring:kbad speculation:k frontend 
bound:k  backend bound:k
S0-D0-C0   2 7.6% 1.8%
40.5%50.0%
S0-D0-C1   215.4% 3.4%
14.4%66.8%
S0-D0-C2   215.8% 5.1%
26.9%52.2%
S0-D0-C3   2 5.7% 5.7%
46.2%42.4%

   1.000771709 seconds time elapsed

 root@kbl:~# perf stat -a --topdown --all-user -- sleep 1

 Performance counter stats for 'system wide':

  retiring:ubad speculation:u frontend 
bound:u  backend bound:u
S0-D0-C0   2 0.5% 0.0% 
0.0%99.4%
S0-D0-C1   2 5.7% 5.8%
77.7%10.7%
S0-D0-C2   215.5%20.5%
35.8%28.2%
S0-D0-C3   214.1% 0.5% 
1.5%83.9%

   1.000773028 seconds time elapsed

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-stat.c |  37 +++-
 tools/perf/util/stat-shadow.c | 167 +-
 tools/perf/util/stat.h|  12 +++
 3 files changed, 171 insertions(+), 45 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7f4d22b00d04..b766293b9a15 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1436,7 +1436,8 @@ static int add_default_attributes(void)
 
if (topdown_run) {
char *str = NULL;
-   bool warn = false;
+   bool warn = false, append_uk = false;
+   struct strbuf new_str;
 
if (stat_config.aggr_mode != AGGR_GLOBAL &&
stat_config.aggr_mode != AGGR_CORE) {
@@ -1457,6 +1458,21 @@ static int add_default_attributes(void)
return -1;
}
if (topdown_attrs[0] && str) {
+   int ret;
+
+   if (stat_config.all_kernel || stat_config.all_user) {
+   ret = append_modifier(_str, str,
+   stat_config.all_kernel,
+   stat_config.all_user);
+   if (ret)
+   return ret;
+
+   free(str);
+   str = strbuf_detach(_str, NULL);
+   strbuf_release(_str);
+   append_uk = true;
+   }
+
if (warn)
arch_topdown_group_warn();
err = parse_events(evsel_list, str, );
@@ -1468,6 +1484,25 @@ static int add_default_attributes(void)
free(str);
return -1;
}
+
+   if (append_uk) {
+   struct evsel *evsel;
+   char *p;
+
+   evlist__for_each_entry(evsel_list, evsel) {
+   /*
+* We appended the modifiers ":u"/":k"
+* to evsel->name. Since the events have
+* been parsed, remove the appended
+* modifiers from event name here.
+*/
+   if (evsel->name) {
+

[PATCH v1 0/2] perf stat: Support --all-kernel and --all-user

2019-09-24 Thread Jin Yao

This patch series supports the new options "--all-kernel" and "--all-user"
in perf-stat.

For example,

root@kbl:~# perf stat -e cycles,instructions --all-kernel --all-user -a -- 
sleep 1

 Performance counter stats for 'system wide':

19,156,665  cycles:k
 7,265,342  instructions:k#0.38  insn per cycle
 4,511,186,293  cycles:u
   121,881,436  instructions:u#0.03  insn per cycle

   1.001153540 seconds time elapsed


 root@kbl:~# perf stat -a --topdown --all-kernel -- sleep 1

 Performance counter stats for 'system wide':

  retiring:kbad speculation:k frontend 
bound:k  backend bound:k
S0-D0-C0   2 7.6% 1.8%
40.5%50.0%
S0-D0-C1   215.4% 3.4%
14.4%66.8%
S0-D0-C2   215.8% 5.1%
26.9%52.2%
S0-D0-C3   2 5.7% 5.7%
46.2%42.4%

   1.000771709 seconds time elapsed

More detail information are in the patch descriptions.

Jin Yao (2):
  perf stat: Support --all-kernel and --all-user options
  perf stat: Support topdown with --all-kernel/--all-user

 tools/perf/Documentation/perf-record.txt |   3 +-
 tools/perf/Documentation/perf-stat.txt   |   7 +
 tools/perf/builtin-stat.c| 200 ++-
 tools/perf/util/stat-shadow.c| 167 ++-
 tools/perf/util/stat.h   |  23 +++
 5 files changed, 353 insertions(+), 47 deletions(-)

-- 
2.17.1

[PATCH v1 1/2] perf stat: Support --all-kernel and --all-user options

2019-09-24 Thread Jin Yao

perf record has supported --all-kernel / --all-user to configure all used
events to run in kernel space or in user space. But perf stat doesn't support
that. It would be useful to support these options so that we can collect
metrics for e.g. user space only without having to type ":u" in the events
manually.

Also it would be useful if --all-user / --all-kernel could be specified
together, and the tool would automatically add two copies of the events,
so that we get a break down between user and kernel.

Since we need to support specifying both --all-user and --all-kernel together,
we can't do as what the perf record does for supporting --all-user /
--all-kernel (it sets attr->exclude_kernel and attr->exclude_user).

This patch uses another solution which appends the modifiers ":u"/":k" to
event string.

For example,
perf stat -e cycles,instructions --all-user --all-kernel

It's automatically expanded to:
perf stat -e cycles:k,instructions:k,cycles:u,instructions:u

More examples,

 root@kbl:~# perf stat -e cycles --all-kernel --all-user -a -- sleep 1

 Performance counter stats for 'system wide':

20,884,637  cycles:k
 4,511,494,722  cycles:u

   1.000891147 seconds time elapsed

 root@kbl:~# perf stat -e cycles,instructions --all-kernel --all-user -a -- 
sleep 1

 Performance counter stats for 'system wide':

19,156,665  cycles:k
 7,265,342  instructions:k#0.38  insn per cycle
 4,511,186,293  cycles:u
   121,881,436  instructions:u#0.03  insn per cycle

   1.001153540 seconds time elapsed

 root@kbl:~#  perf stat -e "{cycles,instructions}" --all-kernel --all-user -a 
-- sleep 1

 Performance counter stats for 'system wide':

16,230,472  cycles:k
 5,357,549  instructions:k#0.33  insn per cycle
 4,510,695,030  cycles:u
   122,097,780  instructions:u#0.03  insn per cycle

   1.000933419 seconds time elapsed

 root@kbl:~# perf stat -e 
"{cycles,instructions},{cache-misses,cache-references}" --all-kernel --all-user 
-a -- sleep 1

 Performance counter stats for 'system wide':

   111,688,302  cycles:k
  (74.81%)
24,322,238  instructions:k#0.22  insn per cycle 
  (74.81%)
 1,115,414  cache-misses:k#   21.292 % of all cache 
refs  (75.02%)
 5,238,665  cache-references:k  
  (75.02%)
 4,506,792,681  cycles:u
  (75.22%)
   124,199,635  instructions:u#0.03  insn per cycle 
  (75.22%)
43,846,543  cache-misses:u#   62.616 % of all cache 
refs  (74.97%)
70,024,231  cache-references:u  
  (74.97%)

   1.001186804 seconds time elapsed

Note:

1. This patch can only configure the specified events (-e xxx) to run
   in user space or in kernel space. For supporting other options, such as
   --topdown, need follow-up patches.

2. In perf-record, it has already supported the --all-kernel and
   --all-user, but they can't be combined. We should keep the behavior
   consistent among all perf subtools. So if this patch can be accepted,
   will post follow-up patches for supporting other subtools for the
   same behavior.

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-record.txt |   3 +-
 tools/perf/Documentation/perf-stat.txt   |   7 +
 tools/perf/builtin-stat.c| 163 ++-
 tools/perf/util/stat.h   |  11 ++
 4 files changed, 182 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index c6f9f31b6039..739e29905184 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -495,7 +495,8 @@ Produce compressed trace using specified level n (default: 
1 - fastest compressi
 Configure all used events to run in kernel space.
 
 --all-user::
-Configure all used events to run in user space.
+Configure all used events to run in user space. The --all-kernel
+and --all-user can't be combined.
 
 --kernel-callchains::
 Collect callchains only from kernel space. I.e. this option sets
diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 930c51c01201..3f630e2f4144 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -323,6 +323,13 @@ The output is SMI cycles%, equals to (aperf - unhalted 
core cycles) / aperf
 
 Users who wants to get the actual value can apply --no-metric-only.
 
+--all-kernel::
+Configure all specified events to run in kernel space.
+
+--all-user::
+Configure all specified events to run in user space. The --all-kernel
+and

Re: efi-pstore: Crash logs not written

2019-09-24 Thread Kees Cook

On Thu, Sep 12, 2019 at 02:51:53PM +0200, Paul Menzel wrote:
> On a Dell OptiPlex 5040 with Linux 5.3-rc8, I’ll try to get
> efi-pstore working.
> 
> ```
> $ lsmod | grep efi
> efi_pstore 16384  0
> pstore 28672  1 efi_pstore
> efivarfs   16384  1
> $ dmesg | grep pstore
> [ 2569.826541] pstore: Using crash dump compression: deflate
> [ 2569.826542] pstore: Registered efi as persistent store backend
> ```
> 
> Triggering a crash with `echo c | sudo tee /proc/sysrq-trigger`,
> there is nothing in `/sys/fs/pstore` after the reboot.
> 
> Please note, that there is also a crash kernel configured, so
> there are actually to reboots, but that should not conflict,
> right?

As long as the crash kernel doesn't delete all the EFI stored variables,
I would expect them to survive.

> Hints on how to debug that would be appreciated. Please find the
> Linux messages attached.

Things seem correct, though I've only done EFI testing under QEMU...
do other things, like BUG (rather than a full panic()) get recorded?
(Maybe try with the lkdtm module?) There was another recent issue with
the "c" sysrq[1] but that seemed to be Xen-specific.

[1] 
https://lore.kernel.org/lkml/be41da82-3adc-4ab1-e4f9-5fdf11ac4...@oracle.com/

-- 
Kees Cook

Re: [PATCH] x86/mm: fix function name typo in pmd_read_atomic() comment

2019-09-24 Thread Wei Yang

To be honest, I have a question on how this works.

As the comment says, we need to call pmd_read_atomic before using
pte_offset_map_lock to avoid data corruption.

For example, in function swapin_walk_pmd_entry:

pmd_none_or_trans_huge_or_clear_bad(pmd)
pmd_read_atomic(pmd)  ---   1
pte_offset_map_lock(mm, pmd, ...) ---   2

At point 1, we are assured the content is intact. While in point 2, we would
read pmd again to calculate the pte address. How we ensure this time the
content is intact? Because pmd_none_or_trans_huge_or_clear_bad() ensures the
pte is stable, so that the content won't be changed?

Thanks for your time in advance.

-- 
Wei Yang
Help you, Help me

[PATCH RESEND v4] fs/epoll: Remove unnecessary wakeups of nested epoll that in ET mode

2019-09-24 Thread hev

From: Heiher 

Take the case where we have:

t0
 | (ew)
e0
 | (et)
e1
 | (lt)
s0

t0: thread 0
e0: epoll fd 0
e1: epoll fd 1
s0: socket fd 0
ew: epoll_wait
et: edge-trigger
lt: level-trigger

We only need to wakeup nested epoll fds if something has been queued to the
overflow list, since the ep_poll() traverses the rdllist during recursive poll
and thus events on the overflow list may not be visible yet.

Test code:
 #include 
 #include 
 #include 

 int main(int argc, char *argv[])
 {
int sfd[2];
int efd[2];
struct epoll_event e;

if (socketpair(AF_UNIX, SOCK_STREAM, 0, sfd) < 0)
goto out;

efd[0] = epoll_create(1);
if (efd[0] < 0)
goto out;

efd[1] = epoll_create(1);
if (efd[1] < 0)
goto out;

e.events = EPOLLIN;
if (epoll_ctl(efd[1], EPOLL_CTL_ADD, sfd[0], ) < 0)
goto out;

e.events = EPOLLIN | EPOLLET;
if (epoll_ctl(efd[0], EPOLL_CTL_ADD, efd[1], ) < 0)
goto out;

if (write(sfd[1], "w", 1) != 1)
goto out;

if (epoll_wait(efd[0], , 1, 0) != 1)
goto out;

if (epoll_wait(efd[0], , 1, 0) != 0)
goto out;

close(efd[0]);
close(efd[1]);
close(sfd[0]);
close(sfd[1]);

return 0;

 out:
return -1;
 }

More tests:
 https://github.com/heiher/epoll-wakeup

Cc: Al Viro 
Cc: Andrew Morton 
Cc: Davide Libenzi 
Cc: Davidlohr Bueso 
Cc: Dominik Brodowski 
Cc: Eric Wong 
Cc: Jason Baron 
Cc: Linus Torvalds 
Cc: Roman Penyaev 
Cc: Sridhar Samudrala 
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: hev 
---
 fs/eventpoll.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index c4159bcc05d9..a0c07f6653c6 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -704,12 +704,21 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
res = (*sproc)(ep, , priv);
 
write_lock_irq(>lock);
+   nepi = READ_ONCE(ep->ovflist);
+   /*
+* We only need to wakeup nested epoll fds if something has been queued
+* to the overflow list, since the ep_poll() traverses the rdllist
+* during recursive poll and thus events on the overflow list may not be
+* visible yet.
+*/
+   if (nepi != NULL)
+   pwake++;
/*
 * During the time we spent inside the "sproc" callback, some
 * other events might have been queued by the poll callback.
 * We re-insert them inside the main ready-list here.
 */
-   for (nepi = READ_ONCE(ep->ovflist); (epi = nepi) != NULL;
+   for (; (epi = nepi) != NULL;
 nepi = epi->next, epi->next = EP_UNACTIVE_PTR) {
/*
 * We need to check if the item is already in the list.
@@ -755,7 +764,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
mutex_unlock(>mtx);
 
/* We have to call this outside the lock */
-   if (pwake)
+   if (pwake == 2)
ep_poll_safewake(>poll_wait);
 
return res;
-- 
2.23.0

Re: [PATCH RESEND v2] fs/epoll: Remove unnecessary wakeups of nested epoll that in ET mode

2019-09-24 Thread Heiher

Hi,

On Tue, Sep 24, 2019 at 11:19 PM Jason Baron  wrote:
>
>
>
> On 9/24/19 10:06 AM, Heiher wrote:
> > Hi,
> >
> > On Mon, Sep 23, 2019 at 11:34 PM Jason Baron  wrote:
> >>
> >>
> >>
> >> On 9/20/19 12:00 PM, Jason Baron wrote:
> >>> On 9/19/19 5:24 AM, hev wrote:
>  From: Heiher 
> 
>  Take the case where we have:
> 
>  t0
>   | (ew)
>  e0
>   | (et)
>  e1
>   | (lt)
>  s0
> 
>  t0: thread 0
>  e0: epoll fd 0
>  e1: epoll fd 1
>  s0: socket fd 0
>  ew: epoll_wait
>  et: edge-trigger
>  lt: level-trigger
> 
>  When s0 fires an event, e1 catches the event, and then e0 catches an 
>  event from
>  e1. After this, There is a thread t0 do epoll_wait() many times on e0, 
>  it should
>  only get one event in total, because e1 is a dded to e0 in 
>  edge-triggered mode.
> 
>  This patch only allows the wakeup(>poll_wait) in ep_scan_ready_list 
>  under
>  two conditions:
> 
>   1. depth == 0.
>
>
> What is the point of this condition again? I was thinking we only need
> to do #2.
>
>   2. There have event is added to ep->ovflist during processing.
> 
>  Test code:
>   #include 
>   #include 
>   #include 
> 
>   int main(int argc, char *argv[])
>   {
>   int sfd[2];
>   int efd[2];
>   struct epoll_event e;
> 
>   if (socketpair(AF_UNIX, SOCK_STREAM, 0, sfd) < 0)
>   goto out;
> 
>   efd[0] = epoll_create(1);
>   if (efd[0] < 0)
>   goto out;
> 
>   efd[1] = epoll_create(1);
>   if (efd[1] < 0)
>   goto out;
> 
>   e.events = EPOLLIN;
>   if (epoll_ctl(efd[1], EPOLL_CTL_ADD, sfd[0], ) < 0)
>   goto out;
> 
>   e.events = EPOLLIN | EPOLLET;
>   if (epoll_ctl(efd[0], EPOLL_CTL_ADD, efd[1], ) < 0)
>   goto out;
> 
>   if (write(sfd[1], "w", 1) != 1)
>   goto out;
> 
>   if (epoll_wait(efd[0], , 1, 0) != 1)
>   goto out;
> 
>   if (epoll_wait(efd[0], , 1, 0) != 0)
>   goto out;
> 
>   close(efd[0]);
>   close(efd[1]);
>   close(sfd[0]);
>   close(sfd[1]);
> 
>   return 0;
> 
>   out:
>   return -1;
>   }
> 
>  More tests:
>   https://github.com/heiher/epoll-wakeup
> 
>  Cc: Al Viro 
>  Cc: Andrew Morton 
>  Cc: Davide Libenzi 
>  Cc: Davidlohr Bueso 
>  Cc: Dominik Brodowski 
>  Cc: Eric Wong 
>  Cc: Jason Baron 
>  Cc: Linus Torvalds 
>  Cc: Roman Penyaev 
>  Cc: Sridhar Samudrala 
>  Cc: linux-kernel@vger.kernel.org
>  Cc: linux-fsde...@vger.kernel.org
>  Signed-off-by: hev 
>  ---
>   fs/eventpoll.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
>  diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>  index c4159bcc05d9..fa71468dbd51 100644
>  --- a/fs/eventpoll.c
>  +++ b/fs/eventpoll.c
>  @@ -685,6 +685,9 @@ static __poll_t ep_scan_ready_list(struct eventpoll 
>  *ep,
>   if (!ep_locked)
>   mutex_lock_nested(>mtx, depth);
> 
>  +if (!depth || list_empty_careful(>rdllist))
>  +pwake++;
>  +
>
> This is the check I'm wondering why it's needed?

You are right. This is not needed. Initially, I want to keep the
original behavior of depth 0 for direct poll() in multi-threads.

>
> Thanks,
>
>
> -Jason
>


-- 
Best regards!
Hev
https://hev.cc

Re: [PATCH v2] devfreq: Add tracepoint for frequency changes

2019-09-24 Thread Chanwoo Choi

On 19. 9. 25. 오전 4:37, Matthias Kaehlcke wrote:
> On Fri, Sep 20, 2019 at 10:15:57AM +0900, Chanwoo Choi wrote:
>> Hi,
> 
> sorry for the delayed response, you message got buried in my
> mailbox.
> 
>> On 19. 9. 20. 오전 2:44, Matthias Kaehlcke wrote:
>>> Add a tracepoint for frequency changes of devfreq devices and
>>> use it.
>>>
>>> Signed-off-by: Matthias Kaehlcke 
>>> ---
>>> (sending v2 without much delay wrt v1, since the change in devfreq
>>>  probably isn't controversial, and I'll be offline a few days)
>>>
>>> Changes in v2:
>>> - included trace_devfreq_frequency_enabled() in the condition
>>>   to avoid unnecessary evaluation when the trace point is
>>>   disabled
>>> ---
>>>  drivers/devfreq/devfreq.c  |  3 +++
>>>  include/trace/events/devfreq.h | 18 ++
>>>  2 files changed, 21 insertions(+)
>>>
>>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>>> index ab22bf8a12d6..e9f04dcafb01 100644
>>> --- a/drivers/devfreq/devfreq.c
>>> +++ b/drivers/devfreq/devfreq.c
>>> @@ -317,6 +317,9 @@ static int devfreq_set_target(struct devfreq *devfreq, 
>>> unsigned long new_freq,
>>>  
>>> devfreq->previous_freq = new_freq;
>>>  
>>> +   if (trace_devfreq_frequency_enabled() && new_freq != cur_freq)
>>> +   trace_devfreq_frequency(devfreq, new_freq);
>>
>> You can change as following without 'new_freq' variable
>> because devfreq->previous_freq is the new frequency. 
>>  trace_devfreq_frequency(devfreq);
> 
> In general that sounds good.
> 
> devfreq essentially uses df->previous_freq as df->cur_freq, I think
> most code using it would be clearer if we renamed it accordingly.
> I'll send a separate patch for this.

Actually, according to reference time of the 'df->previous_freq',
'previous_freq' is proper or 'cur_freq is proper.
But, In the comment of 'struct devfreq',
it means the configured time as following: 
* @previous_freq:  previously configured frequency value.
I think that it it not big problem to keep the name.

> 
>>> +
>>> if (devfreq->suspend_freq)
>>> devfreq->resume_freq = cur_freq;
>>>  
>>> diff --git a/include/trace/events/devfreq.h b/include/trace/events/devfreq.h
>>> index cf5b8772175d..a62d32fe3c33 100644
>>> --- a/include/trace/events/devfreq.h
>>> +++ b/include/trace/events/devfreq.h
>>> @@ -8,6 +8,24 @@
>>>  #include 
>>>  #include 
>>>  
>>> +TRACE_EVENT(devfreq_frequency,
>>> +   TP_PROTO(struct devfreq *devfreq, unsigned long freq),
>>
>> 'unsigned long freq' parameter is not necessary.
>>
>>> +
>>> +   TP_ARGS(devfreq, freq),
>>> +
>>> +   TP_STRUCT__entry(
>>> +   __string(dev_name, dev_name(>dev))
>>> +   __field(unsigned long, freq)
>>> +   ),
>>> +
>>> +   TP_fast_assign(
>>> +   __assign_str(dev_name, dev_name(>dev));
>>> +   __entry->freq = freq;
>>
>> Initialize the new frequency with 'devfreq->previous_freq' as following:
>>
>>  __entry->freq = devfreq->previous_freq;
>>
>>> +   ),
>>> +
>>> +   TP_printk("dev_name=%s freq=%lu", __get_str(dev_name), __entry->freq)
>>> +);
>>> +
>>>  TRACE_EVENT(devfreq_monitor,
>>> TP_PROTO(struct devfreq *devfreq),
>>>  
>>>
>>
>>
>> -- 
>> Best Regards,
>> Chanwoo Choi
>> Samsung Electronics
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: For review: pidfd_send_signal(2) manual page

2019-09-24 Thread Jann Horn

On Mon, Sep 23, 2019 at 1:26 PM Florian Weimer  wrote:
> * Michael Kerrisk:
> >The  pidfd_send_signal()  system call allows the avoidance of race
> >conditions that occur when using traditional interfaces  (such  as
> >kill(2)) to signal a process.  The problem is that the traditional
> >interfaces specify the target process via a process ID (PID), with
> >the  result  that the sender may accidentally send a signal to the
> >wrong process if the originally intended target process has termi‐
> >nated  and its PID has been recycled for another process.  By con‐
> >trast, a PID file descriptor is a stable reference to  a  specific
> >process;  if  that  process  terminates,  then the file descriptor
> >ceases to be  valid  and  the  caller  of  pidfd_send_signal()  is
> >informed of this fact via an ESRCH error.
>
> It would be nice to explain somewhere how you can avoid the race using
> a PID descriptor.  Is there anything else besides CLONE_PIDFD?

My favorite example here is that you could implement "killall" without
PID reuse races. With /proc/$pid file descriptors, you could do it
like this (rough pseudocode with missing error handling and resource
leaks and such):

for each pid {
  procfs_pid_fd = open("/proc/"+pid);
  if (procfs_pid_fd == -1) continue;
  comm_fd = openat(procfs_pid_fd, "comm");
  if (comm_fd == -1) continue;
  char buf[1000];
  int n = read(comm_fd, buf, sizeof(buf)-1);
  buf[n] = 0;
  if (strcmp(buf, expected_comm) == 0) {
pidfd_send_signal(procfs_pid_fd, SIGKILL, NULL, 0);
  }
}

If you want to avoid using a procfs fd for this, I think you can still
do it, the dance just gets more complicated:

for each pid {
  procfs_pid_fd = open("/proc/"+pid);
  if (procfs_pid_fd == -1) continue;
  pid_fd = pidfd_open(pid, 0);
  if (pid_fd == -1) continue;
  /* at this point procfs_pid_fd and pid_fd may refer to different processes */
  comm_fd = openat(procfs_pid_fd, "comm");
  if (comm_fd == -1) continue;
  /* at this point we know that procfs_pid_fd and pid_fd refer to the
same struct pid, because otherwise the procfs_pid_fd must point to a
directory that throws -ESRCH for everything */
  char buf[1000];
  int n = read(comm_fd, buf, sizeof(buf)-1);
  buf[n] = 0;
  if (strcmp(buf, expected_comm) == 0) {
pidfd_send_signal(pid_fd, SIGKILL, NULL, 0);
  }
}

But I don't think anyone is actually interested in using pidfds for
this kind of usecase right now.

[PATCH xfstests v2] overlay: Enable character device to be the base fs partition

2019-09-24 Thread Zhihao Cheng

There is a message in _supported_fs():
_notrun "not suitable for this filesystem type: $FSTYP"
for when overlay usecases are executed on a chararcter device based base
fs. _overay_config_override() detects that the current base fs partition
is not a block device, and FSTYP won't be overwritten as 'overlay' before
executing usecases which results in all overlay usecases become 'notrun'.
In addition, all generic usecases are based on base fs rather than overlay.

We want to rewrite FSTYP to 'overlay' before running the usecases. To do
this, we need to add additional character device judgments for TEST_DEV
and SCRATCH_DEV in _overay_config_override().

Signed-off-by: Zhihao Cheng 
---
 common/config | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/common/config b/common/config
index 4c86a49..a22acdb 100644
--- a/common/config
+++ b/common/config
@@ -550,7 +550,7 @@ _overlay_config_override()
#the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
#of the configured base fs and SCRATCH/TEST_DEV vars are set to the
#overlayfs base and mount dirs inside base fs mount.
-   [ -b "$TEST_DEV" ] || return 0
+   [ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || return 0
 
# Config file may specify base fs type, but we obay -overlay flag
[ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
@@ -570,7 +570,7 @@ _overlay_config_override()
export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
 
-   [ -b "$SCRATCH_DEV" ] || return 0
+   [ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || return 0
 
# Store original base fs vars
export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
-- 
2.7.4

[PATCH] x86/mm: fix function name typo in pmd_read_atomic() comment

2019-09-24 Thread Wei Yang

The function involved should be pte_offset_map_lock and we never have
function pmd_offset_map_lock defined.

Fixes: 26c191788f18 ("mm: pmd_read_atomic: fix 32bit PAE pmd walk vs 
pmd_populate SMP race conditio")

Signed-off-by: Wei Yang 
---

Hope my understanding is correct.

---
 arch/x86/include/asm/pgtable-3level.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level.h 
b/arch/x86/include/asm/pgtable-3level.h
index e3633795fb22..45e6099fe6b7 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -44,10 +44,10 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
  * pmd_populate rightfully does a set_64bit, but if we're reading the
  * pmd_t with a "*pmdp" on the mincore side, a SMP race can happen
  * because gcc will not read the 64bit of the pmd atomically. To fix
- * this all places running pmd_offset_map_lock() while holding the
+ * this all places running pte_offset_map_lock() while holding the
  * mmap_sem in read mode, shall read the pmdp pointer using this
  * function to know if the pmd is null nor not, and in turn to know if
- * they can run pmd_offset_map_lock or pmd_trans_huge or other pmd
+ * they can run pte_offset_map_lock or pmd_trans_huge or other pmd
  * operations.
  *
  * Without THP if the mmap_sem is hold for reading, the pmd can only
-- 
2.17.1

Re: linux-next: Tree for Sep 16 (kernel/sched/core.c)

2019-09-24 Thread Randy Dunlap

On 9/18/19 3:03 AM, Patrick Bellasi wrote:
> 
> On Wed, Sep 18, 2019 at 07:05:53 +0100, Ingo Molnar wrote...
> 
>> * Randy Dunlap  wrote:
>>
>>> On 9/17/19 6:38 AM, Patrick Bellasi wrote:
>>>>
>>>> On Tue, Sep 17, 2019 at 08:52:42 +0100, Ingo Molnar wrote...
>>>>
>>>>> * Randy Dunlap  wrote:
>>>>>
>>>>>> On 9/16/19 3:38 PM, Mark Brown wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Changes since 20190915:
>>>>>>>
>>>>>>
>>>>>> on x86_64:
>>>>>>
>>>>>> when CONFIG_CGROUPS is not set:
>>>>
>>>> Hi Randy,
>>>> thanks for the report.
>>>>
>>>>>>   CC  kernel/sched/core.o
>>>>>> ../kernel/sched/core.c: In function ‘uclamp_update_active_tasks’:
>>>>>> ../kernel/sched/core.c:1081:23: error: storage size of ‘it’ isn’t known
>>>>>>   struct css_task_iter it;
>>>>>>^~
>>>>>>   CC  kernel/printk/printk_safe.o
>>>>>> ../kernel/sched/core.c:1084:2: error: implicit declaration of function 
>>>>>> ‘css_task_iter_start’; did you mean ‘__sg_page_iter_start’? 
>>>>>> [-Werror=implicit-function-declaration]
>>>>>>   css_task_iter_start(css, 0, );
>>>>>>   ^~~
>>>>>>   __sg_page_iter_start
>>>>>> ../kernel/sched/core.c:1085:14: error: implicit declaration of function 
>>>>>> ‘css_task_iter_next’; did you mean ‘__sg_page_iter_next’? 
>>>>>> [-Werror=implicit-function-declaration]
>>>>>>   while ((p = css_task_iter_next())) {
>>>>>>   ^~
>>>>>>   __sg_page_iter_next
>>>>>> ../kernel/sched/core.c:1091:2: error: implicit declaration of function 
>>>>>> ‘css_task_iter_end’; did you mean ‘get_task_cred’? 
>>>>>> [-Werror=implicit-function-declaration]
>>>>>>   css_task_iter_end();
>>>>>>   ^
>>>>>>   get_task_cred
>>>>>> ../kernel/sched/core.c:1081:23: warning: unused variable ‘it’ 
>>>>>> [-Wunused-variable]
>>>>>>   struct css_task_iter it;
>>>>>>^~
>>>>>>
>>>>>
>>>>> I cannot reproduce this build failue: I took Linus's latest which has all 
>>>>> the -next scheduler commits included (ad062195731b), and an x86-64 "make 
>>>>> defconfig" and a disabling of CONFIG_CGROUPS still resuls in a kernel 
>>>>> that builds fine.
>>>>
>>>> Same here Ingo, I cannot reproduce on arm64 and !CONFIG_CGROUPS and
>>>> testing on tip/sched/core.
>>>>
>>>> However, if you like, the following patch can make that code a
>>>> bit more "robust".
>>>>
>>>> Best,
>>>> Patrick
>>>>
>>>> ---8<---
>>>> From 7e17b7bb08dd8dfc57e01c2a7b6875439eb47cbe Mon Sep 17 00:00:00 2001
>>>> From: Patrick Bellasi 
>>>> Date: Tue, 17 Sep 2019 14:12:10 +0100
>>>> Subject: [PATCH 1/1] sched/core: uclamp: Fix compile error on 
>>>> !CONFIG_CGROUPS
>>>>
>>>> Randy reported a compiler error on x86_64 and !CONFIG_CGROUPS which is due
>>>> to uclamp_update_active_tasks() using the undefined css_task_iter().
>>>>
>>>> Since uclamp_update_active_tasks() is used only when cgroup support is
>>>> enabled, fix that by properly guarding that function at compile time.
>>>>
>>>> Signed-off-by: Patrick Bellasi 
>>>> Link: 
>>>> https://lore.kernel.org/lkml/1898d3c9-1997-17ce-a022-a5e28c8dc...@infradead.org/
>>>> Fixes: commit babbe170e05 ("sched/uclamp: Update CPU's refcount on TG's 
>>>> clamp changes")
>>>
>>> Acked-by: Randy Dunlap  # build-tested
>>>
>>> Thanks.
>>
>> Build failures like this one shouldn't depend on the compiler version - 
>> and it's still a mystery how and why this build bug triggered - we cannot 
>> apply the fix without knowing the answer to those questions.
> 
> Right, but it's also quite strange it's not triggering without the
> guarding above. The only definition of struct css_task_iter I can see is
> the one
> provided in:
> 
>include/linux/cgroup.h:50
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/cgroup.h?h=35f7a95266153b1cf0caca3aa9661cb721864527#n50
> 
> which is CONFIG_CGROUPS guarded.
> 
>> Can you reproduce the build bug with Linus's latest tree? If not, which 
>> part of -next triggers the build failure?
> 
> I tried again using this morning's Linus tree headed at:
> 
>   commit 35f7a9526615 ("Merge tag 'devprop-5.4-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm")
> 
> and compilation actually fails for me too.

and linux-next of 20190924 still fails also...


> Everything is fine in v5.3 with !CONFIG_CGROUPS and a git bisect
> between v5.3 and Linus master points to:
> 
>   commit babbe170e053c ("sched/uclamp: Update CPU's refcount on TG's clamp 
> changes")
> 
> So, I think it's really my fault not properly testing !CONFIG_CGROUP,
> which is enforced by default from CONFIG_SCHED_AUTOGROUP.
> 
> The patch above fixes the compilation error, hope this helps.
> 
> Cheers,
> Patrick


-- 
~Randy

[PATCH] rtlwifi: prevent memory leak in rtl_usb_probe

2019-09-24 Thread Navid Emamdoost

In rtl_usb_probe if allocation for usb_data fails the allocated hw
should be released. In addition the allocated rtlpriv->usb_data should
be released on error handling path.

Signed-off-by: Navid Emamdoost 
---
 drivers/net/wireless/realtek/rtlwifi/usb.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/usb.c 
b/drivers/net/wireless/realtek/rtlwifi/usb.c
index 4b59f3b46b28..348b0072cdd6 100644
--- a/drivers/net/wireless/realtek/rtlwifi/usb.c
+++ b/drivers/net/wireless/realtek/rtlwifi/usb.c
@@ -1021,8 +1021,10 @@ int rtl_usb_probe(struct usb_interface *intf,
rtlpriv->hw = hw;
rtlpriv->usb_data = kcalloc(RTL_USB_MAX_RX_COUNT, sizeof(u32),
GFP_KERNEL);
-   if (!rtlpriv->usb_data)
+   if (!rtlpriv->usb_data) {
+   ieee80211_free_hw(hw);
return -ENOMEM;
+   }
 
/* this spin lock must be initialized early */
spin_lock_init(>locks.usb_lock);
@@ -1083,6 +1085,7 @@ int rtl_usb_probe(struct usb_interface *intf,
_rtl_usb_io_handler_release(hw);
usb_put_dev(udev);
complete(>firmware_loading_complete);
+   kfree(rtlpriv->usb_data);
return -ENODEV;
 }
 EXPORT_SYMBOL(rtl_usb_probe);
-- 
2.17.1

[PATCH v8] perf diff: Report noisy for cycles diff

2019-09-24 Thread Jin Yao

This patch prints the stddev and hist for the cycles diff of
program block. It can help us to understand if the cycles
is noisy or not.

This patch is inspired by Andi Kleen's patch
https://lwn.net/Articles/600471/

We create new option '--cycles-hist'.

Example:

perf record -b ./div
perf record -b ./div
perf diff -c cycles

  # Baseline   [Program Block Range] Cycles 
Diff  Shared Object  Symbol
  #   
..  
.  
  #
  46.72% [div.c:40 -> div.c:40] 
   0  div[.] main
  46.72% [div.c:42 -> div.c:44] 
   0  div[.] main
  46.72% [div.c:42 -> div.c:39] 
   0  div[.] main
  20.54% [random_r.c:357 -> random_r.c:394] 
   1  libc-2.27.so   [.] __random_r
  20.54% [random_r.c:357 -> random_r.c:380] 
   0  libc-2.27.so   [.] __random_r
  20.54% [random_r.c:388 -> random_r.c:388] 
   0  libc-2.27.so   [.] __random_r
  20.54% [random_r.c:388 -> random_r.c:391] 
   0  libc-2.27.so   [.] __random_r
  17.04% [random.c:288 -> random.c:291] 
   0  libc-2.27.so   [.] __random
  17.04% [random.c:291 -> random.c:291] 
   0  libc-2.27.so   [.] __random
  17.04% [random.c:293 -> random.c:293] 
   0  libc-2.27.so   [.] __random
  17.04% [random.c:295 -> random.c:295] 
   0  libc-2.27.so   [.] __random
  17.04% [random.c:295 -> random.c:295] 
   0  libc-2.27.so   [.] __random
  17.04% [random.c:298 -> random.c:298] 
   0  libc-2.27.so   [.] __random
   8.40% [div.c:22 -> div.c:25] 
   0  div[.] compute_flag
   8.40% [div.c:27 -> div.c:28] 
   0  div[.] compute_flag
   5.14%   [rand.c:26 -> rand.c:27] 
   0  libc-2.27.so   [.] rand
   5.14%   [rand.c:28 -> rand.c:28] 
   0  libc-2.27.so   [.] rand
   2.15% [rand@plt+0 -> rand@plt+0] 
   0  div[.] rand@plt
   0.00%
  [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
   0.00%   [do_mmap+714 -> do_mmap+732] 
 -10  [kernel.kallsyms]  [k] do_mmap
   0.00%   [do_mmap+737 -> do_mmap+765] 
   1  [kernel.kallsyms]  [k] do_mmap
   0.00%   [do_mmap+262 -> do_mmap+299] 
   0  [kernel.kallsyms]  [k] do_mmap
   0.00% [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0] 
   7  [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
   0.00%   [native_sched_clock+0 -> native_sched_clock+119] 
  -1  [kernel.kallsyms]  [k] native_sched_clock
   0.00%[native_write_msr+0 -> native_write_msr+16] 
 -13  [kernel.kallsyms]  [k] native_write_msr

When we enable the option '--cycles-hist', the output is

perf diff -c cycles --cycles-hist

  # Baseline   [Program Block Range] Cycles 
Diffstddev/Hist  Shared Object  Symbol
  #   
..  
.  .  
  #
  46.72% [div.c:40 -> div.c:40] 
   0  ± 37.8% ▁█▁▁██▁█   div[.] main
  46.72% [div.c:42 -> div.c:44] 
   0  ± 49.4% ▁▁▂█   div[.] main
  46.72% [div.c:42 -> div.c:39] 
   0  ± 24.1% ▃█▂▄▁▃▂▁   div[.] main
  20.54% [random_r.c:357 -> random_r.c:394] 
   1  ± 33.5% ▅▂▁█▃▁▂▁   libc-2.27.so   [.] __random_r
  20.54% [random_r.c:357 -> random_r.c:380] 
   0  ± 39.4% ▁▁█▁██▅▁   libc-2.27.so   [.] __random_r
  20.54% [random_r.c:388 -> random_r.c:388] 
   0 libc-2.27.so   [.] __random_r
  20.54% [random_r.c:388 -> random_r.c:391] 
   0  ± 41.2% ▁▃▁▂█▄▃▁   libc-2.27.so   [.] __random_r
  17.04%

[PATCH] iwlwifi: prevent memory leak

2019-09-24 Thread Navid Emamdoost

In alloc_sgtable if alloc_page fails, along with releasing previously
allocated pages, the allocated table should be released too.

Signed-off-by: Navid Emamdoost 
---
 drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c 
b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
index 5c8602de9168..87421807e040 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
@@ -646,6 +646,7 @@ static struct scatterlist *alloc_sgtable(int size)
if (new_page)
__free_page(new_page);
}
+   kfree(table);
return NULL;
}
alloc_size = min_t(int, size, PAGE_SIZE);
-- 
2.17.1

[PATCH 11/15] mm: Remove hpage_nr_pages

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

This function assumed that compound pages were necessarily PMD sized.
While that may be true for some users, it's not going to be true for
all users forever, so it's better to remove it and avoid the confusion
by just using compound_nr() or page_size().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 drivers/nvdimm/btt.c  |  4 +---
 drivers/nvdimm/pmem.c |  3 +--
 include/linux/huge_mm.h   |  8 
 include/linux/mm_inline.h |  6 +++---
 mm/filemap.c  |  2 +-
 mm/gup.c  |  2 +-
 mm/internal.h |  4 ++--
 mm/memcontrol.c   | 14 +++---
 mm/memory_hotplug.c   |  4 ++--
 mm/mempolicy.c|  2 +-
 mm/migrate.c  | 19 ++-
 mm/mlock.c|  9 -
 mm/page_io.c  |  4 ++--
 mm/page_vma_mapped.c  |  6 +++---
 mm/rmap.c |  8 
 mm/swap.c |  4 ++--
 mm/swap_state.c   |  4 ++--
 mm/swapfile.c |  2 +-
 mm/vmscan.c   |  4 ++--
 19 files changed, 49 insertions(+), 60 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index a8d56887ec88..2aac2bf10a37 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1488,10 +1488,8 @@ static int btt_rw_page(struct block_device *bdev, 
sector_t sector,
 {
struct btt *btt = bdev->bd_disk->private_data;
int rc;
-   unsigned int len;
 
-   len = hpage_nr_pages(page) * PAGE_SIZE;
-   rc = btt_do_bvec(btt, NULL, page, len, 0, op, sector);
+   rc = btt_do_bvec(btt, NULL, page, page_size(page), 0, op, sector);
if (rc == 0)
page_endio(page, op_is_write(op), 0);
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f9f76f6ba07b..778c73fd10d6 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -224,8 +224,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t 
sector,
struct pmem_device *pmem = bdev->bd_queue->queuedata;
blk_status_t rc;
 
-   rc = pmem_do_bvec(pmem, page, hpage_nr_pages(page) * PAGE_SIZE,
- 0, op, sector);
+   rc = pmem_do_bvec(pmem, page, page_size(page), 0, op, sector);
 
/*
 * The ->rw_page interface is subtle and tricky.  The core
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 779e83800a77..6018d31549c3 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -226,12 +226,6 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
else
return NULL;
 }
-static inline int hpage_nr_pages(struct page *page)
-{
-   if (unlikely(PageTransHuge(page)))
-   return HPAGE_PMD_NR;
-   return 1;
-}
 
 struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, int flags, struct dev_pagemap **pgmap);
@@ -285,8 +279,6 @@ static inline struct list_head *page_deferred_list(struct 
page *page)
 #define HPAGE_PUD_MASK ({ BUILD_BUG(); 0; })
 #define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; })
 
-#define hpage_nr_pages(x) 1
-
 static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
 {
return false;
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 6f2fef7b0784..3bd675ce6ba8 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -47,14 +47,14 @@ static __always_inline void update_lru_size(struct lruvec 
*lruvec,
 static __always_inline void add_page_to_lru_list(struct page *page,
struct lruvec *lruvec, enum lru_list lru)
 {
-   update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page));
+   update_lru_size(lruvec, lru, page_zonenum(page), compound_nr(page));
list_add(>lru, >lists[lru]);
 }
 
 static __always_inline void add_page_to_lru_list_tail(struct page *page,
struct lruvec *lruvec, enum lru_list lru)
 {
-   update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page));
+   update_lru_size(lruvec, lru, page_zonenum(page), compound_nr(page));
list_add_tail(>lru, >lists[lru]);
 }
 
@@ -62,7 +62,7 @@ static __always_inline void del_page_from_lru_list(struct 
page *page,
struct lruvec *lruvec, enum lru_list lru)
 {
list_del(>lru);
-   update_lru_size(lruvec, lru, page_zonenum(page), -hpage_nr_pages(page));
+   update_lru_size(lruvec, lru, page_zonenum(page), -compound_nr(page));
 }
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index 8eca91547e40..b07ef9469861 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -196,7 +196,7 @@ static void unaccount_page_cache_page(struct address_space 
*mapping,
if (PageHuge(page))
return;
 
-   nr = hpage_nr_pages(page);
+   nr = compound_nr(page);
 
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
if (PageSwapBacked(page)) {

[PATCH 15/15] xfs: Use filemap_huge_fault

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

Signed-off-by: Matthew Wilcox (Oracle) 
---
 fs/xfs/xfs_file.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index d952d5962e93..9445196f8056 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1156,6 +1156,8 @@ __xfs_filemap_fault(
} else {
if (write_fault)
ret = iomap_page_mkwrite(vmf, _iomap_ops);
+   else if (pe_size)
+   ret = filemap_huge_fault(vmf, pe_size);
else
ret = filemap_fault(vmf);
}
@@ -1181,9 +1183,6 @@ xfs_filemap_huge_fault(
struct vm_fault *vmf,
enum page_entry_sizepe_size)
 {
-   if (!IS_DAX(file_inode(vmf->vma->vm_file)))
-   return VM_FAULT_FALLBACK;
-
/* DAX can shortcut the normal fault path on write faults! */
return __xfs_filemap_fault(vmf, pe_size,
(vmf->flags & FAULT_FLAG_WRITE));
-- 
2.23.0

[PATCH 08/15] mm: Add __page_cache_alloc_order

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

This new function allows page cache pages to be allocated that are
larger than an order-0 page.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/pagemap.h | 14 +++---
 mm/filemap.c| 12 
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 103205494ea0..d610a49be571 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -208,14 +208,22 @@ static inline int page_cache_add_speculative(struct page 
*page, int count)
 }
 
 #ifdef CONFIG_NUMA
-extern struct page *__page_cache_alloc(gfp_t gfp);
+extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order);
 #else
-static inline struct page *__page_cache_alloc(gfp_t gfp)
+static inline
+struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
 {
-   return alloc_pages(gfp, 0);
+   if (order == 0)
+   return alloc_pages(gfp, 0);
+   return prep_transhuge_page(alloc_pages(gfp | __GFP_COMP, order));
 }
 #endif
 
+static inline struct page *__page_cache_alloc(gfp_t gfp)
+{
+   return __page_cache_alloc_order(gfp, 0);
+}
+
 static inline struct page *page_cache_alloc(struct address_space *x)
 {
return __page_cache_alloc(mapping_gfp_mask(x));
diff --git a/mm/filemap.c b/mm/filemap.c
index 625ef3ef19f3..bab97addbb1d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -962,24 +962,28 @@ int add_to_page_cache_lru(struct page *page, struct 
address_space *mapping,
 EXPORT_SYMBOL_GPL(add_to_page_cache_lru);
 
 #ifdef CONFIG_NUMA
-struct page *__page_cache_alloc(gfp_t gfp)
+struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
 {
int n;
struct page *page;
 
+   if (order > 0)
+   gfp |= __GFP_COMP;
+
if (cpuset_do_page_mem_spread()) {
unsigned int cpuset_mems_cookie;
do {
cpuset_mems_cookie = read_mems_allowed_begin();
n = cpuset_mem_spread_node();
-   page = __alloc_pages_node(n, gfp, 0);
+   page = __alloc_pages_node(n, gfp, order);
+   prep_transhuge_page(page);
} while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
 
return page;
}
-   return alloc_pages(gfp, 0);
+   return prep_transhuge_page(alloc_pages(gfp, order));
 }
-EXPORT_SYMBOL(__page_cache_alloc);
+EXPORT_SYMBOL(__page_cache_alloc_order);
 #endif
 
 /*
-- 
2.23.0

[PATCH 07/15] mm: Make prep_transhuge_page tail-callable

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

By permitting NULL or order-0 pages as an argument, and returning the
argument, callers can write:

return prep_transhuge_page(alloc_pages(...));

instead of assigning the result to a temporary variable and conditionally
passing that to prep_transhuge_page().

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/huge_mm.h | 7 +--
 mm/huge_memory.c| 9 +++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 61c9ffd89b05..779e83800a77 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -153,7 +153,7 @@ extern unsigned long thp_get_unmapped_area(struct file 
*filp,
unsigned long addr, unsigned long len, unsigned long pgoff,
unsigned long flags);
 
-extern void prep_transhuge_page(struct page *page);
+extern struct page *prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 
 bool can_split_huge_page(struct page *page, int *pextra_pins);
@@ -303,7 +303,10 @@ static inline bool transhuge_vma_suitable(struct 
vm_area_struct *vma,
return false;
 }
 
-static inline void prep_transhuge_page(struct page *page) {}
+static inline struct page *prep_transhuge_page(struct page *page)
+{
+   return page;
+}
 
 #define transparent_hugepage_flags 0UL
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 73fc517c08d2..cbe7d0619439 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -516,15 +516,20 @@ static inline struct deferred_split 
*get_deferred_split_queue(struct page *page)
 }
 #endif
 
-void prep_transhuge_page(struct page *page)
+struct page *prep_transhuge_page(struct page *page)
 {
+   if (!page || compound_order(page) == 0)
+   return page;
/*
-* we use page->mapping and page->indexlru in second tail page
+* we use page->mapping and page->index in second tail page
 * as list_head: assuming THP order >= 2
 */
+   BUG_ON(compound_order(page) == 1);
 
INIT_LIST_HEAD(page_deferred_list(page));
set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
+
+   return page;
 }
 
 static unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long 
len,
-- 
2.23.0

[PATCH 09/15] mm: Allow large pages to be added to the page cache

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

We return -EEXIST if there are any non-shadow entries in the page
cache in the range covered by the large page.  If there are multiple
shadow entries in the range, we set *shadowp to one of them (currently
the one at the highest index).  If that turns out to be the wrong
answer, we can implement something more complex.  This is mostly
modelled after the equivalent function in the shmem code.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/filemap.c | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index bab97addbb1d..afe8f5d95810 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -855,6 +855,7 @@ static int __add_to_page_cache_locked(struct page *page,
int huge = PageHuge(page);
struct mem_cgroup *memcg;
int error;
+   unsigned int nr = 1;
void *old;
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -866,31 +867,45 @@ static int __add_to_page_cache_locked(struct page *page,
  gfp_mask, , false);
if (error)
return error;
+   xas_set_order(, offset, compound_order(page));
+   nr = compound_nr(page);
}
 
-   get_page(page);
+   page_ref_add(page, nr);
page->mapping = mapping;
page->index = offset;
 
do {
+   unsigned long exceptional = 0;
+   unsigned int i = 0;
+
xas_lock_irq();
-   old = xas_load();
-   if (old && !xa_is_value(old))
+   xas_for_each_conflict(, old) {
+   if (!xa_is_value(old))
+   break;
+   exceptional++;
+   if (shadowp)
+   *shadowp = old;
+   }
+   if (old)
xas_set_err(, -EEXIST);
-   xas_store(, page);
+   xas_create_range();
if (xas_error())
goto unlock;
 
-   if (xa_is_value(old)) {
-   mapping->nrexceptional--;
-   if (shadowp)
-   *shadowp = old;
+next:
+   xas_store(, page);
+   if (++i < nr) {
+   xas_next();
+   goto next;
}
-   mapping->nrpages++;
+   mapping->nrexceptional -= exceptional;
+   mapping->nrpages += nr;
 
/* hugetlb pages do not participate in page cache accounting */
if (!huge)
-   __inc_node_page_state(page, NR_FILE_PAGES);
+   __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES,
+   nr);
 unlock:
xas_unlock_irq();
} while (xas_nomem(, gfp_mask & GFP_RECLAIM_MASK));
@@ -907,7 +922,7 @@ static int __add_to_page_cache_locked(struct page *page,
/* Leave page->index set: truncation relies upon it */
if (!huge)
mem_cgroup_cancel_charge(page, memcg, false);
-   put_page(page);
+   page_ref_sub(page, nr);
return xas_error();
 }
 ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);
-- 
2.23.0

[PATCH 05/15] xfs: Support large pages

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

Mostly this is just checking the page size of each page instead of
assuming PAGE_SIZE.  Clean up the logic in writepage a little.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 fs/xfs/xfs_aops.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 102cfd8a97d6..1a26e9ca626b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -765,7 +765,7 @@ xfs_add_to_ioend(
struct xfs_mount*mp = ip->i_mount;
struct block_device *bdev = xfs_find_bdev_for_inode(inode);
unsignedlen = i_blocksize(inode);
-   unsignedpoff = offset & (PAGE_SIZE - 1);
+   unsignedpoff = offset & (page_size(page) - 1);
boolmerged, same_page = false;
sector_tsector;
 
@@ -843,7 +843,7 @@ xfs_aops_discard_page(
if (error && !XFS_FORCED_SHUTDOWN(mp))
xfs_alert(mp, "page discard unable to remove delalloc 
mapping.");
 out_invalidate:
-   xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
+   xfs_vm_invalidatepage(page, 0, page_size(page));
 }
 
 /*
@@ -984,8 +984,7 @@ xfs_do_writepage(
struct xfs_writepage_ctx *wpc = data;
struct inode*inode = page->mapping->host;
loff_t  offset;
-   uint64_t  end_offset;
-   pgoff_t end_index;
+   uint64_tend_offset;
 
trace_xfs_writepage(inode, page, 0, 0);
 
@@ -1024,10 +1023,9 @@ xfs_do_writepage(
 * -^--|
 */
offset = i_size_read(inode);
-   end_index = offset >> PAGE_SHIFT;
-   if (page->index < end_index)
-   end_offset = (xfs_off_t)(page->index + 1) << PAGE_SHIFT;
-   else {
+   end_offset = file_offset_of_next_page(page);
+
+   if (end_offset > offset) {
/*
 * Check whether the page to write out is beyond or straddles
 * i_size or not.
@@ -1039,7 +1037,8 @@ xfs_do_writepage(
 * ||  Straddles |
 * -^---||
 */
-   unsigned offset_into_page = offset & (PAGE_SIZE - 1);
+   unsigned offset_into_page = offset_in_this_page(page, offset);
+   pgoff_t end_index = offset >> PAGE_SHIFT;
 
/*
 * Skip the page if it is fully outside i_size, e.g. due to a
@@ -1070,7 +1069,7 @@ xfs_do_writepage(
 * memory is zeroed when mapped, and writes to that region are
 * not written out to the file."
 */
-   zero_user_segment(page, offset_into_page, PAGE_SIZE);
+   zero_user_segment(page, offset_into_page, page_size(page));
 
/* Adjust the end_offset to the end of file */
end_offset = offset;
-- 
2.23.0

[PATCH 10/15] mm: Allow find_get_page to be used for large pages

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

Add FGP_PMD to indicate that we're trying to find-or-create a page that
is at least PMD_ORDER in size.  The internal 'conflict' entry usage
is modelled after that in DAX, but the implementations are different
due to DAX using multi-order entries and the page cache using multiple
order-0 entries.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/pagemap.h | 13 ++
 mm/filemap.c| 99 +++--
 2 files changed, 99 insertions(+), 13 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index d610a49be571..d6d97f9fb762 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -248,6 +248,19 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
 #define FGP_NOFS   0x0010
 #define FGP_NOWAIT 0x0020
 #define FGP_FOR_MMAP   0x0040
+/*
+ * If you add more flags, increment FGP_ORDER_SHIFT (no further than 25).
+ * Do not insert flags above the FGP order bits.
+ */
+#define FGP_ORDER_SHIFT7
+#define FGP_PMD((PMD_SHIFT - PAGE_SHIFT) << 
FGP_ORDER_SHIFT)
+#define FGP_PUD((PUD_SHIFT - PAGE_SHIFT) << 
FGP_ORDER_SHIFT)
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define fgp_order(fgp) ((fgp) >> FGP_ORDER_SHIFT)
+#else
+#define fgp_order(fgp) 0
+#endif
 
 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset,
int fgp_flags, gfp_t cache_gfp_mask);
diff --git a/mm/filemap.c b/mm/filemap.c
index afe8f5d95810..8eca91547e40 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1576,7 +1576,71 @@ struct page *find_get_entry(struct address_space 
*mapping, pgoff_t offset)
 
return page;
 }
-EXPORT_SYMBOL(find_get_entry);
+
+static bool pagecache_is_conflict(struct page *page)
+{
+   return page == XA_RETRY_ENTRY;
+}
+
+/**
+ * __find_get_page - Find and get a page cache entry.
+ * @mapping: The address_space to search.
+ * @offset: The page cache index.
+ * @order: The minimum order of the entry to return.
+ *
+ * Looks up the page cache entries at @mapping between @offset and
+ * @offset + 2^@order.  If there is a page cache page, it is returned with
+ * an increased refcount unless it is smaller than @order.
+ *
+ * If the slot holds a shadow entry of a previously evicted page, or a
+ * swap entry from shmem/tmpfs, it is returned.
+ *
+ * Return: the found page, a value indicating a conflicting page or %NULL if
+ * there are no pages in this range.
+ */
+static struct page *__find_get_page(struct address_space *mapping,
+   unsigned long offset, unsigned int order)
+{
+   XA_STATE(xas, >i_pages, offset);
+   struct page *page;
+
+   rcu_read_lock();
+repeat:
+   xas_reset();
+   page = xas_find(, offset | ((1UL << order) - 1));
+   if (xas_retry(, page))
+   goto repeat;
+   /*
+* A shadow entry of a recently evicted page, or a swap entry from
+* shmem/tmpfs.  Skip it; keep looking for pages.
+*/
+   if (xa_is_value(page))
+   goto repeat;
+   if (!page)
+   goto out;
+   if (compound_order(page) < order) {
+   page = XA_RETRY_ENTRY;
+   goto out;
+   }
+
+   if (!page_cache_get_speculative(page))
+   goto repeat;
+
+   /*
+* Has the page moved or been split?
+* This is part of the lockless pagecache protocol. See
+* include/linux/pagemap.h for details.
+*/
+   if (unlikely(page != xas_reload())) {
+   put_page(page);
+   goto repeat;
+   }
+   page = find_subpage(page, offset);
+out:
+   rcu_read_unlock();
+
+   return page;
+}
 
 /**
  * find_lock_entry - locate, pin and lock a page cache entry
@@ -1618,12 +1682,12 @@ EXPORT_SYMBOL(find_lock_entry);
  * pagecache_get_page - find and get a page reference
  * @mapping: the address_space to search
  * @offset: the page index
- * @fgp_flags: PCG flags
+ * @fgp_flags: FGP flags
  * @gfp_mask: gfp mask to use for the page cache data page allocation
  *
  * Looks up the page cache slot at @mapping & @offset.
  *
- * PCG flags modify how the page is returned.
+ * FGP flags modify how the page is returned.
  *
  * @fgp_flags can be:
  *
@@ -1636,6 +1700,10 @@ EXPORT_SYMBOL(find_lock_entry);
  * - FGP_FOR_MMAP: Similar to FGP_CREAT, only we want to allow the caller to do
  *   its own locking dance if the page is already in cache, or unlock the page
  *   before returning if we had to add the page to pagecache.
+ * - FGP_PMD: We're only interested in pages at PMD granularity.  If there
+ *   is no page here (and FGP_CREATE is set), we'll create one large enough.
+ *   If there is a smaller page in the cache that overlaps the PMD page, we
+ *   return %NULL and do not attempt to create a page.
  *
  * If FGP_LOCK or FGP_CREAT are specified then the function

[PATCH 02/15] fs: Introduce i_blocks_per_page

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

This helper is useful for both large pages in the page cache and for
supporting block size larger than page size.  Convert some example
users (we have a few different ways of writing this idiom).

Signed-off-by: Matthew Wilcox (Oracle) 
---
 fs/iomap/buffered-io.c  |  4 ++--
 fs/jfs/jfs_metapage.c   |  2 +-
 fs/xfs/xfs_aops.c   |  8 
 include/linux/pagemap.h | 13 +
 4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e25901ae3ff4..0e76a4b6d98a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -24,7 +24,7 @@ iomap_page_create(struct inode *inode, struct page *page)
 {
struct iomap_page *iop = to_iomap_page(page);
 
-   if (iop || i_blocksize(inode) == PAGE_SIZE)
+   if (iop || i_blocks_per_page(inode, page) <= 1)
return iop;
 
iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
@@ -128,7 +128,7 @@ iomap_set_range_uptodate(struct page *page, unsigned off, 
unsigned len)
bool uptodate = true;
 
if (iop) {
-   for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
+   for (i = 0; i < i_blocks_per_page(inode, page); i++) {
if (i >= first && i <= last)
set_bit(i, iop->uptodate);
else if (!test_bit(i, iop->uptodate))
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index a2f5338a5ea1..176580f54af9 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -473,7 +473,7 @@ static int metapage_readpage(struct file *fp, struct page 
*page)
struct inode *inode = page->mapping->host;
struct bio *bio = NULL;
int block_offset;
-   int blocks_per_page = PAGE_SIZE >> inode->i_blkbits;
+   int blocks_per_page = i_blocks_per_page(inode, page);
sector_t page_start;/* address of page in fs blocks */
sector_t pblock;
int xlen;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index f16d5f196c6b..102cfd8a97d6 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -68,7 +68,7 @@ xfs_finish_page_writeback(
mapping_set_error(inode->i_mapping, -EIO);
}
 
-   ASSERT(iop || i_blocksize(inode) == PAGE_SIZE);
+   ASSERT(iop || i_blocks_per_page(inode, bvec->bv_page) <= 1);
ASSERT(!iop || atomic_read(>write_count) > 0);
 
if (!iop || atomic_dec_and_test(>write_count))
@@ -839,7 +839,7 @@ xfs_aops_discard_page(
page, ip->i_ino, offset);
 
error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
-   PAGE_SIZE / i_blocksize(inode));
+   i_blocks_per_page(inode, page));
if (error && !XFS_FORCED_SHUTDOWN(mp))
xfs_alert(mp, "page discard unable to remove delalloc 
mapping.");
 out_invalidate:
@@ -877,7 +877,7 @@ xfs_writepage_map(
uint64_tfile_offset;/* file offset of page */
int error = 0, count = 0, i;
 
-   ASSERT(iop || i_blocksize(inode) == PAGE_SIZE);
+   ASSERT(iop || i_blocks_per_page(inode, page) <= 1);
ASSERT(!iop || atomic_read(>write_count) == 0);
 
/*
@@ -886,7 +886,7 @@ xfs_writepage_map(
 * one.
 */
for (i = 0, file_offset = page_offset(page);
-i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
+i < i_blocks_per_page(inode, page) && file_offset < end_offset;
 i++, file_offset += len) {
if (iop && !test_bit(i, iop->uptodate))
continue;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 37a4d9e32cd3..750770a2c685 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -636,4 +636,17 @@ static inline unsigned long dir_pages(struct inode *inode)
   PAGE_SHIFT;
 }
 
+/**
+ * i_blocks_per_page - How many blocks fit in this page.
+ * @inode: The inode which contains the blocks.
+ * @page: The (potentially large) page.
+ *
+ * Context: Any context.
+ * Return: The number of filesystem blocks covered by this page.
+ */
+static inline
+unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
+{
+   return page_size(page) >> inode->i_blkbits;
+}
 #endif /* _LINUX_PAGEMAP_H */
-- 
2.23.0

[PATCH 04/15] iomap: Support large pages

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

Change iomap_page from a statically sized uptodate bitmap to a dynamically
allocated uptodate bitmap, allowing an arbitrarily large page.

The only remaining places where iomap assumes an order-0 page are for
files with inline data, where there's no sense in allocating a larger
page.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 fs/iomap/buffered-io.c | 119 ++---
 include/linux/iomap.h  |   2 +-
 include/linux/mm.h |   2 +
 3 files changed, 80 insertions(+), 43 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 0e76a4b6d98a..15d844a88439 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -23,14 +23,14 @@ static struct iomap_page *
 iomap_page_create(struct inode *inode, struct page *page)
 {
struct iomap_page *iop = to_iomap_page(page);
+   unsigned int n;
 
if (iop || i_blocks_per_page(inode, page) <= 1)
return iop;
 
-   iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
-   atomic_set(>read_count, 0);
-   atomic_set(>write_count, 0);
-   bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
+   n = BITS_TO_LONGS(i_blocks_per_page(inode, page));
+   iop = kmalloc(struct_size(iop, uptodate, n),
+   GFP_NOFS | __GFP_NOFAIL | __GFP_ZERO);
 
/*
 * migrate_page_move_mapping() assumes that pages with private data have
@@ -61,15 +61,16 @@ iomap_page_release(struct page *page)
  * Calculate the range inside the page that we actually need to read.
  */
 static void
-iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
+iomap_adjust_read_range(struct inode *inode, struct page *page,
loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp)
 {
+   struct iomap_page *iop = to_iomap_page(page);
loff_t orig_pos = *pos;
loff_t isize = i_size_read(inode);
unsigned block_bits = inode->i_blkbits;
unsigned block_size = (1 << block_bits);
-   unsigned poff = offset_in_page(*pos);
-   unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
+   unsigned poff = offset_in_this_page(page, *pos);
+   unsigned plen = min_t(loff_t, page_size(page) - poff, length);
unsigned first = poff >> block_bits;
unsigned last = (poff + plen - 1) >> block_bits;
 
@@ -107,7 +108,8 @@ iomap_adjust_read_range(struct inode *inode, struct 
iomap_page *iop,
 * page cache for blocks that are entirely outside of i_size.
 */
if (orig_pos <= isize && orig_pos + length > isize) {
-   unsigned end = offset_in_page(isize - 1) >> block_bits;
+   unsigned end = offset_in_this_page(page, isize - 1) >>
+   block_bits;
 
if (first <= end && last > end)
plen -= (last - end) * block_size;
@@ -121,19 +123,16 @@ static void
 iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 {
struct iomap_page *iop = to_iomap_page(page);
-   struct inode *inode = page->mapping->host;
-   unsigned first = off >> inode->i_blkbits;
-   unsigned last = (off + len - 1) >> inode->i_blkbits;
-   unsigned int i;
bool uptodate = true;
 
if (iop) {
-   for (i = 0; i < i_blocks_per_page(inode, page); i++) {
-   if (i >= first && i <= last)
-   set_bit(i, iop->uptodate);
-   else if (!test_bit(i, iop->uptodate))
-   uptodate = false;
-   }
+   struct inode *inode = page->mapping->host;
+   unsigned first = off >> inode->i_blkbits;
+   unsigned count = len >> inode->i_blkbits;
+
+   bitmap_set(iop->uptodate, first, count);
+   if (!bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
+   uptodate = false;
}
 
if (uptodate && !PageError(page))
@@ -194,6 +193,7 @@ iomap_read_inline_data(struct inode *inode, struct page 
*page,
return;
 
BUG_ON(page->index);
+   BUG_ON(PageCompound(page));
BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
addr = kmap_atomic(page);
@@ -203,6 +203,16 @@ iomap_read_inline_data(struct inode *inode, struct page 
*page,
SetPageUptodate(page);
 }
 
+/*
+ * Estimate the number of vectors we need based on the current page size;
+ * if we're wrong we'll end up doing an overly large allocation or needing
+ * to do a second allocation, neither of which is a big deal.
+ */
+static unsigned int iomap_nr_vecs(struct page *page, loff_t length)
+{
+   return (length + page_size(page) - 1) >> page_shift(page);
+}
+
 static loff_t
 iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void 
*data,
struct iomap *iomap)
@@ -222,7 +232,7 @@ iomap_readpage_actor(struct inode

[RFC 00/15] Large pages in the page-cache

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

Here's what I'm currently playing with.  I'm having trouble _testing_
it, but since akpm's patches were just merged into Linus' tree, I
thought this would be a good point to send out my current work tree.
Thanks to kbuild bot for finding a bunch of build problems ;-)

Matthew Wilcox (Oracle) (12):
  mm: Use vm_fault error code directly
  fs: Introduce i_blocks_per_page
  mm: Add file_offset_of_ helpers
  iomap: Support large pages
  xfs: Support large pages
  xfs: Pass a page to xfs_finish_page_writeback
  mm: Make prep_transhuge_page tail-callable
  mm: Add __page_cache_alloc_order
  mm: Allow large pages to be added to the page cache
  mm: Allow find_get_page to be used for large pages
  mm: Remove hpage_nr_pages
  xfs: Use filemap_huge_fault

William Kucharski (3):
  mm: Support removing arbitrary sized pages from mapping
  mm: Add a huge page fault handler for files
  mm: Align THP mappings for non-DAX

 drivers/net/ethernet/ibm/ibmveth.c |   2 -
 drivers/nvdimm/btt.c   |   4 +-
 drivers/nvdimm/pmem.c  |   3 +-
 fs/iomap/buffered-io.c | 121 +++
 fs/jfs/jfs_metapage.c  |   2 +-
 fs/xfs/xfs_aops.c  |  37 ++--
 fs/xfs/xfs_file.c  |   5 +-
 include/linux/huge_mm.h|  15 +-
 include/linux/iomap.h  |   2 +-
 include/linux/mm.h |  12 ++
 include/linux/mm_inline.h  |   6 +-
 include/linux/pagemap.h|  73 ++-
 mm/filemap.c   | 311 ++---
 mm/gup.c   |   2 +-
 mm/huge_memory.c   |  11 +-
 mm/internal.h  |   4 +-
 mm/memcontrol.c|  14 +-
 mm/memory_hotplug.c|   4 +-
 mm/mempolicy.c |   2 +-
 mm/migrate.c   |  19 +-
 mm/mlock.c |   9 +-
 mm/page_io.c   |   4 +-
 mm/page_vma_mapped.c   |   6 +-
 mm/rmap.c  |   8 +-
 mm/swap.c  |   4 +-
 mm/swap_state.c|   4 +-
 mm/swapfile.c  |   2 +-
 mm/vmscan.c|   9 +-
 28 files changed, 519 insertions(+), 176 deletions(-)

-- 
2.23.0

[PATCH 01/15] mm: Use vm_fault error code directly

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

Use VM_FAULT_OOM instead of indirecting through vmf_error(-ENOMEM).

Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1146fcfa3215..625ef3ef19f3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2533,7 +2533,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
if (!page) {
if (fpin)
goto out_retry;
-   return vmf_error(-ENOMEM);
+   return VM_FAULT_OOM;
}
}
 
-- 
2.23.0

[PATCH 06/15] xfs: Pass a page to xfs_finish_page_writeback

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

The only part of the bvec we were accessing was the bv_page, so just
pass that instead of the whole bvec.

Signed-off-by: Matthew Wilcox (Oracle) 
---
 fs/xfs/xfs_aops.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1a26e9ca626b..edcb4797fcc2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -58,21 +58,21 @@ xfs_find_daxdev_for_inode(
 static void
 xfs_finish_page_writeback(
struct inode*inode,
-   struct bio_vec  *bvec,
+   struct page *page,
int error)
 {
-   struct iomap_page   *iop = to_iomap_page(bvec->bv_page);
+   struct iomap_page   *iop = to_iomap_page(page);
 
if (error) {
-   SetPageError(bvec->bv_page);
+   SetPageError(page);
mapping_set_error(inode->i_mapping, -EIO);
}
 
-   ASSERT(iop || i_blocks_per_page(inode, bvec->bv_page) <= 1);
+   ASSERT(iop || i_blocks_per_page(inode, page) <= 1);
ASSERT(!iop || atomic_read(>write_count) > 0);
 
if (!iop || atomic_dec_and_test(>write_count))
-   end_page_writeback(bvec->bv_page);
+   end_page_writeback(page);
 }
 
 /*
@@ -106,7 +106,7 @@ xfs_destroy_ioend(
 
/* walk each page on bio, ending page IO on them */
bio_for_each_segment_all(bvec, bio, iter_all)
-   xfs_finish_page_writeback(inode, bvec, error);
+   xfs_finish_page_writeback(inode, bvec->bv_page, error);
bio_put(bio);
}
 
-- 
2.23.0

[PATCH 13/15] mm: Add a huge page fault handler for files

2019-09-24 Thread Matthew Wilcox

From: William Kucharski 

Add filemap_huge_fault() to attempt to satisfy page
faults on memory-mapped read-only text pages using THP when possible.

Signed-off-by: William Kucharski 
[rebased on top of mm prep patches -- Matthew]
Signed-off-by: Matthew Wilcox (Oracle) 
---
 include/linux/mm.h  |  10 +++
 include/linux/pagemap.h |   8 ++
 mm/filemap.c| 165 ++--
 3 files changed, 178 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 04bea9f9282c..623878f11eaf 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2414,6 +2414,16 @@ extern void truncate_inode_pages_final(struct 
address_space *);
 
 /* generic vm_area_ops exported for stackable file systems */
 extern vm_fault_t filemap_fault(struct vm_fault *vmf);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+extern vm_fault_t filemap_huge_fault(struct vm_fault *vmf,
+   enum page_entry_size pe_size);
+#else
+static inline vm_fault_t filemap_huge_fault(struct vm_fault *vmf,
+   enum page_entry_size pe_size)
+{
+   return VM_FAULT_FALLBACK;
+}
+#endif
 extern void filemap_map_pages(struct vm_fault *vmf,
pgoff_t start_pgoff, pgoff_t end_pgoff);
 extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index d6d97f9fb762..ae09788f5345 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -354,6 +354,14 @@ static inline struct page *grab_cache_page_nowait(struct 
address_space *mapping,
mapping_gfp_mask(mapping));
 }
 
+/* This (head) page should be found at this offset in the page cache */
+static inline void page_cache_assert(struct page *page, pgoff_t offset)
+{
+   VM_BUG_ON_PAGE(PageTail(page), page);
+   VM_BUG_ON_PAGE(page->index == (offset & ~(compound_nr(page) - 1)),
+   page);
+}
+
 static inline struct page *find_subpage(struct page *page, pgoff_t offset)
 {
if (PageHuge(page))
diff --git a/mm/filemap.c b/mm/filemap.c
index b07ef9469861..8017e905df7a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1590,7 +1590,8 @@ static bool pagecache_is_conflict(struct page *page)
  *
  * Looks up the page cache entries at @mapping between @offset and
  * @offset + 2^@order.  If there is a page cache page, it is returned with
- * an increased refcount unless it is smaller than @order.
+ * an increased refcount unless it is smaller than @order.  This function
+ * returns the head page, not a tail page.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
@@ -1601,7 +1602,7 @@ static bool pagecache_is_conflict(struct page *page)
 static struct page *__find_get_page(struct address_space *mapping,
unsigned long offset, unsigned int order)
 {
-   XA_STATE(xas, >i_pages, offset);
+   XA_STATE(xas, >i_pages, offset & ~((1UL << order) - 1));
struct page *page;
 
rcu_read_lock();
@@ -1635,7 +1636,6 @@ static struct page *__find_get_page(struct address_space 
*mapping,
put_page(page);
goto repeat;
}
-   page = find_subpage(page, offset);
 out:
rcu_read_unlock();
 
@@ -1741,11 +1741,12 @@ struct page *pagecache_get_page(struct address_space 
*mapping, pgoff_t offset,
put_page(page);
goto repeat;
}
-   VM_BUG_ON_PAGE(page->index != offset, page);
+   page_cache_assert(page, offset);
}
 
if (fgp_flags & FGP_ACCESSED)
mark_page_accessed(page);
+   page = find_subpage(page, offset);
 
 no_page:
if (!page && (fgp_flags & FGP_CREAT)) {
@@ -2638,7 +2639,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
put_page(page);
goto retry_find;
}
-   VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
+   page_cache_assert(page, offset);
 
/*
 * We have a locked page in the page cache, now we need to check
@@ -2711,6 +2712,160 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 }
 EXPORT_SYMBOL(filemap_fault);
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+/**
+ * filemap_huge_fault - Read in file data for page fault handling.
+ * @vmf: struct vm_fault containing details of the fault.
+ * @pe_size: Page entry size.
+ *
+ * filemap_huge_fault() is invoked via the vma operations vector for a
+ * mapped memory region to read in file data during a page fault.
+ *
+ * The goto's are kind of ugly, but this streamlines the normal case of having
+ * it in the page cache, and handles the special cases reasonably without
+ * having a lot of duplicated code.
+ *
+ * vma->vm_mm->mmap_sem must be held on entry.
+ *
+ * If our return value has VM_FAULT_RETRY set, it's because the mmap_sem
+ * may be dropped before doing I/O or by lock_page_maybe_drop_mmap().
+ *
+ * If our return value

[PATCH 12/15] mm: Support removing arbitrary sized pages from mapping

2019-09-24 Thread Matthew Wilcox

From: William Kucharski 

__remove_mapping() assumes that pages can only be either base pages
or HPAGE_PMD_SIZE.  Ask the page what size it is.

Signed-off-by: William Kucharski 
Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/vmscan.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a7f9f379e523..9f44868e640b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -932,10 +932,7 @@ static int __remove_mapping(struct address_space *mapping, 
struct page *page,
 * Note that if SetPageDirty is always performed via set_page_dirty,
 * and thus under the i_pages lock, then this ordering is not required.
 */
-   if (unlikely(PageTransHuge(page)) && PageSwapCache(page))
-   refcount = 1 + HPAGE_PMD_NR;
-   else
-   refcount = 2;
+   refcount = 1 + compound_nr(page);
if (!page_ref_freeze(page, refcount))
goto cannot_free;
/* note: atomic_cmpxchg in page_ref_freeze provides the smp_rmb */
-- 
2.23.0

[PATCH 03/15] mm: Add file_offset_of_ helpers

2019-09-24 Thread Matthew Wilcox

From: "Matthew Wilcox (Oracle)" 

The page_offset function is badly named for people reading the functions
which call it.  The natural meaning of a function with this name would
be 'offset within a page', not 'page offset in bytes within a file'.
Dave Chinner suggests file_offset_of_page() as a replacement function
name and I'm also adding file_offset_of_next_page() as a helper for the
large page work.  Also add kernel-doc for these functions so they show
up in the kernel API book.

page_offset() is retained as a compatibility define for now.
---
 drivers/net/ethernet/ibm/ibmveth.c |  2 --
 include/linux/pagemap.h| 25 ++---
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
b/drivers/net/ethernet/ibm/ibmveth.c
index c5be4ebd8437..bf98aeaf9a45 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -978,8 +978,6 @@ static int ibmveth_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
return -EOPNOTSUPP;
 }
 
-#define page_offset(v) ((unsigned long)(v) & ((1 << 12) - 1))
-
 static int ibmveth_send(struct ibmveth_adapter *adapter,
union ibmveth_buf_desc *descs, unsigned long mss)
 {
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 750770a2c685..103205494ea0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -428,14 +428,33 @@ static inline pgoff_t page_to_pgoff(struct page *page)
return page_to_index(page);
 }
 
-/*
- * Return byte-offset into filesystem object for page.
+/**
+ * file_offset_of_page - File offset of this page.
+ * @page: Page cache page.
+ *
+ * Context: Any context.
+ * Return: The offset of the first byte of this page.
  */
-static inline loff_t page_offset(struct page *page)
+static inline loff_t file_offset_of_page(struct page *page)
 {
return ((loff_t)page->index) << PAGE_SHIFT;
 }
 
+/* Legacy; please convert callers */
+#define page_offset(page)  file_offset_of_page(page)
+
+/**
+ * file_offset_of_next_page - File offset of the next page.
+ * @page: Page cache page.
+ *
+ * Context: Any context.
+ * Return: The offset of the first byte after this page.
+ */
+static inline loff_t file_offset_of_next_page(struct page *page)
+{
+   return ((loff_t)page->index + compound_nr(page)) << PAGE_SHIFT;
+}
+
 static inline loff_t page_file_offset(struct page *page)
 {
return ((loff_t)page_index(page)) << PAGE_SHIFT;
-- 
2.23.0

[PATCH 14/15] mm: Align THP mappings for non-DAX

2019-09-24 Thread Matthew Wilcox

From: William Kucharski 

When we have the opportunity to use transparent huge pages to map a
file, we want to follow the same rules as DAX.

Signed-off-by: William Kucharski 
Signed-off-by: Matthew Wilcox (Oracle) 
---
 mm/huge_memory.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cbe7d0619439..670a1780bd2f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -563,8 +563,6 @@ unsigned long thp_get_unmapped_area(struct file *filp, 
unsigned long addr,
 
if (addr)
goto out;
-   if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
-   goto out;
 
addr = __thp_get_unmapped_area(filp, len, off, flags, PMD_SIZE);
if (addr)
-- 
2.23.0

[PATCH] RISC-V: Clear load reservations while restoring hart contexts

2019-09-24 Thread Palmer Dabbelt

This is almost entirely a comment.  The bug is unlikely to manifest on
existing hardware because there is a timeout on load reservations, but
manifests on QEMU because there is no timeout.

Signed-off-by: Palmer Dabbelt 
---
 arch/riscv/include/asm/asm.h |  1 +
 arch/riscv/kernel/entry.S| 21 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
index 5a02b7d50940..9c992a88d858 100644
--- a/arch/riscv/include/asm/asm.h
+++ b/arch/riscv/include/asm/asm.h
@@ -22,6 +22,7 @@
 
 #define REG_L  __REG_SEL(ld, lw)
 #define REG_S  __REG_SEL(sd, sw)
+#define REG_SC __REG_SEL(sc.d, sc.w)
 #define SZREG  __REG_SEL(8, 4)
 #define LGREG  __REG_SEL(3, 2)
 
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 74ccfd464071..9fbb256da55d 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -98,7 +98,26 @@ _save_context:
  */
.macro RESTORE_ALL
REG_L a0, PT_SSTATUS(sp)
-   REG_L a2, PT_SEPC(sp)
+   /*
+* The current load reservation is effectively part of the processor's
+* state, in the sense that load reservations cannot be shared between
+* different hart contexts.  We can't actually save and restore a load
+* reservation, so instead here we clear any existing reservation --
+* it's always legal for implementations to clear load reservations at
+* any point (as long as the forward progress guarantee is kept, but
+* we'll ignore that here).
+*
+* Dangling load reservations can be the result of taking a trap in the
+* middle of an LR/SC sequence, but can also be the result of a taken
+* forward branch around an SC -- which is how we implement CAS.  As a
+* result we need to clear reservations between the last CAS and the
+* jump back to the new context.  While it is unlikely the store
+* completes, implementations are allowed to expand reservations to be
+* arbitrarily large.
+*/
+   REG_L  a2, PT_SEPC(sp)
+   REG_SC x0, a2, PT_SEPC(sp)
+
csrw CSR_SSTATUS, a0
csrw CSR_SEPC, a2
 
-- 
2.21.0

Re: [PULL REQUEST] i2c for 5.4

2019-09-24 Thread pr-tracker-bot

The pull request you sent on Tue, 24 Sep 2019 21:31:03 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-5.4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/351c8a09b00b5c51c8f58b016fffe51f87e2d820

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] sound fixes for 5.4-rc1

2019-09-24 Thread pr-tracker-bot

The pull request you sent on Tue, 24 Sep 2019 14:07:39 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git 
> tags/sound-fix-5.4-rc1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/3cf7487c5de713b706ca2e1f66ec5f9b27fe265a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

[GIT PULL] tpmdd fixes for Linux v5.4-rc1

2019-09-24 Thread Jarkko Sakkinen

Hi

These are bug fixes for bugs found after my v5.4 PR.

/Jarkko

The following changes since commit 4c07e2ddab5b6b57dbcb09aedbda1f484d5940cc:

  Merge tag 'mfd-next-5.4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd (2019-09-23 19:37:49 
-0700)

are available in the Git repository at:

  git://git.infradead.org/users/jjs/linux-tpmdd.git tags/tpmdd-next-20190925

for you to fetch changes up to e13cd21ffd50a07b55dcc4d8c38cedf27f28eaa1:

  tpm: Wrap the buffer from the caller to tpm_buf in tpm_send() (2019-09-25 
02:43:57 +0300)


tpmdd fixes for Linux v5.4


Denis Efremov (1):
  MAINTAINERS: keys: Update path to trusted.h

Jarkko Sakkinen (2):
  selftests/tpm2: Add the missing TEST_FILES assignment
  tpm: Wrap the buffer from the caller to tpm_buf in tpm_send()

Petr Vorel (1):
  selftests/tpm2: Add log and *.pyc to .gitignore

Roberto Sassu (1):
  KEYS: trusted: correctly initialize digests and fix locking issue

 MAINTAINERS   |  2 +-
 drivers/char/tpm/tpm-interface.c  | 23 +++
 security/keys/trusted.c   |  5 +
 tools/testing/selftests/.gitignore|  2 ++
 tools/testing/selftests/tpm2/Makefile |  1 +
 5 files changed, 20 insertions(+), 13 deletions(-)

Re: [PULL REQUEST] i2c for 5.4

2019-09-24 Thread Linus Torvalds

On Tue, Sep 24, 2019 at 12:31 PM Wolfram Sang  wrote:
>
> - new driver for ICY, an Amiga Zorro card :)

Christ. Will that thing _never_ die?

But the reason I'm actually replying is not to comment on the apparent
death-defying Amiga hardware scene, but to point out that you should
try to fix your email configuration:

> Bj??rn Ard?? (2):
>   i2c-eeprom_slave: Add support for more eeprom models
>   i2c: slave-eeprom: Add comment about address handling

This is all fine in the git repo, being proper utf-8 "Björn Ardö".

But your mutt setup doesn't seem to be using a proper utf-8 locale and
instead uses

  Content-Type: text/plain; charset=us-ascii
  Content-Disposition: inline

like it was the last century.

I don't know what the proper mutt incantation is to make it join the
modern world, but I'm sure one exists, and then your emails would get
names right too. Even if they are some funky Swedish ones with åäö.

(And no, don't use Latin1 - it may cover Swedish and German etc, but
you really want to go with proper utf-8 and be able to handle true
complex character sets, not just the Western European ones).

Linus

Re: [PATCH V3 09/10] interconnect: mediatek: Add mt8183 interconnect provider driver

2019-09-24 Thread Georgi Djakov

Hi Henry,

Please CC also the linux-pm@ list.

On 8/28/19 05:28, Henry Chen wrote:
> Introduce Mediatek MT8183 specific provider driver using the
> interconnect framework.
> 
> Signed-off-by: Henry Chen 
> ---
>  drivers/interconnect/Kconfig|   1 +
>  drivers/interconnect/Makefile   |   1 +
>  drivers/interconnect/mediatek/Kconfig   |  13 ++
>  drivers/interconnect/mediatek/Makefile  |   3 +
>  drivers/interconnect/mediatek/mtk-emi.c | 246 
> 
>  5 files changed, 264 insertions(+)
>  create mode 100644 drivers/interconnect/mediatek/Kconfig
>  create mode 100644 drivers/interconnect/mediatek/Makefile
>  create mode 100644 drivers/interconnect/mediatek/mtk-emi.c
> 
[..]
> +
> +#define MT8183_MAX_LINKS 6

Looks like 1 is enough. Sorry for missing this in my earlier review.

> +
> +/**
> + * struct mtk_icc_node - Mediatek specific interconnect nodes
> + * @name: the node name used in debugfs
> + * @ep: true if the node is an end point.
> + * @id: a unique node identifier
> + * @links: an array of nodes where we can go next while traversing
> + * @num_links: the total number of @links
> + * @buswidth: width of the interconnect between a node and the bus
> + * @sum_avg: current sum aggregate value of all avg bw kBps requests
> + * @max_peak: current max aggregate value of all peak bw kBps requests
> + */
> +struct mtk_icc_node {
> + unsigned char *name;
> + bool ep;
> + u16 id;
> + u16 links[MT8183_MAX_LINKS];
> + u16 num_links;
> + u16 buswidth;
> + u64 sum_avg;
> + u64 max_peak;
> +};
> +
> +struct mtk_icc_desc {
> + struct mtk_icc_node **nodes;
> + size_t num_nodes;
> +};
> +
> +#define DEFINE_MNODE(_name, _id, _buswidth, _ep, ...)\
> + static struct mtk_icc_node _name = {\
> + .name = #_name, \
> + .id = _id,  \
> + .buswidth = _buswidth,  \
> + .ep = _ep,  \
> + .num_links = ARRAY_SIZE(((int[]){ __VA_ARGS__ })),  \
> +}
> +
> +DEFINE_MNODE(ddr_emi, SLAVE_DDR_EMI, 1024, 1, 0);
> +DEFINE_MNODE(mcusys, MASTER_MCUSYS, 256, 0, SLAVE_DDR_EMI);
> +DEFINE_MNODE(gpu, MASTER_GPUSYS, 256, 0, SLAVE_DDR_EMI);
> +DEFINE_MNODE(mmsys, MASTER_MMSYS, 256, 0, SLAVE_DDR_EMI);
> +DEFINE_MNODE(mm_vpu, MASTER_MM_VPU, 128, 0, MASTER_MMSYS);
> +DEFINE_MNODE(mm_disp, MASTER_MM_DISP, 128, 0, MASTER_MMSYS);
> +DEFINE_MNODE(mm_vdec, MASTER_MM_VDEC, 128, 0, MASTER_MMSYS);
> +DEFINE_MNODE(mm_venc, MASTER_MM_VENC, 128, 0, MASTER_MMSYS);
> +DEFINE_MNODE(mm_cam, MASTER_MM_CAM, 128, 0, MASTER_MMSYS);
> +DEFINE_MNODE(mm_img, MASTER_MM_IMG, 128, 0, MASTER_MMSYS);
> +DEFINE_MNODE(mm_mdp, MASTER_MM_MDP, 128, 0, MASTER_MMSYS);
> +
[..]

> +static int emi_icc_aggregate(struct icc_node *node, u32 avg_bw,
> +   u32 peak_bw, u32 *agg_avg, u32 *agg_peak)
> +{

The prototype of this function has changed meanwhile, so you might want to 
update.

[..]
> +static int emi_icc_probe(struct platform_device *pdev)
> +{
> + int ret;
> + const struct mtk_icc_desc *desc;
> + struct icc_node *node;
> + struct icc_onecell_data *data;
> + struct icc_provider *provider;
> + struct mtk_icc_node **mnodes;
> + size_t num_nodes, i, j;
> +
> + desc = of_device_get_match_data(>dev);
> + if (!desc)
> + return -EINVAL;
> +
> + mnodes = desc->nodes;
> + num_nodes = desc->num_nodes;
> +
> + provider = devm_kzalloc(>dev, sizeof(*provider), GFP_KERNEL);
> + if (!provider)
> + return -ENOMEM;
> +
> + data = devm_kcalloc(>dev, num_nodes, sizeof(*node), GFP_KERNEL);
> + if (!data)
> + return -ENOMEM;
> +
> + provider->dev = >dev;
> + provider->set = emi_icc_set;
> + provider->aggregate = emi_icc_aggregate;
> + provider->xlate = of_icc_xlate_onecell;
> + INIT_LIST_HEAD(>nodes);
> + provider->data = data;
> +
> + ret = icc_provider_add(provider);
> + if (ret) {
> + dev_err(>dev, "error adding interconnect provider\n");
> + return ret;
> + }
> +
> + for (i = 0; i < num_nodes; i++) {
> + node = icc_node_create(mnodes[i]->id);
> + if (IS_ERR(node)) {
> + ret = PTR_ERR(node);
> + goto err;
> + }
> +
> + node->name = mnodes[i]->name;
> + node->data = mnodes[i];
> + icc_node_add(node, provider);
> +
> + dev_dbg(>dev, "registered node %s, num link: %d\n",
> + mnodes[i]->name, mnodes[i]->num_links);
> +
> + /* populate links */
> + for (j = 0; j < mnodes[i]->num_links; j++)
> + icc_link_create(node, mnodes[i]->links[j]);
> +
> + data->nodes[i] = node;
> + }

1 2 3 4 5 6 7 8 9 >

1 - 100 of 891 matches

Mail list logo