date:20200911

[tip:master] BUILD SUCCESS 76dbb9a6a6b514e6881689f8a3e2dd6b51d2681b

2020-09-11 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git  master
branch HEAD: 76dbb9a6a6b514e6881689f8a3e2dd6b51d2681b  Merge branch 'perf/core'

elapsed time: 721m

configs tested: 130
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
powerpc mpc8540_ads_defconfig
sh  r7780mp_defconfig
xtensa  iss_defconfig
shsh7757lcr_defconfig
mipsnlm_xlp_defconfig
armqcom_defconfig
armpleb_defconfig
sparc64 defconfig
armmini2440_defconfig
powerpcamigaone_defconfig
xtensaxip_kc705_defconfig
mips db1xxx_defconfig
shsh7763rdp_defconfig
powerpc  ppc6xx_defconfig
sh   sh7770_generic_defconfig
powerpc  pasemi_defconfig
arm at91_dt_defconfig
sh   alldefconfig
m68kmvme16x_defconfig
powerpcwarp_defconfig
arm   efm32_defconfig
arm  zx_defconfig
mips  malta_defconfig
arc nps_defconfig
alphaallyesconfig
sh   se7750_defconfig
sh  rsk7203_defconfig
mips tb0219_defconfig
mips decstation_defconfig
armvexpress_defconfig
arm eseries_pxa_defconfig
armmvebu_v5_defconfig
sh   sh2007_defconfig
mips   ci20_defconfig
armtrizeps4_defconfig
powerpc mpc5200_defconfig
sh apsh4a3a_defconfig
ia64generic_defconfig
mips   rbtx49xx_defconfig
arm socfpga_defconfig
c6xevmc6472_defconfig
powerpc  pmac32_defconfig
sh   se7722_defconfig
powerpc mpc836x_rdk_defconfig
m68k   m5249evb_defconfig
riscv  rv32_defconfig
c6xevmc6474_defconfig
sh  lboxre2_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a004-20200911
i386 randconfig-a006-20200911
i386 randconfig-a001-20200911
i386 randconfig-a003-20200911
i386 randconfig-a002-20200911
i386 randconfig-a005-20200911
i386 randconfig-a004-20200912
i386 randconfig-a006-20200912
i386 randconfig-a003-20200912
i386 randconfig-a001-20200912
i386 randconfig-a002-20200912
i386 randconfig-a005-20200912
x86_64   randconfig-a014-20200911
x86_64   randconfig-a011-20200911
x86_64   randconfig-a012-20200911
x86_64   randconfig-a016-20200911
x86_64   randconfig-a015-20200911
x86_64

[PATCH] perf bench: Fix 2 memory sanitizer warnings

2020-09-11 Thread Ian Rogers

Memory sanitizer warns if a write is performed where the memory
being read for the write is uninitialized. Avoid this warning by
initializing the memory.

Signed-off-by: Ian Rogers 
---
 tools/perf/bench/sched-messaging.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/bench/sched-messaging.c 
b/tools/perf/bench/sched-messaging.c
index 71d830d7b923..cecce93ccc63 100644
--- a/tools/perf/bench/sched-messaging.c
+++ b/tools/perf/bench/sched-messaging.c
@@ -66,11 +66,10 @@ static void fdpair(int fds[2])
 /* Block until we're ready to go */
 static void ready(int ready_out, int wakefd)
 {
-   char dummy;
struct pollfd pollfd = { .fd = wakefd, .events = POLLIN };
 
/* Tell them we're ready. */
-   if (write(ready_out, , 1) != 1)
+   if (write(ready_out, "R", 1) != 1)
err(EXIT_FAILURE, "CLIENT: ready write");
 
/* Wait for "GO" signal */
@@ -85,6 +84,7 @@ static void *sender(struct sender_context *ctx)
unsigned int i, j;
 
ready(ctx->ready_out, ctx->wakefd);
+   memset(data, 'S', sizeof(data));
 
/* Now pump to every receiver. */
for (i = 0; i < nr_loops; i++) {
-- 
2.28.0.618.gf4bc123cb7-goog

Re: [PATCH v18 00/32] per memcg lru_lock: reviews

2020-09-11 Thread Hugh Dickins

On Thu, 10 Sep 2020, Alexander Duyck wrote:
> On Wed, Sep 9, 2020 at 5:32 PM Hugh Dickins  wrote:
> > On Wed, 9 Sep 2020, Alexander Duyck wrote:
> > > On Tue, Sep 8, 2020 at 4:41 PM Hugh Dickins  wrote:
> > > > [PATCH v18 28/32] mm/compaction: Drop locked from 
> > > > isolate_migratepages_block
> > > > Most of this consists of replacing "locked" by "lruvec", which is good:
> > > > but please fold those changes back into 20/32 (or would it be 17/32?
> > > > I've not yet looked into the relationship between those two), so we
> > > > can then see more clearly what change this 28/32 (will need renaming!)
> > > > actually makes, to use lruvec_holds_page_lru_lock(). That may be a
> > > > good change, but it's mixed up with the "locked"->"lruvec" at present,
> > > > and I think you could have just used lruvec for locked all along
> > > > (but of course there's a place where you'll need new_lruvec too).
> > >
> > > I am good with my patch being folded in. No need to keep it separate.
> >
> > Thanks.  Though it was only the "locked"->"lruvec" changes I was
> > suggesting to fold back, to minimize the diff, so that we could
> > see your use of lruvec_holds_page_lru_lock() more clearly - you
> > had not introduced that function at the stage of the earlier patches.
> >
> > But now that I stare at it again, using lruvec_holds_page_lru_lock()
> > there doesn't look like an advantage to me: when it decides no, the
> > same calculation is made all over again in mem_cgroup_page_lruvec(),
> > whereas the code before only had to calculate it once.
> >
> > So, the code before looks better to me: I wonder, do you think that
> > rcu_read_lock() is more expensive than I think it?  There can be
> > debug instrumentation that makes it heavier, but by itself it is
> > very cheap (by design) - not worth branching around.
> 
> Actually what I was more concerned with was the pointer chase that
> required the RCU lock. With this function we are able to compare a
> pair of pointers from the page and the lruvec and avoid the need for
> the RCU lock. The way the old code was working we had to crawl through
> the memcg to get to the lruvec before we could compare it to the one
> we currently hold. The general idea is to use the data we have instead
> of having to pull in some additional cache lines to perform the test.

When you say "With this function...", I think you are referring to
lruvec_holds_page_lru_lock().  Yes, I appreciate what you're doing
there, making calculations from known-stable data, and taking it no
further than the required comparison; and I think (I don't yet claim
to have reviewed 21/32) what you do with it in relock_page_lruvec*()
is an improvement over what we had there before.

But here I'm talking about using it in isolate_migratepages_block()
in 28/32: in this case, the code before evaluated the new lruvec,
compared against the old, and immediately used the new lruvec if
different; whereas using lruvec_holds_page_lru_lock() makes an
almost (I agree not entirely, and I haven't counted cachelines)
equivalent evaluation, but its results have to thrown away when
it's false, then the new lruvec actually calculated and used.

The same "results thrown away" criticism can be made of
relock_page_lruvec*(), but what was done there before your rewrite
in v18 was no better: they both resort to lock_page_lruvec*(page),
working it all out again from page.  And I'm not suggesting that
be changed, not at this point anyway; but 28/32 looks to me
like a regression from what was done there before 28/32.

> 
> > >
> > > > [PATCH v18 29/32] mm: Identify compound pages sooner in 
> > > > isolate_migratepages_block
> > > > NAK. I agree that isolate_migratepages_block() looks nicer this way, but
> > > > take a look at prep_new_page() in mm/page_alloc.c: post_alloc_hook() is
> > > > where set_page_refcounted() changes page->_refcount from 0 to 1, 
> > > > allowing
> > > > a racing get_page_unless_zero() to succeed; then later 
> > > > prep_compound_page()
> > > > is where PageHead and PageTails get set. So there's a small race window 
> > > > in
> > > > which this patch could deliver a compound page when it should not.
> > >
> > > So the main motivation for the patch was to avoid the case where we
> > > are having to reset the LRU flag.
> >
> > That would be satisfying.  Not necessary, but I agree satisfying.
> > Maybe depends also on your "skip" change, which I've not looked at yet?
> 
> My concern is that we have scenarios where isolate_migratepages_block
> could possibly prevent another page from being able to isolate a page.
> I'm mostly concerned with us potentially creating something like an
> isolation leak if multiple threads are doing something like clearing
> and then resetting the LRU flag. In my mind if we clear the LRU flag
> we should be certain we are going to remove the page as otherwise
> another thread would have done it if it would have been allowed
> access.

I agree it's nicer not to TestClearPageLRU unnecessarily;

RE: [PATCH] exfat: remove 'rwoffset' in exfat_inode_info

2020-09-11 Thread Sungjong Seo

> Remove 'rwoffset' in exfat_inode_info and replace it with the
> parameter(cpos) of exfat_readdir.
> Since rwoffset of  is referenced only by exfat_readdir, it is not
> necessary a exfat_inode_info's member.
> 
> Signed-off-by: Tetsuhiro Kohada 
> ---
>  fs/exfat/dir.c  | 16 ++--
>  fs/exfat/exfat_fs.h |  2 --
>  fs/exfat/file.c |  2 --
>  fs/exfat/inode.c|  3 ---
>  fs/exfat/super.c|  1 -
>  5 files changed, 6 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c index
> a9b13ae3f325..fa5bb72aa295 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c
[snip]
> sector @@ -262,13 +260,11 @@ static int exfat_iterate(struct file *filp,
> struct dir_context *ctx)
>   goto end_of_dir;
>   }
> 
> - cpos = EXFAT_DEN_TO_B(ei->rwoffset);
> -
>   if (!nb->lfn[0])
>   goto end_of_dir;
> 
>   i_pos = ((loff_t)ei->start_clu << 32) |
> - ((ei->rwoffset - 1) & 0x);
> + (EXFAT_B_TO_DEN(cpos-1) & 0x);

Need to fix the above line to be:
(EXFAT_B_TO_DEN(cpos)-1)) & 0x);

Re: [PATCH next v2 0/3] soc: ti: k3: ringacc: add am65x sr2.0 support

2020-09-11 Thread santosh.shilim...@oracle.com




On 9/9/20 9:52 AM, santosh.shilim...@oracle.com wrote:

On 9/2/20 7:08 AM, Nishanth Menon wrote:

On 11:34-20200831, santosh.shilim...@oracle.com wrote:

On 8/29/20 11:41 AM, Grygorii Strashko wrote:

[..]



Santosh, in this series, may i suggest that the dtsi changes[1] be hosted
on my tree? else we are going to create a mix of rc1 and rc3 branches
which is going to be irritating, to say the least.

I will pick [1] the day after I see the patches 1 and 2 in linux-next 
tag.



Sure !!


Applied. Should show up in linux-next

Re: [PATCH next v2 0/3] soc: ti: k3: ringacc: add am65x sr2.0 support

2020-09-11 Thread santosh.shilim...@oracle.com


On 9/8/20 7:40 PM, santosh.shilim...@oracle.com wrote:



On 9/8/20 3:09 PM, Suman Anna wrote:

Hi Santosh,

On 8/31/20 1:34 PM, santosh.shilim...@oracle.com wrote:

On 8/29/20 11:41 AM, Grygorii Strashko wrote:

Hi Santosh,

I've rebased on top of  linux-next and identified merge conflict of 
patch 3
with commit 6da45875fa17 ("arm64: dts: k3-am65: Update the RM 
resource types")

in -next.

---
This series adds support for the TI AM65x SR2.0 SoC Ringacc which 
has fixed
errata i2023 "RINGACC, UDMA: RINGACC and UDMA Ring State 
Interoperability

Issue after Channel Teardown". This errata also fixed for J271E SoC.
The SOC bus chipinfo data is used to identify the SoC and configure
i2023 errata W/A.

This changes made "ti,dma-ring-reset-quirk" DT property obsolete, so 
it's

removed.

Changes in v2:
   - no functional changes
   - rebased on top of linux-next
   - added ask from Rob Herring


Thanks. Can you please followup DT acks for PRUSS series so that I can
apply PRUSS + $subject series.


PRUSS dt binding is acked now, so can you pick up the PRUSS v2 series 
for 5.10

merge window.


Yes, I saw ack from Rob. Will try to get to this over coming weekend.


Applied. Should show up in linux-next

Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-11 Thread Srikar Dronamraju

* Michael Ellerman  [2020-09-11 21:55:23]:

> Srikar Dronamraju  writes:
> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > always be a superset of cpu_sibling_mask.
> >
> > Lets stop that assumption. cpu_l2_cache_mask is a superset of
> > cpu_sibling_mask if and only if shared_caches is set.
> 
> I'm seeing oopses with this:
> 
> [0.117392][T1] smp: Bringing up secondary CPUs ...
> [0.156515][T1] smp: Brought up 2 nodes, 2 CPUs
> [0.158265][T1] numa: Node 0 CPUs: 0
> [0.158520][T1] numa: Node 1 CPUs: 1
> [0.167453][T1] BUG: Unable to handle kernel data access on read at 
> 0x800041228298
> [0.167992][T1] Faulting instruction address: 0xc018c128
> [0.168817][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.168964][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [0.169417][T1] Modules linked in:
> [0.170047][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.9.0-rc2-00095-g7430ad5aa700 #209
> [0.170305][T1] NIP:  c018c128 LR: c018c0cc CTR: 
> c004dce0
> [0.170498][T1] REGS: c0007e343880 TRAP: 0380   Not tainted  
> (5.9.0-rc2-00095-g7430ad5aa700)
> [0.170602][T1] MSR:  82009033   CR: 
> 4400  XER: 
> [0.170985][T1] CFAR: c018c288 IRQMASK: 0
> [0.170985][T1] GPR00:  c0007e343b10 
> c173e400 4000
> [0.170985][T1] GPR04:  0800 
> 0800 
> [0.170985][T1] GPR08:  c122c298 
> c0003fffc000 c0007fd05ce8
> [0.170985][T1] GPR12: c0007e0119f8 c193 
> 8ade 
> [0.170985][T1] GPR16: c0007e3c0640 0917 
> c0007e3c0658 0008
> [0.170985][T1] GPR20: c15d0bb8 8ade 
> c0f57400 c1817c28
> [0.170985][T1] GPR24: c176dc80 c0007e3c0890 
> c0007e3cfe00 
> [0.170985][T1] GPR28: c1772310 c0007e011900 
> c0007e3c0800 0001
> [0.172750][T1] NIP [c018c128] build_sched_domains+0x808/0x14b0
> [0.172900][T1] LR [c018c0cc] build_sched_domains+0x7ac/0x14b0
> [0.173186][T1] Call Trace:
> [0.173484][T1] [c0007e343b10] [c018bfe8] 
> build_sched_domains+0x6c8/0x14b0 (unreliable)
> [0.173821][T1] [c0007e343c50] [c018dcdc] 
> sched_init_domains+0xec/0x130
> [0.174037][T1] [c0007e343ca0] [c10d59d8] 
> sched_init_smp+0x50/0xc4
> [0.174207][T1] [c0007e343cd0] [c10b45c4] 
> kernel_init_freeable+0x1b4/0x378
> [0.174378][T1] [c0007e343db0] [c00129fc] 
> kernel_init+0x24/0x158
> [0.174740][T1] [c0007e343e20] [c000d9d0] 
> ret_from_kernel_thread+0x5c/0x6c
> [0.175050][T1] Instruction dump:
> [0.175626][T1] 554905ee 71480040 7d2907b4 4182016c 2c29 3920006e 
> 913e002c 41820034
> [0.175841][T1] 7c6307b4 e9300020 78631f24 7d58182a <7d2a482a> 
> f93e0080 7d404828 314a0001
> [0.178340][T1] ---[ end trace 6876b88dd1d4b3bb ]---
> [0.178512][T1]
> [1.180458][T1] Kernel panic - not syncing: Attempted to kill init! 
> exitcode=0x000b
> 
> That's qemu:
> 
> qemu-system-ppc64 -nographic -vga none -M pseries -cpu POWER8 \
>   -kernel build~/vmlinux \
>   -m 2G,slots=2,maxmem=4G \
>   -object memory-backend-ram,size=1G,id=m0 \
>   -object memory-backend-ram,size=1G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 2,sockets=2,maxcpus=2  \
> 

Thanks Michael for the report and also for identifying the patch and also
giving an easy reproducer. That made my task easy. (My only problem was all
my PowerKVM hosts had a old compiler that refuse to compile never kernels.)

So in this setup, CPU doesn't have a l2-cache. And in that scenario, we
miss updating the l2-cache domain. Actually the initial patch had this
exact code. However it was my mistake. I should have reassessed it before
making changes suggested by Gautham.

Patch below. Do let me know if you want me to send the patch separately.

> 
> On mambo I get:
> 
> [0.005069][T1] smp: Bringing up secondary CPUs ...
> [0.011656][T1] smp: Brought up 2 nodes, 8 CPUs
> [0.011682][T1] numa: Node 0 CPUs: 0-3
> [0.011709][T1] numa: Node 1 CPUs: 4-7
> [0.012015][T1] BUG: arch topology borken
> [0.012040][T1]  the SMT domain not a subset of the CACHE domain
> [0.012107][T1] BUG: Unable to handle kernel data access on read at 
> 0x8001012e7398
> [0.012142][T1] Faulting instruction address: 0xc01aa4f0
> [0.012174][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.012206][T1] LE PAGE_SIZE=64K MMU=Hash SMP

Re: [IB/srpt] c804af2c1d: last_state.test.blktests.exit_code.143

2020-09-11 Thread Yi Zhang


Tested-by: Yi Zhang 


This patch fixed the issue I filed[1] which use rdma_rxe for nvme-rdma 
testing.


[1]
http://lists.infradead.org/pipermail/linux-nvme/2020-August/018988.html

Thanks
Yi

On 9/12/20 6:00 AM, Bart Van Assche wrote:

On 2020-09-08 19:01, Bart Van Assche wrote:

The above patch didn't compile, but the patch below does and makes the hang
disappear. So feel free to add the following to the patch below:

Tested-by: Bart Van Assche 

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c36b4d2b61e0..23ee65a9185f 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1285,6 +1285,8 @@ static void disable_device(struct ib_device *device)
remove_client_context(device, cid);
}

+   ib_cq_pool_destroy(device);
+
/* Pairs with refcount_set in enable_device */
ib_device_put(device);
wait_for_completion(>unreg_completion);
@@ -1328,6 +1330,8 @@ static int enable_device_and_get(struct ib_device *device)
goto out;
}

+   ib_cq_pool_init(device);
+
down_read(_rwsem);
xa_for_each_marked (, index, client, CLIENT_REGISTERED) {
ret = add_client_context(device, client);
@@ -1400,7 +1404,6 @@ int ib_register_device(struct ib_device *device, const 
char *name)
goto dev_cleanup;
}

-   ib_cq_pool_init(device);
ret = enable_device_and_get(device);
dev_set_uevent_suppress(>dev, false);
/* Mark for userspace that device is ready */
@@ -1455,7 +1458,6 @@ static void __ib_unregister_device(struct ib_device 
*ib_dev)
goto out;

disable_device(ib_dev);
-   ib_cq_pool_destroy(ib_dev);

/* Expedite removing unregistered pointers from the hash table */
free_netdevs(ib_dev);

Hi Jason,

Please let me know how you want to proceed with this patch.

Thanks,

Bart.

[PATCH] perf vendor events amd: remove trailing comma

2020-09-11 Thread Henry Burns

amdzen2/core.json had a trailing comma on the x_ret_fus_brnch_inst
event. Since that goes against the JSON standard, lets remove it.

Signed-off-by: Henry Burns 
---
 tools/perf/pmu-events/arch/x86/amdzen2/core.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/arch/x86/amdzen2/core.json 
b/tools/perf/pmu-events/arch/x86/amdzen2/core.json
index de89e5a44ff1..4b75183da94a 100644
--- a/tools/perf/pmu-events/arch/x86/amdzen2/core.json
+++ b/tools/perf/pmu-events/arch/x86/amdzen2/core.json
@@ -125,6 +125,6 @@
   {
 "EventName": "ex_ret_fus_brnch_inst",
 "EventCode": "0x1d0",
-"BriefDescription": "Retired Fused Instructions. The number of fuse-branch 
instructions retired per cycle. The number of events logged per cycle can vary 
from 0-8.",
+"BriefDescription": "Retired Fused Instructions. The number of fuse-branch 
instructions retired per cycle. The number of events logged per cycle can vary 
from 0-8."
   }
 ]
-- 
2.25.1

[REGRESSION] Needless shutting down of oneshot timer in nohz mode

2020-09-11 Thread Steven Rostedt

Hi Thomas,

The VMware PhotonOS team is evaluating 4.19-rt compared to CentOS
3.10-rt (franken kernel from Red Hat). They found a regression between
the two kernels that was found to be introduced by:

 d25408756accb ("clockevents: Stop unused clockevent devices")

The issue is running this on a guest, and it causes a noticeable wake
up latency in cyclictest. The 4.19-rt kernel has two extra apic
instructions causing for two extra VMEXITs to occur over the 3.10-rt
kernel. I found out the reason why, and this is true for vanilla 5.9-rc
as well.

When running isocpus with NOHZ_FULL, I see the following.

  tick_nohz_idle_stop_tick() {
hrtimer_start_range_ns() {
remove_hrtimer(timer)
/* no more timers on the base */
expires = KTIME_MAX;
tick_program_event() {
clock_switch_state(ONESHOT_STOPPED);
/* call to apic to shutdown timer */
}
}
[..]
hrtimer_reprogram(timer) {
tick_program_event() {
clock_switch_state(ONESHOT);
/* call to apic to enable timer again! */
}
}
 }


Thus, we are needlessly shutting down and restarting the apic every
time we call tick_nohz_stop_tick() if there is a timer still on the
queue.

I'm not exactly sure how to fix this. Is there a way we can hold off
disabling the clock here until we know that it isn't going to be
immediately enabled again?

-- Steve

[PATCH v3 6/6] iommu/vt-d: Cleanup after converting to dma-iommu ops

2020-09-11 Thread Lu Baolu

Some cleanups after converting the driver to use dma-iommu ops.
- Remove nobounce option;
- Cleanup and simplify the path in domain mapping.

Signed-off-by: Lu Baolu 
---
 .../admin-guide/kernel-parameters.txt |  5 --
 drivers/iommu/intel/iommu.c   | 90 ++-
 2 files changed, 28 insertions(+), 67 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a1068742a6df..0d11ef43d314 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1866,11 +1866,6 @@
Note that using this option lowers the security
provided by tboot because it makes the system
vulnerable to DMA attacks.
-   nobounce [Default off]
-   Disable bounce buffer for untrusted devices such as
-   the Thunderbolt devices. This will treat the untrusted
-   devices as the trusted ones, hence might expose security
-   risks of DMA attacks.
 
intel_idle.max_cstate=  [KNL,HW,ACPI,X86]
0   disables intel_idle and fall back on acpi_idle.
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index adc231790e0a..fe2544c95013 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -355,7 +355,6 @@ static int dmar_forcedac;
 static int intel_iommu_strict;
 static int intel_iommu_superpage = 1;
 static int iommu_identity_mapping;
-static int intel_no_bounce;
 static int iommu_skip_te_disable;
 
 #define IDENTMAP_GFX   2
@@ -457,9 +456,6 @@ static int __init intel_iommu_setup(char *str)
} else if (!strncmp(str, "tboot_noforce", 13)) {
pr_info("Intel-IOMMU: not forcing on after tboot. This 
could expose security risk for tboot\n");
intel_iommu_tboot_noforce = 1;
-   } else if (!strncmp(str, "nobounce", 8)) {
-   pr_info("Intel-IOMMU: No bounce buffer. This could 
expose security risks of DMA attacks\n");
-   intel_no_bounce = 1;
}
 
str += strcspn(str, ",");
@@ -2230,15 +2226,14 @@ static inline int hardware_largepage_caps(struct 
dmar_domain *domain,
return level;
 }
 
-static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
-   struct scatterlist *sg, unsigned long phys_pfn,
-   unsigned long nr_pages, int prot)
+static int
+__domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
+unsigned long phys_pfn, unsigned long nr_pages, int prot)
 {
struct dma_pte *first_pte = NULL, *pte = NULL;
-   phys_addr_t pteval;
-   unsigned long sg_res = 0;
unsigned int largepage_lvl = 0;
unsigned long lvl_pages = 0;
+   phys_addr_t pteval;
u64 attr;
 
BUG_ON(!domain_pfn_supported(domain, iov_pfn + nr_pages - 1));
@@ -2250,26 +2245,14 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
if (domain_use_first_level(domain))
attr |= DMA_FL_PTE_PRESENT | DMA_FL_PTE_XD | DMA_FL_PTE_US;
 
-   if (!sg) {
-   sg_res = nr_pages;
-   pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
-   }
+   pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
 
while (nr_pages > 0) {
uint64_t tmp;
 
-   if (!sg_res) {
-   unsigned int pgoff = sg->offset & ~PAGE_MASK;
-
-   sg_res = aligned_nrpages(sg->offset, sg->length);
-   sg->dma_address = ((dma_addr_t)iov_pfn << 
VTD_PAGE_SHIFT) + pgoff;
-   sg->dma_length = sg->length;
-   pteval = (sg_phys(sg) - pgoff) | attr;
-   phys_pfn = pteval >> VTD_PAGE_SHIFT;
-   }
-
if (!pte) {
-   largepage_lvl = hardware_largepage_caps(domain, 
iov_pfn, phys_pfn, sg_res);
+   largepage_lvl = hardware_largepage_caps(domain, iov_pfn,
+   phys_pfn, nr_pages);
 
first_pte = pte = pfn_to_dma_pte(domain, iov_pfn, 
_lvl);
if (!pte)
@@ -2281,7 +2264,7 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
pteval |= DMA_PTE_LARGE_PAGE;
lvl_pages = lvl_to_nr_pages(largepage_lvl);
 
-   nr_superpages = sg_res / lvl_pages;
+   nr_superpages = nr_pages / lvl_pages;
end_pfn = iov_pfn + nr_superpages * lvl_pages - 
1;
 
/*
@@ -2315,48 +2298,45 @@

[PATCH v3 4/6] iommu: Add quirk for Intel graphic devices in map_sg

2020-09-11 Thread Lu Baolu

Combining the sg segments exposes a bug in the Intel i915 driver which
causes visual artifacts and the screen to freeze. This is most likely
because of how the i915 handles the returned list. It probably doesn't
respect the returned value specifying the number of elements in the list
and instead depends on the previous behaviour of the Intel iommu driver
which would return the same number of elements in the output list as in
the input list.

Signed-off-by: Tom Murphy 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/dma-iommu.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1a1da22e5a5e..fc19f1fb9413 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -880,6 +880,33 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
unsigned int cur_len = 0, max_len = dma_get_max_seg_size(dev);
int i, count = 0;
 
+   /*
+* The Intel graphic driver is used to assume that the returned
+* sg list is not combound. This blocks the efforts of converting
+* Intel IOMMU driver to dma-iommu api's. Add this quirk to make the
+* device driver work and should be removed once it's fixed in i915
+* driver.
+*/
+   if (IS_ENABLED(CONFIG_DRM_I915) && dev_is_pci(dev) &&
+   to_pci_dev(dev)->vendor == PCI_VENDOR_ID_INTEL &&
+   (to_pci_dev(dev)->class >> 16) == PCI_BASE_CLASS_DISPLAY) {
+   for_each_sg(sg, s, nents, i) {
+   unsigned int s_iova_off = sg_dma_address(s);
+   unsigned int s_length = sg_dma_len(s);
+   unsigned int s_iova_len = s->length;
+
+   s->offset += s_iova_off;
+   s->length = s_length;
+   sg_dma_address(s) = dma_addr + s_iova_off;
+   sg_dma_len(s) = s_length;
+   dma_addr += s_iova_len;
+
+   pr_info_once("sg combining disabled due to i915 
driver\n");
+   }
+
+   return nents;
+   }
+
for_each_sg(sg, s, nents, i) {
/* Restore this segment's original unaligned fields first */
unsigned int s_iova_off = sg_dma_address(s);
-- 
2.17.1

[PATCH v3 3/6] iommu: Allow the dma-iommu api to use bounce buffers

2020-09-11 Thread Lu Baolu

From: Tom Murphy 

Allow the dma-iommu api to use bounce buffers for untrusted devices.
This is a copy of the intel bounce buffer code.

Signed-off-by: Tom Murphy 
Co-developed-by: Lu Baolu 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/dma-iommu.c | 163 +++---
 1 file changed, 150 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d06411bd5e08..1a1da22e5a5e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -21,9 +21,11 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
 
 struct iommu_dma_msi_page {
struct list_headlist;
@@ -498,6 +500,31 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
 }
 
+static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
+   size_t size, enum dma_data_direction dir,
+   unsigned long attrs)
+{
+   struct iommu_domain *domain = iommu_get_dma_domain(dev);
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+   struct iova_domain *iovad = >iovad;
+   phys_addr_t phys;
+
+   phys = iommu_iova_to_phys(domain, dma_addr);
+   if (WARN_ON(!phys))
+   return;
+
+   __iommu_dma_unmap(dev, dma_addr, size);
+
+   if (unlikely(is_swiotlb_buffer(phys)))
+   swiotlb_tbl_unmap_single(dev, phys, size,
+   iova_align(iovad, size), dir, attrs);
+}
+
+static bool dev_is_untrusted(struct device *dev)
+{
+   return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
+}
+
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
size_t size, int prot, u64 dma_mask)
 {
@@ -523,6 +550,55 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
return iova + iova_off;
 }
 
+static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys,
+   size_t org_size, dma_addr_t dma_mask, bool coherent,
+   enum dma_data_direction dir, unsigned long attrs)
+{
+   int prot = dma_info_to_prot(dir, coherent, attrs);
+   struct iommu_domain *domain = iommu_get_dma_domain(dev);
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+   struct iova_domain *iovad = >iovad;
+   size_t aligned_size = org_size;
+   void *padding_start;
+   size_t padding_size;
+   dma_addr_t iova;
+
+   /*
+* If both the physical buffer start address and size are
+* page aligned, we don't need to use a bounce page.
+*/
+   if (IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) &&
+   iova_offset(iovad, phys | org_size)) {
+   aligned_size = iova_align(iovad, org_size);
+   phys = swiotlb_tbl_map_single(dev,
+   __phys_to_dma(dev, io_tlb_start),
+   phys, org_size, aligned_size, dir, attrs);
+
+   if (phys == DMA_MAPPING_ERROR)
+   return DMA_MAPPING_ERROR;
+
+   /* Cleanup the padding area. */
+   padding_start = phys_to_virt(phys);
+   padding_size = aligned_size;
+
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_TO_DEVICE ||
+dir == DMA_BIDIRECTIONAL)) {
+   padding_start += org_size;
+   padding_size -= org_size;
+   }
+
+   memset(padding_start, 0, padding_size);
+   }
+
+   iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
+   if ((iova == DMA_MAPPING_ERROR) && is_swiotlb_buffer(phys))
+   swiotlb_tbl_unmap_single(dev, phys, org_size,
+   aligned_size, dir, attrs);
+
+   return iova;
+}
+
 static void __iommu_dma_free_pages(struct page **pages, int count)
 {
while (count--)
@@ -698,11 +774,15 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
 {
phys_addr_t phys;
 
-   if (dev_is_dma_coherent(dev))
+   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   arch_sync_dma_for_cpu(phys, size, dir);
+   if (!dev_is_dma_coherent(dev))
+   arch_sync_dma_for_cpu(phys, size, dir);
+
+   if (is_swiotlb_buffer(phys))
+   swiotlb_tbl_sync_single(dev, phys, size, dir, SYNC_FOR_CPU);
 }
 
 static void iommu_dma_sync_single_for_device(struct device *dev,
@@ -710,11 +790,15 @@ static void iommu_dma_sync_single_for_device(struct 
device *dev,
 {
phys_addr_t phys;
 
-   if (dev_is_dma_coherent(dev))
+   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);

[PATCH v3 5/6] iommu/vt-d: Convert intel iommu driver to the iommu ops

2020-09-11 Thread Lu Baolu

From: Tom Murphy 

Convert the intel iommu driver to the dma-iommu api. Remove the iova
handling and reserve region code from the intel iommu driver.

Signed-off-by: Tom Murphy 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/Kconfig |   1 +
 drivers/iommu/intel/iommu.c | 742 ++--
 2 files changed, 43 insertions(+), 700 deletions(-)

diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 5337ee1584b0..28a3d1596c76 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -13,6 +13,7 @@ config INTEL_IOMMU
select DMAR_TABLE
select SWIOTLB
select IOASID
+   select IOMMU_DMA
help
  DMA remapping (DMAR) devices support enables independent address
  translations for Direct Memory Access (DMA) from devices.
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 63ee30c689a7..adc231790e0a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -41,7 +42,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -382,9 +382,6 @@ struct device_domain_info *get_domain_info(struct device 
*dev)
 DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
 
-#define device_needs_bounce(d) (!intel_no_bounce && dev_is_pci(d) &&   \
-   to_pci_dev(d)->untrusted)
-
 /*
  * Iterate over elements in device_domain_list and call the specified
  * callback @fn against each element.
@@ -1242,13 +1239,6 @@ static void dma_free_pagelist(struct page *freelist)
}
 }
 
-static void iova_entry_free(unsigned long data)
-{
-   struct page *freelist = (struct page *)data;
-
-   dma_free_pagelist(freelist);
-}
-
 /* iommu handling */
 static int iommu_alloc_root_entry(struct intel_iommu *iommu)
 {
@@ -1613,19 +1603,17 @@ static inline void __mapping_notify_one(struct 
intel_iommu *iommu,
iommu_flush_write_buffer(iommu);
 }
 
-static void iommu_flush_iova(struct iova_domain *iovad)
+static void intel_flush_iotlb_all(struct iommu_domain *domain)
 {
-   struct dmar_domain *domain;
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
int idx;
 
-   domain = container_of(iovad, struct dmar_domain, iovad);
-
-   for_each_domain_iommu(idx, domain) {
+   for_each_domain_iommu(idx, dmar_domain) {
struct intel_iommu *iommu = g_iommus[idx];
-   u16 did = domain->iommu_did[iommu->seq_id];
+   u16 did = dmar_domain->iommu_did[iommu->seq_id];
 
-   if (domain_use_first_level(domain))
-   domain_flush_piotlb(iommu, domain, 0, -1, 0);
+   if (domain_use_first_level(dmar_domain))
+   domain_flush_piotlb(iommu, dmar_domain, 0, -1, 0);
else
iommu->flush.flush_iotlb(iommu, did, 0, 0,
 DMA_TLB_DSI_FLUSH);
@@ -1907,48 +1895,6 @@ static int domain_detach_iommu(struct dmar_domain 
*domain,
return count;
 }
 
-static struct iova_domain reserved_iova_list;
-static struct lock_class_key reserved_rbtree_key;
-
-static int dmar_init_reserved_ranges(void)
-{
-   struct pci_dev *pdev = NULL;
-   struct iova *iova;
-   int i;
-
-   init_iova_domain(_iova_list, VTD_PAGE_SIZE, IOVA_START_PFN);
-
-   lockdep_set_class(_iova_list.iova_rbtree_lock,
-   _rbtree_key);
-
-   /* IOAPIC ranges shouldn't be accessed by DMA */
-   iova = reserve_iova(_iova_list, IOVA_PFN(IOAPIC_RANGE_START),
-   IOVA_PFN(IOAPIC_RANGE_END));
-   if (!iova) {
-   pr_err("Reserve IOAPIC range failed\n");
-   return -ENODEV;
-   }
-
-   /* Reserve all PCI MMIO to avoid peer-to-peer access */
-   for_each_pci_dev(pdev) {
-   struct resource *r;
-
-   for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-   r = >resource[i];
-   if (!r->flags || !(r->flags & IORESOURCE_MEM))
-   continue;
-   iova = reserve_iova(_iova_list,
-   IOVA_PFN(r->start),
-   IOVA_PFN(r->end));
-   if (!iova) {
-   pci_err(pdev, "Reserve iova for %pR failed\n", 
r);
-   return -ENODEV;
-   }
-   }
-   }
-   return 0;
-}
-
 static inline int guestwidth_to_adjustwidth(int gaw)
 {
int agaw;
@@ -1971,7 +1917,7 @@ static void domain_exit(struct dmar_domain *domain)
 
/* destroy iovas */
if (domain->domain.type == IOMMU_DOMAIN_DMA)
-   put_iova_domain(>iovad);
+   iommu_put_dma_cookie(>domain);
 
if

[PATCH v3 1/6] iommu: Handle freelists when using deferred flushing in iommu drivers

2020-09-11 Thread Lu Baolu

From: Tom Murphy 

Allow the iommu_unmap_fast to return newly freed page table pages and
pass the freelist to queue_iova in the dma-iommu ops path.

This is useful for iommu drivers (in this case the intel iommu driver)
which need to wait for the ioTLB to be flushed before newly
free/unmapped page table pages can be freed. This way we can still batch
ioTLB free operations and handle the freelists.

Signed-off-by: Tom Murphy 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/dma-iommu.c   | 30 ++--
 drivers/iommu/intel/iommu.c | 55 -
 include/linux/iommu.h   |  1 +
 3 files changed, 59 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 5141d49a046b..82c071b2d5c8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -50,6 +50,18 @@ struct iommu_dma_cookie {
struct iommu_domain *fq_domain;
 };
 
+static void iommu_dma_entry_dtor(unsigned long data)
+{
+   struct page *freelist = (struct page *)data;
+
+   while (freelist) {
+   unsigned long p = (unsigned long)page_address(freelist);
+
+   freelist = freelist->freelist;
+   free_page(p);
+   }
+}
+
 static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
 {
if (cookie->type == IOMMU_DMA_IOVA_COOKIE)
@@ -344,7 +356,8 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
if (!cookie->fq_domain && !iommu_domain_get_attr(domain,
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, ) && attr) {
cookie->fq_domain = domain;
-   init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, NULL);
+   init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
+ iommu_dma_entry_dtor);
}
 
if (!dev)
@@ -438,7 +451,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
 }
 
 static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
-   dma_addr_t iova, size_t size)
+   dma_addr_t iova, size_t size, struct page *freelist)
 {
struct iova_domain *iovad = >iovad;
 
@@ -447,7 +460,8 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie 
*cookie,
cookie->msi_iova -= size;
else if (cookie->fq_domain) /* non-strict mode */
queue_iova(iovad, iova_pfn(iovad, iova),
-   size >> iova_shift(iovad), 0);
+   size >> iova_shift(iovad),
+   (unsigned long)freelist);
else
free_iova_fast(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad));
@@ -472,7 +486,7 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
 
if (!cookie->fq_domain)
iommu_tlb_sync(domain, _gather);
-   iommu_dma_free_iova(cookie, dma_addr, size);
+   iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
@@ -494,7 +508,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
return DMA_MAPPING_ERROR;
 
if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) {
-   iommu_dma_free_iova(cookie, iova, size);
+   iommu_dma_free_iova(cookie, iova, size, NULL);
return DMA_MAPPING_ERROR;
}
return iova + iova_off;
@@ -649,7 +663,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
 out_free_sg:
sg_free_table();
 out_free_iova:
-   iommu_dma_free_iova(cookie, iova, size);
+   iommu_dma_free_iova(cookie, iova, size, NULL);
 out_free_pages:
__iommu_dma_free_pages(pages, count);
return NULL;
@@ -900,7 +914,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
return __finalise_sg(dev, sg, nents, iova);
 
 out_free_iova:
-   iommu_dma_free_iova(cookie, iova, iova_len);
+   iommu_dma_free_iova(cookie, iova, iova_len, NULL);
 out_restore_sg:
__invalidate_sg(sg, nents);
return 0;
@@ -1194,7 +1208,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
return msi_page;
 
 out_free_iova:
-   iommu_dma_free_iova(cookie, iova, size);
+   iommu_dma_free_iova(cookie, iova, size, NULL);
 out_free_page:
kfree(msi_page);
return NULL;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 87b17bac04c2..63ee30c689a7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1208,17 +1208,17 @@ static struct page *dma_pte_clear_level(struct 
dmar_domain *domain, int level,
pages can only be freed after the IOTLB flush has been done. */
 static struct page *domain_unmap(struct dmar_domain *domain,

[PATCH v3 2/6] iommu: Add iommu_dma_free_cpu_cached_iovas()

2020-09-11 Thread Lu Baolu

From: Tom Murphy 

Add a iommu_dma_free_cpu_cached_iovas function to allow drivers which
use the dma-iommu ops to free cached cpu iovas.

Signed-off-by: Tom Murphy 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/dma-iommu.c | 9 +
 include/linux/dma-iommu.h | 8 
 2 files changed, 17 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 82c071b2d5c8..d06411bd5e08 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -50,6 +50,15 @@ struct iommu_dma_cookie {
struct iommu_domain *fq_domain;
 };
 
+void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
+   struct iommu_domain *domain)
+{
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+   struct iova_domain *iovad = >iovad;
+
+   free_cpu_cached_iovas(cpu, iovad);
+}
+
 static void iommu_dma_entry_dtor(unsigned long data)
 {
struct page *freelist = (struct page *)data;
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 2112f21f73d8..706b68d1359b 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -37,6 +37,9 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
+void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
+   struct iommu_domain *domain);
+
 #else /* CONFIG_IOMMU_DMA */
 
 struct iommu_domain;
@@ -78,5 +81,10 @@ static inline void iommu_dma_get_resv_regions(struct device 
*dev, struct list_he
 {
 }
 
+static inline void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
+   struct iommu_domain *domain)
+{
+}
+
 #endif /* CONFIG_IOMMU_DMA */
 #endif /* __DMA_IOMMU_H */
-- 
2.17.1

[PATCH v3 0/6] Convert the intel iommu driver to the dma-iommu api

2020-09-11 Thread Lu Baolu

Tom Murphy has almost done all the work. His latest patch series was
posted here.

https://lore.kernel.org/linux-iommu/20200903201839.7327-1-murph...@tcd.ie/

Thanks a lot!

This series is a follow-up with below changes:

1. Add a quirk for the i915 driver issue described in Tom's cover
letter.
2. Fix several bugs in patch "iommu: Allow the dma-iommu api to use
bounce buffers" to make the bounce buffer work for untrusted devices.
3. Several cleanups in iommu/vt-d driver after the conversion.

Please review and test.

Best regards,
baolu

Lu Baolu (2):
  iommu: Add quirk for Intel graphic devices in map_sg
  iommu/vt-d: Cleanup after converting to dma-iommu ops

Tom Murphy (4):
  iommu: Handle freelists when using deferred flushing in iommu drivers
  iommu: Add iommu_dma_free_cpu_cached_iovas()
  iommu: Allow the dma-iommu api to use bounce buffers
  iommu/vt-d: Convert intel iommu driver to the iommu ops

 .../admin-guide/kernel-parameters.txt |   5 -
 drivers/iommu/dma-iommu.c | 229 -
 drivers/iommu/intel/Kconfig   |   1 +
 drivers/iommu/intel/iommu.c   | 885 +++---
 include/linux/dma-iommu.h |   8 +
 include/linux/iommu.h |   1 +
 6 files changed, 323 insertions(+), 806 deletions(-)

-- 
2.17.1

Re: [PATCH v6 3/3] arm64: Add IMA kexec buffer to DTB

2020-09-11 Thread Thiago Jung Bauermann



Lakshmi Ramasubramanian  writes:

> Any existing FDT_PROP_IMA_KEXEC_BUFFER property in the device tree
> needs to be removed and its corresponding memory reservation in
> the currently running kernel needs to be freed.
>
> The address and size of the current kernel's IMA measurement log need
> to be added to the device tree's IMA kexec buffer node and memory for
> the buffer needs to be reserved for the log to be carried over to
> the next kernel on the kexec call.
>
> Remove any existing FDT_PROP_IMA_KEXEC_BUFFER property in the device
> tree and free the corresponding memory reservation in the currently
> running kernel. Add FDT_PROP_IMA_KEXEC_BUFFER property to the device
> tree and reserve the memory for storing the IMA log.
> Update CONFIG_KEXEC_FILE to select CONFIG_HAVE_IMA_KEXEC to indicate
> that the IMA measurement log information is present in the device tree
> for ARM64.
>
> Co-developed-by: Prakhar Srivastava 
> Signed-off-by: Prakhar Srivastava 
> Signed-off-by: Lakshmi Ramasubramanian 

Reviewed-by: Thiago Jung Bauermann 

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v2 7/8] mm/shmem: Return head page from find_lock_entry

2020-09-11 Thread Matthew Wilcox

On Thu, Sep 10, 2020 at 07:33:17PM +0100, Matthew Wilcox (Oracle) wrote:
> Convert shmem_getpage_gfp() (the only remaining caller of
> find_lock_entry()) to cope with a head page being returned instead of
> the subpage for the index.

This version was buggy.  Apparently I was too focused on running the test suite 
against XFS and neglected to run it against tmpfs, which crashed instantly.

Here's the patch I should have sent.

commit 7bfa655881da76f3386e6d4c07e38a165b4a6ca8
Author: Matthew Wilcox (Oracle) 
Date:   Sun Aug 2 07:22:34 2020 -0400

mm/shmem: Return head page from find_lock_entry

Convert shmem_getpage_gfp() (the only remaining caller of
find_lock_entry()) to cope with a head page being returned instead of
the subpage for the index.

Signed-off-by: Matthew Wilcox (Oracle) 

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 905a64030647..f374618b2c93 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -371,6 +371,15 @@ static inline struct page *grab_cache_page_nowait(struct 
address_space *mapping,
mapping_gfp_mask(mapping));
 }
 
+/* Does this page contain this index? */
+static inline bool thp_contains(struct page *head, pgoff_t index)
+{
+   /* HugeTLBfs indexes the page cache in units of hpage_size */
+   if (PageHuge(head))
+   return head->index == index;
+   return page_index(head) == (index & ~(thp_nr_pages(head) - 1UL));
+}
+
 /*
  * Given the page we found in the page cache, return the page corresponding
  * to this index in the file
diff --git a/mm/filemap.c b/mm/filemap.c
index 2f134383b0ae..453535170b8d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1614,37 +1614,34 @@ struct page *find_get_entry(struct address_space 
*mapping, pgoff_t index)
 }
 
 /**
- * find_lock_entry - locate, pin and lock a page cache entry
- * @mapping: the address_space to search
- * @offset: the page cache index
+ * find_lock_entry - Locate and lock a page cache entry.
+ * @mapping: The address_space to search.
+ * @index: The page cache index.
  *
- * Looks up the page cache slot at @mapping & @offset.  If there is a
- * page cache page, it is returned locked and with an increased
- * refcount.
+ * Looks up the page at @mapping & @index.  If there is a page in the
+ * cache, the head page is returned locked and with an increased refcount.
  *
  * If the slot holds a shadow entry of a previously evicted page, or a
  * swap entry from shmem/tmpfs, it is returned.
  *
- * find_lock_entry() may sleep.
- *
- * Return: the found page or shadow entry, %NULL if nothing is found.
+ * Context: May sleep.
+ * Return: The head page or shadow entry, %NULL if nothing is found.
  */
-struct page *find_lock_entry(struct address_space *mapping, pgoff_t offset)
+struct page *find_lock_entry(struct address_space *mapping, pgoff_t index)
 {
struct page *page;
 
 repeat:
-   page = find_get_entry(mapping, offset);
+   page = find_get_entry(mapping, index);
if (page && !xa_is_value(page)) {
lock_page(page);
/* Has the page been truncated? */
-   if (unlikely(page_mapping(page) != mapping)) {
+   if (unlikely(page->mapping != mapping)) {
unlock_page(page);
put_page(page);
goto repeat;
}
-   page = find_subpage(page, offset);
-   VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page);
+   VM_BUG_ON_PAGE(!thp_contains(page, index), page);
}
return page;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 271548ca20f3..58bc9e326d0d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1822,6 +1822,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t 
index,
return error;
}
 
+   if (page)
+   hindex = page->index;
if (page && sgp == SGP_WRITE)
mark_page_accessed(page);
 
@@ -1832,11 +1834,10 @@ static int shmem_getpage_gfp(struct inode *inode, 
pgoff_t index,
unlock_page(page);
put_page(page);
page = NULL;
+   hindex = index;
}
-   if (page || sgp == SGP_READ) {
-   *pagep = page;
-   return 0;
-   }
+   if (page || sgp == SGP_READ)
+   goto out;
 
/*
 * Fast cache lookup did not find it:
@@ -1961,14 +1962,13 @@ static int shmem_getpage_gfp(struct inode *inode, 
pgoff_t index,
 * it now, lest undo on failure cancel our earlier guarantee.
 */
if (sgp != SGP_WRITE && !PageUptodate(page)) {
-   struct page *head = compound_head(page);
int i;
 
-   for (i = 0; i < compound_nr(head); i++) {
-   clear_highpage(head + i);
-   flush_dcache_page(head + i);
+   for (i = 0; i < compound_nr(page); i++) {
+

Re: [PATCH v6 1/3] powerpc: Refactor kexec functions to move arch independent code to IMA

2020-09-11 Thread Thiago Jung Bauermann



Hello Lakshmi,

The code itself is good, I have just some minor comments about it.

But movement of powerpc code is unfortunately creating some issues. More
details below.

Lakshmi Ramasubramanian  writes:

> diff --git a/arch/powerpc/kexec/ima.c b/arch/powerpc/kexec/ima.c
> index 720e50e490b6..467647886064 100644
> --- a/arch/powerpc/kexec/ima.c
> +++ b/arch/powerpc/kexec/ima.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -28,105 +29,6 @@ static int get_addr_size_cells(int *addr_cells, int 
> *size_cells)
>   return 0;
>  }
>  
> -static int do_get_kexec_buffer(const void *prop, int len, unsigned long 
> *addr,
> -size_t *size)
> -{
> - int ret, addr_cells, size_cells;
> -
> - ret = get_addr_size_cells(_cells, _cells);
> - if (ret)
> - return ret;
> -
> - if (len < 4 * (addr_cells + size_cells))
> - return -ENOENT;
> -
> - *addr = of_read_number(prop, addr_cells);
> - *size = of_read_number(prop + 4 * addr_cells, size_cells);
> -
> - return 0;
> -}
> -
> -/**
> - * ima_get_kexec_buffer - get IMA buffer from the previous kernel
> - * @addr:On successful return, set to point to the buffer contents.
> - * @size:On successful return, set to the buffer size.
> - *
> - * Return: 0 on success, negative errno on error.
> - */
> -int ima_get_kexec_buffer(void **addr, size_t *size)
> -{
> - int ret, len;
> - unsigned long tmp_addr;
> - size_t tmp_size;
> - const void *prop;
> -
> - prop = of_get_property(of_chosen, "linux,ima-kexec-buffer", );
> - if (!prop)
> - return -ENOENT;
> -
> - ret = do_get_kexec_buffer(prop, len, _addr, _size);
> - if (ret)
> - return ret;
> -
> - *addr = __va(tmp_addr);
> - *size = tmp_size;
> -
> - return 0;
> -}
> -
> -/**
> - * ima_free_kexec_buffer - free memory used by the IMA buffer
> - */
> -int ima_free_kexec_buffer(void)
> -{
> - int ret;
> - unsigned long addr;
> - size_t size;
> - struct property *prop;
> -
> - prop = of_find_property(of_chosen, "linux,ima-kexec-buffer", NULL);
> - if (!prop)
> - return -ENOENT;
> -
> - ret = do_get_kexec_buffer(prop->value, prop->length, , );
> - if (ret)
> - return ret;
> -
> - ret = of_remove_property(of_chosen, prop);
> - if (ret)
> - return ret;
> -
> - return memblock_free(addr, size);
> -
> -}
> -
> -/**
> - * remove_ima_buffer - remove the IMA buffer property and reservation from 
> @fdt
> - *
> - * The IMA measurement buffer is of no use to a subsequent kernel, so we 
> always
> - * remove it from the device tree.
> - */
> -void remove_ima_buffer(void *fdt, int chosen_node)
> -{
> - int ret, len;
> - unsigned long addr;
> - size_t size;
> - const void *prop;
> -
> - prop = fdt_getprop(fdt, chosen_node, "linux,ima-kexec-buffer", );
> - if (!prop)
> - return;
> -
> - ret = do_get_kexec_buffer(prop, len, , );
> - fdt_delprop(fdt, chosen_node, "linux,ima-kexec-buffer");
> - if (ret)
> - return;
> -
> - ret = delete_fdt_mem_rsv(fdt, addr, size);
> - if (!ret)
> - pr_debug("Removed old IMA buffer reservation.\n");
> -}
> -

With this change if CONFIG_IMA=y but CONFIG_IMA_KEXEC=n, this file will
only have static int get_addr_size_cells() as its contents, which isn't
useful (and will print a warning about the function not being used). You
should change the Makefile to only build this file if CONFIG_IMA_KEXEC=y.
Then you can also remove the #ifdef CONFIG_IMA_KEXEC/#endif within it as
well.

>  #ifdef CONFIG_IMA_KEXEC
>  /**
>   * arch_ima_add_kexec_buffer - do arch-specific steps to add the IMA buffer
> @@ -179,7 +81,7 @@ int setup_ima_buffer(const struct kimage *image, void 
> *fdt, int chosen_node)
>   int ret, addr_cells, size_cells, entry_size;
>   u8 value[16];
>  
> - remove_ima_buffer(fdt, chosen_node);
> + ima_remove_kexec_buffer(fdt, chosen_node);
>   if (!image->arch.ima_buffer_size)
>   return 0;
>  
> @@ -201,7 +103,7 @@ int setup_ima_buffer(const struct kimage *image, void 
> *fdt, int chosen_node)
>   if (ret)
>   return ret;
>  
> - ret = fdt_setprop(fdt, chosen_node, "linux,ima-kexec-buffer", value,
> + ret = fdt_setprop(fdt, chosen_node, FDT_PROP_IMA_KEXEC_BUFFER, value,
> entry_size);
>   if (ret < 0)
>   return -EINVAL;



> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 9e93bef52968..00a60dcc7075 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -223,8 +223,19 @@ extern int crash_exclude_mem_range(struct crash_mem *mem,
>  unsigned long long mend);
>  extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,
>  void **addr,

Re: [PATCH V2 5/5] DO NOT MERGE: iommu: disable list appending in dma-iommu

2020-09-11 Thread Lu Baolu


On 2020/9/9 15:06, Christoph Hellwig wrote:

On Wed, Sep 09, 2020 at 09:43:09AM +0800, Lu Baolu wrote:

+   /*
+* The Intel graphic device driver is used to assume that the
returned
+* sg list is not combound. This blocks the efforts of converting
the


This adds pointless overly long lines.


+* Intel IOMMU driver to dma-iommu api's. Add this quirk to make the
+* device driver work and should be removed once it's fixed in i915
+* driver.
+*/
+   if (dev_is_pci(dev) &&
+   to_pci_dev(dev)->vendor == PCI_VENDOR_ID_INTEL &&
+   (to_pci_dev(dev)->class >> 16) == PCI_BASE_CLASS_DISPLAY) {
+   for_each_sg(sg, s, nents, i) {
+   unsigned int s_iova_off = sg_dma_address(s);
+   unsigned int s_length = sg_dma_len(s);
+   unsigned int s_iova_len = s->length;
+
+   s->offset += s_iova_off;
+   s->length = s_length;
+   sg_dma_address(s) = dma_addr + s_iova_off;
+   sg_dma_len(s) = s_length;
+   dma_addr += s_iova_len;
+   }
+
+   return nents;
+   }


This wants an IS_ENABLED() check.  And probably a pr_once reminding
of the workaround.



Will fix in the next version.

Best regards,
baolu

Re: [PATCH v2 2/5] perf record: Prevent override of attr->sample_period for libpfm4 events

2020-09-11 Thread Ian Rogers

On Fri, Sep 11, 2020 at 3:34 PM Ian Rogers  wrote:
>
> On Fri, Sep 4, 2020 at 11:51 AM Arnaldo Carvalho de Melo
>  wrote:
> >
> > Em Fri, Sep 04, 2020 at 03:50:13PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Fri, Sep 04, 2020 at 03:48:03PM -0300, Arnaldo Carvalho de Melo 
> > > escreveu:
> > > > Em Fri, Sep 04, 2020 at 09:22:10AM -0700, Ian Rogers escreveu:
> > > > > On Fri, Sep 4, 2020 at 9:03 AM Jiri Olsa  wrote:
> > > > > > On Thu, Sep 03, 2020 at 10:41:14PM -0700, Ian Rogers wrote:
> > > > > > > On Wed, Jul 29, 2020 at 4:24 PM Ian Rogers  
> > > > > > > wrote:
> > > > > > > > On Tue, Jul 28, 2020 at 9:10 AM Jiri Olsa  
> > > > > > > > wrote:
> > > > > > > > > On Tue, Jul 28, 2020 at 05:59:46PM +0200, Jiri Olsa wrote:
> > > > > > > > > > On Tue, Jul 28, 2020 at 01:57:31AM -0700, Ian Rogers wrote:
> > > > > > > > > [jolsa@krava perf]$ sudo ./perf test 17 -v
> > > > > > > > > 17: Setup struct perf_event_attr  :
> > > >
> > > > > > > > > running './tests/attr/test-record-C0'
> > > > > > > > > expected sample_period=4000, got 3000
> > > > > > > > > FAILED './tests/attr/test-record-C0' - match failure
> > > >
> > > > > > > > I'm not able to reproduce this. Do you have a build 
> > > > > > > > configuration or
> > > > > > > > something else to look at? The test doesn't seem obviously 
> > > > > > > > connected
> > > > > > > > with this patch.
> > > >
> > > > > > > Jiri, any update? Thanks,
> > > >
> > > > > > sorry, I rebased and ran it again and it passes for me now,
> > > > > > so it got fixed along the way
> > > >
> > > > > No worries, thanks for the update! It'd be nice to land this and the
> > > > > other libpfm fixes.
> > > >
> > > > I applied it and it generated this regression:
> > > >
> > > > FAILED '/home/acme/libexec/perf-core/tests/attr/test-record-pfm-period' 
> > > > - match failure
> > > >
> > > > I'll look at the other patches that are pending in this regard to see
> > > > what needs to be squashed so that we don't break bisect.
> > >
> > > So, more context:
> > >
> > > running '/home/acme/libexec/perf-core/tests/attr/test-record-pfm-period'
> > > expected exclude_hv=0, got 1
> > > FAILED '/home/acme/libexec/perf-core/tests/attr/test-record-pfm-period' - 
> > > match failure
> > > test child finished with -1
> > >  end 
> > > Setup struct perf_event_attr: FAILED!
> > > [root@five ~]#
> > >
> > > Ian, can you take a look at this?
> >
> > Further tests I've performed:
> >
> > Committer testing:
> >
> > Not linking with libpfm:
> >
> >   # ldd ~/bin/perf | grep libpfm
> >   #
> >
> > Before:
> >
> >   # perf record -c 1 -e cycles/period=12345/,instructions sleep 
> > 0.0001
> >   [ perf record: Woken up 1 times to write data ]
> >   [ perf record: Captured and wrote 0.052 MB perf.data (258 samples) ]
> >   # perf evlist -v
> >   cycles/period=12345/: size: 120, { sample_period, sample_freq }: 
> > 12345, sample_type: IP|TID|TIME|ID, read_format: ID, disabled: 1, inherit: 
> > 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, 
> > exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
> >   instructions: size: 120, config: 0x1, { sample_period, sample_freq }: 
> > 1, sample_type: IP|TID|TIME|ID, read_format: ID, disabled: 1, inherit: 
> > 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
> >   #
> >
> > After:
> >
> >   #
> >   # perf record -c 1 -e cycles/period=12345/,instructions sleep 
> > 0.0001
> >   [ perf record: Woken up 1 times to write data ]
> >   [ perf record: Captured and wrote 0.053 MB perf.data (284 samples) ]
> >   # perf evlist -v
> >   cycles/period=12345/: size: 120, { sample_period, sample_freq }: 
> > 12345, sample_type: IP|TID|TIME|ID, read_format: ID, disabled: 1, inherit: 
> > 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, 
> > exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
> >   instructions: size: 120, config: 0x1, { sample_period, sample_freq }: 
> > 1, sample_type: IP|TID|TIME|ID, read_format: ID, disabled: 1, inherit: 
> > 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
> >   #
> >
> > Linking with libpfm:
> >
> >   # ldd ~/bin/perf | grep libpfm
> > libpfm.so.4 => /lib64/libpfm.so.4 (0x7f54c7d75000)
> >   #
> >
> >   # perf record -c 1 --pfm-events=cycles:period=7 sleep 1
> >   [ perf record: Woken up 1 times to write data ]
> >   [ perf record: Captured and wrote 0.043 MB perf.data (141 samples) ]
> >   # perf evlist -v
> >   cycles:period=7: size: 120, { sample_period, sample_freq }: 
> > 1, sample_type: IP|TID|TIME, read_format: ID, disabled: 1, inherit: 1, 
> > exclude_hv: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 
> > 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
> >   #
> >
> > After:
> >
> >   # perf record -c 1

[PATCH v3 2/4] perf record: Prevent override of attr->sample_period for libpfm4 events

2020-09-11 Thread Ian Rogers

From: Stephane Eranian 

Before:
$ perf record -c 1 --pfm-events=cycles:period=7

Would yield a cycles event with period=1, instead of 7.

This was due to an ordering issue between libpfm4 parsing
the event string and perf record initializing the event.

This patch fixes the problem by preventing override for
events with attr->sample_period != 0 by the time
perf_evsel__config() is invoked. This seems to have been the
intent of the author.

Signed-off-by: Stephane Eranian 
Reviewed-by: Ian Rogers 
Signed-off-by: Ian Rogers 
---
 tools/perf/util/evsel.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 3e985016da7e..459b51e90063 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -976,8 +976,7 @@ void evsel__config(struct evsel *evsel, struct record_opts 
*opts,
 * We default some events to have a default interval. But keep
 * it a weak assumption overridable by the user.
 */
-   if (!attr->sample_period || (opts->user_freq != UINT_MAX ||
-opts->user_interval != ULLONG_MAX)) {
+   if (!attr->sample_period) {
if (opts->freq) {
attr->freq  = 1;
attr->sample_freq   = opts->freq;
-- 
2.28.0.618.gf4bc123cb7-goog

[PATCH v3 3/4] perf record: Don't clear event's period if set by a term

2020-09-11 Thread Ian Rogers

If events in a group explicitly set a frequency or period with leader
sampling, don't disable the samples on those events.

Prior to 5.8:
perf record -e '{cycles/period=12345000/,instructions/period=6789000/}:S'
would clear the attributes then apply the config terms. In commit
5f34278867b7 leader sampling configuration was moved to after applying the
config terms, in the example, making the instructions' event have its period
cleared.
This change makes it so that sampling is only disabled if configuration
terms aren't present.

Fixes: 5f34278867b7 ("perf evlist: Move leader-sampling configuration")
Signed-off-by: Ian Rogers 
---
 tools/perf/util/record.c | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index a4cc11592f6b..ea9aa1d7cf50 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -2,6 +2,7 @@
 #include "debug.h"
 #include "evlist.h"
 #include "evsel.h"
+#include "evsel_config.h"
 #include "parse-events.h"
 #include 
 #include 
@@ -33,11 +34,24 @@ static struct evsel *evsel__read_sampler(struct evsel 
*evsel, struct evlist *evl
return leader;
 }
 
+static u64 evsel__config_term_mask(struct evsel *evsel)
+{
+   struct evsel_config_term *term;
+   struct list_head *config_terms = >config_terms;
+   u64 term_types = 0;
+
+   list_for_each_entry(term, config_terms, list) {
+   term_types |= 1 << term->type;
+   }
+   return term_types;
+}
+
 static void evsel__config_leader_sampling(struct evsel *evsel, struct evlist 
*evlist)
 {
struct perf_event_attr *attr = >core.attr;
struct evsel *leader = evsel->leader;
struct evsel *read_sampler;
+   u64 term_types, freq_mask;
 
if (!leader->sample_read)
return;
@@ -47,16 +61,20 @@ static void evsel__config_leader_sampling(struct evsel 
*evsel, struct evlist *ev
if (evsel == read_sampler)
return;
 
+   term_types = evsel__config_term_mask(evsel);
/*
-* Disable sampling for all group members other than the leader in
-* case the leader 'leads' the sampling, except when the leader is an
-* AUX area event, in which case the 2nd event in the group is the one
-* that 'leads' the sampling.
+* Disable sampling for all group members except those with explicit
+* config terms or the leader. In the case of an AUX area event, the 2nd
+* event in the group is the one that 'leads' the sampling.
 */
-   attr->freq   = 0;
-   attr->sample_freq= 0;
-   attr->sample_period  = 0;
-   attr->write_backward = 0;
+   freq_mask = (1 << EVSEL__CONFIG_TERM_FREQ) | (1 << 
EVSEL__CONFIG_TERM_PERIOD);
+   if ((term_types & freq_mask) == 0) {
+   attr->freq   = 0;
+   attr->sample_freq= 0;
+   attr->sample_period  = 0;
+   }
+   if ((term_types & (1 << EVSEL__CONFIG_TERM_OVERWRITE)) == 0)
+   attr->write_backward = 0;
 
/*
 * We don't get a sample for slave events, we make them when delivering
-- 
2.28.0.618.gf4bc123cb7-goog

[PATCH v3 4/4] perf test: Leader sampling shouldn't clear sample period

2020-09-11 Thread Ian Rogers

Add test that a sibling with leader sampling doesn't have its period
cleared.

Signed-off-by: Ian Rogers 
---
 tools/perf/tests/attr/README |  1 +
 tools/perf/tests/attr/test-record-group2 | 29 
 2 files changed, 30 insertions(+)
 create mode 100644 tools/perf/tests/attr/test-record-group2

diff --git a/tools/perf/tests/attr/README b/tools/perf/tests/attr/README
index 6cd408108595..a36f49fb4dbe 100644
--- a/tools/perf/tests/attr/README
+++ b/tools/perf/tests/attr/README
@@ -49,6 +49,7 @@ Following tests are defined (with perf commands):
   perf record --call-graph fp kill  (test-record-graph-fp)
   perf record --group -e cycles,instructions kill (test-record-group)
   perf record -e '{cycles,instructions}' kill   (test-record-group1)
+  perf record -e '{cycles/period=1/,instructions/period=2/}:S' kill 
(test-record-group2)
   perf record -D kill   (test-record-no-delay)
   perf record -i kill   (test-record-no-inherit)
   perf record -n kill   (test-record-no-samples)
diff --git a/tools/perf/tests/attr/test-record-group2 
b/tools/perf/tests/attr/test-record-group2
new file mode 100644
index ..6b9f8d182ce1
--- /dev/null
+++ b/tools/perf/tests/attr/test-record-group2
@@ -0,0 +1,29 @@
+[config]
+command = record
+args= --no-bpf-event -e 
'{cycles/period=1234000/,instructions/period=6789000/}:S' kill >/dev/null 2>&1
+ret = 1
+
+[event-1:base-record]
+fd=1
+group_fd=-1
+config=0|1
+sample_period=1234000
+sample_type=87
+read_format=12
+inherit=0
+freq=0
+
+[event-2:base-record]
+fd=2
+group_fd=1
+config=0|1
+sample_period=6789000
+sample_type=87
+read_format=12
+disabled=0
+inherit=0
+mmap=0
+comm=0
+freq=0
+enable_on_exec=0
+task=0
-- 
2.28.0.618.gf4bc123cb7-goog

[PATCH v3 1/4] perf record: Set PERF_RECORD_PERIOD if attr->freq is set.

2020-09-11 Thread Ian Rogers

From: David Sharp 

evsel__config() would only set PERF_RECORD_PERIOD if it set attr->freq
from perf record options. When it is set by libpfm events, it would not
get set. This changes evsel__config to see if attr->freq is set outside of
whether or not it changes attr->freq itself.

Signed-off-by: David Sharp 
Signed-off-by: Ian Rogers 
---
 tools/perf/util/evsel.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index fd865002cbbd..3e985016da7e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -979,13 +979,18 @@ void evsel__config(struct evsel *evsel, struct 
record_opts *opts,
if (!attr->sample_period || (opts->user_freq != UINT_MAX ||
 opts->user_interval != ULLONG_MAX)) {
if (opts->freq) {
-   evsel__set_sample_bit(evsel, PERIOD);
attr->freq  = 1;
attr->sample_freq   = opts->freq;
} else {
attr->sample_period = opts->default_interval;
}
}
+   /*
+* If attr->freq was set (here or earlier), ask for period
+* to be sampled.
+*/
+   if (attr->freq)
+   evsel__set_sample_bit(evsel, PERIOD);
 
if (opts->no_samples)
attr->sample_freq = 0;
-- 
2.28.0.618.gf4bc123cb7-goog

[PATCH v3 0/4] Fixes for setting event freq/periods

2020-09-11 Thread Ian Rogers

Some fixes that address issues for regular and pfm4 events with 2
additional perf_event_attr tests. Various authors, David Sharp isn't
currently at Google.

v3. moved a loop into a helper following Adrian Hunter's suggestion. 
v2. corrects the commit message following Athira Rajeev's suggestion.

David Sharp (1):
  perf record: Set PERF_RECORD_PERIOD if attr->freq is set.

Ian Rogers (2):
  perf record: Don't clear event's period if set by a term
  perf test: Leader sampling shouldn't clear sample period

Stephane Eranian (1):
  perf record: Prevent override of attr->sample_period for libpfm4
events

 tools/perf/tests/attr/README |  1 +
 tools/perf/tests/attr/test-record-group2 | 29 
 tools/perf/util/evsel.c  | 10 ---
 tools/perf/util/record.c | 34 ++--
 4 files changed, 63 insertions(+), 11 deletions(-)
 create mode 100644 tools/perf/tests/attr/test-record-group2

-- 
2.28.0.618.gf4bc123cb7-goog

Re: [PATCH V2 2/5] iommu: Add iommu_dma_free_cpu_cached_iovas function

2020-09-11 Thread Lu Baolu


On 2020/9/9 15:05, Christoph Hellwig wrote:

+static inline void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
+  struct iommu_domain
*domain)


This adds a crazy long line.  Which is rather pointless as other
bits of code in the patch use the more compact two tab indentations
for the prototype continuation lines anyway.



Okay. I will use two tabs instead.

Best regards,
baolu

Re: [PATCH V2 1/3] riscv: Fixup static_obj() fail

2020-09-11 Thread Guo Ren

It's come from mm/usercopy.c
/* Is this address range in the kernel text area? */
static inline void check_kernel_text_object(const unsigned long ptr,
unsigned long n, bool to_user)
{
unsigned long textlow = (unsigned long)_stext;
unsigned long texthigh = (unsigned long)_etext;
unsigned long textlow_linear, texthigh_linear;

if (overlaps(ptr, n, textlow, texthigh))
usercopy_abort("kernel text", NULL, to_user, ptr - textlow, n);

The __init_text/data areas will be freed after bootup, so I think it should be:
-unsigned long textlow = (unsigned long)_stext;
+unsigned long textlow = (unsigned long)_text;

That means _stext should include init_text/data and _text is only for freeable.


On Sat, Sep 12, 2020 at 5:01 AM Aurelien Jarno  wrote:
>
> Hi,
>
> On 2020-06-27 13:57, guo...@kernel.org wrote:
> > From: Guo Ren 
> >
> > When enable LOCKDEP, static_obj() will cause error. Because some
> > __initdata static variables is before _stext:
> >
> > static int static_obj(const void *obj)
> > {
> > unsigned long start = (unsigned long) &_stext,
> >   end   = (unsigned long) &_end,
> >   addr  = (unsigned long) obj;
> >
> > /*
> >  * static variable?
> >  */
> > if ((addr >= start) && (addr < end))
> > return 1;
> >
> > [0.067192] INFO: trying to register non-static key.
> > [0.067325] the code is fine but needs lockdep annotation.
> > [0.067449] turning off the locking correctness validator.
> > [0.067718] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc7-dirty #44
> > [0.067945] Call Trace:
> > [0.068369] [] walk_stackframe+0x0/0xa4
> > [0.068506] [] show_stack+0x2a/0x34
> > [0.068631] [] dump_stack+0x94/0xca
> > [0.068757] [] register_lock_class+0x5b8/0x5bc
> > [0.068969] [] __lock_acquire+0x6c/0x1d5c
> > [0.069101] [] lock_acquire+0xae/0x312
> > [0.069228] [] _raw_spin_lock_irqsave+0x40/0x5a
> > [0.069357] [] complete+0x1e/0x50
> > [0.069479] [] rest_init+0x1b0/0x28a
> > [0.069660] [] 0xffe016a2
> > [0.069779] [] 0xffe01b84
> > [0.069953] [] 0xffe01092
> >
> > static __initdata DECLARE_COMPLETION(kthreadd_done);
> >
> > noinline void __ref rest_init(void)
> > {
> >   ...
> >   complete(_done);
> >
> > Signed-off-by: Guo Ren 
> > ---
> >  arch/riscv/kernel/vmlinux.lds.S | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/kernel/vmlinux.lds.S 
> > b/arch/riscv/kernel/vmlinux.lds.S
> > index e6f8016..f3586e3 100644
> > --- a/arch/riscv/kernel/vmlinux.lds.S
> > +++ b/arch/riscv/kernel/vmlinux.lds.S
> > @@ -22,6 +22,7 @@ SECTIONS
> >   /* Beginning of code and text segment */
> >   . = LOAD_OFFSET;
> >   _start = .;
> > + _stext = .;
> >   HEAD_TEXT_SECTION
> >   . = ALIGN(PAGE_SIZE);
> >
> > @@ -54,7 +55,6 @@ SECTIONS
> >   . = ALIGN(SECTION_ALIGN);
> >   .text : {
> >   _text = .;
> > - _stext = .;
> >   TEXT_TEXT
> >   SCHED_TEXT
> >   CPUIDLE_TEXT
>
>
> This patch has been backported to kernel 5.8.4. This causes the kernel
> to crash when trying to execute the init process:
>
> [3.484586] AppArmor: AppArmor sha1 policy hashing enabled
> [4.749835] Freeing unused kernel memory: 492K
> [4.752017] Run /init as init process
> [4.753571] usercopy: Kernel memory overwrite attempt detected to kernel 
> text (offset 507879, size 11)!
> [4.754838] [ cut here ]
> [4.755651] kernel BUG at mm/usercopy.c:99!
> [4.756445] Kernel BUG [#1]
> [4.756815] Modules linked in:
> [4.757542] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.8.0-1-riscv64 #1 
> Debian 5.8.7-1
> [4.758372] epc: ffe0003b5120 ra : ffe0003b5120 sp : 
> ffe07f783ca0
> [4.758960]  gp : ffe000cc7230 tp : ffe07f77cec0 t0 : 
> ffe000cdafc0
> [4.759772]  t1 : 0064 t2 :  s0 : 
> ffe07f783cf0
> [4.760534]  s1 : ffe00095d780 a0 : 005b a1 : 
> 0020
> [4.761309]  a2 : 0005 a3 :  a4 : 
> ffe000c1f340
> [4.761848]  a5 : ffe000c1f340 a6 :  a7 : 
> 0087
> [4.762684]  s2 : ffe000941848 s3 : 0007bfe7 s4 : 
> 000b
> [4.763500]  s5 :  s6 : ffe00091cc00 s7 : 
> f000
> [4.764376]  s8 : 003ff000 s9 : ffe0769f3200 s10: 
> 000b
> [4.765208]  s11: ffe07d548c40 t3 :  t4 : 
> 0001dcd0
> [4.766059]  t5 : ffe000cc8510 t6 : ffe000cd64aa
> [4.766712] status: 0120 badaddr:  cause: 
> 0003
> [4.768308] ---[ end trace 1f8e733e834d4c3e ]---
> [4.769129] Kernel

Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands

2020-09-11 Thread Damien Le Moal

On 2020/09/12 8:07, Borislav Petkov wrote:
> On Sat, Sep 12, 2020 at 12:17:59AM +0200, Borislav Petkov wrote:
>> Enabling it, fixes the issue.
> 
> Btw, I just hit the below warn with 5.8, while booting with the above
> config option enabled. Looks familiar and I didn't trigger it with
> 5.9-rc4+ so you guys either fixed it or something changed in-between:
> 
> [5.124321] ata4.00: NCQ Send/Recv Log not supported
> [5.131484] ata4.00: configured for UDMA/133
> [5.135847] scsi 3:0:0:0: Direct-Access ATA  ST8000AS0022-1WL SN01 
> PQ: 0 ANSI: 5
> [5.143972] sd 3:0:0:0: Attached scsi generic sg1 type 0
> [5.144033] sd 3:0:0:0: [sdb] Host-aware zoned block device
> [5.177105] sd 3:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 
> TB/7.28 TiB)
> [5.184880] sd 3:0:0:0: [sdb] 4096-byte physical blocks
> [5.190084] sd 3:0:0:0: [sdb] 29808 zones of 524288 logical blocks + 1 
> runt zone
> [5.197439] sd 3:0:0:0: [sdb] Write Protect is off
> [5.202220] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> [5.207260] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, 
> doesn't support DPO or FUA
> [5.356631]  sdb: sdb1
> [5.359014] sdb: disabling host aware zoned block device support due to 
> partitions
> [5.389941] [ cut here ]
> [5.394557] WARNING: CPU: 8 PID: 164 at block/blk-settings.c:236 
> blk_queue_max_zone_append_sectors+0x12/0x40
> [5.404300] Modules linked in:
> [5.407365] CPU: 8 PID: 164 Comm: kworker/u32:6 Not tainted 5.8.0 #7
> [5.413682] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 
> GAMING PRO (MS-7B79), BIOS 1.70 01/23/2019
> [5.424191] Workqueue: events_unbound async_run_entry_fn
> [5.429482] RIP: 0010:blk_queue_max_zone_append_sectors+0x12/0x40
> [5.435543] Code: fe 0f 00 00 53 48 89 fb 0f 86 3d 07 00 00 48 89 b3 e0 03 
> 00 00 5b c3 90 0f 1f 44 00 00 8b 87 40 04 00 00 ff c8 83 f8 01 76 03 <0f> 0b 
> c3 8b 87 f8 03 00 00 39 87 f0 03 00 00 0f 46 87 f0 03 00 00
> [5.454099] RSP: 0018:c9697c60 EFLAGS: 00010282
> [5.459306] RAX:  RBX: 8887fa0a9400 RCX: 
> 
> [5.466390] RDX: 8887faf0d400 RSI: 0540 RDI: 
> 8887f0dde6c8
> [5.473474] RBP: 7471 R08: 001d1c40 R09: 
> 8887fee29ad0
> [5.480559] R10: 0001434bac00 R11: 00358275 R12: 
> 0008
> [5.487643] R13: 8887f0dde6c8 R14: 8887fa0a9738 R15: 
> 
> [5.494726] FS:  () GS:8887fee0() 
> knlGS:
> [5.502757] CS:  0010 DS:  ES:  CR0: 80050033
> [5.508474] CR2:  CR3: 02209000 CR4: 
> 003406e0
> [5.515558] Call Trace:
> [5.518026]  sd_zbc_read_zones+0x323/0x480
> [5.522122]  sd_revalidate_disk+0x122b/0x2000
> [5.526472]  ? __device_add_disk+0x2f7/0x4e0
> [5.530738]  sd_probe+0x347/0x44b
> [5.534058]  really_probe+0x2c4/0x3f0
> [5.537720]  driver_probe_device+0xe1/0x150
> [5.541902]  ? driver_allows_async_probing+0x50/0x50
> [5.546852]  bus_for_each_drv+0x6a/0xa0
> [5.550683]  __device_attach_async_helper+0x8c/0xd0
> [5.47]  async_run_entry_fn+0x4a/0x180
> [5.559636]  process_one_work+0x1a5/0x3a0
> [5.563637]  worker_thread+0x50/0x3a0
> [5.567300]  ? process_one_work+0x3a0/0x3a0
> [5.571480]  kthread+0x117/0x160
> [5.574715]  ? kthread_park+0x90/0x90
> [5.578377]  ret_from_fork+0x22/0x30
> [5.581960] ---[ end trace 94141003236730cf ]---
> [5.586578] sd 3:0:0:0: [sdb] Attached SCSI disk
> [6.186783] ata5: failed to resume link (SControl 0)
> [6.191818] ata5: SATA link down (SStatus 0 SControl 0)
> 

Can you try this:

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 95018e650f2d..620539162ef1 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2968,8 +2968,13 @@ static void sd_read_block_characteristics(struct
scsi_disk *sdkp)
} else {
sdkp->zoned = (buffer[8] >> 4) & 3;
if (sdkp->zoned == 1 && !disk_has_partitions(sdkp->disk)) {
+#ifdef CONFIG_BLK_DEV_ZONED
/* Host-aware */
q->limits.zoned = BLK_ZONED_HA;
+#else
+   /* Host-aware drive is treated as a regular disk */
+   q->limits.zoned = BLK_ZONED_NONE;
+#endif
} else {
/*
 * Treat drive-managed devices and host-aware devices
@@ -3404,12 +3409,12 @@ static int sd_probe(struct device *dev)
sdkp->first_scan = 1;
sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS;

+   sd_revalidate_disk(gd);
+
error = sd_zbc_init_disk(sdkp);
if (error)
goto out_free_index;

-   sd_revalidate_disk(gd);
-
gd->flags = GENHD_FL_EXT_DEVT;
if (sdp->removable) {

Re: [PATCH v5 01/10] fs/ntfs3: Add headers and misc files

2020-09-11 Thread Joe Perches

On Fri, 2020-09-11 at 17:10 +0300, Konstantin Komarov wrote:
> This adds headers and misc files

The code may be ok, but its cosmetics are poor.

> diff --git a/fs/ntfs3/debug.h b/fs/ntfs3/debug.h
[]
> +#define QuadAlign(n) (((n) + 7u) & (~7u))
> +#define IsQuadAligned(n) (!((size_t)(n)&7u))
> +#define Quad2Align(n) (((n) + 15() & (~15u))
> +#define IsQuad2Aligned(n) (!((size_t)(n)&15u))
> +#define Quad4Align(n) (((n) + 31u) & (~31u))
> +#define IsSizeTAligned(n) (!((size_t)(n) & (sizeof(size_t) - 1)))
> +#define DwordAlign(n) (((n) + 3u) & (~3u))
> +#define IsDwordAligned(n) (!((size_t)(n)&3u))
> +#define WordAlign(n) (((n) + 1u) & (~1u))
> +#define IsWordAligned(n) (!((size_t)(n)&1u))

All of these could use column alignment.
IsSizeTAligned could is the kernel's IS_ALIGNED

#define IsQuadAligned(n)(!((size_t)(n) & 7u))
#define QuadAlign(n)(((n) + 7u) & (~7u))
#define IsQuadAligned(n)(!((size_t)(n) & 7u))
#define Quad2Align(n)   (((n) + 15u) & (~15u))
#define IsQuad2Aligned(n)   (!((size_t)(n) & 15u))

Though all of these could also use two macros with
the same form.  Something like:

#define NTFS3_ALIGN(n, at)  (((n) + ((at) - 1)) & (~((at) - 1)))
#define NTFS3_IS_ALIGNED(n, at) (!((size_t)(n) & ((at) - 1)))

So all of these could be ordered by size and use actual size

#define WordAlign(n)NTFS3_ALIGN(n, 2)
#define IsWordAligned(n)NTFS3_IS_ALIGNED(n, 2)
#define DwordAlign(n)   NTFS3_ALIGN(n, 4)
#define IsDwordAligned(n)   
NTFS3_IS_ALIGNED(n, 4)
#define QuadAlign(n)NTFS3_ALIGN(n, 8)
#define IsQuadAligned(n)NTFS3_IS_ALIGNED(n, 8)
#define
Quad2Align(n)   NTFS3_ALIGN(n, 16)
#define IsQuad2Aligned(n)   NTFS3_IS_ALIGNED(n, 16)

#define IsSizeTAligned(n)   NTFS3_IS_ALIGNED(n, sizeof(size_t))


> +#ifdef CONFIG_PRINTK
> +__printf(2, 3) void ntfs_printk(const struct super_block *sb, const char 
> *fmt,
> + ...);

Better would be

__printf(2, 3)
void ntfs_printk(const struct super_block *sb, const char *fmt, ...);

etc...

There's a lot of code that could be made more readable for a human.

Re: [BUGFIX PATCH] kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()

2020-09-11 Thread Masami Hiramatsu

Hi Ingo,

Could you also pick this fix to fix the reproducible warning?

Thank you,

On Tue,  1 Sep 2020 00:12:07 +0900
Masami Hiramatsu  wrote:

> Commit 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at
> kprobe_ftrace_handler") fixed one bug but not completely fixed yet.
> If we run a kprobe_module.tc of ftracetest, kernel showed a warning
> as below.
> 
> 
> # ./ftracetest test.d/kprobe/kprobe_module.tc
> === Ftrace unit tests ===
> [1] Kprobe dynamic event - probing module
> ...
> [   22.400215] [ cut here ]
> [   22.400962] Failed to disarm kprobe-ftrace at 
> trace_printk_irq_work+0x0/0x7e [trace_printk] (-2)
> [   22.402139] WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 
> __disarm_kprobe_ftrace.isra.0+0x7e/0xa0
> [   22.403358] Modules linked in: trace_printk(-)
> [   22.404028] CPU: 7 PID: 200 Comm: rmmod Not tainted 5.9.0-rc2+ #66
> [   22.404870] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.13.0-1ubuntu1 04/01/2014
> [   22.406139] RIP: 0010:__disarm_kprobe_ftrace.isra.0+0x7e/0xa0
> [   22.406947] Code: 30 8b 03 eb c9 80 3d e5 09 1f 01 00 75 dc 49 8b 34 24 89 
> c2 48 c7 c7 a0 c2 05 82 89 45 e4 c6 05 cc 09 1f 01 01 e8 a9 c7 f0 ff <0f> 0b 
> 8b 45 e4 eb b9 89 c6 48 c7 c7 70 c2 05 82 89 45 e4 e8 91 c7
> [   22.409544] RSP: 0018:c9237df0 EFLAGS: 00010286
> [   22.410385] RAX:  RBX: 83066024 RCX: 
> 
> [   22.411434] RDX: 0001 RSI: 810de8d3 RDI: 
> 810de8d3
> [   22.412687] RBP: c9237e10 R08: 0001 R09: 
> 0001
> [   22.413762] R10:  R11: 0001 R12: 
> 88807c478640
> [   22.414852] R13: 8235ebc0 R14: a00060c0 R15: 
> 
> [   22.415941] FS:  019d48c0() GS:88807d7c() 
> knlGS:
> [   22.417264] CS:  0010 DS:  ES:  CR0: 80050033
> [   22.418176] CR2: 005bb7e3 CR3: 78f7a000 CR4: 
> 06a0
> [   22.419309] Call Trace:
> [   22.419990]  kill_kprobe+0x94/0x160
> [   22.420652]  kprobes_module_callback+0x64/0x230
> [   22.421470]  notifier_call_chain+0x4f/0x70
> [   22.422184]  blocking_notifier_call_chain+0x49/0x70
> [   22.422979]  __x64_sys_delete_module+0x1ac/0x240
> [   22.423733]  do_syscall_64+0x38/0x50
> [   22.424366]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   22.425176] RIP: 0033:0x4bb81d
> [   22.425741] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 
> f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 
> 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48
> [   22.428726] RSP: 002b:7ffc70fef008 EFLAGS: 0246 ORIG_RAX: 
> 00b0
> [   22.430169] RAX: ffda RBX: 019d48a0 RCX: 
> 004bb81d
> [   22.431375] RDX:  RSI: 0880 RDI: 
> 7ffc70fef028
> [   22.432543] RBP: 0880 R08:  R09: 
> 7ffc70fef320
> [   22.433692] R10: 00656300 R11: 0246 R12: 
> 7ffc70fef028
> [   22.434635] R13:  R14: 0002 R15: 
> 
> [   22.435682] irq event stamp: 1169
> [   22.436240] hardirqs last  enabled at (1179): [] 
> console_unlock+0x422/0x580
> [   22.437466] hardirqs last disabled at (1188): [] 
> console_unlock+0x7b/0x580
> [   22.438608] softirqs last  enabled at (866): [] 
> __do_softirq+0x38e/0x490
> [   22.439637] softirqs last disabled at (859): [] 
> asm_call_on_stack+0x12/0x20
> [   22.440690] ---[ end trace 1e7ce7e1e4567276 ]---
> [   22.472832] trace_kprobe: This probe might be able to register after 
> target module is loaded. Continue.
> 
> 
> This is because the kill_kprobe() calls disarm_kprobe_ftrace() even
> if the given probe is not enabled. In that case, ftrace_set_filter_ip()
> fails because the given probe point is not registered to ftrace.
> 
> Fix to check the given (going) probe is enabled before invoking
> disarm_kprobe_ftrace().
> 
> Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at 
> kprobe_ftrace_handler")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Masami Hiramatsu 
> ---
>  kernel/kprobes.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 287b263c9cb9..d43b48ecdb4f 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -2159,9 +2159,10 @@ static void kill_kprobe(struct kprobe *p)
>  
>   /*
>* The module is going away. We should disarm the kprobe which
> -  * is using ftrace.
> +  * is using ftrace, because ftrace framework is still available at
> +  * MODULE_STATE_GOING notification.
>*/
> - if (kprobe_ftrace(p))
> + if (kprobe_ftrace(p) && !kprobe_disabled(p) && !kprobes_all_disarmed)
>   disarm_kprobe_ftrace(p);
>  }
>  
> 


-- 
Masami Hiramatsu

Re: [PATCH v3 1/2] leds: is31fl319x: Add shutdown pin and generate a 5ms low pulse when startup

2020-09-11 Thread Grant Feng


Thanks for the info.

Best regards,

                                                Grant

On 2020-09-09 17:18, Pavel Machek wrote:

On Tue 2020-08-25 16:22:05, Grant Feng wrote:

generate a 5ms low pulse on shutdown pin when startup, then the chip
becomes more stable in the complex EM environment.

Thanks, I applied the series.

Best regards,
Pavel

Re: [PATCH 5.8 00/16] 5.8.9-rc1 review

2020-09-11 Thread Guenter Roeck

On Fri, Sep 11, 2020 at 02:47:17PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.8.9 release.
> There are 16 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 13 Sep 2020 12:24:42 +.
> Anything received after that time might be too late.
> 

Build results:
total: 154 pass: 154 fail: 0
Qemu test results:
total: 430 pass: 430 fail: 0

Tested-by: Guenter Roeck 

Guenter

Re: [PATCH 4.14 00/12] 4.14.198-rc1 review

2020-09-11 Thread Guenter Roeck

On Fri, Sep 11, 2020 at 02:46:54PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.198 release.
> There are 12 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 13 Sep 2020 12:24:42 +.
> Anything received after that time might be too late.
> 

Build results:
total: 171 pass: 171 fail: 0
Qemu test results:
total: 408 pass: 408 fail: 0

Oh, and I keep forgetting the formal:

Tested-by: Guenter Roeck 

Please consider that to be given for all test reports even
if not explicit.

Thanks,
Guenter

[PATCH net-next] drivers/net/wan/x25_asy: Remove an unnecessary x25_type_trans call

2020-09-11 Thread Xie He

x25_type_trans only needs to be called before we call netif_rx to pass
the skb to upper layers.

It does not need to be called before lapb_data_received. The LAPB module
does not need the fields that are set by calling it.

In the other two X.25 drivers - lapbether and hdlc_x25. x25_type_trans
is only called before netif_rx and not before lapb_data_received.

Cc: Martin Schiller 
Signed-off-by: Xie He 
---
 drivers/net/wan/x25_asy.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/wan/x25_asy.c b/drivers/net/wan/x25_asy.c
index 5a7cf8bf9d0d..ab56a5e6447a 100644
--- a/drivers/net/wan/x25_asy.c
+++ b/drivers/net/wan/x25_asy.c
@@ -202,8 +202,7 @@ static void x25_asy_bump(struct x25_asy *sl)
return;
}
skb_put_data(skb, sl->rbuff, count);
-   skb->protocol = x25_type_trans(skb, sl->dev);
-   err = lapb_data_received(skb->dev, skb);
+   err = lapb_data_received(sl->dev, skb);
if (err != LAPB_OK) {
kfree_skb(skb);
printk(KERN_DEBUG "x25_asy: data received err - %d\n", err);
-- 
2.25.1

Re: [PATCH 4.9 00/71] 4.9.236-rc1 review

2020-09-11 Thread Guenter Roeck

On Fri, Sep 11, 2020 at 02:45:44PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.236 release.
> There are 71 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 13 Sep 2020 12:24:42 +.
> Anything received after that time might be too late.
> 

Build results:
total: 171 pass: 171 fail: 0
Qemu test results:
total: 386 pass: 386 fail: 0

Guenter

Re: [PATCH 4.4 00/62] 4.4.236-rc1 review

2020-09-11 Thread Guenter Roeck

On Fri, Sep 11, 2020 at 02:45:43PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.236 release.
> There are 62 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sun, 13 Sep 2020 12:24:42 +.
> Anything received after that time might be too late.
> 

Build results:
total: 168 pass: 168 fail: 0
Qemu test results:
total: 332 pass: 332 fail: 0

Guenter

Re: [PATCH v18 00/32] per memcg lru_lock: reviews

2020-09-11 Thread Hugh Dickins

On Fri, 11 Sep 2020, Alex Shi wrote:
> 在 2020/9/10 上午7:16, Hugh Dickins 写道:
> > On Wed, 9 Sep 2020, Alex Shi wrote:
> >> 在 2020/9/9 上午7:41, Hugh Dickins 写道:
> >>>
> >>> [PATCH v18 05/32] mm/thp: remove code path which never got into
> >>> This is a good simplification, but I see no sign that you understand
> >>> why it's valid: it relies on lru_add_page_tail() being called while
> >>> head refcount is frozen to 0: we would not get this far if someone
> >>> else holds a reference to the THP - which they must hold if they have
> >>> isolated the page from its lru (and that's true before or after your
> >>> per-memcg changes - but even truer after those changes, since PageLRU
> >>> can then be flipped without lru_lock at any instant): please explain
> >>> something of this in the commit message.
> >>
> >> Is the following commit log better?
> >>
> >> split_huge_page() will never call on a page which isn't on lru list, so
> >> this code never got a chance to run, and should not be run, to add tail
> >> pages on a lru list which head page isn't there.
> >>
> >> Hugh Dickins' mentioned:
> >> The path should never be called since lru_add_page_tail() being called
> >> while head refcount is frozen to 0: we would not get this far if 
> >> someone
> >> else holds a reference to the THP - which they must hold if they have
> >> isolated the page from its lru.
> >>
> >> Although the bug was never triggered, it'better be removed for code
> >> correctness, and add a warn for unexpected calling.
> > 
> > Not much better, no.  split_huge_page() can easily be called for a page
> > which is not on the lru list at the time, 
> 
> Hi Hugh,
> 
> Thanks for comments!
> 
> There are some discussion on this point a couple of weeks ago,
> https://lkml.org/lkml/2020/7/9/760
> 
> Matthew Wilcox and Kirill have the following comments,
> > I don't understand how we get to split_huge_page() with a page that's
> > not on an LRU list.  Both anonymous and page cache pages should be on
> > an LRU list.  What am I missing?
> 
> Right, and it's never got removed from LRU during the split. The tail
> pages have to be added to LRU because they now separate from the tail
> page.
> 
> -- 
>  Kirill A. Shutemov

Yes, those were among the mails that I read through before getting
down to review.  I was surprised by their not understanding, but
it was a bit late to reply to that thread.

Perhaps everybody had been focused on pages which have been and
naturally belong on an LRU list, rather than pages which are on
the LRU list at the instant that split_huge_page() is called.

There are a number of places where PageLRU gets cleared, and a
number of places where we del_page_from_lru_list(), I think you'll
agree: your patches touch all or most of them.  Let's think of a
common one, isolate_lru_pages() used by page reclaim, but the same
would apply to most of the others.

Then there a number of places where split_huge_page() is called:
I am having difficulty finding any of those which cannot race with
page reclaim, but shall we choose anon THP's deferred_split_scan(),
or shmem THP's shmem_punch_compound()?

What prevents either of those from calling split_huge_page() at
a time when isolate_lru_pages() has removed the page from LRU?

But there's no problem in this race, because anyone isolating the
page from LRU must hold their own reference to the page (to prevent
it from being freed independently), and the can_split_huge_page() or
page_ref_freeze() in split_huge_page_to_list() will detect that and
fail the split with -EBUSY (or else succeed and prevent new references
from being acquired).  So this case never reaches lru_add_page_tail().

> 
> > and I don't know what was the
> > bug which was never triggered.  
> 
> So the only path to the removed part should be a bug, like  sth here,
> https://lkml.org/lkml/2020/7/10/118
> or
> https://lkml.org/lkml/2020/7/10/972

Oh, the use of split_huge_page() in __iommu_dma_alloc_pages() is just
nonsense, I thought it had already been removed - perhaps some debate
over __GFP_COMP held it up.  Not something you need worry about in
this patchset.

> 
> > Stick with whatever text you end up with
> > for the combination of 05/32 and 18/32, and I'll rewrite it after.
> 
> I am not object to merge them into one, I just don't know how to say
> clear about 2 patches in commit log. As patch 18, TestClearPageLRU
> add the incorrect posibility of remove lru bit during split, that's
> the reason of code path rewrite and a WARN there.

I did not know that was why you were putting 18/32 in at that
point, it does not mention TestClearPageLRU at all.  But the fact
remains that it's a nice cleanup, contains a reassuring WARN if we
got it wrong (and I've suggested a WARN on the other branch too),
it was valid before your changes, and it's valid after your changes.
Please merge it back into the uglier 05/32, and again I'll rewrite
whatever comment you come up with if necessary.

> > 
> >>>

Re: [RFC/RFT PATCH v2 3/5] riscv: Separate memory init from paging init

2020-09-11 Thread Greentime Hu

Atish Patra  於 2020年9月12日 週六 上午9:34寫道：
>
> Currently, we perform some memory init functions in paging init. But,
> that will be an issue for NUMA support where DT needs to be flattened
> before numa initialization and memblock_present can only be called
> after numa initialization.
>
> Move memory initialization related functions to a separate function.
>
> Signed-off-by: Atish Patra 
> ---
>  arch/riscv/include/asm/pgtable.h | 1 +
>  arch/riscv/kernel/setup.c| 1 +
>  arch/riscv/mm/init.c | 6 +-
>  3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/pgtable.h 
> b/arch/riscv/include/asm/pgtable.h
> index eaea1f717010..515b42f98d34 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -466,6 +466,7 @@ static inline void __kernel_map_pages(struct page *page, 
> int numpages, int enabl
>  extern void *dtb_early_va;
>  void setup_bootmem(void);
>  void paging_init(void);
> +void misc_mem_init(void);
>
>  #define FIRST_USER_ADDRESS  0
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 2c6dd329312b..07fa6d13367e 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -78,6 +78,7 @@ void __init setup_arch(char **cmdline_p)
>  #else
> unflatten_device_tree();
>  #endif
> +   misc_mem_init();
>
>  #ifdef CONFIG_SWIOTLB
> swiotlb_init(1);
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 188281fc2816..8f31a5428ce4 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -568,8 +568,12 @@ static void __init resource_init(void)
>  void __init paging_init(void)
>  {
> setup_vm_final();
> -   sparse_init();
> setup_zero_page();
> +}
> +
> +void __init misc_mem_init(void)
> +{
> +   sparse_init();
> zone_sizes_init();
> resource_init();
>  }

Thank you, Atish.
Reviewed-by: Greentime Hu

Re: [PATCH v7 5/9] RISC-V: Add PE/COFF header for EFI stub

2020-09-11 Thread Atish Patra

On Fri, Sep 11, 2020 at 6:09 AM Ard Biesheuvel  wrote:
>
> On Fri, 28 Aug 2020 at 20:20, Atish Patra  wrote:
> >
> > Linux kernel Image can appear as an EFI application With appropriate
> > PE/COFF header fields in the beginning of the Image header. An EFI
> > application loader can directly load a Linux kernel Image and an EFI
> > stub residing in kernel can boot Linux kernel directly.
> >
> > Add the necessary PE/COFF header.
> >
> > Signed-off-by: Atish Patra 
> > Link: https://lore.kernel.org/r/2020042106.9663-3-atish.pa...@wdc.com
> > [ardb: - use C prefix for c.li to ensure the expected opcode is emitted
> >- align all image sections according to PE/COFF section alignment ]
> > Signed-off-by: Ard Biesheuvel 
> > Reviewed-by: Anup Patel 
>
> Since you need to respin this anyway, one comment below on a thing
> that I spotted while revisiting these patches.
>
> > ---
> >  arch/riscv/include/asm/sections.h |  13 
> >  arch/riscv/kernel/Makefile|   4 ++
> >  arch/riscv/kernel/efi-header.S| 104 ++
> >  arch/riscv/kernel/head.S  |  16 +
> >  arch/riscv/kernel/image-vars.h|  51 +++
> >  arch/riscv/kernel/vmlinux.lds.S   |  22 ++-
> >  6 files changed, 208 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/riscv/include/asm/sections.h
> >  create mode 100644 arch/riscv/kernel/efi-header.S
> >  create mode 100644 arch/riscv/kernel/image-vars.h
> >
> > diff --git a/arch/riscv/include/asm/sections.h 
> > b/arch/riscv/include/asm/sections.h
> > new file mode 100644
> > index ..3a9971b1210f
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/sections.h
> > @@ -0,0 +1,13 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2020 Western Digital Corporation or its affiliates.
> > + */
> > +#ifndef __ASM_SECTIONS_H
> > +#define __ASM_SECTIONS_H
> > +
> > +#include 
> > +
> > +extern char _start[];
> > +extern char _start_kernel[];
> > +
> > +#endif /* __ASM_SECTIONS_H */
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index dc93710f0b2f..41e3895a3192 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -31,6 +31,10 @@ obj-y+= cacheinfo.o
> >  obj-y  += patch.o
> >  obj-$(CONFIG_MMU) += vdso.o vdso/
> >
> > +OBJCOPYFLAGS := --prefix-symbols=__efistub_
> > +$(obj)/%.stub.o: $(obj)/%.o FORCE
> > +   $(call if_changed,objcopy)
> > +
> >  obj-$(CONFIG_RISCV_M_MODE) += traps_misaligned.o
> >  obj-$(CONFIG_FPU)  += fpu.o
> >  obj-$(CONFIG_SMP)  += smpboot.o
> > diff --git a/arch/riscv/kernel/efi-header.S b/arch/riscv/kernel/efi-header.S
> > new file mode 100644
> > index ..822b4c9ff2bb
> > --- /dev/null
> > +++ b/arch/riscv/kernel/efi-header.S
> > @@ -0,0 +1,104 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2020 Western Digital Corporation or its affiliates.
> > + * Adapted from arch/arm64/kernel/efi-header.S
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +   .macro  __EFI_PE_HEADER
> > +   .long   PE_MAGIC
> > +coff_header:
> > +#ifdef CONFIG_64BIT
> > +   .short  IMAGE_FILE_MACHINE_RISCV64  // Machine
> > +#else
> > +   .short  IMAGE_FILE_MACHINE_RISCV32  // Machine
> > +#endif
> > +   .short  section_count   // NumberOfSections
> > +   .long   0   // TimeDateStamp
> > +   .long   0   // 
> > PointerToSymbolTable
> > +   .long   0   // NumberOfSymbols
> > +   .short  section_table - optional_header // 
> > SizeOfOptionalHeader
> > +   .short  IMAGE_FILE_DEBUG_STRIPPED | \
> > +   IMAGE_FILE_EXECUTABLE_IMAGE | \
> > +   IMAGE_FILE_LINE_NUMS_STRIPPED   // Characteristics
> > +
> > +optional_header:
> > +   .short  PE_OPT_MAGIC_PE32PLUS   // PE32+ format
>
> Are you sure both riscv32 and riscv64 use PE32+? IIUC, 32-bit
> architectures use PE32 not PE32+ (but I could be wrong)
>

Ahh yes. You are correct. Thanks for noticing it.
I just followed the U-Boot implementation [1]. I will update this in
the next revision and update the U-Boot code as well.

As per the specification, we also need to add a BaseOfData[2] entry for RV32.
Does any of the efi application loader actually use BaseOfData or can
we set it to zero for RV32 ?

[1] 
https://gitlab.denx.de/u-boot/custodians/u-boot-amlogic/-/blob/u-boot-amlogic/arch/riscv/lib/crt0_riscv_efi.S#L51
[2] https://docs.microsoft.com/en-us/windows/win32/debug/pe-format

> > +   .byte   0x02// 
> > MajorLinkerVersion
> > +   .byte   0x14// 
> > MinorLinkerVersion
> > +   .long   __pecoff_text_end - efi_header_end  // SizeOfCode
> > +   .long

[PATCH] MIPS: Remove unused BOOT_MEM_INIT_RAM

2020-09-11 Thread Youling Tang

Commit a94e4f24ec83 ("MIPS: init: Drop boot_mem_map") left
the BOOT_MEM_INIT_RAM unused, remove it.

Signed-off-by: Youling Tang 
---
 arch/mips/include/asm/bootinfo.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/mips/include/asm/bootinfo.h b/arch/mips/include/asm/bootinfo.h
index 147c932..39196ae 100644
--- a/arch/mips/include/asm/bootinfo.h
+++ b/arch/mips/include/asm/bootinfo.h
@@ -91,7 +91,6 @@ extern unsigned long mips_machtype;
 #define BOOT_MEM_RAM   1
 #define BOOT_MEM_ROM_DATA  2
 #define BOOT_MEM_RESERVED  3
-#define BOOT_MEM_INIT_RAM  4
 #define BOOT_MEM_NOMAP 5
 
 extern void add_memory_region(phys_addr_t start, phys_addr_t size, long type);
-- 
2.1.0

[PATCH] MIPS: netlogic: Remove unused code

2020-09-11 Thread Youling Tang

Remove some unused code.

Signed-off-by: Youling Tang 
---
 arch/mips/include/asm/netlogic/psb-bootinfo.h | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/arch/mips/include/asm/netlogic/psb-bootinfo.h 
b/arch/mips/include/asm/netlogic/psb-bootinfo.h
index 6878307..272544b 100644
--- a/arch/mips/include/asm/netlogic/psb-bootinfo.h
+++ b/arch/mips/include/asm/netlogic/psb-bootinfo.h
@@ -77,21 +77,6 @@ struct psb_info {
uint64_t avail_mem_map;
 };
 
-enum {
-   NETLOGIC_IO_SPACE = 0x10,
-   PCIX_IO_SPACE,
-   PCIX_CFG_SPACE,
-   PCIX_MEMORY_SPACE,
-   HT_IO_SPACE,
-   HT_CFG_SPACE,
-   HT_MEMORY_SPACE,
-   SRAM_SPACE,
-   FLASH_CONTROLLER_SPACE
-};
-
-#define NLM_MAX_ARGS   64
-#define NLM_MAX_ENVS   32
-
 /* This is what netlboot passes and linux boot_mem_map is subtly different */
 #define NLM_BOOT_MEM_MAP_MAX   32
 struct nlm_boot_mem_map {
-- 
2.1.0

[RFC/RFT PATCH v2 3/5] riscv: Separate memory init from paging init

2020-09-11 Thread Atish Patra

Currently, we perform some memory init functions in paging init. But,
that will be an issue for NUMA support where DT needs to be flattened
before numa initialization and memblock_present can only be called
after numa initialization.

Move memory initialization related functions to a separate function.

Signed-off-by: Atish Patra 
---
 arch/riscv/include/asm/pgtable.h | 1 +
 arch/riscv/kernel/setup.c| 1 +
 arch/riscv/mm/init.c | 6 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index eaea1f717010..515b42f98d34 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -466,6 +466,7 @@ static inline void __kernel_map_pages(struct page *page, 
int numpages, int enabl
 extern void *dtb_early_va;
 void setup_bootmem(void);
 void paging_init(void);
+void misc_mem_init(void);
 
 #define FIRST_USER_ADDRESS  0
 
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 2c6dd329312b..07fa6d13367e 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -78,6 +78,7 @@ void __init setup_arch(char **cmdline_p)
 #else
unflatten_device_tree();
 #endif
+   misc_mem_init();
 
 #ifdef CONFIG_SWIOTLB
swiotlb_init(1);
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 188281fc2816..8f31a5428ce4 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -568,8 +568,12 @@ static void __init resource_init(void)
 void __init paging_init(void)
 {
setup_vm_final();
-   sparse_init();
setup_zero_page();
+}
+
+void __init misc_mem_init(void)
+{
+   sparse_init();
zone_sizes_init();
resource_init();
 }
-- 
2.24.0

[RFC/RFT PATCH v2 4/5] riscv: Add support pte_protnone and pmd_protnone if CONFIG_NUMA_BALANCING

2020-09-11 Thread Atish Patra

From: Greentime Hu 

These two functions are used to distinguish between PROT_NONENUMA
protections and hinting fault protections.

Signed-off-by: Greentime Hu 
---
 arch/riscv/include/asm/pgtable.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 515b42f98d34..2751110675e6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -183,6 +183,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
return (unsigned long)pfn_to_virt(pmd_val(pmd) >> _PAGE_PFN_SHIFT);
 }
 
+static inline pte_t pmd_pte(pmd_t pmd)
+{
+   return __pte(pmd_val(pmd));
+}
+
 /* Yields the page frame number (PFN) of a page table entry */
 static inline unsigned long pte_pfn(pte_t pte)
 {
@@ -286,6 +291,21 @@ static inline pte_t pte_mkhuge(pte_t pte)
return pte;
 }
 
+#ifdef CONFIG_NUMA_BALANCING
+/*
+ * See the comment in include/asm-generic/pgtable.h
+ */
+static inline int pte_protnone(pte_t pte)
+{
+   return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) == 
_PAGE_PROT_NONE;
+}
+
+static inline int pmd_protnone(pmd_t pmd)
+{
+   return pte_protnone(pmd_pte(pmd));
+}
+#endif
+
 /* Modify page protection bits */
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
-- 
2.24.0

Re: inconsistent lock state in sco_conn_del

2020-09-11 Thread syzbot

syzbot has found a reproducer for the following issue on:

HEAD commit:e8878ab8 Merge tag 'spi-fix-v5.9-rc4' of git://git.kernel...
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1213075990
kernel config:  https://syzkaller.appspot.com/x/.config?x=c61610091f4ca8c4
dashboard link: https://syzkaller.appspot.com/bug?extid=65684128cd7c35bc66a1
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=121ef0fd90
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16c3a85390

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+65684128cd7c35bc6...@syzkaller.appspotmail.com


WARNING: inconsistent lock state
5.9.0-rc4-syzkaller #0 Not tainted

inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
syz-executor675/31233 [HC0[0]:SC0[0]:HE1:SE1] takes:
8880a75c50a0 (slock-AF_BLUETOOTH-BTPROTO_SCO){+.?.}-{2:2}, at: spin_lock 
include/linux/spinlock.h:354 [inline]
8880a75c50a0 (slock-AF_BLUETOOTH-BTPROTO_SCO){+.?.}-{2:2}, at: 
sco_conn_del+0x128/0x270 net/bluetooth/sco.c:176
{IN-SOFTIRQ-W} state was registered at:
  lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:354 [inline]
  sco_sock_timeout+0x24/0x140 net/bluetooth/sco.c:83
  call_timer_fn+0x1ac/0x760 kernel/time/timer.c:1413
  expire_timers kernel/time/timer.c:1458 [inline]
  __run_timers.part.0+0x67c/0xaa0 kernel/time/timer.c:1755
  __run_timers kernel/time/timer.c:1736 [inline]
  run_timer_softirq+0xae/0x1a0 kernel/time/timer.c:1768
  __do_softirq+0x1f7/0xa91 kernel/softirq.c:298
  asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
  __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
  do_softirq_own_stack+0x9d/0xd0 arch/x86/kernel/irq_64.c:77
  invoke_softirq kernel/softirq.c:393 [inline]
  __irq_exit_rcu kernel/softirq.c:423 [inline]
  irq_exit_rcu+0x235/0x280 kernel/softirq.c:435
  sysvec_apic_timer_interrupt+0x51/0xf0 arch/x86/kernel/apic/apic.c:1091
  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:581
  unwind_next_frame+0x139a/0x1f90 arch/x86/kernel/unwind_orc.c:607
  arch_stack_walk+0x81/0xf0 arch/x86/kernel/stacktrace.c:25
  stack_trace_save+0x8c/0xc0 kernel/stacktrace.c:123
  kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
  kasan_set_track mm/kasan/common.c:56 [inline]
  __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
  slab_post_alloc_hook mm/slab.h:518 [inline]
  slab_alloc mm/slab.c:3312 [inline]
  kmem_cache_alloc+0x13a/0x3a0 mm/slab.c:3482
  __d_alloc+0x2a/0x950 fs/dcache.c:1709
  d_alloc+0x4a/0x230 fs/dcache.c:1788
  d_alloc_parallel+0xe9/0x18e0 fs/dcache.c:2540
  lookup_open.isra.0+0x9ac/0x1350 fs/namei.c:3030
  open_last_lookups fs/namei.c:3177 [inline]
  path_openat+0x96d/0x2730 fs/namei.c:3365
  do_filp_open+0x17e/0x3c0 fs/namei.c:3395
  do_sys_openat2+0x16d/0x420 fs/open.c:1168
  do_sys_open fs/open.c:1184 [inline]
  __do_sys_open fs/open.c:1192 [inline]
  __se_sys_open fs/open.c:1188 [inline]
  __x64_sys_open+0x119/0x1c0 fs/open.c:1188
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
irq event stamp: 853
hardirqs last  enabled at (853): [] __raw_spin_unlock_irq 
include/linux/spinlock_api_smp.h:168 [inline]
hardirqs last  enabled at (853): [] 
_raw_spin_unlock_irq+0x1f/0x80 kernel/locking/spinlock.c:199
hardirqs last disabled at (852): [] __raw_spin_lock_irq 
include/linux/spinlock_api_smp.h:126 [inline]
hardirqs last disabled at (852): [] 
_raw_spin_lock_irq+0xa4/0xd0 kernel/locking/spinlock.c:167
softirqs last  enabled at (0): [] copy_process+0x1a99/0x6920 
kernel/fork.c:2018
softirqs last disabled at (0): [<>] 0x0

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(slock-AF_BLUETOOTH-BTPROTO_SCO);
  
lock(slock-AF_BLUETOOTH-BTPROTO_SCO);

 *** DEADLOCK ***

3 locks held by syz-executor675/31233:
 #0: 88809f104f40 (>req_lock){+.+.}-{3:3}, at: 
hci_dev_do_close+0xf5/0x1080 net/bluetooth/hci_core.c:1720
 #1: 88809f104078 (>lock){+.+.}-{3:3}, at: 
hci_dev_do_close+0x253/0x1080 net/bluetooth/hci_core.c:1757
 #2: 8a9188c8 (hci_cb_list_lock){+.+.}-{3:3}, at: hci_disconn_cfm 
include/net/bluetooth/hci_core.h:1435 [inline]
 #2: 8a9188c8 (hci_cb_list_lock){+.+.}-{3:3}, at: 
hci_conn_hash_flush+0xc7/0x220 net/bluetooth/hci_conn.c:1557

stack backtrace:
CPU: 1 PID: 31233 Comm: syz-executor675 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x198/0x1fd

[RFC/RFT PATCH v2 2/5] arm64, numa: Change the numa init function name to be generic

2020-09-11 Thread Atish Patra

As we are using generic numa implementation code, modify the init function
name to indicate that generic implementation.

Signed-off-by: Atish Patra 
---
 arch/arm64/kernel/acpi_numa.c | 13 -
 arch/arm64/mm/init.c  |  4 ++--
 drivers/base/arch_numa.c  | 29 ++---
 include/asm-generic/numa.h|  4 ++--
 4 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kernel/acpi_numa.c b/arch/arm64/kernel/acpi_numa.c
index 7ff800045434..96502ff92af5 100644
--- a/arch/arm64/kernel/acpi_numa.c
+++ b/arch/arm64/kernel/acpi_numa.c
@@ -117,16 +117,3 @@ void __init acpi_numa_gicc_affinity_init(struct 
acpi_srat_gicc_affinity *pa)
 
node_set(node, numa_nodes_parsed);
 }
-
-int __init arm64_acpi_numa_init(void)
-{
-   int ret;
-
-   ret = acpi_numa_init();
-   if (ret) {
-   pr_info("Failed to initialise from firmware\n");
-   return ret;
-   }
-
-   return srat_disabled() ? -EINVAL : 0;
-}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 481d22c32a2e..93b660229e1d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -418,10 +418,10 @@ void __init bootmem_init(void)
max_pfn = max_low_pfn = max;
min_low_pfn = min;
 
-   arm64_numa_init();
+   arch_numa_init();
 
/*
-* must be done after arm64_numa_init() which calls numa_init() to
+* must be done after arch_numa_init() which calls numa_init() to
 * initialize node_online_map that gets used in hugetlb_cma_reserve()
 * while allocating required CMA size across online nodes.
 */
diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
index 73f8b49d485c..a4039dcabd3e 100644
--- a/drivers/base/arch_numa.c
+++ b/drivers/base/arch_numa.c
@@ -13,7 +13,9 @@
 #include 
 #include 
 
+#ifdef CONFIG_ACPI_NUMA
 #include 
+#endif
 #include 
 
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
@@ -444,16 +446,37 @@ static int __init dummy_numa_init(void)
return 0;
 }
 
+#ifdef CONFIG_ACPI_NUMA
+int __init arch_acpi_numa_init(void)
+{
+   int ret;
+
+   ret = acpi_numa_init();
+   if (ret) {
+   pr_info("Failed to initialise from firmware\n");
+   return ret;
+   }
+
+   return srat_disabled() ? -EINVAL : 0;
+}
+#else
+int __init arch_acpi_numa_init(void)
+{
+   return -EOPNOTSUPP;
+}
+
+#endif
+
 /**
- * arm64_numa_init() - Initialize NUMA
+ * arch_numa_init() - Initialize NUMA
  *
  * Try each configured NUMA initialization method until one succeeds. The
  * last fallback is dummy single node config encomapssing whole memory.
  */
-void __init arm64_numa_init(void)
+void __init arch_numa_init(void)
 {
if (!numa_off) {
-   if (!acpi_disabled && !numa_init(arm64_acpi_numa_init))
+   if (!acpi_disabled && !numa_init(arch_acpi_numa_init))
return;
if (acpi_disabled && !numa_init(of_numa_init))
return;
diff --git a/include/asm-generic/numa.h b/include/asm-generic/numa.h
index 2718d5a6ff03..e7962db4ba44 100644
--- a/include/asm-generic/numa.h
+++ b/include/asm-generic/numa.h
@@ -27,7 +27,7 @@ static inline const struct cpumask *cpumask_of_node(int node)
 }
 #endif
 
-void __init arm64_numa_init(void);
+void __init arch_numa_init(void);
 int __init numa_add_memblk(int nodeid, u64 start, u64 end);
 void __init numa_set_distance(int from, int to, int distance);
 void __init numa_free_distance(void);
@@ -41,7 +41,7 @@ void numa_remove_cpu(unsigned int cpu);
 static inline void numa_store_cpu_info(unsigned int cpu) { }
 static inline void numa_add_cpu(unsigned int cpu) { }
 static inline void numa_remove_cpu(unsigned int cpu) { }
-static inline void arm64_numa_init(void) { }
+static inline void arch_numa_init(void) { }
 static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
 
 #endif /* CONFIG_NUMA */
-- 
2.24.0

[RFC/RFT PATCH v2 5/5] riscv: Add numa support for riscv64 platform

2020-09-11 Thread Atish Patra

Use the generic numa implementation to add NUMA support for RISC-V.
This is based on Greentime's patch[1] but modified to use generic NUMA
implementation and few more fixes.

[1] https://lkml.org/lkml/2020/1/10/233

Co-developed-by: Greentime Hu 
Signed-off-by: Greentime Hu 
Signed-off-by: Atish Patra 
---
 arch/riscv/Kconfig  | 31 ++-
 arch/riscv/include/asm/mmzone.h | 13 +
 arch/riscv/include/asm/numa.h   |  8 
 arch/riscv/include/asm/pci.h| 14 ++
 arch/riscv/kernel/setup.c   | 10 --
 arch/riscv/kernel/smpboot.c | 12 +++-
 arch/riscv/mm/init.c|  4 +++-
 7 files changed, 87 insertions(+), 5 deletions(-)
 create mode 100644 arch/riscv/include/asm/mmzone.h
 create mode 100644 arch/riscv/include/asm/numa.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index df18372861d8..7beb6ddb6eb1 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -137,7 +137,7 @@ config PAGE_OFFSET
default 0xffe0 if 64BIT && MAXPHYSMEM_128GB
 
 config ARCH_FLATMEM_ENABLE
-   def_bool y
+   def_bool !NUMA
 
 config ARCH_SPARSEMEM_ENABLE
def_bool y
@@ -295,6 +295,35 @@ config TUNE_GENERIC
 
 endchoice
 
+# Common NUMA Features
+config NUMA
+   bool "NUMA Memory Allocation and Scheduler Support"
+   select GENERIC_ARCH_NUMA
+   select OF_NUMA
+   select ARCH_SUPPORTS_NUMA_BALANCING
+   help
+ Enable NUMA (Non-Uniform Memory Access) support.
+
+ The kernel will try to allocate memory used by a CPU on the
+ local memory of the CPU and add some more NUMA awareness to the 
kernel.
+
+config NODES_SHIFT
+   int "Maximum NUMA Nodes (as a power of 2)"
+   range 1 10
+   default "2"
+   depends on NEED_MULTIPLE_NODES
+   help
+ Specify the maximum number of NUMA Nodes available on the target
+ system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+   def_bool y
+   depends on NUMA
+
+config NEED_PER_CPU_EMBED_FIRST_CHUNK
+   def_bool y
+   depends on NUMA
+
 config RISCV_ISA_C
bool "Emit compressed instructions when building Linux"
default y
diff --git a/arch/riscv/include/asm/mmzone.h b/arch/riscv/include/asm/mmzone.h
new file mode 100644
index ..fa17e01d9ab2
--- /dev/null
+++ b/arch/riscv/include/asm/mmzone.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_MMZONE_H
+#define __ASM_MMZONE_H
+
+#ifdef CONFIG_NUMA
+
+#include 
+
+extern struct pglist_data *node_data[];
+#define NODE_DATA(nid) (node_data[(nid)])
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_MMZONE_H */
diff --git a/arch/riscv/include/asm/numa.h b/arch/riscv/include/asm/numa.h
new file mode 100644
index ..8c8cf4297cc3
--- /dev/null
+++ b/arch/riscv/include/asm/numa.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_NUMA_H
+#define __ASM_NUMA_H
+
+#include 
+#include 
+
+#endif /* __ASM_NUMA_H */
diff --git a/arch/riscv/include/asm/pci.h b/arch/riscv/include/asm/pci.h
index 1c473a1bd986..658e112c3ce7 100644
--- a/arch/riscv/include/asm/pci.h
+++ b/arch/riscv/include/asm/pci.h
@@ -32,6 +32,20 @@ static inline int pci_proc_domain(struct pci_bus *bus)
/* always show the domain in /proc */
return 1;
 }
+
+#ifdef CONFIG_NUMA
+
+static inline int pcibus_to_node(struct pci_bus *bus)
+{
+   return dev_to_node(>dev);
+}
+#ifndef cpumask_of_pcibus
+#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ?\
+cpu_all_mask : \
+cpumask_of_node(pcibus_to_node(bus)))
+#endif
+#endif /* CONFIG_NUMA */
+
 #endif  /* CONFIG_PCI */
 
 #endif  /* _ASM_RISCV_PCI_H */
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 07fa6d13367e..53a806a9cbaf 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -101,13 +101,19 @@ void __init setup_arch(char **cmdline_p)
 
 static int __init topology_init(void)
 {
-   int i;
+   int i, ret;
+
+   for_each_online_node(i)
+   register_one_node(i);
 
for_each_possible_cpu(i) {
struct cpu *cpu = _cpu(cpu_devices, i);
 
cpu->hotpluggable = cpu_has_hotplug(i);
-   register_cpu(cpu, i);
+   ret = register_cpu(cpu, i);
+   if (unlikely(ret))
+   pr_warn("Warning: %s: register_cpu %d failed (%d)\n",
+  __func__, i, ret);
}
 
return 0;
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index 96167d55ed98..5e276c25646f 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -45,13 +46,18 @@ void __init smp_prepare_cpus(unsigned int

[RFC/RFT PATCH v2 1/5] numa: Move numa implementation to common code

2020-09-11 Thread Atish Patra

ARM64 numa implementation is generic enough that RISC-V can reuse that
implementation with very minor cosmetic changes. This will help both
ARM64 and RISC-V in terms of maintanace and feature improvement

Move the numa implementation code to common directory so that both ISAs
can reuse this. This doesn't introduce any function changes for ARM64.

Signed-off-by: Atish Patra 
---
 arch/arm64/Kconfig|  1 +
 arch/arm64/include/asm/numa.h | 45 +
 arch/arm64/mm/Makefile|  1 -
 drivers/base/Kconfig  |  6 +++
 drivers/base/Makefile |  1 +
 .../mm/numa.c => drivers/base/arch_numa.c |  0
 include/asm-generic/numa.h| 49 +++
 7 files changed, 58 insertions(+), 45 deletions(-)
 rename arch/arm64/mm/numa.c => drivers/base/arch_numa.c (100%)
 create mode 100644 include/asm-generic/numa.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6d232837cbee..955a0cf75b16 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -960,6 +960,7 @@ config HOTPLUG_CPU
 # Common NUMA Features
 config NUMA
bool "NUMA Memory Allocation and Scheduler Support"
+   select GENERIC_ARCH_NUMA
select ACPI_NUMA if ACPI
select OF_NUMA
help
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index 626ad01e83bf..8c8cf4297cc3 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -3,49 +3,6 @@
 #define __ASM_NUMA_H
 
 #include 
-
-#ifdef CONFIG_NUMA
-
-#define NR_NODE_MEMBLKS(MAX_NUMNODES * 2)
-
-int __node_distance(int from, int to);
-#define node_distance(a, b) __node_distance(a, b)
-
-extern nodemask_t numa_nodes_parsed __initdata;
-
-extern bool numa_off;
-
-/* Mappings between node number and cpus on that node. */
-extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
-void numa_clear_node(unsigned int cpu);
-
-#ifdef CONFIG_DEBUG_PER_CPU_MAPS
-const struct cpumask *cpumask_of_node(int node);
-#else
-/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
-static inline const struct cpumask *cpumask_of_node(int node)
-{
-   return node_to_cpumask_map[node];
-}
-#endif
-
-void __init arm64_numa_init(void);
-int __init numa_add_memblk(int nodeid, u64 start, u64 end);
-void __init numa_set_distance(int from, int to, int distance);
-void __init numa_free_distance(void);
-void __init early_map_cpu_to_node(unsigned int cpu, int nid);
-void numa_store_cpu_info(unsigned int cpu);
-void numa_add_cpu(unsigned int cpu);
-void numa_remove_cpu(unsigned int cpu);
-
-#else  /* CONFIG_NUMA */
-
-static inline void numa_store_cpu_info(unsigned int cpu) { }
-static inline void numa_add_cpu(unsigned int cpu) { }
-static inline void numa_remove_cpu(unsigned int cpu) { }
-static inline void arm64_numa_init(void) { }
-static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
-
-#endif /* CONFIG_NUMA */
+#include 
 
 #endif /* __ASM_NUMA_H */
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index d91030f0ffee..928c308b044b 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -6,7 +6,6 @@ obj-y   := dma-mapping.o extable.o 
fault.o init.o \
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_PTDUMP_CORE)  += dump.o
 obj-$(CONFIG_PTDUMP_DEBUGFS)   += ptdump_debugfs.o
-obj-$(CONFIG_NUMA) += numa.o
 obj-$(CONFIG_DEBUG_VIRTUAL)+= physaddr.o
 KASAN_SANITIZE_physaddr.o  += n
 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 8d7001712062..c5956c8845cc 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -210,4 +210,10 @@ config GENERIC_ARCH_TOPOLOGY
  appropriate scaling, sysfs interface for reading capacity values at
  runtime.
 
+config GENERIC_ARCH_NUMA
+   bool
+   help
+ Enable support for generic NUMA implementation. Currently, RISC-V
+ and ARM64 uses it.
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 157452080f3d..c3d02c644222 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_PINCTRL) += pinctrl.o
 obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o
 obj-$(CONFIG_GENERIC_MSI_IRQ_DOMAIN) += platform-msi.o
 obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
+obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
 
 obj-y  += test/
 
diff --git a/arch/arm64/mm/numa.c b/drivers/base/arch_numa.c
similarity index 100%
rename from arch/arm64/mm/numa.c
rename to drivers/base/arch_numa.c
diff --git a/include/asm-generic/numa.h b/include/asm-generic/numa.h
new file mode 100644
index ..2718d5a6ff03
--- /dev/null
+++ b/include/asm-generic/numa.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_GENERIC_NUMA_H
+#define __ASM_GENERIC_NUMA_H
+
+#ifdef CONFIG_NUMA
+
+#define NR_NODE_MEMBLKS

[RFC/RFT PATCH v2 0/5] Unify NUMA implementation between ARM64 & RISC-V

2020-09-11 Thread Atish Patra

This series attempts to move the ARM64 numa implementation to common
code so that RISC-V can leverage that as well instead of reimplementing
it again.

RISC-V specific bits are based on initial work done by Greentime Hu [1] but
modified to reuse the common implementation to avoid duplication.

[1] https://lkml.org/lkml/2020/1/10/233

This series has been tested on qemu with numa enabled for both RISC-V & ARM64.
It would be great if somebody can test it on numa capable ARM64 hardware 
platforms.
This patch series doesn't modify the maintainers list for the common code 
(arch_numa)
as I am not sure if somebody from ARM64 community or Greg should take up the
maintainership. Ganapatrao was the original author of the arm64 version.
I would be happy to update that in the next revision once it is decided.

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 486 MB
node 0 free: 470 MB
node 1 cpus: 4 5 6 7
node 1 size: 424 MB
node 1 free: 408 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
# numactl -show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 
cpubind: 0 1 
nodebind: 0 1 
membind: 0 1 

For RISC-V, the following qemu series is a pre-requisite(already available in 
upstream)
to test the patches in Qemu and 2 socket OmniXtend FPGA.

https://patchwork.kernel.org/project/qemu-devel/list/?series=303313

The patches are also available at

https://github.com/atishp04/linux/tree/5.10_numa_unified_v2

There may be some minor conflicts with Mike's cleanup series [2] depending on 
the
order in which these two series are being accepted. I can rebase on top his 
series
if required.

[2] https://lkml.org/lkml/2020/8/18/754

Atish Patra (4):
numa: Move numa implementation to common code
arm64, numa: Change the numa init function name to be generic
riscv: Separate memory init from paging init
riscv: Add numa support for riscv64 platform

Greentime Hu (1):
riscv: Add support pte_protnone and pmd_protnone if
CONFIG_NUMA_BALANCING

arch/arm64/Kconfig|  1 +
arch/arm64/include/asm/numa.h | 45 +
arch/arm64/kernel/acpi_numa.c | 13 -
arch/arm64/mm/Makefile|  1 -
arch/arm64/mm/init.c  |  4 +-
arch/riscv/Kconfig| 31 +++-
arch/riscv/include/asm/mmzone.h   | 13 +
arch/riscv/include/asm/numa.h |  8 +++
arch/riscv/include/asm/pci.h  | 14 ++
arch/riscv/include/asm/pgtable.h  | 21 
arch/riscv/kernel/setup.c | 11 -
arch/riscv/kernel/smpboot.c   | 12 -
arch/riscv/mm/init.c  | 10 +++-
drivers/base/Kconfig  |  6 +++
drivers/base/Makefile |  1 +
.../mm/numa.c => drivers/base/arch_numa.c | 29 +--
include/asm-generic/numa.h| 49 +++
17 files changed, 200 insertions(+), 69 deletions(-)
create mode 100644 arch/riscv/include/asm/mmzone.h
create mode 100644 arch/riscv/include/asm/numa.h
rename arch/arm64/mm/numa.c => drivers/base/arch_numa.c (95%)
create mode 100644 include/asm-generic/numa.h

--
2.24.0

Re: [PATCH] i2c: do not acpi/of match device in i2c_device_probe()

2020-09-11 Thread Sergey Senozhatsky

On (20/08/26 23:49), Sergey Senozhatsky wrote:
> i2c, apparently, can match the same device twice - the first
> time in ->match bus hook (i2c_device_match()), and the second
> one in ->probe (i2c_device_probe()) bus hook.
> 
> To make things more complicated, the second matching does not
> do exactly same checks as the first one. Namely, i2c_device_match()
> calls acpi_driver_match_device() which considers devices that
> provide of_match_table and performs of_compatible() matching for
> such devices. One important thing to note here is that ACPI
> of_compatible() matching (acpi_of_match_device()) is part of ACPI
> and does not depend on CONFIG_OF.
> 
> i2c_device_probe(), on the other hand, calls acpi_match_device()
> which does not perform of_compatible() matching, but instead
> i2c_device_probe() relies on CONFIG_OF API to perform of_match_table
> matching, IOW ->probe matching, unlike ->match matching, depends on
> CONFIG_OF. This can break i2c device probing on !CONFIG_OF systems
> if the device does not provide .id_table.
> 
>  i2c_device_probe()
>  ...
>if (!driver->id_table &&
>!i2c_acpi_match_device(dev->driver->acpi_match_table, client) &&
>!i2c_of_match_device(dev->driver->of_match_table, client)) {
>status = -ENODEV;
>goto put_sync_adapter;
>}
> 
> i2c_of_match_device() on !CONFIG_OF systems is always false, so we never
> perform of_match_table matching. i2c_acpi_match_device() does ACPI match
> only, no of_compatible() matching takes place, even though the device
> provides .of_match_table and ACPI is capable of matching such device.
> 
> It is not entirely clear why the device is matched again in bus
> ->probe after successful and proper matching in bus ->match. Let's
> remove ->probe matching.

Hi,

Gentle ping.

-ss

[PATCH v4 1/1] Input: atmel_mxt_ts - implement I2C retries

2020-09-11 Thread Jiada Wang

From: Nick Dyer 

Some maXTouch chips (eg mXT1386) will not respond on the first I2C request
when they are in a sleep state. It must be retried after a delay for the
chip to wake up.

Signed-off-by: Nick Dyer 
[gdavis: Forward port and fix conflicts.]
Signed-off-by: George G. Davis 
[jiada: return exact errno when i2c_transfer & i2c_master_send fails
rename "retry" to "retried" and keep its order in length
set "ret" to correct errno before calling dev_err()
remove reduntant conditional]
Signed-off-by: Jiada Wang 
Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
---
 drivers/input/touchscreen/atmel_mxt_ts.c | 38 
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c 
b/drivers/input/touchscreen/atmel_mxt_ts.c
index a2189739e30f..bad3ac58503d 100644
--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -196,6 +196,7 @@ enum t100_type {
 #define MXT_CRC_TIMEOUT1000/* msec */
 #define MXT_FW_RESET_TIME  3000/* msec */
 #define MXT_FW_CHG_TIMEOUT 300 /* msec */
+#define MXT_WAKEUP_TIME25  /* msec */
 
 /* Command to unlock bootloader */
 #define MXT_UNLOCK_CMD_MSB 0xaa
@@ -624,6 +625,7 @@ static int __mxt_read_reg(struct i2c_client *client,
   u16 reg, u16 len, void *val)
 {
struct i2c_msg xfer[2];
+   bool retried = false;
u8 buf[2];
int ret;
 
@@ -642,22 +644,28 @@ static int __mxt_read_reg(struct i2c_client *client,
xfer[1].len = len;
xfer[1].buf = val;
 
-   ret = i2c_transfer(client->adapter, xfer, 2);
-   if (ret == 2) {
-   ret = 0;
-   } else {
-   if (ret >= 0)
-   ret = -EIO;
+retry_read:
+   ret = i2c_transfer(client->adapter, xfer, ARRAY_SIZE(xfer));
+   if (ret != ARRAY_SIZE(xfer)) {
+   if (!retried) {
+   dev_dbg(>dev, "i2c retry\n");
+   msleep(MXT_WAKEUP_TIME);
+   retried = true;
+   goto retry_read;
+   }
+   ret = ret < 0 ? ret : -EIO;
dev_err(>dev, "%s: i2c transfer failed (%d)\n",
__func__, ret);
+   return ret;
}
 
-   return ret;
+   return 0;
 }
 
 static int __mxt_write_reg(struct i2c_client *client, u16 reg, u16 len,
   const void *val)
 {
+   bool retried = false;
u8 *buf;
size_t count;
int ret;
@@ -671,14 +679,20 @@ static int __mxt_write_reg(struct i2c_client *client, u16 
reg, u16 len,
buf[1] = (reg >> 8) & 0xff;
memcpy([2], val, len);
 
+retry_write:
ret = i2c_master_send(client, buf, count);
-   if (ret == count) {
-   ret = 0;
-   } else {
-   if (ret >= 0)
-   ret = -EIO;
+   if (ret != count) {
+   if (!retried) {
+   dev_dbg(>dev, "i2c retry\n");
+   msleep(MXT_WAKEUP_TIME);
+   retried = true;
+   goto retry_write;
+   }
+   ret = ret < 0 ? ret : -EIO;
dev_err(>dev, "%s: i2c send failed (%d)\n",
__func__, ret);
+   } else {
+   ret = 0;
}
 
kfree(buf);
-- 
2.17.1

[no subject]

2020-09-11 Thread Mikhail Fridman





-- 
I, Mikhail Fridman have selected you specifically as one of my
beneficiaries for my Charitable Donation of $5 Million Dollars,

Check the link below for confirmation:

https://www.rt.com/business/343781-mikhail-fridman-will-charity/

I await your earliest response for further directives.

Best Regards,
Mikhail Fridman.

[PATCH] core/entry: Report syscall correctly for trace and audit

2020-09-11 Thread Kees Cook

On v5.8 when doing seccomp syscall rewrites (e.g. getpid into getppid
as seen in the seccomp selftests), trace (and audit) correctly see the
rewritten syscall on entry and exit:

seccomp_bpf-1307  [000]  22974.874393: sys_enter: NR 110 (...
seccomp_bpf-1307  [000] .N.. 22974.874401: sys_exit: NR 110 = 1304

With mainline we see a mismatched enter and exit (the original syscall
is incorrectly visible on entry):

seccomp_bpf-1030  [000] 21.806766: sys_enter: NR 39 (...
seccomp_bpf-1030  [000] 21.806767: sys_exit: NR 110 = 1027

When ptrace or seccomp change the syscall, this needs to be visible to
trace and audit at that time as well. Update the syscall earlier so they
see the correct value.

Reported-by: Michael Ellerman 
Fixes: d88d59b64ca3 ("core/entry: Respect syscall number rewrites")
Cc: Thomas Gleixner 
Cc: Kyle Huey 
Cc: Andy Lutomirski 
Cc: Ingo Molnar 
Signed-off-by: Kees Cook 
---
 kernel/entry/common.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 18683598edbc..6fdb6105e6d6 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -60,13 +60,15 @@ static long syscall_trace_enter(struct pt_regs *regs, long 
syscall,
return ret;
}
 
+   /* Either of the above might have changed the syscall number */
+   syscall = syscall_get_nr(current, regs);
+
if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
trace_sys_enter(regs, syscall);
 
syscall_enter_audit(regs, syscall);
 
-   /* The above might have changed the syscall number */
-   return ret ? : syscall_get_nr(current, regs);
+   return ret ? : syscall;
 }
 
 static __always_inline long
-- 
2.25.1

Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs

2020-09-11 Thread Bjorn Helgaas

On Fri, Sep 11, 2020 at 04:16:03PM -0700, Sean V Kelley wrote:
> On 4 Sep 2020, at 19:23, Bjorn Helgaas wrote:
> > On Fri, Sep 04, 2020 at 10:18:30PM +, Kelley, Sean V wrote:
> > > Hi Bjorn,
> > > 
> > > Quick question below...
> > > 
> > > On Wed, 2020-09-02 at 14:55 -0700, Sean V Kelley wrote:
> > > > Hi Bjorn,
> > > > 
> > > > On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:
> > > > > On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
> > > > > > From: Qiuxu Zhuo 
> > > > > > 
> > > > > > When an RCEC device signals error(s) to a CPU core, the CPU core
> > > > > > needs to walk all the RCiEPs associated with that RCEC to check
> > > > > > errors. So add the function pcie_walk_rcec() to walk all RCiEPs
> > > > > > associated with the RCEC device.
> > > > > > 
> > > > > > Co-developed-by: Sean V Kelley 
> > > > > > Signed-off-by: Sean V Kelley 
> > > > > > Signed-off-by: Qiuxu Zhuo 
> > > > > > Reviewed-by: Jonathan Cameron 
> > > > > > ---
> > > > > >  drivers/pci/pci.h   |  4 +++
> > > > > >  drivers/pci/pcie/rcec.c | 76
> > > > > > +
> > > > > >  2 files changed, 80 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > > > > index bd25e6047b54..8bd7528d6977 100644
> > > > > > --- a/drivers/pci/pci.h
> > > > > > +++ b/drivers/pci/pci.h
> > > > > > @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct
> > > > > > pci_dev
> > > > > > *pdev) {}
> > > > > >  #ifdef CONFIG_PCIEPORTBUS
> > > > > >  void pci_rcec_init(struct pci_dev *dev);
> > > > > >  void pci_rcec_exit(struct pci_dev *dev);
> > > > > > +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
> > > > > > pci_dev
> > > > > > *, void *),
> > > > > > +   void *userdata);
> > > > > >  #else
> > > > > >  static inline void pci_rcec_init(struct pci_dev *dev) {}
> > > > > >  static inline void pci_rcec_exit(struct pci_dev *dev) {}
> > > > > > +static inline void pcie_walk_rcec(struct pci_dev *rcec, int
> > > > > > (*cb)(struct pci_dev *, void *),
> > > > > > + void *userdata) {}
> > > > > >  #endif
> > > > > > 
> > > > > >  #ifdef CONFIG_PCI_ATS
> > > > > > diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> > > > > > index 519ae086ff41..405f92fcdf7f 100644
> > > > > > --- a/drivers/pci/pcie/rcec.c
> > > > > > +++ b/drivers/pci/pcie/rcec.c
> > > > > > @@ -17,6 +17,82 @@
> > > > > > 
> > > > > >  #include "../pci.h"
> > > > > > 
> > > > > > +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
> > > > > > (*cb)(struct pci_dev *, void *),
> > > > > > +void *userdata, const unsigned long
> > > > > > bitmap)
> > > > > > +{
> > > > > > +   unsigned int devn, fn;
> > > > > > +   struct pci_dev *dev;
> > > > > > +   int retval;
> > > > > > +
> > > > > > +   for_each_set_bit(devn, , 32) {
> > > > > > +   for (fn = 0; fn < 8; fn++) {
> > > > > > +   dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
> > > > > 
> > > > > Wow, this is a lot of churning to call pci_get_slot() 256 times per
> > > > > bus for the "associated bus numbers" case where we pass a bitmap of
> > > > > 0x.  They didn't really make it easy for software when they
> > > > > added the next/last bus number thing.
> > > > > 
> > > > > Just thinking out loud here.  What if we could set dev->rcec during
> > > > > enumeration, and then use that to build pcie_walk_rcec()?
> > > > 
> > > > I think follow what you are doing.
> > > > 
> > > > As we enumerate an RCEC, use the time to discover RCiEPs and
> > > > associate
> > > > each RCiEP's dev->rcec. Although BIOS already set the bitmap for
> > > > this
> > > > specific RCEC, it's more efficient to simply discover the devices
> > > > through the bus walk and verify each one found against the bitmap.
> > > > 
> > > > Further, while we can be certain that an RCiEP found with a matching
> > > > device no. in a bitmap for an associated RCEC is correct, we cannot
> > > > be
> > > > certain that any RCiEP found on another bus range is correct unless
> > > > we
> > > > verify the bus is within that next/last bus range.
> > > > 
> > > > Finally, that's where find_rcec() callback for rcec_assoc_rciep()
> > > > does
> > > > double duty by also checking on the "on-a-separate-bus" case
> > > > captured
> > > > potentially by find_rcec() during an RCiEP's bus walk.
> > > > 
> > > > 
> > > > >   bool rcec_assoc_rciep(rcec, rciep)
> > > > >   {
> > > > > if (rcec->bus == rciep->bus)
> > > > >   return (rcec->bitmap contains rciep->devfn);
> > > > > 
> > > > > return (rcec->next/last contains rciep->bus);
> > > > >   }
> > > > > 
> > > > >   link_rcec(dev, data)
> > > > >   {
> > > > > struct pci_dev *rcec = data;
> > > > > 
> > > > > if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
> > > > >   dev->rcec = rcec;
> > > > >   }
> > > > > 
> > > > >   find_rcec(dev, data)
> > > > >   {
> > > > > struct pci_dev

Re: [PATCH v2 1/1] Input: atmel_mxt_ts - implement I2C retries

2020-09-11 Thread Wang, Jiada


Hi Andy

Thanks for your comment

On 2020/09/06 3:02, Andy Shevchenko wrote:



On Thursday, September 3, 2020, Jiada Wang > wrote:


From: Nick Dyer mailto:nick.d...@itdev.co.uk>>

Some maXTouch chips (eg mXT1386) will not respond on the first I2C
request
when they are in a sleep state. It must be retried after a delay for the
chip to wake up.

Signed-off-by: Nick Dyer mailto:nick.d...@itdev.co.uk>>
Acked-by: Yufeng Shen mailto:mile...@chromium.org>>


(cherry picked from ndyer/linux/for-upstream commit
63fd7a2cd03c3a572a5db39c52f4856819e1835d)


It’s a noise for upstream.


sure, I will remove it


[gdavis: Forward port and fix conflicts.]
Signed-off-by: George G. Davis mailto:george_da...@mentor.com>>
[jiada: return exact errno when i2c_transfer & i2c_master_send fails]
Signed-off-by: Jiada Wang mailto:jiada_w...@mentor.com>>
---
  drivers/input/touchscreen/atmel_mxt_ts.c | 45 
  1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c
b/drivers/input/touchscreen/atmel_mxt_ts.c
index a2189739e30f..5d4cb15d21dc 100644
--- a/drivers/input/touchscreen/atmel_mxt_ts.c
+++ b/drivers/input/touchscreen/atmel_mxt_ts.c
@@ -196,6 +196,7 @@ enum t100_type {
  #define MXT_CRC_TIMEOUT                1000    /* msec */
  #define MXT_FW_RESET_TIME      3000    /* msec */
  #define MXT_FW_CHG_TIMEOUT     300     /* msec */
+#define MXT_WAKEUP_TIME                25      /* msec */


Can we simple add _MS unit suffix to the definition?
As Dmitry commented, I'd like to keep it as-is, probably a separate 
patch to update all these together.



  /* Command to unlock bootloader */
  #define MXT_UNLOCK_CMD_MSB     0xaa
@@ -626,6 +627,7 @@ static int __mxt_read_reg(struct i2c_client *client,
         struct i2c_msg xfer[2];
         u8 buf[2];
         int ret;
+       bool retry = false;


Keep this ordered by length.

I will move "retry" upper.



         buf[0] = reg & 0xff;
         buf[1] = (reg >> 8) & 0xff;
@@ -642,17 +644,22 @@ static int __mxt_read_reg(struct i2c_client
*client,
         xfer[1].len = len;
         xfer[1].buf = val;

-       ret = i2c_transfer(client->adapter, xfer, 2);
-       if (ret == 2) {
-               ret = 0;
-       } else {
-               if (ret >= 0)
-                       ret = -EIO;
-               dev_err(>dev, "%s: i2c transfer failed (%d)\n",
-                       __func__, ret);
+retry_read:
+       ret = i2c_transfer(client->adapter, xfer, ARRAY_SIZE(xfer));
+       if (ret != ARRAY_SIZE(xfer)) {
+               if (!retry) {


Why not positive conditional?

to me, it's not much different either positive or negative conditional,
can you elaborate more about this?




+                       dev_dbg(>dev, "%s: i2c retry\n",
__func__);


__func__ is redundant for dev_dbg().

+                       msleep(MXT_WAKEUP_TIME);
+                       retry = true;
+                       goto retry_read;

+               } else {


Redundant in either case of conditional. Allows to drop indentation level.

I will remove the redundant conditional

Thanks,
Jiada


+                       dev_err(>dev, "%s: i2c transfer
failed (%d)\n",
+                               __func__, ret);
+                       return ret < 0 ? ret : -EIO;
+               }
         }

-       return ret;
+       return 0;
  }


Same comments about below.

  static int __mxt_write_reg(struct i2c_client *client, u16 reg, u16
len,
@@ -661,6 +668,7 @@ static int __mxt_write_reg(struct i2c_client
*client, u16 reg, u16 len,
         u8 *buf;
         size_t count;
         int ret;
+       bool retry = false;

         count = len + 2;
         buf = kmalloc(count, GFP_KERNEL);
@@ -671,14 +679,21 @@ static int __mxt_write_reg(struct i2c_client
*client, u16 reg, u16 len,
         buf[1] = (reg >> 8) & 0xff;
         memcpy([2], val, len);

+retry_write:
         ret = i2c_master_send(client, buf, count);
-       if (ret == count) {
-               ret = 0;
+       if (ret != count) {
+               if (!retry) {
+                       dev_dbg(>dev, "%s: i2c retry\n",
__func__);
+                       msleep(MXT_WAKEUP_TIME);
+                       retry = true;
+                       goto retry_write;
+               } else {
+                       dev_err(>dev, "%s: i2c send failed
(%d)\n",
+                               __func__, ret);
+                       ret = ret < 0 ? ret : -EIO;
+               }
         } else {
-               if (ret >= 0)
-                       ret = -EIO;
-

kernel panic: stack is corrupted in get_kernel_gp_address

2020-09-11 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:f4d51dff Linux 5.9-rc4
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14aa2d3e90
kernel config:  https://syzkaller.appspot.com/x/.config?x=a9075b36a6ae26c9
dashboard link: https://syzkaller.appspot.com/bug?extid=d6459d8f8984c0929e54
compiler:   gcc (GCC) 10.1.0-syz 20200507
userspace arch: i386
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=164270dd90

Bisection is inconclusive: the issue happens on the oldest tested release.

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13c7d9f990
final oops: https://syzkaller.appspot.com/x/report.txt?x=1027d9f990
console output: https://syzkaller.appspot.com/x/log.txt?x=17c7d9f990

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d6459d8f8984c0929...@syzkaller.appspotmail.com

���ACode: Bad RIP value.
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 
get_kernel_gp_address+0x1a0/0x1c0 arch/x86/kernel/traps.c:520
Kernel Offset: disabled


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

[PATCH net-next v2 4/7] net: ipa: manage endpoints separate from clock

2020-09-11 Thread Alex Elder

Currently, when (before) the last IPA clock reference is dropped,
all endpoints are suspended.  And whenever the first IPA clock
reference is taken, all endpoints are resumed (or started).

In most cases there's no need to start endpoints when the clock
starts.  So move the calls to ipa_endpoint_suspend() and
ipa_endpoint_resume() out of ipa_clock_put() and ipa_clock_get(),
respectiely.  Instead, only suspend endpoints when handling a system
suspend, and only resume endpoints when handling a system resume.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa_clock.c | 14 --
 drivers/net/ipa/ipa_main.c  |  8 
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ipa/ipa_clock.c b/drivers/net/ipa/ipa_clock.c
index b703866f2e20b..a2c0fde058199 100644
--- a/drivers/net/ipa/ipa_clock.c
+++ b/drivers/net/ipa/ipa_clock.c
@@ -200,9 +200,8 @@ bool ipa_clock_get_additional(struct ipa *ipa)
 
 /* Get an IPA clock reference.  If the reference count is non-zero, it is
  * incremented and return is immediate.  Otherwise it is checked again
- * under protection of the mutex, and if appropriate the clock (and
- * interconnects) are enabled suspended endpoints (if any) are resumed
- * before returning.
+ * under protection of the mutex, and if appropriate the IPA clock
+ * is enabled.
  *
  * Incrementing the reference count is intentionally deferred until
  * after the clock is running and endpoints are resumed.
@@ -229,17 +228,14 @@ void ipa_clock_get(struct ipa *ipa)
goto out_mutex_unlock;
}
 
-   ipa_endpoint_resume(ipa);
-
refcount_set(>count, 1);
 
 out_mutex_unlock:
mutex_unlock(>mutex);
 }
 
-/* Attempt to remove an IPA clock reference.  If this represents the last
- * reference, suspend endpoints and disable the clock (and interconnects)
- * under protection of a mutex.
+/* Attempt to remove an IPA clock reference.  If this represents the
+ * last reference, disable the IPA clock under protection of the mutex.
  */
 void ipa_clock_put(struct ipa *ipa)
 {
@@ -249,8 +245,6 @@ void ipa_clock_put(struct ipa *ipa)
if (!refcount_dec_and_mutex_lock(>count, >mutex))
return;
 
-   ipa_endpoint_suspend(ipa);
-
ipa_clock_disable(ipa);
 
mutex_unlock(>mutex);
diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index cfdf60ded86ca..3b68b53c99015 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -913,11 +913,15 @@ static int ipa_remove(struct platform_device *pdev)
  * Return: Always returns zero
  *
  * Called by the PM framework when a system suspend operation is invoked.
+ * Suspends endpoints and releases the clock reference held to keep
+ * the IPA clock running until this point.
  */
 static int ipa_suspend(struct device *dev)
 {
struct ipa *ipa = dev_get_drvdata(dev);
 
+   ipa_endpoint_suspend(ipa);
+
ipa_clock_put(ipa);
if (!test_and_clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
dev_err(dev, "suspend: missing suspend clock reference\n");
@@ -932,6 +936,8 @@ static int ipa_suspend(struct device *dev)
  * Return: Always returns 0
  *
  * Called by the PM framework when a system resume operation is invoked.
+ * Takes an IPA clock reference to keep the clock running until suspend,
+ * and resumes endpoints.
  */
 static int ipa_resume(struct device *dev)
 {
@@ -945,6 +951,8 @@ static int ipa_resume(struct device *dev)
else
dev_err(dev, "resume: duplicate suspend clock reference\n");
 
+   ipa_endpoint_resume(ipa);
+
return 0;
 }
 
-- 
2.20.1

[PATCH net-next v2 6/7] net: ipa: enable wakeup on IPA interrupt

2020-09-11 Thread Alex Elder

Now that we handle wakeup interrupts properly, arrange for the IPA
interrupt to be treated as a wakeup interrupt.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa_interrupt.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/ipa/ipa_interrupt.c b/drivers/net/ipa/ipa_interrupt.c
index 90353987c45fc..cc1ea28f7bc2e 100644
--- a/drivers/net/ipa/ipa_interrupt.c
+++ b/drivers/net/ipa/ipa_interrupt.c
@@ -237,8 +237,16 @@ struct ipa_interrupt *ipa_interrupt_setup(struct ipa *ipa)
goto err_kfree;
}
 
+   ret = enable_irq_wake(irq);
+   if (ret) {
+   dev_err(dev, "error %d enabling wakeup for \"ipa\" IRQ\n", ret);
+   goto err_free_irq;
+   }
+
return interrupt;
 
+err_free_irq:
+   free_irq(interrupt->irq, interrupt);
 err_kfree:
kfree(interrupt);
 
@@ -248,6 +256,12 @@ struct ipa_interrupt *ipa_interrupt_setup(struct ipa *ipa)
 /* Tear down the IPA interrupt framework */
 void ipa_interrupt_teardown(struct ipa_interrupt *interrupt)
 {
+   struct device *dev = >ipa->pdev->dev;
+   int ret;
+
+   ret = disable_irq_wake(interrupt->irq);
+   if (ret)
+   dev_err(dev, "error %d disabling \"ipa\" IRQ wakeup\n", ret);
free_irq(interrupt->irq, interrupt);
kfree(interrupt);
 }
-- 
2.20.1

[PATCH net-next v2 0/7] net: ipa: wake up system on RX available

2020-09-11 Thread Alex Elder

This series arranges for the IPA driver to wake up a suspended
system if the IPA hardware has a packet to deliver to the AP.
Version 2 replaces the first patch from version 1 with three
patches, in response to David Miller's feedback.

Specifically:
  - The first patch now replaces an atomic_t field with a
refcount_t.  The affected field is not the one David
commented on, but this fix is consistent with what he
asked for.
  - The second patch replaces the atomic_t field David *did*
comment on with a single bit in a new bitmap field;
ultimately what's needed there is a Boolean flag anyway.
  - The third patch is renamed, but basically does the same
thing the first patch did in version 1.  It now operates
on a bit in a bitmap rather than on an atomic variable.

Currently, the GSI interrupt is set up to be a waking interrupt.
But the GSI interrupt won't actually fire for a stopped channel (or
a channel that underlies a suspended endpoint).  The fix involves
having the IPA rather than GSI interrupt wake up the AP.

The IPA hardware clock is managed by both the modem and the AP.
Even if the AP is in a fully-suspended state, the modem can clock
the IPA hardware, and can send a packet through IPA that is destined
for an endpoint on the AP.

When the IPA hardware finds a packet's destination is stopped or
suspended, it sends an *IPA interrupt* to the destination "execution
environment" (EE--in this case, the AP).  The desired behavior is
for the EE (even if suspended) to be able to handle the incoming
packet.

To do this, we arrange for the IPA interrupt to be a wakeup
interrupt.  And if the system is suspended when that interrupt
fires, we trigger a system resume operation.  While resuming the
system, the IPA driver starts all its channels (or for SDM845,
takes its endpoints out of suspend mode).

Whenever an RX channel is started, if it has a packet ready to be
consumed, the GSI interrupt will fire.  At this point the inbound
packet that caused this wakeup activity will be received.

The first three patches in the series were described above.  The
next three arrange for the IPA interrupt wake up the system.
Finally, with this design, we no longer want the GSI interrupt to
wake a suspended system, so that is removed by the last patch.

-Alex


Alex Elder (7):
  net: ipa: use refcount_t for IPA clock reference count
  net: ipa: replace ipa->suspend_ref with a flag bit
  net: ipa: verify reference flag values
  net: ipa: manage endpoints separate from clock
  net: ipa: use device_init_wakeup()
  net: ipa: enable wakeup on IPA interrupt
  net: ipa: do not enable GSI interrupt for wakeup

 drivers/net/ipa/gsi.c   | 17 ++--
 drivers/net/ipa/gsi.h   |  1 -
 drivers/net/ipa/ipa.h   | 16 +--
 drivers/net/ipa/ipa_clock.c | 28 +---
 drivers/net/ipa/ipa_interrupt.c | 14 ++
 drivers/net/ipa/ipa_main.c  | 76 +++--
 6 files changed, 84 insertions(+), 68 deletions(-)

-- 
2.20.1

[PATCH net-next v2 7/7] net: ipa: do not enable GSI interrupt for wakeup

2020-09-11 Thread Alex Elder

We now trigger a system resume when we receive an IPA SUSPEND
interrupt.  We should *not* wake up on GSI interrupts.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/gsi.c | 17 -
 drivers/net/ipa/gsi.h |  1 -
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ipa/gsi.c b/drivers/net/ipa/gsi.c
index 0e63d35320aaf..cb75f7d540571 100644
--- a/drivers/net/ipa/gsi.c
+++ b/drivers/net/ipa/gsi.c
@@ -1987,31 +1987,26 @@ int gsi_init(struct gsi *gsi, struct platform_device 
*pdev, bool prefetch,
}
gsi->irq = irq;
 
-   ret = enable_irq_wake(gsi->irq);
-   if (ret)
-   dev_warn(dev, "error %d enabling gsi wake irq\n", ret);
-   gsi->irq_wake_enabled = !ret;
-
/* Get GSI memory range and map it */
res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "gsi");
if (!res) {
dev_err(dev, "DT error getting \"gsi\" memory property\n");
ret = -ENODEV;
-   goto err_disable_irq_wake;
+   goto err_free_irq;
}
 
size = resource_size(res);
if (res->start > U32_MAX || size > U32_MAX - res->start) {
dev_err(dev, "DT memory resource \"gsi\" out of range\n");
ret = -EINVAL;
-   goto err_disable_irq_wake;
+   goto err_free_irq;
}
 
gsi->virt = ioremap(res->start, size);
if (!gsi->virt) {
dev_err(dev, "unable to remap \"gsi\" memory\n");
ret = -ENOMEM;
-   goto err_disable_irq_wake;
+   goto err_free_irq;
}
 
ret = gsi_channel_init(gsi, prefetch, count, data, modem_alloc);
@@ -2025,9 +2020,7 @@ int gsi_init(struct gsi *gsi, struct platform_device 
*pdev, bool prefetch,
 
 err_iounmap:
iounmap(gsi->virt);
-err_disable_irq_wake:
-   if (gsi->irq_wake_enabled)
-   (void)disable_irq_wake(gsi->irq);
+err_free_irq:
free_irq(gsi->irq, gsi);
 
return ret;
@@ -2038,8 +2031,6 @@ void gsi_exit(struct gsi *gsi)
 {
mutex_destroy(>mutex);
gsi_channel_exit(gsi);
-   if (gsi->irq_wake_enabled)
-   (void)disable_irq_wake(gsi->irq);
free_irq(gsi->irq, gsi);
iounmap(gsi->virt);
 }
diff --git a/drivers/net/ipa/gsi.h b/drivers/net/ipa/gsi.h
index 061312773df09..3f9f29d531c43 100644
--- a/drivers/net/ipa/gsi.h
+++ b/drivers/net/ipa/gsi.h
@@ -150,7 +150,6 @@ struct gsi {
struct net_device dummy_dev;/* needed for NAPI */
void __iomem *virt;
u32 irq;
-   bool irq_wake_enabled;
u32 channel_count;
u32 evt_ring_count;
struct gsi_channel channel[GSI_CHANNEL_COUNT_MAX];
-- 
2.20.1

[PATCH net-next v2 5/7] net: ipa: use device_init_wakeup()

2020-09-11 Thread Alex Elder

The call to wakeup_source_register() in ipa_probe() does not do what
it was intended to do.  Call device_init_wakeup() in ipa_setup()
instead, to set the IPA device as wakeup-capable and to initially
enable wakeup capability.

When we receive a SUSPEND interrupt, call pm_wakeup_dev_event()
with a zero processing time, to simply call for a resume without
any other processing.  The ipa_resume() call will take care of
waking things up again, and will handle receiving the packet.

Signed-off-by: Alex Elder 
---
 drivers/net/ipa/ipa.h  |  2 --
 drivers/net/ipa/ipa_main.c | 43 --
 2 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ipa/ipa.h b/drivers/net/ipa/ipa.h
index e02fe979b645b..c688155ccf375 100644
--- a/drivers/net/ipa/ipa.h
+++ b/drivers/net/ipa/ipa.h
@@ -114,8 +114,6 @@ struct ipa {
void *zero_virt;
size_t zero_size;
 
-   struct wakeup_source *wakeup_source;
-
/* Bit masks indicating endpoint state */
u32 available;  /* supported by hardware */
u32 filter_map;
diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index 3b68b53c99015..5e714d9d2e5cb 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -75,18 +75,19 @@
  * @ipa:   IPA pointer
  * @irq_id:IPA interrupt type (unused)
  *
- * When in suspended state, the IPA can trigger a resume by sending a SUSPEND
- * IPA interrupt.
+ * If an RX endpoint is in suspend state, and the IPA has a packet
+ * destined for that endpoint, the IPA generates a SUSPEND interrupt
+ * to inform the AP that it should resume the endpoint.  If we get
+ * one of these interrupts we just resume everything.
  */
 static void ipa_suspend_handler(struct ipa *ipa, enum ipa_irq_id irq_id)
 {
-   /* Take a a single clock reference to prevent suspend.  All
-* endpoints will be resumed as a result.  This reference will
-* be dropped when we get a power management suspend request.
-* The first call activates the clock; ignore any others.
+   /* Just report the event, and let system resume handle the rest.
+* More than one endpoint could signal this; if so, ignore
+* all but the first.
 */
if (!test_and_set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
-   ipa_clock_get(ipa);
+   pm_wakeup_dev_event(>pdev->dev, 0, true);
 
/* Acknowledge/clear the suspend interrupt on all endpoints */
ipa_interrupt_suspend_clear_all(ipa->interrupt);
@@ -107,6 +108,7 @@ int ipa_setup(struct ipa *ipa)
 {
struct ipa_endpoint *exception_endpoint;
struct ipa_endpoint *command_endpoint;
+   struct device *dev = >pdev->dev;
int ret;
 
/* Setup for IPA v3.5.1 has some slight differences */
@@ -124,6 +126,10 @@ int ipa_setup(struct ipa *ipa)
 
ipa_uc_setup(ipa);
 
+   ret = device_init_wakeup(dev, true);
+   if (ret)
+   goto err_uc_teardown;
+
ipa_endpoint_setup(ipa);
 
/* We need to use the AP command TX endpoint to perform other
@@ -159,7 +165,7 @@ int ipa_setup(struct ipa *ipa)
 
ipa->setup_complete = true;
 
-   dev_info(>pdev->dev, "IPA driver setup completed successfully\n");
+   dev_info(dev, "IPA driver setup completed successfully\n");
 
return 0;
 
@@ -174,6 +180,8 @@ int ipa_setup(struct ipa *ipa)
ipa_endpoint_disable_one(command_endpoint);
 err_endpoint_teardown:
ipa_endpoint_teardown(ipa);
+   (void)device_init_wakeup(dev, false);
+err_uc_teardown:
ipa_uc_teardown(ipa);
ipa_interrupt_remove(ipa->interrupt, IPA_IRQ_TX_SUSPEND);
ipa_interrupt_teardown(ipa->interrupt);
@@ -201,6 +209,7 @@ static void ipa_teardown(struct ipa *ipa)
command_endpoint = ipa->name_map[IPA_ENDPOINT_AP_COMMAND_TX];
ipa_endpoint_disable_one(command_endpoint);
ipa_endpoint_teardown(ipa);
+   (void)device_init_wakeup(>pdev->dev, false);
ipa_uc_teardown(ipa);
ipa_interrupt_remove(ipa->interrupt, IPA_IRQ_TX_SUSPEND);
ipa_interrupt_teardown(ipa->interrupt);
@@ -715,7 +724,6 @@ static void ipa_validate_build(void)
  */
 static int ipa_probe(struct platform_device *pdev)
 {
-   struct wakeup_source *wakeup_source;
struct device *dev = >dev;
const struct ipa_data *data;
struct ipa_clock *clock;
@@ -764,19 +772,11 @@ static int ipa_probe(struct platform_device *pdev)
goto err_clock_exit;
}
 
-   /* Create a wakeup source. */
-   wakeup_source = wakeup_source_register(dev, "ipa");
-   if (!wakeup_source) {
-   /* The most likely reason for failure is memory exhaustion */
-   ret = -ENOMEM;
-   goto err_clock_exit;
-   }
-
/* Allocate and initialize the IPA structure */
ipa = kzalloc(sizeof(*ipa), GFP_KERNEL);
if (!ipa) {
ret = -ENOMEM;
-

[PATCH net-next v2 3/7] net: ipa: verify reference flag values

2020-09-11 Thread Alex Elder

We take a single IPA clock reference to keep the clock running until
we get a system suspend operation, and maintain a flag indicating
whether that reference has been taken.  When a suspend request
arrives, we drop that reference and clear the flag.

In most places we simply set or clear the extra-reference flag.
Instead--primarily to catch coding errors--test the previous value
of the flag and report an error in the event the previous value is
unexpected.  And if the clock reference is already taken, don't take
another.

In a couple of cases it's pretty clear atomic access is not
necessary and an error should never be reported.  Report these
anyway, conveying our surprise with an added exclamation point.

Signed-off-by: Alex Elder 
---
v2: Updated to operate on a bitmap bit rather than an atomic_t.

 drivers/net/ipa/ipa_main.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index 409375b96eb8f..cfdf60ded86ca 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -83,6 +83,7 @@ static void ipa_suspend_handler(struct ipa *ipa, enum 
ipa_irq_id irq_id)
/* Take a a single clock reference to prevent suspend.  All
 * endpoints will be resumed as a result.  This reference will
 * be dropped when we get a power management suspend request.
+* The first call activates the clock; ignore any others.
 */
if (!test_and_set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
ipa_clock_get(ipa);
@@ -502,14 +503,17 @@ static void ipa_resource_deconfig(struct ipa *ipa)
  */
 static int ipa_config(struct ipa *ipa, const struct ipa_data *data)
 {
+   struct device *dev = >pdev->dev;
int ret;
 
/* Get a clock reference to allow initialization.  This reference
 * is held after initialization completes, and won't get dropped
 * unless/until a system suspend request arrives.
 */
-   __set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
-   ipa_clock_get(ipa);
+   if (!__test_and_set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
+   ipa_clock_get(ipa);
+   else
+   dev_err(dev, "suspend clock reference already taken!\n");
 
ipa_hardware_config(ipa);
 
@@ -544,7 +548,8 @@ static int ipa_config(struct ipa *ipa, const struct 
ipa_data *data)
 err_hardware_deconfig:
ipa_hardware_deconfig(ipa);
ipa_clock_put(ipa);
-   __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
+   if (!__test_and_clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
+   dev_err(dev, "suspend clock reference already dropped!\n");
 
return ret;
 }
@@ -562,7 +567,8 @@ static void ipa_deconfig(struct ipa *ipa)
ipa_endpoint_deconfig(ipa);
ipa_hardware_deconfig(ipa);
ipa_clock_put(ipa);
-   __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
+   if (!test_and_clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
+   dev_err(>pdev->dev, "no suspend clock reference\n");
 }
 
 static int ipa_firmware_load(struct device *dev)
@@ -913,7 +919,8 @@ static int ipa_suspend(struct device *dev)
struct ipa *ipa = dev_get_drvdata(dev);
 
ipa_clock_put(ipa);
-   __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
+   if (!test_and_clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
+   dev_err(dev, "suspend: missing suspend clock reference\n");
 
return 0;
 }
@@ -933,8 +940,10 @@ static int ipa_resume(struct device *dev)
/* This clock reference will keep the IPA out of suspend
 * until we get a power management suspend request.
 */
-   __set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
-   ipa_clock_get(ipa);
+   if (!test_and_set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
+   ipa_clock_get(ipa);
+   else
+   dev_err(dev, "resume: duplicate suspend clock reference\n");
 
return 0;
 }
-- 
2.20.1

[PATCH net-next v2 1/7] net: ipa: use refcount_t for IPA clock reference count

2020-09-11 Thread Alex Elder

Take advantage of the checking provided by refcount_t, rather than
using a plain atomic to represent the IPA clock reference count.

Note that we need to *set* the value to 1 in ipa_clock_get() rather
than incrementing it from 0 (because doing that is considered an
error for a refcount_t).

Signed-off-by: Alex Elder 
---
v2: This patch is new in version 2 of the series.

 drivers/net/ipa/ipa_clock.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ipa/ipa_clock.c b/drivers/net/ipa/ipa_clock.c
index 398f2e47043d8..b703866f2e20b 100644
--- a/drivers/net/ipa/ipa_clock.c
+++ b/drivers/net/ipa/ipa_clock.c
@@ -4,7 +4,7 @@
  * Copyright (C) 2018-2020 Linaro Ltd.
  */
 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -51,7 +51,7 @@
  * @config_path:   Configuration space interconnect
  */
 struct ipa_clock {
-   atomic_t count;
+   refcount_t count;
struct mutex mutex; /* protects clock enable/disable */
struct clk *core;
struct icc_path *memory_path;
@@ -195,7 +195,7 @@ static void ipa_clock_disable(struct ipa *ipa)
  */
 bool ipa_clock_get_additional(struct ipa *ipa)
 {
-   return !!atomic_inc_not_zero(>clock->count);
+   return refcount_inc_not_zero(>clock->count);
 }
 
 /* Get an IPA clock reference.  If the reference count is non-zero, it is
@@ -231,7 +231,7 @@ void ipa_clock_get(struct ipa *ipa)
 
ipa_endpoint_resume(ipa);
 
-   atomic_inc(>count);
+   refcount_set(>count, 1);
 
 out_mutex_unlock:
mutex_unlock(>mutex);
@@ -246,7 +246,7 @@ void ipa_clock_put(struct ipa *ipa)
struct ipa_clock *clock = ipa->clock;
 
/* If this is not the last reference there's nothing more to do */
-   if (!atomic_dec_and_mutex_lock(>count, >mutex))
+   if (!refcount_dec_and_mutex_lock(>count, >mutex))
return;
 
ipa_endpoint_suspend(ipa);
@@ -294,7 +294,7 @@ struct ipa_clock *ipa_clock_init(struct device *dev)
goto err_kfree;
 
mutex_init(>mutex);
-   atomic_set(>count, 0);
+   refcount_set(>count, 0);
 
return clock;
 
@@ -311,7 +311,7 @@ void ipa_clock_exit(struct ipa_clock *clock)
 {
struct clk *clk = clock->core;
 
-   WARN_ON(atomic_read(>count) != 0);
+   WARN_ON(refcount_read(>count) != 0);
mutex_destroy(>mutex);
ipa_interconnect_exit(clock);
kfree(clock);
-- 
2.20.1

[PATCH net-next v2 2/7] net: ipa: replace ipa->suspend_ref with a flag bit

2020-09-11 Thread Alex Elder

For suspend/resume, we currently take an extra clock reference to
prevent the IPA clock from being shut down until a power management
suspend request arrives.  An atomic field in the IPA structure is
used to indicate whether this reference has been taken.

Instead, introduce a new flags bitmap in the IPA structure, and use
a single bit in that bitmap rather than the atomic to indicate
whether we have taken the special IPA clock reference.

Signed-off-by: Alex Elder 
---
v2: New patch to use a bitmap bit rather than an atomic_t.

 drivers/net/ipa/ipa.h  | 14 --
 drivers/net/ipa/ipa_main.c | 14 +++---
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ipa/ipa.h b/drivers/net/ipa/ipa.h
index 407fee841a9a8..e02fe979b645b 100644
--- a/drivers/net/ipa/ipa.h
+++ b/drivers/net/ipa/ipa.h
@@ -27,15 +27,25 @@ struct ipa_clock;
 struct ipa_smp2p;
 struct ipa_interrupt;
 
+/**
+ * enum ipa_flag - IPA state flags
+ * @IPA_FLAG_CLOCK_HELD:   Whether IPA clock is held to prevent suspend
+ * @IPA_FLAG_COUNT:Number of defined IPA flags
+ */
+enum ipa_flag {
+   IPA_FLAG_CLOCK_HELD,
+   IPA_FLAG_COUNT, /* Last; not a flag */
+};
+
 /**
  * struct ipa - IPA information
  * @gsi:   Embedded GSI structure
+ * @flags: Boolean state flags
  * @version:   IPA hardware version
  * @pdev:  Platform device
  * @modem_rproc:   Remoteproc handle for modem subsystem
  * @smp2p: SMP2P information
  * @clock: IPA clocking information
- * @suspend_ref:   Whether clock reference preventing suspend taken
  * @table_addr:DMA address of filter/route table content
  * @table_virt:Virtual address of filter/route table content
  * @interrupt: IPA Interrupt information
@@ -70,6 +80,7 @@ struct ipa_interrupt;
  */
 struct ipa {
struct gsi gsi;
+   DECLARE_BITMAP(flags, IPA_FLAG_COUNT);
enum ipa_version version;
struct platform_device *pdev;
struct rproc *modem_rproc;
@@ -77,7 +88,6 @@ struct ipa {
void *notifier;
struct ipa_smp2p *smp2p;
struct ipa_clock *clock;
-   atomic_t suspend_ref;
 
dma_addr_t table_addr;
__le64 *table_virt;
diff --git a/drivers/net/ipa/ipa_main.c b/drivers/net/ipa/ipa_main.c
index 1fdfec41e4421..409375b96eb8f 100644
--- a/drivers/net/ipa/ipa_main.c
+++ b/drivers/net/ipa/ipa_main.c
@@ -84,7 +84,7 @@ static void ipa_suspend_handler(struct ipa *ipa, enum 
ipa_irq_id irq_id)
 * endpoints will be resumed as a result.  This reference will
 * be dropped when we get a power management suspend request.
 */
-   if (!atomic_xchg(>suspend_ref, 1))
+   if (!test_and_set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags))
ipa_clock_get(ipa);
 
/* Acknowledge/clear the suspend interrupt on all endpoints */
@@ -508,7 +508,7 @@ static int ipa_config(struct ipa *ipa, const struct 
ipa_data *data)
 * is held after initialization completes, and won't get dropped
 * unless/until a system suspend request arrives.
 */
-   atomic_set(>suspend_ref, 1);
+   __set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
ipa_clock_get(ipa);
 
ipa_hardware_config(ipa);
@@ -544,7 +544,7 @@ static int ipa_config(struct ipa *ipa, const struct 
ipa_data *data)
 err_hardware_deconfig:
ipa_hardware_deconfig(ipa);
ipa_clock_put(ipa);
-   atomic_set(>suspend_ref, 0);
+   __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
 
return ret;
 }
@@ -562,7 +562,7 @@ static void ipa_deconfig(struct ipa *ipa)
ipa_endpoint_deconfig(ipa);
ipa_hardware_deconfig(ipa);
ipa_clock_put(ipa);
-   atomic_set(>suspend_ref, 0);
+   __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
 }
 
 static int ipa_firmware_load(struct device *dev)
@@ -777,7 +777,7 @@ static int ipa_probe(struct platform_device *pdev)
dev_set_drvdata(dev, ipa);
ipa->modem_rproc = rproc;
ipa->clock = clock;
-   atomic_set(>suspend_ref, 0);
+   __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
ipa->wakeup_source = wakeup_source;
ipa->version = data->version;
 
@@ -913,7 +913,7 @@ static int ipa_suspend(struct device *dev)
struct ipa *ipa = dev_get_drvdata(dev);
 
ipa_clock_put(ipa);
-   atomic_set(>suspend_ref, 0);
+   __clear_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
 
return 0;
 }
@@ -933,7 +933,7 @@ static int ipa_resume(struct device *dev)
/* This clock reference will keep the IPA out of suspend
 * until we get a power management suspend request.
 */
-   atomic_set(>suspend_ref, 1);
+   __set_bit(IPA_FLAG_CLOCK_HELD, ipa->flags);
ipa_clock_get(ipa);
 
return 0;
-- 
2.20.1

Re: [PATCH] drm: xlnx: remove defined but not used 'scaling_factors_666'

2020-09-11 Thread Hyun Kwon

On Fri, Sep 11, 2020 at 09:27:08AM -0700, Hyun Kwon wrote:
> Hi Daniel,
> 
> On Fri, Sep 11, 2020 at 01:15:19AM -0700, Daniel Vetter wrote:
> > On Thu, Sep 10, 2020 at 11:14:18AM -0700, Hyun Kwon wrote:
> > > Hi Jason,
> > > 
> > > On Thu, Sep 10, 2020 at 07:06:30AM -0700, Jason Yan wrote:
> > > > This addresses the following gcc warning with "make W=1":
> > > > 
> > > > drivers/gpu/drm/xlnx/zynqmp_disp.c:245:18: warning:
> > > > ‘scaling_factors_666’ defined but not used [-Wunused-const-variable=]
> > > >   245 | static const u32 scaling_factors_666[] = {
> > > >   |  ^~~
> > > > 
> > > > Reported-by: Hulk Robot 
> > > > Signed-off-by: Jason Yan 
> > > 
> > > Reviewed-by: Hyun Kwon 
> > 
> > I think you're the maintainer, so please also push patches to
> > drm-misc-next. Otherwise they'll just get lost, or at least it's very
> > confusing when a maintainer reviews a patch but there's no indication what
> > will happen with the patch.
> 
> Right. I wanted to give it some time before pushing. I'll clearly state going
> forward.
> 

Pushed to drm-misc/drm-misc-next.

Thanks,
-hyun

> Thanks,
> -hyun
> 
> > -Daniel
> > 
> > > 
> > > Thanks!
> > > 
> > > -hyun
> > > 
> > > > ---
> > > >  drivers/gpu/drm/xlnx/zynqmp_disp.c | 6 --
> > > >  1 file changed, 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xlnx/zynqmp_disp.c 
> > > > b/drivers/gpu/drm/xlnx/zynqmp_disp.c
> > > > index a455cfc1bee5..98bd48f13fd1 100644
> > > > --- a/drivers/gpu/drm/xlnx/zynqmp_disp.c
> > > > +++ b/drivers/gpu/drm/xlnx/zynqmp_disp.c
> > > > @@ -242,12 +242,6 @@ static const u32 scaling_factors_565[] = {
> > > > ZYNQMP_DISP_AV_BUF_5BIT_SF,
> > > >  };
> > > >  
> > > > -static const u32 scaling_factors_666[] = {
> > > > -   ZYNQMP_DISP_AV_BUF_6BIT_SF,
> > > > -   ZYNQMP_DISP_AV_BUF_6BIT_SF,
> > > > -   ZYNQMP_DISP_AV_BUF_6BIT_SF,
> > > > -};
> > > > -
> > > >  static const u32 scaling_factors_888[] = {
> > > > ZYNQMP_DISP_AV_BUF_8BIT_SF,
> > > > ZYNQMP_DISP_AV_BUF_8BIT_SF,
> > > > -- 
> > > > 2.25.4
> > > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev"

2020-09-11 Thread David Miller

From: Geert Uytterhoeven 
Date: Fri, 11 Sep 2020 08:32:55 +0200

> Hi David,
> 
> On Thu, Sep 10, 2020 at 9:20 PM David Miller  wrote:
>> From: Geert Uytterhoeven 
>> Date: Tue,  1 Sep 2020 17:02:37 +0200
>>
>> > This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
>> >
>> > Inami-san reported that this commit breaks bridge support in a Xen
>> > environment, and that reverting it fixes this.
>> >
>> > During system resume, bridge ports are no longer enabled, as that relies
>> > on the receipt of the NETDEV_CHANGE notification.  This notification is
>> > not sent, as netdev_state_change() is no longer called.
>> >
>> > Note that the condition this commit intended to fix never existed
>> > upstream, as the patch triggering it and referenced in the commit was
>> > never applied upstream.  Hence I can confirm s2ram on r8a73a4/ape6evm
>> > and sh73a0/kzm9g works fine before/after this revert.
>> >
>> > Reported-by Gaku Inami 
>> > Signed-off-by: Geert Uytterhoeven 
>>
>> Maybe you cannot reproduce it, but the problem is there and it still
>> looks very real to me.
>>
>> netdev_state_change() does two things:
>>
>> 1) Emit the NETDEV_CHANGE notification
>>
>> 2) Emit an rtmsg_ifinfo() netlink message, which in turn tries to access
>>the device statistics via ->ndo_get_stats*().
>>
>> It is absolutely wrong to do #2 when netif_device_present() is false.
>>
>> So I cannot apply this patch as-is, sorry.
> 
> Thanks a lot for looking into this!
> 
> But doing #1 is still safe?  That is the part that calls into the bridge
> code.  So would moving the netif_device_present() check from
> linkwatch_do_dev() to netdev_state_change(), to prevent doing #2, be
> acceptable?

I have a better question.  Why is a software device like the bridge,
that wants to effectively exist and still receive netdev event
notifications, marking itself as not present?

That's seems like the real bug here.

Re: [PATCH v3 net-next] net: phy: mchp: Add support for LAN8814 QUAD PHY

2020-09-11 Thread David Miller

From: Divya Koppera 
Date: Fri, 11 Sep 2020 18:48:44 +0530

> LAN8814 is a low-power, quad-port triple-speed (10BASE-T/100BASETX/1000BASE-T)
> Ethernet physical layer transceiver (PHY). It supports transmission and
> reception of data on standard CAT-5, as well as CAT-5e and CAT-6, unshielded
> twisted pair (UTP) cables.
> 
> LAN8814 supports industry-standard QSGMII (Quad Serial Gigabit Media
> Independent Interface) and Q-USGMII (Quad Universal Serial Gigabit Media
> Independent Interface) providing chip-to-chip connection to four Gigabit
> Ethernet MACs using a single serialized link (differential pair) in each
> direction.
> 
> The LAN8814 SKU supports high-accuracy timestamping functions to
> support IEEE-1588 solutions using Microchip Ethernet switches, as well as
> customer solutions based on SoCs and FPGAs.
> 
> The LAN8804 SKU has same features as that of LAN8814 SKU except that it does
> not support 1588, SyncE, or Q-USGMII with PCH/MCH.
> 
> This adds support for 10BASE-T, 100BASE-TX, and 1000BASE-T,
> QSGMII link with the MAC.
> 
> Signed-off-by: Divya Koppera

Applied, thanks.

Re: [PATCH] net: ethernet: ti: cpsw_new: fix suspend/resume

2020-09-11 Thread David Miller

From: Grygorii Strashko 
Date: Thu, 10 Sep 2020 23:52:29 +0300

> Add missed suspend/resume callbacks to properly restore networking after
> suspend/resume cycle.
> 
> Fixes: ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based 
> driver part 1 - dual-emac")
> Signed-off-by: Grygorii Strashko 

Applied and queued up for -stable, thanks.

Re: [PATCH net-next v3 0/9] net: ethernet: ti: ale: add static configuration

2020-09-11 Thread David Miller

From: Grygorii Strashko 
Date: Thu, 10 Sep 2020 23:27:58 +0300

> As existing, as newly introduced CPSW ALE versions have differences in 
> supported features and ALE table formats. Especially it's actual for the
> recent AM65x/J721E/J7200 and future AM64x SoCs, which supports more
> features like: auto-aging, classifiers, Link aggregation, additional HW
> filtering, etc.
> 
> The existing ALE configuration interface is not practical in terms of 
> adding new features and requires consumers to program a lot static
> parameters. And any attempt to add new features will case endless adding
> and maintaining different combination of flags and options. Because CPSW
> ALE configuration is static and fixed for SoC (or set of SoC), It is
> reasonable to add support for static ALE configurations inside ALE module.
> 
> This series introduces static ALE configuration table for different ALE 
> variants and provides option for consumers to select required ALE
> configuration by providing ALE const char *dev_id identifier (Patch 2).
> And all existing driver have been switched to use new approach (Patches 3-6).
> 
> After this ALE HW auto-ageing feature can be enabled for AM65x CPSW ALE 
> variant (Patch 7).
> 
> Finally, Patches 8-9 introduces tables to describe the ALE VLAN entries 
> fields as the ALE VLAN entries are too much differ between different TI
> CPSW ALE versions. So, handling them using flags, defines and get/set
> functions are became over-complicated.
 ...

Series applied, thank you.

Re: [RFC PATCH v9 0/3] Add introspect_access(2) (was O_MAYEXEC)

2020-09-11 Thread James Morris

On Thu, 10 Sep 2020, Matthew Wilcox wrote:

> On Thu, Sep 10, 2020 at 08:38:21PM +0200, Mickaël Salaün wrote:
> > There is also the use case of noexec mounts and file permissions. From
> > user space point of view, it doesn't matter which kernel component is in
> > charge of defining the policy. The syscall should then not be tied with
> > a verification/integrity/signature/appraisal vocabulary, but simply an
> > access control one.
> 
> permission()?
> 

The caller is not asking the kernel to grant permission, it's asking 
"SHOULD I access this file?"

The caller doesn't know, for example, if the script file it's about to 
execute has been signed, or if it's from a noexec mount. It's asking the 
kernel, which does know. (Note that this could also be extended to reading 
configuration files).

How about: should_faccessat ?

-- 
James Morris

Re: [PATCH net] net: ipa: fix u32_replace_bits by u32p_xxx version

2020-09-11 Thread David Miller

From: Vadym Kochan 
Date: Thu, 10 Sep 2020 18:41:52 +0300

> Looks like u32p_replace_bits() should be used instead of
> u32_replace_bits() which does not modifies the value but returns the
> modified version.
> 
> Fixes: 2b9feef2b6c2 ("soc: qcom: ipa: filter and routing tables")
> Signed-off-by: Vadym Kochan 
> Reviewed-by: Alex Elder 

Applied and queued up for -stable, thank you.

[ANNOUNCE] 4.9.235-rt153

2020-09-11 Thread Clark Williams

Hello RT-list!

I'm pleased to announce the 4.9.235-rt153 stable release.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.9-rt
  Head SHA1: 0e7258df4e13bd29c182837d9b642b2ad7868847

Or to build 4.9.235-rt153 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz

  https://www.kernel.org/pub/linux/kernel/v4.x/patch-4.9.235.xz

  
https://www.kernel.org/pub/linux/kernel/projects/rt/4.9/patch-4.9.235-rt153.patch.xz


You can also build from 4.9.234-rt152 by applying the incremental patch:

  
https://www.kernel.org/pub/linux/kernel/projects/rt/4.9/incr/patch-4.9.234-rt152-rt153.patch.xz

Enjoy!
Clark

Re: [PATCH net v1] hinic: fix rewaking txq after netif_tx_disable

2020-09-11 Thread David Miller

From: Luo bin 
Date: Thu, 10 Sep 2020 22:04:40 +0800

> When calling hinic_close in hinic_set_channels, all queues are
> stopped after netif_tx_disable, but some queue may be rewaken in
> free_tx_poll by mistake while drv is handling tx irq. If one queue
> is rewaken core may call hinic_xmit_frame to send pkt after
> netif_tx_disable within a short time which may results in accessing
> memory that has been already freed in hinic_close. So we call
> napi_disable before netif_tx_disable in hinic_close to fix this bug.
> 
> Fixes: 2eed5a8b614b ("hinic: add set_channels ethtool_ops support")
> Signed-off-by: Luo bin 
> ---
> V0~V1:
> - call napi_disable before netif_tx_disable instead of judging whether
>   the netdev is in down state before waking txq in free_tx_poll to fix
>   this bug

Applied and queued up for -stable, thank you.

[PATCH] RISC-V: Consider sparse memory while removing unusable memory

2020-09-11 Thread Atish Patra

Currently, any usable memory area beyond page_offset is removed by adding the
memory sizes from each memblock. That may not work for sparse memory
as memory regions can be very far apart resulting incorrect removal of some
usable memory.

Just use the start of the first memory block and the end of the last memory
block to compute the size of the total memory that can be used.

Signed-off-by: Atish Patra 
---
 arch/riscv/mm/init.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 787c75f751a5..188281fc2816 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -147,7 +147,6 @@ void __init setup_bootmem(void)
 {
struct memblock_region *reg;
phys_addr_t mem_size = 0;
-   phys_addr_t total_mem = 0;
phys_addr_t mem_start, end = 0;
phys_addr_t vmlinux_end = __pa_symbol(&_end);
phys_addr_t vmlinux_start = __pa_symbol(&_start);
@@ -155,18 +154,17 @@ void __init setup_bootmem(void)
/* Find the memory region containing the kernel */
for_each_memblock(memory, reg) {
end = reg->base + reg->size;
-   if (!total_mem)
+   if (!mem_start)
mem_start = reg->base;
if (reg->base <= vmlinux_start && vmlinux_end <= end)
BUG_ON(reg->size == 0);
-   total_mem = total_mem + reg->size;
}
 
/*
 * Remove memblock from the end of usable area to the
 * end of region
 */
-   mem_size = min(total_mem, (phys_addr_t)-PAGE_OFFSET);
+   mem_size = min(end - mem_start, (phys_addr_t)-PAGE_OFFSET);
if (mem_start + mem_size < end)
memblock_remove(mem_start + mem_size,
end - mem_start - mem_size);
-- 
2.24.0

[GIT PULL] seccomp fixes for v5.9-rc5

2020-09-11 Thread Kees Cook

Hi Linus,

Please pull these seccomp fixes for v5.9-rc5. This fixes a rare race
condition in seccomp when using TSYNC and USER_NOTIF together where a
memory allocation would not get freed (found by syzkaller, fixed by
Tycho). Additionally updates Tycho's MAINTAINERS and .mailmap entries
for his new address.

Thanks!

-Kees

The following changes since commit d012a7190fc1fd72ed48911e77ca97ba4521bccd:

  Linux 5.9-rc2 (2020-08-23 14:08:43 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
tags/seccomp-v5.9-rc5

for you to fetch changes up to e839317900e9f13c83d8711d684de88c625b307a:

  seccomp: don't leave dangling ->notif if file allocation fails (2020-09-08 
11:30:16 -0700)


seccomp fixes for v5.9-rc5

- Fix memory resource leak of user_notif under TSYNC race (Tycho Andersen)


Tycho Andersen (3):
  seccomp: don't leak memory when filter install races
  mailmap, MAINTAINERS: move to tycho.pizza
  seccomp: don't leave dangling ->notif if file allocation fails

 .mailmap |  1 +
 MAINTAINERS  |  2 +-
 kernel/seccomp.c | 24 ++--
 3 files changed, 20 insertions(+), 7 deletions(-)

-- 
Kees Cook

Re: [RFC PATCH v8 0/3] Add support for AT_INTERPRETED (was O_MAYEXEC)

2020-09-11 Thread James Morris

On Wed, 9 Sep 2020, Al Viro wrote:

> On Wed, Sep 09, 2020 at 09:19:11AM +0200, Mickaël Salaün wrote:
> > 
> > On 08/09/2020 20:50, Al Viro wrote:
> > > On Tue, Sep 08, 2020 at 09:59:53AM +0200, Mickaël Salaün wrote:
> > >> Hi,
> > >>
> > >> This height patch series rework the previous O_MAYEXEC series by not
> > >> adding a new flag to openat2(2) but to faccessat2(2) instead.  As
> > >> suggested, this enables to perform the access check on a file descriptor
> > >> instead of on a file path (while opening it).  This may require two
> > >> checks (one on open and then with faccessat2) but it is a more generic
> > >> approach [8].
> > > 
> > > Again, why is that folded into lookup/open/whatnot, rather than being
> > > an operation applied to a file (e.g. O_PATH one)?
> > > 
> > 
> > I don't understand your question. AT_INTERPRETED can and should be used
> > with AT_EMPTY_PATH. The two checks I wrote about was for IMA.
> 
> Once more, with feeling: don't hide that behind existing syscalls.
> If you want to tell LSM have a look at given fs object in a special
> way, *add* *a* *new* *system* *call* *for* *doing* *just* *that*.

It's not just for LSM, though, and it has identical semantics from the 
caller's POV as faccessat().



-- 
James Morris

Re: [PATCH net-next] octeontx2-af: Constify npc_kpu_profile_{action,cam}

2020-09-11 Thread David Miller

From: Rikard Falkeborn 
Date: Sat, 12 Sep 2020 00:00:15 +0200

> These are never modified, so constify them to allow the compiler to
> place them in read-only memory. This moves about 25kB to read-only
> memory as seen by the output of the size command.
> 
> Before:
>textdata bss dec hex filename
>  296203   654641248  362915   589a3 
> drivers/net/ethernet/marvell/octeontx2/af/octeontx2_af.ko
> 
> After:
>textdata bss dec hex filename
>  321003   406641248  362915   589a3 
> drivers/net/ethernet/marvell/octeontx2/af/octeontx2_af.ko
> 
> Signed-off-by: Rikard Falkeborn 

Applied, thank you.

Re: [REGRESSION] x86/entry: Tracer no longer has opportunity to change the syscall number at entry via orig_ax

2020-09-11 Thread Kees Cook

On Wed, Sep 09, 2020 at 11:53:42PM +1000, Michael Ellerman wrote:
> I can observe the difference between v5.8 and mainline, using the
> raw_syscall trace event and running the seccomp_bpf selftest which turns
> a getpid (39) into a getppid (110).
> 
> With v5.8 we see getppid on entry and exit:
> 
>  seccomp_bpf-1307  [000]  22974.874393: sys_enter: NR 110 
> (722c46e0, 40a350, 4, f7ab, 7fa6ee0d4010, 0)
>  seccomp_bpf-1307  [000] .N.. 22974.874401: sys_exit: NR 110 = 1304
> 
> Whereas on mainline we see an enter for getpid and an exit for getppid:
> 
>  seccomp_bpf-1030  [000] 21.806766: sys_enter: NR 39 
> (7ffe2f6d1ad0, 40a350, 7ffe2f6d1ad0, 0, 0, 407299)
>  seccomp_bpf-1030  [000] 21.806767: sys_exit: NR 110 = 1027

For my own notes, this is how I reproduced it:

# ./perf-$VER record -e raw_syscalls:sys_enter -e raw_syscalls:sys_exit &
# ./seccomp_bpf
# fg
ctrl-c
# ./perf-$VER script | grep seccomp_bpf | awk '{print $7}' | sort | uniq -c > 
$VER.log
*repeat*
# diff -u old.log new.log
...

(Is there an easier way to get those results?)

I will go see if I can figure out the best way to correct this.

-- 
Kees Cook

Re: [RESEND][RFC PATCH 0/6] Fork brute force attack mitigation (fbfam)

2020-09-11 Thread James Morris

On Thu, 10 Sep 2020, Kees Cook wrote:

> [kees: re-sending this series on behalf of John Wood 
>  also visible at https://github.com/johwood/linux fbfam]
> 
> From: John Wood 

Why are you resending this? The author of the code needs to be able to 
send and receive emails directly as part of development and maintenance.

-- 
James Morris

[ANNOUNCE] 4.14.197-rt95

2020-09-11 Thread Clark Williams

Hello RT-list!

I'm pleased to announce the 4.14.197-rt95 stable release.

In addition to the merge of the .196 and .197 stable release tags, this
release contains three RT specific fixes:

eba893980303 net: xfrm: fix compress vs decompress serialization
23d7ce6a6ca9 Bluetooth: Acquire sk_lock.slock without disabling interrupts
c0e17a81059e signal: Prevent double-free of user struct

Note the above sha1's are for the regular merge branch. The rebase branch
commits are:

59e53fae6d31 net: xfrm: fix compress vs decompress serialization
3eb9ffa69d28 Bluetooth: Acquire sk_lock.slock without disabling interrupts
89ac4fb20261 signal: Prevent double-free of user struct

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.14-rt
  Head SHA1: 4b77f0c11a53ef0ca870f3d7a05d3de62d3dfd0a

Or to build 4.14.197-rt95 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.14.tar.xz

  https://www.kernel.org/pub/linux/kernel/v4.x/patch-4.14.197.xz

  
https://www.kernel.org/pub/linux/kernel/projects/rt/4.14/patch-4.14.197-rt95.patch.xz


You can also build from 4.14.195-rt94 by applying the incremental patch:

  
https://www.kernel.org/pub/linux/kernel/projects/rt/4.14/incr/patch-4.14.195-rt94-rt95.patch.xz

Enjoy!
Clark

Re: [PATCH v2] zram: add restriction on dynamic zram device creation

2020-09-11 Thread Minchan Kim

Hi Yi,

On Fri, Sep 04, 2020 at 04:52:10PM +0800, Yi Wang wrote:
> From: zhanglin 
> 
> Add max_num_devices to limit dynamic zram device creation to prevent
>  potential OOM
> 
> Signed-off-by: zhanglin 
> Signed-off-by: Yi Wang 
> ---
> v1->v2:
> change hard-coded initial max_num_devices into configurable way.
> 
>  drivers/block/zram/Kconfig|  7 +++
>  drivers/block/zram/zram_drv.c | 28 +---
>  2 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
> index fe7a4b7d30cf..54a369932417 100644
> --- a/drivers/block/zram/Kconfig
> +++ b/drivers/block/zram/Kconfig
> @@ -37,3 +37,10 @@ config ZRAM_MEMORY_TRACKING
> /sys/kernel/debug/zram/zramX/block_state.
>  
> See Documentation/admin-guide/blockdev/zram.rst for more information.
> +
> +config ZRAM_DEV_MAX_COUNT
> + int "Number of zram devices to be created"
> + depends on ZRAM
> + default 256
> + help
> +   This option specifies the maximum number of zram devices.
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 36d49159140f..d1022f3c04c4 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -43,8 +43,9 @@ static DEFINE_MUTEX(zram_index_mutex);
>  static int zram_major;
>  static const char *default_compressor = "lzo-rle";
>  
> -/* Module params (documentation at end) */
>  static unsigned int num_devices = 1;
> +/* Module params (documentation at end) */
> +static unsigned int max_num_devices = CONFIG_ZRAM_DEV_MAX_COUNT;
>  /*
>   * Pages that compress to sizes equals or greater than this are stored
>   * uncompressed in memory.
> @@ -2013,10 +2014,16 @@ static ssize_t hot_add_show(struct class *class,
>   struct class_attribute *attr,
>   char *buf)
>  {
> - int ret;
> + int ret = -ENOSPC;
>  
>   mutex_lock(_index_mutex);
> + if (num_devices >= max_num_devices) {
> + mutex_unlock(_index_mutex);
> + return ret;
> + }
>   ret = zram_add();

> + if (ret >= 0)
> + num_devices += 1;

Let's have the num_devices inc/dec in zram_add/zram_remove so any caller
doesn't need to take care of it.

>   mutex_unlock(_index_mutex);
>  
>   if (ret < 0)
> @@ -2046,8 +2053,10 @@ static ssize_t hot_remove_store(struct class *class,
>   zram = idr_find(_index_idr, dev_id);
>   if (zram) {
>   ret = zram_remove(zram);
> - if (!ret)
> + if (!ret) {
>   idr_remove(_index_idr, dev_id);
> + num_devices -= 1;
> + }
>   } else {
>   ret = -ENODEV;
>   }
> @@ -2089,6 +2098,7 @@ static void destroy_devices(void)
>  static int __init zram_init(void)
>  {
>   int ret;
> + unsigned int i;
>  
>   ret = cpuhp_setup_state_multi(CPUHP_ZCOMP_PREPARE, "block/zram:prepare",
> zcomp_cpu_up_prepare, zcomp_cpu_dead);
> @@ -2111,13 +2121,17 @@ static int __init zram_init(void)
>   return -EBUSY;
>   }
>  
> - while (num_devices != 0) {
> + if (num_devices > max_num_devices) {
> + pr_err("Number of pre-created zram devices over limit\n");
> + goto out_error;
> + }
> +
> + for (i = 0; i < num_devices; i++) {
>   mutex_lock(_index_mutex);
>   ret = zram_add();
>   mutex_unlock(_index_mutex);
>   if (ret < 0)
>   goto out_error;
> - num_devices--;
>   }
>  
>   return 0;
> @@ -2135,8 +2149,8 @@ static void __exit zram_exit(void)
>  module_init(zram_init);
>  module_exit(zram_exit);
>  
> -module_param(num_devices, uint, 0);
> -MODULE_PARM_DESC(num_devices, "Number of pre-created zram devices");
> +module_param(max_num_devices, uint, 0);
> +MODULE_PARM_DESC(max_num_devices, "Max number of created zram devices");

How about this?
"Max number of zram devices to be created"

>  
>  MODULE_LICENSE("Dual BSD/GPL");
>  MODULE_AUTHOR("Nitin Gupta ");
> -- 
> 2.17.1
>

Re: slab-out-of-bounds in iov_iter_revert()

2020-09-11 Thread Al Viro

On Fri, Sep 11, 2020 at 05:59:04PM -0400, Qian Cai wrote:
> Super easy to reproduce on today's mainline by just fuzzing for a few minutes
> on virtiofs (if it ever matters). Any thoughts?

Usually happens when ->direct_IO() fucks up and reports the wrong amount
of data written/read.  We had several bugs like that in the past - see
e.g. 85128b2be673 (fix nfs O_DIRECT advancing iov_iter too much).

Had there been any recent O_DIRECT-related patches on the filesystems
involved?

Re: [PATCH net-next] drivers/net/wan/x25_asy: Remove an unused flag "SLF_OUTWAIT"

2020-09-11 Thread Xie He

On Fri, Sep 11, 2020 at 2:44 PM David Miller  wrote:
>
> From: Xie He 
> Date: Thu, 10 Sep 2020 23:35:03 -0700
>
> > The "SLF_OUTWAIT" flag defined in x25_asy.h is not actually used.
> > It is only cleared at one place in x25_asy.c but is never read or set.
> > So we can remove it.
> >
> > Signed-off-by: Xie He 
>
> Applied, it looks like this code wss based upon the slip.c code.

Oh! You are right! I can finally understand now why there are so many
things named "sl" in this file.

Re: [PATCH net-next] net/packet: Fix a comment about hard_header_len and add a warning for it

2020-09-11 Thread Xie He

On Fri, Sep 11, 2020 at 7:32 AM Willem de Bruijn
 wrote:
>
> From a quick scan, a few device types that might trigger this
>
> net/atm/clip.c
> drivers/net/wan/hdlc_fr.c
> drivers/net/appletalk/ipddp.c
> drivers/net/ppp/ppp_generic.c
> drivers/net/net_failover.c

I have recently fixed this problem in the "net" tree in hdlc_fr.c.

Glad to see the number of drivers that have this problem is not very big.

Re.Dear Friend

2020-09-11 Thread Andrew A. Page





--
Hello,

How are you once again. I have been having difficulty reaching you. I 
have an important business transaction to discuss with you.  There are 
funds available and ready for investment which we will need your 
assistance to invest. Do get back to me as soon as you can for more 
details.


Regards,

Andrew A. Page

Re: [PATCH net-next] net/socket.c: Remove an unused header file

2020-09-11 Thread Xie He

On Fri, Sep 11, 2020 at 2:41 PM David Miller  wrote:
>
> From: Xie He 
> Date: Thu, 10 Sep 2020 23:07:20 -0700
>
> > This header file is not actually used in this file. Let's remove it.
>
> How did you test this assertion?  As Jakub showed, the
> dlci_ioctl_set() function needs to be declared because socket.c
> references it.
>
> All of your visual scanning of the code is wasted if you don't
> do something simple like an "allmodconfig" or "allyesconfig"
> build to test whether your change is correct or not.
>
> Don't leave that step for us, that's your responsibility.
>

OK. I'm sorry for this.

Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain

2020-09-11 Thread Li, Aubrey

On 2020/9/12 7:04, Li, Aubrey wrote:
> On 2020/9/12 0:28, Qais Yousef wrote:
>> On 09/10/20 13:42, Aubrey Li wrote:
>>> Added idle cpumask to track idle cpus in sched domain. When a CPU
>>> enters idle, its corresponding bit in the idle cpumask will be set,
>>> and when the CPU exits idle, its bit will be cleared.
>>>
>>> When a task wakes up to select an idle cpu, scanning idle cpumask
>>> has low cost than scanning all the cpus in last level cache domain,
>>> especially when the system is heavily loaded.
>>>
>>> Signed-off-by: Aubrey Li 
>>> ---
>>>  include/linux/sched/topology.h | 13 +
>>>  kernel/sched/fair.c|  4 +++-
>>>  kernel/sched/topology.c|  2 +-
>>>  3 files changed, 17 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
>>> index fb11091129b3..43a641d26154 100644
>>> --- a/include/linux/sched/topology.h
>>> +++ b/include/linux/sched/topology.h
>>> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>>> atomic_tref;
>>> atomic_tnr_busy_cpus;
>>> int has_idle_cores;
>>> +   /*
>>> +* Span of all idle CPUs in this domain.
>>> +*
>>> +* NOTE: this field is variable length. (Allocated dynamically
>>> +* by attaching extra space to the end of the structure,
>>> +* depending on how many CPUs the kernel has booted up with)
>>> +*/
>>> +   unsigned long   idle_cpus_span[];
>>
>> Can't you use cpumask_var_t and zalloc_cpumask_var() instead?
> 
> I can use the existing free code. Do we have a problem of this?
> 
>>
>> The patch looks useful. Did it help you with any particular workload? It'd be
>> good to expand on that in the commit message.
>>
> Odd, that included in patch v1 0/1, did you receive it?

I found it at here:

https://lkml.org/lkml/2020/9/11/645

> 
> Thanks,
> -Aubrey
>

Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs

2020-09-11 Thread Sean V Kelley


On 4 Sep 2020, at 19:23, Bjorn Helgaas wrote:


On Fri, Sep 04, 2020 at 10:18:30PM +, Kelley, Sean V wrote:

Hi Bjorn,

Quick question below...

On Wed, 2020-09-02 at 14:55 -0700, Sean V Kelley wrote:

Hi Bjorn,

On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:

On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:

From: Qiuxu Zhuo 

When an RCEC device signals error(s) to a CPU core, the CPU core
needs to walk all the RCiEPs associated with that RCEC to check
errors. So add the function pcie_walk_rcec() to walk all RCiEPs
associated with the RCEC device.

Co-developed-by: Sean V Kelley 
Signed-off-by: Sean V Kelley 
Signed-off-by: Qiuxu Zhuo 
Reviewed-by: Jonathan Cameron 
---
 drivers/pci/pci.h   |  4 +++
 drivers/pci/pcie/rcec.c | 76
+
 2 files changed, 80 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index bd25e6047b54..8bd7528d6977 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct
pci_dev
*pdev) {}
 #ifdef CONFIG_PCIEPORTBUS
 void pci_rcec_init(struct pci_dev *dev);
 void pci_rcec_exit(struct pci_dev *dev);
+void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
pci_dev
*, void *),
+   void *userdata);
 #else
 static inline void pci_rcec_init(struct pci_dev *dev) {}
 static inline void pci_rcec_exit(struct pci_dev *dev) {}
+static inline void pcie_walk_rcec(struct pci_dev *rcec, int
(*cb)(struct pci_dev *, void *),
+ void *userdata) {}
 #endif

 #ifdef CONFIG_PCI_ATS
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
index 519ae086ff41..405f92fcdf7f 100644
--- a/drivers/pci/pcie/rcec.c
+++ b/drivers/pci/pcie/rcec.c
@@ -17,6 +17,82 @@

 #include "../pci.h"

+static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
(*cb)(struct pci_dev *, void *),
+void *userdata, const unsigned long
bitmap)
+{
+   unsigned int devn, fn;
+   struct pci_dev *dev;
+   int retval;
+
+   for_each_set_bit(devn, , 32) {
+   for (fn = 0; fn < 8; fn++) {
+   dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));


Wow, this is a lot of churning to call pci_get_slot() 256 times per
bus for the "associated bus numbers" case where we pass a bitmap of
0x.  They didn't really make it easy for software when they
added the next/last bus number thing.

Just thinking out loud here.  What if we could set dev->rcec during
enumeration, and then use that to build pcie_walk_rcec()?


I think follow what you are doing.

As we enumerate an RCEC, use the time to discover RCiEPs and
associate
each RCiEP's dev->rcec. Although BIOS already set the bitmap for 
this

specific RCEC, it's more efficient to simply discover the devices
through the bus walk and verify each one found against the bitmap.

Further, while we can be certain that an RCiEP found with a matching
device no. in a bitmap for an associated RCEC is correct, we cannot
be
certain that any RCiEP found on another bus range is correct unless
we
verify the bus is within that next/last bus range.

Finally, that's where find_rcec() callback for rcec_assoc_rciep()
does
double duty by also checking on the "on-a-separate-bus" case 
captured

potentially by find_rcec() during an RCiEP's bus walk.



  bool rcec_assoc_rciep(rcec, rciep)
  {
if (rcec->bus == rciep->bus)
  return (rcec->bitmap contains rciep->devfn);

return (rcec->next/last contains rciep->bus);
  }

  link_rcec(dev, data)
  {
struct pci_dev *rcec = data;

if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
  dev->rcec = rcec;
  }

  find_rcec(dev, data)
  {
struct pci_dev *rciep = data;

if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
  rciep->rcec = dev;
  }

  pci_setup_device
...


I just noticed your use of pci_setup_device(). Are you suggesting
moving the call to pci_rcec_init() out of pci_init_capabilities() and
move it into pci_setup_device()?  If so, would pci_rcec_exit() still
remain in pci_release_capabilities()?

I'm just wondering if it could just remain in 
pci_init_capabilities().


Yeah, I didn't mean in pci_setup_device() specifically, just somewhere
in the callchain of pci_setup_device().  But you're right, it probably
would make more sense in pci_init_capabilities(), so I *should* have
said pci_scan_single_device() to be a little less specific.

Bjorn


I’ve done some experimenting with this approach, and I think there may 
be a problem of just walking the busses during enumeration 
pci_init_capabilities(). One problem is where one has an RCEC on a root 
bus: 6a(00.4) and an RCiEP on another root bus: 6b(00.0).  They will 
never find each other in this approach through a normal pci_bus_walk() 
call using their respective root_bus.



 +-[:6b]-+-00.0
 |   +-00.1
 |   +-00.2
 |   \-00.3
 +-[:6a]-+-00.0
 |   +-00.1
 |   +-00.2
 |

Yet another ethernet PHY LED control proposal

2020-09-11 Thread Marek Behun

Hello,

I have been thinking about another way to implement ABI for HW control
of ethernet PHY connected LEDs.

This proposal is inspired by the fact that for some time there is a
movement in the kernel to do transparent HW offloading of things (DSA
is an example of that).

So currently we have the `netdev` trigger. When this is enabled for a
LED, new files will appear in that LED's sysfs directory:
  - `device_name` where user is supposed to write interface name
  - `link` if set to 1, the LED will be ON if the interface is linked
  - `rx` if set to 1, the LED will blink on receive event
  - `tx` if set to 1, the LED will blink on transmit event
  - `interval` specifies duration of the LED blink

Now what is interesting is that almost all combinations of link/rx/tx
settings are offloadable to a Marvell PHY! (Not to all LEDs, though...)

So what if we abandoned the idea of a `hw` trigger, and instead just
allowed a LED trigger to be offloadable, if that specific LED supports
it?

For the HW mode for different speed we can just expand the `link` sysfs
file ABI, so that if user writes a specific speed to this file, instead
of just "1", the LED will be on if the interface is linked on that
specific speed. Or maybe another sysfs file could be used for "light on
N mbps" setting...

Afterwards we can figure out other possible modes.

What do you think?

Marek

Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands

2020-09-11 Thread Borislav Petkov

On Sat, Sep 12, 2020 at 12:17:59AM +0200, Borislav Petkov wrote:
> Enabling it, fixes the issue.

Btw, I just hit the below warn with 5.8, while booting with the above
config option enabled. Looks familiar and I didn't trigger it with
5.9-rc4+ so you guys either fixed it or something changed in-between:

[5.124321] ata4.00: NCQ Send/Recv Log not supported
[5.131484] ata4.00: configured for UDMA/133
[5.135847] scsi 3:0:0:0: Direct-Access ATA  ST8000AS0022-1WL SN01 
PQ: 0 ANSI: 5
[5.143972] sd 3:0:0:0: Attached scsi generic sg1 type 0
[5.144033] sd 3:0:0:0: [sdb] Host-aware zoned block device
[5.177105] sd 3:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 
TB/7.28 TiB)
[5.184880] sd 3:0:0:0: [sdb] 4096-byte physical blocks
[5.190084] sd 3:0:0:0: [sdb] 29808 zones of 524288 logical blocks + 1 runt 
zone
[5.197439] sd 3:0:0:0: [sdb] Write Protect is off
[5.202220] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[5.207260] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[5.356631]  sdb: sdb1
[5.359014] sdb: disabling host aware zoned block device support due to 
partitions
[5.389941] [ cut here ]
[5.394557] WARNING: CPU: 8 PID: 164 at block/blk-settings.c:236 
blk_queue_max_zone_append_sectors+0x12/0x40
[5.404300] Modules linked in:
[5.407365] CPU: 8 PID: 164 Comm: kworker/u32:6 Not tainted 5.8.0 #7
[5.413682] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 
GAMING PRO (MS-7B79), BIOS 1.70 01/23/2019
[5.424191] Workqueue: events_unbound async_run_entry_fn
[5.429482] RIP: 0010:blk_queue_max_zone_append_sectors+0x12/0x40
[5.435543] Code: fe 0f 00 00 53 48 89 fb 0f 86 3d 07 00 00 48 89 b3 e0 03 
00 00 5b c3 90 0f 1f 44 00 00 8b 87 40 04 00 00 ff c8 83 f8 01 76 03 <0f> 0b c3 
8b 87 f8 03 00 00 39 87 f0 03 00 00 0f 46 87 f0 03 00 00
[5.454099] RSP: 0018:c9697c60 EFLAGS: 00010282
[5.459306] RAX:  RBX: 8887fa0a9400 RCX: 
[5.466390] RDX: 8887faf0d400 RSI: 0540 RDI: 8887f0dde6c8
[5.473474] RBP: 7471 R08: 001d1c40 R09: 8887fee29ad0
[5.480559] R10: 0001434bac00 R11: 00358275 R12: 0008
[5.487643] R13: 8887f0dde6c8 R14: 8887fa0a9738 R15: 
[5.494726] FS:  () GS:8887fee0() 
knlGS:
[5.502757] CS:  0010 DS:  ES:  CR0: 80050033
[5.508474] CR2:  CR3: 02209000 CR4: 003406e0
[5.515558] Call Trace:
[5.518026]  sd_zbc_read_zones+0x323/0x480
[5.522122]  sd_revalidate_disk+0x122b/0x2000
[5.526472]  ? __device_add_disk+0x2f7/0x4e0
[5.530738]  sd_probe+0x347/0x44b
[5.534058]  really_probe+0x2c4/0x3f0
[5.537720]  driver_probe_device+0xe1/0x150
[5.541902]  ? driver_allows_async_probing+0x50/0x50
[5.546852]  bus_for_each_drv+0x6a/0xa0
[5.550683]  __device_attach_async_helper+0x8c/0xd0
[5.47]  async_run_entry_fn+0x4a/0x180
[5.559636]  process_one_work+0x1a5/0x3a0
[5.563637]  worker_thread+0x50/0x3a0
[5.567300]  ? process_one_work+0x3a0/0x3a0
[5.571480]  kthread+0x117/0x160
[5.574715]  ? kthread_park+0x90/0x90
[5.578377]  ret_from_fork+0x22/0x30
[5.581960] ---[ end trace 94141003236730cf ]---
[5.586578] sd 3:0:0:0: [sdb] Attached SCSI disk
[6.186783] ata5: failed to resume link (SControl 0)
[6.191818] ata5: SATA link down (SStatus 0 SControl 0)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] ide/macide: Convert Mac IDE driver to platform driver

2020-09-11 Thread Finn Thain

On Fri, 11 Sep 2020, Geert Uytterhoeven wrote:

> 
> Sorry, I completely missed that the Baboon case registers a 
> "pata_platform" device instead of a "macide" device.  Hence please 
> ignore my comments related to that.  Sorry for the noise.
> 

No problem. That misunderstanding got me thinking about implications 
stemming from my patch that I may have overlooked. Here's what I found.

1) Your presumption that the old macide driver would keep supporting all 
variants does make sense, as that would delay disruption for as long as 
possible (i.e. for as long as the IDE subsystem remains).

However, if my patch does not get merged until 2021, that disruption would 
not arrive earlier than promised by commit 7ad19a99ad431 ("ide: officially 
deprecated the legacy IDE driver").

2) My patch omitted a mac_defconfig update that would enable an 
alternative driver for the Baboon case. I will remedy this in v2.

3) It turns out that the Debian/m68k kernel config has 
CONFIG_BLK_DEV_PLATFORM=m. This will work fine with this patch. (I assume 
that Debian developers will replace CONFIG_BLK_DEV_PLATFORM with 
CONFIG_PATA_PLATFORM prior to the removal of the IDE subsystem next year.)

Re: [RFC PATCH v1 1/1] sched/fair: select idle cpu from idle cpumask in sched domain

2020-09-11 Thread Li, Aubrey

On 2020/9/12 0:28, Qais Yousef wrote:
> On 09/10/20 13:42, Aubrey Li wrote:
>> Added idle cpumask to track idle cpus in sched domain. When a CPU
>> enters idle, its corresponding bit in the idle cpumask will be set,
>> and when the CPU exits idle, its bit will be cleared.
>>
>> When a task wakes up to select an idle cpu, scanning idle cpumask
>> has low cost than scanning all the cpus in last level cache domain,
>> especially when the system is heavily loaded.
>>
>> Signed-off-by: Aubrey Li 
>> ---
>>  include/linux/sched/topology.h | 13 +
>>  kernel/sched/fair.c|  4 +++-
>>  kernel/sched/topology.c|  2 +-
>>  3 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
>> index fb11091129b3..43a641d26154 100644
>> --- a/include/linux/sched/topology.h
>> +++ b/include/linux/sched/topology.h
>> @@ -65,8 +65,21 @@ struct sched_domain_shared {
>>  atomic_tref;
>>  atomic_tnr_busy_cpus;
>>  int has_idle_cores;
>> +/*
>> + * Span of all idle CPUs in this domain.
>> + *
>> + * NOTE: this field is variable length. (Allocated dynamically
>> + * by attaching extra space to the end of the structure,
>> + * depending on how many CPUs the kernel has booted up with)
>> + */
>> +unsigned long   idle_cpus_span[];
> 
> Can't you use cpumask_var_t and zalloc_cpumask_var() instead?

I can use the existing free code. Do we have a problem of this?

> 
> The patch looks useful. Did it help you with any particular workload? It'd be
> good to expand on that in the commit message.
> 
Odd, that included in patch v1 0/1, did you receive it?

Thanks,
-Aubrey

Re: [PATCH 01/12] dt-bindings: power: Add bindings for the Mediatek SCPSYS power domains controller

2020-09-11 Thread Rob Herring

On Thu, Sep 10, 2020 at 07:28:15PM +0200, Enric Balletbo i Serra wrote:
> The System Control Processor System (SCPSYS) has several power management
> related tasks in the system. Add the bindings to define the power
> domains for the SCPSYS power controller.
> 
> Co-developed-by: Matthias Brugger 
> Signed-off-by: Matthias Brugger 
> Signed-off-by: Enric Balletbo i Serra 
> ---
> Dear Rob,
> 
> I am awasre that this binding is not ready, but I prefered to send because I'm
> kind of blocked. Compiling this binding triggers the following error:
> 
> mediatek,power-controller.example.dt.yaml: syscon@10006000: mfg_async@7:
> '#address-cells', '#size-cells', 'mfg_2d@8'
> do not match any of the regexes: 'pinctrl-[0-9]+'
> 
> This happens when a definition of a power-domain (parent) contains
> another power-domain (child), like the example. I am not sure how to
> specify this in the yaml and deal with this, so any clue is welcome.

You just have to keep nesting schemas all the way down. Define a 
grandchild node under the child node and then all of its properties.

> 
> Thanks,
>   Enric
> 
>  .../power/mediatek,power-controller.yaml  | 171 ++
>  1 file changed, 171 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/power/mediatek,power-controller.yaml
> 
> diff --git 
> a/Documentation/devicetree/bindings/power/mediatek,power-controller.yaml 
> b/Documentation/devicetree/bindings/power/mediatek,power-controller.yaml
> new file mode 100644
> index ..8be9244ad160
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/power/mediatek,power-controller.yaml
> @@ -0,0 +1,171 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/power/mediatek,power-controller.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Mediatek Power Domains Controller
> +
> +maintainers:
> +  - Weiyi Lu 
> +  - Matthias Brugger 
> +
> +description: |
> +  Mediatek processors include support for multiple power domains which can be
> +  powered up/down by software based on different application scenes to save 
> power.
> +
> +  IP cores belonging to a power domain should contain a 'power-domains'
> +  property that is a phandle for SCPSYS node representing the domain.
> +
> +properties:
> +  $nodename:
> +pattern: "^syscon@[0-9a-f]+$"
> +
> +  compatible:
> +items:
> +  - enum:
> +- mediatek,mt8173-power-controller
> +  - const: syscon
> +
> +  reg:
> +maxItems: 1
> +
> +patternProperties:
> +  "^.*@[0-9]$":

Node names should be generic:

power-domain@

> +type: object
> +description: |
> +  Represents the power domains within the power controller node as 
> documented
> +  in Documentation/devicetree/bindings/power/power-domain.yaml.
> +
> +properties:
> +  reg:
> +description: |
> +  Power domain index. Valid values are defined in:
> +  "include/dt-bindings/power/mt8173-power.h" - for MT8173 type 
> power domain.
> +maxItems: 1
> +
> +  '#power-domain-cells':
> +description:
> +  Documented by the generic PM Domain bindings in
> +  Documentation/devicetree/bindings/power/power-domain.yaml.

No need to redefine a common property. This should define valid values 
for it.

> +
> +  clocks:
> +description: |
> +  A number of phandles to clocks that need to be enabled during 
> domain
> +  power-up sequencing.

No need to redefine 'clocks'. You need to define how many, what each one 
is, and the order.

> +
> +  clock-names:
> +description: |
> +  List of names of clocks, in order to match the power-up sequencing
> +  for each power domain we need to group the clocks by name. BASIC
> +  clocks need to be enabled before enabling the corresponding power
> +  domain, and should not have a '-' in their name (i.e mm, mfg, 
> venc).
> +  SUSBYS clocks need to be enabled before releasing the bus 
> protection,
> +  and should contain a '-' in their name (i.e mm-0, isp-0, cam-0).
> +
> +  In order to follow properly the power-up sequencing, the clocks 
> must
> +  be specified by order, adding first the BASIC clocks followed by 
> the
> +  SUSBSYS clocks.

You need to define the names.

> +
> +  mediatek,infracfg:
> +$ref: /schemas/types.yaml#definitions/phandle
> +description: phandle to the device containing the INFRACFG register 
> range.
> +
> +  mediatek,smi:
> +$ref: /schemas/types.yaml#definitions/phandle
> +description: phandle to the device containing the SMI register range.
> +
> +required:
> +  - reg
> +  - '#power-domain-cells'
> +
> +additionalProperties: false
> +
> +required:
> +  - compatible
> +  - reg
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +#include 
> +#include 
>

Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

2020-09-11 Thread Yu-cheng Yu

On Wed, 2020-09-09 at 16:29 -0700, Dave Hansen wrote:
> On 9/9/20 4:25 PM, Yu, Yu-cheng wrote:
> > On 9/9/2020 4:11 PM, Dave Hansen wrote:
> > > On 9/9/20 4:07 PM, Yu, Yu-cheng wrote:
> > > > What if a writable mapping is passed to madvise(MADV_SHSTK)?  Should
> > > > that be rejected?
> > > 
> > > It doesn't matter to me.  Even if it's readable, it _stops_ being even
> > > directly readable after it's a shadow stack, right?  I don't think
> > > writes are special in any way.  If anything, we *want* it to be writable
> > > because that indicates that it can be written to, and we will want to
> > > write to it soon.
> > > 
> > But in a PROT_WRITE mapping, all the pte's have _PAGE_BIT_RW set.  To
> > change them to shadow stack, we need to clear that bit from the pte's.
> > That will be like mprotect_fixup()/change_protection_range().
> 
> The page table hardware bits don't matter.  The user-visible protection
> effects matter.
> 
> For instance, we have PROT_EXEC, which *CLEARS* a hardware NX PTE bit.
> The PROT_ permissions are independent of the hardware.
> 
> I don't think the interface should be influenced at *all* by what whacko
> PTE bit combinations we have to set to get the behavior.

Here are the changes if we take the mprotect(PROT_SHSTK) approach.
Any comments/suggestions?

---
 arch/x86/include/uapi/asm/mman.h | 26 +-
 mm/mprotect.c| 11 +++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h
index d4a8d0424bfb..024f006fcfe8 100644
--- a/arch/x86/include/uapi/asm/mman.h
+++ b/arch/x86/include/uapi/asm/mman.h
@@ -4,6 +4,8 @@
 
 #define MAP_32BIT  0x40/* only give out 32bit addresses */
 
+#define PROT_SHSTK 0x10/* shadow stack pages */
+
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
 /*
  * Take the 4 protection key bits out of the vma->vm_flags
@@ -19,13 +21,35 @@
((vm_flags) & VM_PKEY_BIT2 ? _PAGE_PKEY_BIT2 : 0) | \
((vm_flags) & VM_PKEY_BIT3 ? _PAGE_PKEY_BIT3 : 0))
 
-#define arch_calc_vm_prot_bits(prot, key) (\
+#define pkey_vm_prot_bits(prot, key) ( \
((key) & 0x1 ? VM_PKEY_BIT0 : 0) |  \
((key) & 0x2 ? VM_PKEY_BIT1 : 0) |  \
((key) & 0x4 ? VM_PKEY_BIT2 : 0) |  \
((key) & 0x8 ? VM_PKEY_BIT3 : 0))
+#else
+#define pkey_vm_prot_bits(prot, key)
 #endif
 
+#define shstk_vm_prot_bits(prot) ( \
+   (static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK)) ? \
+   VM_SHSTK : 0)
+
+#define arch_calc_vm_prot_bits(prot, key) \
+   (pkey_vm_prot_bits(prot, key) | shstk_vm_prot_bits(prot))
+
 #include 
 
+static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
+{
+   unsigned long supported = PROT_READ | PROT_EXEC | PROT_SEM;
+
+   if (static_cpu_has(X86_FEATURE_SHSTK) && (prot & PROT_SHSTK))
+   supported |= PROT_SHSTK;
+   else
+   supported |= PROT_WRITE;
+
+   return (prot & ~supported) == 0;
+}
+#define arch_validate_prot arch_validate_prot
+
 #endif /* _ASM_X86_MMAN_H */
diff --git a/mm/mprotect.c b/mm/mprotect.c
index a8edbcb3af99..520bd8caa005 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -571,6 +571,17 @@ static int do_mprotect_pkey(unsigned long start, size_t
len,
goto out;
}
}
+
+   /*
+* Only anonymous mapping is suitable for shadow stack.
+*/
+   if (prot & PROT_SHSTK) {
+   if (vma->vm_file) {
+   error = -EINVAL;
+   goto out;
+   }
+   }
+
if (start > vma->vm_start)
prev = vma;
 
--

Re: [PATCH v3 2/2] dm-crypt: collect data and submit to DM to measure

2020-09-11 Thread Tushar Sugandhi


Thanks for taking a look at this patch Milan.
Appreciate it.

Sorry for responding late. I was on vacation last week.

My responses below.

On 2020-08-31 3:54 a.m., Milan Broz wrote:

On 28/08/2020 22:27, Tushar Sugandhi wrote:

Currently, dm-crypt does not take advantage of IMA measuring
capabilities, and ultimately the benefits of remote attestation.

Measure various dm-crypt constructs by calling various device-mapper
functions - dm_ima_*() that use IMA measuring capabilities. Implement
ima_measure_dm_crypt_data() to measure various dm-crypt constructs.

Ensure that ima_measure_dm_crypt_data() is non intrusive, i.e. failures
in this function and the call-stack below should not affect the core
functionality of dm-crypt.


Just my opinion, but I really do not like to add every relevant DM table option
as a harcoded value here. (But maybe it is necessary, dunno).

Why you cannot measure the whole device-mapper table as a string?
(output of STATUSTYPE_TABLE).
But it has own problems, like table can contain key etc.


Correct. That’s one of the reasons why we don’t want to measure
the whole device mapper table. We don’t want the keys to leave the
device – even for IMA measurements/attestation.

Here is the list of problems, that I can think of, with measuring the 
whole device mapper table:


1. The table may not output all the important attributes that we
   want to measure, for all the targets we are interested in. (e.g.
   for crypt it doesn’t output many of the integrity/bio attributes
   from crypt_config)

2. The targets may output sensitive information to the table that
   should not leave the device, even in case of measurement/
   attestation.
   (crypt has keys, other targets may have similar sensitive
   information)

3. The order of entries in the table maybe different on each
   activation, and not all the entries will be present on each
   activation. The entries may come and go, based on policy
   changes and/or devices getting added/removed from the system.
   This would generate at least 2^n different hashes (where n is the
   number of entries in the table). And attestation server will have
   to keep track of all these permutations. With each entry measured
   separately, the possible hashes that the attestation server needs
   to keep track of will grow more linearly with the number of
   entries, and not exponentially.

4. With bulk measuring of entire table, we are not giving sys-admins
   any choice on which targets to measure, and which not to measure.
   With the current approach – they have this choice (using the IMA
   policy I introduced in the patch -
   https://patchwork.kernel.org/patch/11742027/
   and
   https://patchwork.kernel.org/patch/11742035/

5. The table would contain the targets that we don’t care about for
   IMA measurements. Measuring the whole table will simply pollute
   the data unnecessarily.


Anyway with the above, the whole measurement can reside in DM core (I hope).


Could you please provide more info – where in DM core? Currently, I have
implemented the functionality to be generic enough to measure any dm
target (not just dm-crypt). It is present in the new module dm-ima.c
which only gets applied when CONFIG_IMA=y. See patch 1/2 of this series.
or https://patchwork.kernel.org/patch/11743713/


But there are some problems - we can activate device with optional flags
(for example allow_discards) - should this be IMA measurement?

allow_discards translates to ti->num_discard_bios. And I am already 
measuring it. But thanks for validating the need to measure it.


static int crypt_ctr_optional(...)
{

if (!strcasecmp(opt_string, "allow_discards"))
ti->num_discard_bios = 1;

...

static void ima_measure_dm_crypt_data(...)
{

r = ima_append_num_values(ti, "ti_num_discard_bios",
  ti->num_discard_bios);


And what about device size (you already measure offset)?

Ok. I will measure device size. I will try and find where device size 
attribute is. But if you could point me to it, that would really help.



IMO it depends on situation (policy).

Do you mean I should conditionally measure the attributes based on some 
policy?

Wouldn’t that be too much granularity?

Or do you mean something else?

I have already introduced a new IMA policy in the patches below, that
can help admins choose which DM targets to measure. I believe further
granularity is not needed, and may pose security risks.
https://patchwork.kernel.org/patch/11742027/
and
https://patchwork.kernel.org/patch/11742035/


It is quite often that we add performance flags later (like these no_workqueue 
in 5.9).
Some of them should be measured because there is possible security/data 
integrity impact.

The optional parameters like no_read_workqueue, no_write_workqueue, 
same_cpu_crypt, submit_from_crypt_cpus, are part of

Re: [RESEND][PATCH v6] arm64: dts: qcom: Add support for Xiaomi Poco F1 (Beryllium)

2020-09-11 Thread Konrad Dybcio

I'm not a maintainer, but I reviewed this earlier, so I guess it's
only appropriate:

Reviewed-by: Konrad Dybcio 


Looking forward to future patches for this device! :D

Konrad

Re: [PATCH 2/2] media: dt-bindings: media: i2c: Add bindings for ADDI9036

2020-09-11 Thread Rob Herring

On Thu, 10 Sep 2020 19:24:07 +0300, Bogdan Togorean wrote:
> Add YAML device tree bindings for Analog Devices Inc. ADDI9036 CCD TOF
> front-end.
> 
> Signed-off-by: Bogdan Togorean 
> ---
>  .../bindings/media/i2c/adi,addi9036.yaml  | 72 +++
>  1 file changed, 72 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/media/i2c/adi,addi9036.yaml
> 


My bot found errors running 'make dt_binding_check' on your patch:

/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/media/i2c/adi,addi9036.example.dt.yaml:
 addi9036_tof@64: 'reg' does not match any of the regexes: 'pinctrl-[0-9]+'
From schema: 
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/media/i2c/adi,addi9036.yaml


See https://patchwork.ozlabs.org/patch/1361583

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure dt-schema is up to date:

pip3 install git+https://github.com/devicetree-org/dt-schema.git@master 
--upgrade

Please check and re-submit.

Re: [PATCH v9 0/4] Introduce the for_each_set_clump macro

2020-09-11 Thread William Breathitt Gray

On Thu, Jul 16, 2020 at 02:49:35PM +0200, Linus Walleij wrote:
> Hi Syed,
> 
> sorry for taking so long. I was on vacation and a bit snowed
> under by work.
> 
> On Sat, Jun 27, 2020 at 10:10 AM Syed Nayyar Waris  
> wrote:
> 
> > Since this patchset primarily affects GPIO drivers, would you like
> > to pick it up through your GPIO tree?
> 
> I have applied the patches to an immutable branch and pushed
> to kernelorg for testing (autobuilders will play with it I hope).
> 
> If all works fine I will merge this into my devel branch for v5.9.
> 
> It would be desirable if Andrew gave his explicit ACK on it too.
> 
> Yours,
> Linus Walleij

Hi Linus,

What's the name of the branch with these patches on kernelorg; I'm
having trouble finding it?

Btw, I'm CCing Andrew as well here because I notice him missing from the
CC list earlier for this patchset.

Thanks,

William Breathitt Gray


signature.asc
Description: PGP signature

Re: [PATCH v2 1/4] dt bindings: remoteproc: Add bindings for MT8183 APU

2020-09-11 Thread Rob Herring

On Thu, 10 Sep 2020 15:01:45 +0200, Alexandre Bailon wrote:
> This adds dt bindings for the APU present in the MT8183.
> 
> Signed-off-by: Alexandre Bailon 
> ---
>  .../bindings/remoteproc/mtk,apu.yaml  | 107 ++
>  1 file changed, 107 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/remoteproc/mtk,apu.yaml
> 

Reviewed-by: Rob Herring

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1287 matches

Mail list logo