LTS: proc: fix lookup in /proc/net subdirectories after setns(2)

2021-01-07 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hi Greg,

Can you cherry-pick these to 4.19.y & 5.4.y:

commit e06689bf57017ac022ccf0f2a5071f760821ce0f
Author: Alexey Dobriyan 
Date:   Wed Dec 4 16:49:59 2019 -0800

proc: change ->nlink under proc_subdir_lock

commit c6c75deda81344c3a95d1d1f606d5cee109e5d54
Author: Alexey Dobriyan 
Date:   Tue Dec 15 20:42:39 2020 -0800

proc: fix lookup in /proc/net subdirectories after setns(2)


-Tommi



/proc/net/sctp/snmp, setns, proc: revalidate misc dentries

2020-12-01 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hello,

Bisected problems with setns() and /proc/net/sctp/snmp to this:

commit 1da4d377f943fe4194ffb9fb9c26cc58fad4dd24
Author: Alexey Dobriyan 
Date:   Fri Apr 13 15:35:42 2018 -0700

proc: revalidate misc dentries

Reproduces for example with Fedora 5.9.10-100.fc32.x86_64, so 1fde6f21d90f
("proc: fix /proc/net/* after setns(2)") does not seem to cover
/proc/net/sctp/snmp


Reproducer attached, that does open+read+close of /proc/net/sctp/snmp before
and after setns() syscall. The second open+read+close of /proc/net/sctp/snmp
incorrectly produces results for the default namespace, not the target
namespace.


Example, create netns and do some sctp:

# ./iperf-netns
+ modprobe sctp
+ ip netns add test
+ ip netns exec test ip link set lo up
+ ip netns exec test iperf3 -s -1
---
Server listening on 5201
---
+ ip netns exec test iperf3 -c 127.0.0.1 --sctp --bitrate 50M --time 4
Connecting to host 127.0.0.1, port 5201
Accepted connection from 127.0.0.1, port 50696
[  5] local 127.0.0.1 port 54735 connected to 127.0.0.1 port 5201
[  5] local 127.0.0.1 port 5201 connected to 127.0.0.1 port 54735
[ ID] Interval   Transfer Bitrate
[ ID] Interval   Transfer Bitrate
[  5]   0.00-1.00   sec  6.00 MBytes  50.3 Mbits/sec  
[  5]   0.00-1.00   sec  6.00 MBytes  50.3 Mbits/sec  
[  5]   1.00-2.00   sec  5.94 MBytes  49.8 Mbits/sec  
[  5]   1.00-2.00   sec  5.94 MBytes  49.8 Mbits/sec  
[  5]   2.00-3.00   sec  6.00 MBytes  50.3 Mbits/sec  
[  5]   2.00-3.00   sec  6.00 MBytes  50.3 Mbits/sec  
[  5]   3.00-4.00   sec  5.94 MBytes  49.8 Mbits/sec  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bitrate
[  5]   0.00-4.00   sec  23.9 MBytes  50.1
Mbits/sec  receiver
[  5]   3.00-4.00   sec  5.94 MBytes  49.8 Mbits/sec  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bitrate
[  5]   0.00-4.00   sec  23.9 MBytes  50.1 Mbits/sec
[  5]   0.00-4.00   sec  23.9 MBytes  50.1

iperf Done.
+ cat /proc/net/sctp/snmp
SctpCurrEstab   0
SctpActiveEstabs0
SctpPassiveEstabs   0
SctpAborteds0
SctpShutdowns   0
SctpOutOfBlues  0
SctpChecksumErrors  0
[...]
+ ip netns exec test cat /proc/net/sctp/snmp
SctpCurrEstab   0
SctpActiveEstabs2
SctpPassiveEstabs   2
SctpAborteds0
SctpShutdowns   4
SctpOutOfBlues  0
SctpChecksumErrors  0
SctpOutCtrlChunks   1544
SctpOutOrderChunks  1530
[...]
+ wait


But now we see all zeroes in /proc/net/sctp/snmp with the reproducer:

$ gcc repro.c -o repro  
 
# ./repro
/proc/net/sctp/snmp [pid: 175998]
SctpCurrEstab   0
SctpActiveEstabs0
SctpPassiveEstabs   0
SctpAborteds0
SctpShutdowns   0
[...]

setns(/run/netns/test) ...
/proc/net/sctp/snmp [pid: 175998]
SctpCurrEstab   0
SctpActiveEstabs0
SctpPassiveEstabs   0
SctpAborteds0
SctpShutdowns   0
SctpOutOfBlues  0
[...]


-Tommi
#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 

void slurp(const char *fn)
{
	char buf[8192];
	ssize_t r;
	int fd;

	printf("%s [pid: %d]\n", fn, getpid()); fflush(stdout);

	fd = open(fn, O_RDONLY);
	if (fd < 0) { perror("open"); exit(1); }

	r = read(fd, buf, sizeof(buf)-1);
	if (r < 0) { perror("read"); exit(1); }
	buf[r] = 0;
	puts(buf); fflush(stdout);

	if (close(fd) < 0) { perror("close"); exit(1); }
}

void newnet(const char *ns)
{
	int fd;
	fd = open(ns, O_RDONLY);
	if (fd < 0) { perror("open"); exit(1); }
	if (setns(fd, CLONE_NEWNET) < 0) { perror("setns"); exit(1); }
	if (close(fd) < 0) { perror("close"); exit(1); }
}

int main(int argc, char **argv)
{
	const char *ns = "/run/netns/test";
	const char *fn = "/proc/net/sctp/snmp";
	int d = 1;

	// Optional args: /run/netns/... /proc/net/... n
	if (argc >= 2) ns = argv[1];
	if (argc >= 3) fn = argv[2];
	if (argc >= 4 && argv[3][0] == 'n') d = 0;

	if (d) slurp(fn);
	printf("setns(%s) ...\n", ns); fflush(stdout);
	newnet(ns);
	slurp(fn);
}


iperf-netns
Description: iperf-netns


Re: [PATCH] selftests: intel_pstate: ftime() is deprecated

2020-10-28 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Tue, 2020-10-27 at 14:08 -0600, Shuah Khan wrote:
> 
> > @@ -73,8 +80,8 @@ int main(int argc, char **argv) {
> > aperf = new_aperf-old_aperf;
> > mperf = new_mperf-old_mperf;
> >   
> > -   start = before.time*1000 + before.millitm;
> > -   finish = after.time*1000 + after.millitm;
> > +   start = before.tv_sec*1000 + before.tv_nsec/100L;
> > +   finish = after.tv_sec*1000 + after.tv_nsec/100L;
> 
> Why not use timespec dNSEC_PER_MSEC define from  include/vdso/time64.h?

Hi,

If the define was available in the UAPI headers, then certainly would make
sense to use it. But I would not mess with the kernel internal headers here.

-Tommi



LTS couple perf test and perf top fixes

2020-10-09 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hi Greg, Sasha,

Can you pick this to 5.4:

commit dbd660e6b2884b864d2642d930a163d3bcebe4be
Author: Tommi Rantala 
Date:   Thu Apr 23 14:53:40 2020 +0300

perf test session topology: Fix data path


And this to 5.4 and older LTS trees too:

commit 29b4f5f188571c112713c35cc87eefb46efee612
Author: Tommi Rantala 
Date:   Thu Mar 5 10:37:12 2020 +0200

perf top: Fix stdio interface input handling with glibc 2.28+


Thanks!
-Tommi



Re: [PATCH 4.14 038/190] KVM: x86: only do L1TF workaround on affected processors

2020-06-26 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Wed, 2020-06-24 at 10:15 -0400, Sasha Levin wrote:
> On Wed, Jun 24, 2020 at 12:00:59PM +0000, Rantala, Tommi T. (Nokia -
> FI/Espoo) wrote:
> > On Fri, 2020-06-19 at 16:31 +0200, Greg Kroah-Hartman wrote:
> > > From: Paolo Bonzini 
> > > 
> > > [ Upstream commit d43e2675e96fc6ae1a633b6a69d296394448cc32 ]
> > > 
> > > KVM stores the gfn in MMIO SPTEs as a caching optimization.
> > 
> > Any ideas what's missing in 4.14 ?
> 
> I think that this was because we're missing 6129ed877d40 ("KVM: x86/mmu:
> Set mmio_value to '0' if reserved #PF can't be generated"). I've queued
> it up (along with a few other related commits) and a new -rc cycle
> should be underway for those.

Sorry, I still see it with 4.14.186:

[2.355140] [ cut here ]
[2.355872] WARNING: CPU: 0 PID: 849 at arch/x86/kvm/mmu.c:284
kvm_mmu_set_mmio_spte_mask+0x4e/0x60 [kvm]
[2.357723] Modules linked in: kvm_intel(+) kvm irqbypass bfq
sch_fq_codel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper
ata_piix dm_mirror dm_region_hash dm_log dm_mod dax autofs4
[2.359639] CPU: 0 PID: 849 Comm: systemd-udevd Not tainted 4.14.186 #2
[2.360309] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-2.fc32 04/01/2014
[2.361177] task: 8a3d19429dc0 task.stack: b2558460c000
[2.361775] RIP: 0010:kvm_mmu_set_mmio_spte_mask+0x4e/0x60 [kvm]
[2.362390] RSP: 0018:b2558460fc58 EFLAGS: 00010206
[2.362901] RAX:  RBX: c0179000 RCX:
ff45
[2.363617] RDX: 0028 RSI: 00080001 RDI:
00080001
[2.364329] RBP: c00c5951 R08:  R09:
3fff
[2.365021] R10: b255841592b8 R11: fffe R12:
5bc0
[2.365717] R13: c017a780 R14: b2558460fea0 R15:
0001
[2.366437] FS:  7fc6fcab6c40() GS:8a3d1ea0()
knlGS:
[2.367270] CS:  0010 DS:  ES:  CR0: 80050033
[2.367824] CR2: 564de775f840 CR3: 000818efc001 CR4:
001606f0
[2.368535] Call Trace:
[2.368809]  kvm_mmu_module_init+0x15f/0x240 [kvm]
[2.369323]  kvm_arch_init+0x5e/0x100 [kvm]
[2.369750]  kvm_init+0x1c/0x2b0 [kvm]
[2.370155]  ? free_pcppages_bulk+0x22d/0x4b0
[2.370591]  ? hardware_setup+0x4ab/0x4ab [kvm_intel]
[2.371113]  vmx_init+0x21/0x6af [kvm_intel]
[2.371596]  ? hardware_setup+0x4ab/0x4ab [kvm_intel]
[2.372118]  do_one_initcall+0x3e/0xf4
[2.372501]  ? kmem_cache_alloc_trace+0xef/0x190
[2.372964]  do_init_module+0x5c/0x1f0
[2.373383]  load_module+0x1f31/0x2620
[2.373765]  ? SYSC_finit_module+0x95/0xb0
[2.374205]  SYSC_finit_module+0x95/0xb0
[2.374601]  do_syscall_64+0x74/0x190
[2.374974]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[2.375500] RIP: 0033:0x7fc6fd3801bd
[2.375853] RSP: 002b:7ffd768187f8 EFLAGS: 0246 ORIG_RAX:
0139
[2.376593] RAX: ffda RBX: 564539d9ab50 RCX:
7fc6fd3801bd
[2.377305] RDX:  RSI: 7fc6fcfc784d RDI:
000e
[2.377981] RBP: 0002 R08:  R09:
0007
[2.378693] R10: 000e R11: 0246 R12:
7fc6fcfc784d
[2.379401] R13:  R14: 564539d7a530 R15:
564539d9ab50
[2.380104] Code: 59 25 06 00 75 25 48 b8 00 00 00 00 00 00 00 40 48 09
c6 48 09 c7 48 89 35 68 25 06 00 48 89 3d 69 25 06 00 c3 0f 0b 0f 0b eb d2
<0f> 0b eb d7 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 
[2.381905] ---[ end trace 5f757335c2eac657 ]---


Re: [PATCH 4.14 038/190] KVM: x86: only do L1TF workaround on affected processors

2020-06-24 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Fri, 2020-06-19 at 16:31 +0200, Greg Kroah-Hartman wrote:
> From: Paolo Bonzini 
> 
> [ Upstream commit d43e2675e96fc6ae1a633b6a69d296394448cc32 ]
> 
> KVM stores the gfn in MMIO SPTEs as a caching optimization.  These are
> split
> in two parts, as in "[high 1 low]", to thwart any attempt to use these
> bits
> in an L1TF attack.  This works as long as there are 5 free bits between
> MAXPHYADDR and bit 50 (inclusive), leaving bit 51 free so that the MMIO
> access triggers a reserved-bit-set page fault.

Hi, I'm now seeing this warning in VM bootup with 4.14.y

Not seen with 4.19.129 and 5.4.47 that also included this commit.

Any ideas what's missing in 4.14 ?

[2.294049] [ cut here ]
[2.294621] WARNING: CPU: 43 PID: 856 at arch/x86/kvm/mmu.c:279
kvm_mmu_set_mmio_spte_mask+0x4e/0x60 [kvm]
[2.295583] Modules linked in: kvm_intel(+) kvm irqbypass bfq
sch_fq_codel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper
ata_piix dm_mirror dm_region_hash dm_log dm_mod dax autofs4
[2.297269] CPU: 43 PID: 856 Comm: systemd-udevd Not tainted 4.14.185 #1
[2.297920] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-2.fc32 04/01/2014
[2.298782] task: 9b2350b19dc0 task.stack: a86344604000
[2.299390] RIP: 0010:kvm_mmu_set_mmio_spte_mask+0x4e/0x60 [kvm]
[2.299987] RSP: 0018:a86344607c78 EFLAGS: 00010206
[2.300522] RAX:  RBX: c0457000 RCX:

[2.301239] RDX: 0001 RSI: 00080001 RDI:
00080001
[2.301935] RBP: c03bd951 R08: 9b235f4e33a0 R09:
9b2355f57258
[2.302646] R10: 0164 R11:  R12:

[2.303356] R13: c0458780 R14: a86344607ea0 R15:
0001
[2.304069] FS:  7f3e95dedc40() GS:9b235f4c()
knlGS:
[2.304852] CS:  0010 DS:  ES:  CR0: 80050033
[2.305425] CR2: 55bd35ff10d0 CR3: 00081026a004 CR4:
001606e0
[2.306137] Call Trace:
[2.306414]  kvm_arch_init+0x90/0x130 [kvm]
[2.306852]  kvm_init+0x1c/0x2b0 [kvm]
[2.307258]  ? __slab_free+0x13a/0x2e0
[2.307649]  ? hardware_setup+0x4ab/0x4ab [kvm_intel]
[2.308178]  vmx_init+0x21/0x6af [kvm_intel]
[2.308604]  ? hardware_setup+0x4ab/0x4ab [kvm_intel]
[2.309132]  do_one_initcall+0x3e/0xf4
[2.309512]  ? kmem_cache_alloc_trace+0xef/0x190
[2.309985]  do_init_module+0x5c/0x1f0
[2.310386]  load_module+0x1f31/0x2620
[2.310769]  ? SYSC_finit_module+0x95/0xb0
[2.311202]  SYSC_finit_module+0x95/0xb0
[2.311600]  do_syscall_64+0x74/0x190
[2.311980]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[2.312496] RIP: 0033:0x7f3e966b71bd
[2.312860] RSP: 002b:7ffe0db584c8 EFLAGS: 0246 ORIG_RAX:
0139
[2.313606] RAX: ffda RBX: 55bd36027b10 RCX:
7f3e966b71bd
[2.314314] RDX:  RSI: 7f3e962fe84d RDI:
000f
[2.315017] RBP: 0002 R08:  R09:
0007
[2.315719] R10: 000f R11: 0246 R12:
7f3e962fe84d
[2.316420] R13:  R14: 55bd3602f400 R15:
55bd36027b10
[2.317130] Code: 29 25 06 00 75 25 48 b8 00 00 00 00 00 00 00 40 48 09
c6 48 09 c7 48 89 35 38 25 06 00 48 89 3d 39 25 06 00 c3 0f 0b 0f 0b eb d2
<0f> 0b eb d7 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 
[2.318933] ---[ end trace d933315308434918 ]---


$ head /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 63
model name  : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz



> The bit positions however were computed wrongly for AMD processors that
> have
> encryption support.  In this case, x86_phys_bits is reduced (for example
> from 48 to 43, to account for the C bit at position 47 and four bits used
> internally to store the SEV ASID and other stuff) while x86_cache_bits in
> would remain set to 48, and _all_ bits between the reduced MAXPHYADDR
> and bit 51 are set.  Then low_phys_bits would also cover some of the
> bits that are set in the shadow_mmio_value, terribly confusing the gfn
> caching mechanism.
> 
> To fix this, avoid splitting gfns as long as the processor does not have
> the L1TF bug (which includes all AMD processors).  When there is no
> splitting, low_phys_bits can be set to the reduced MAXPHYADDR removing
> the overlap.  This fixes "npt=0" operation on EPYC processors.
> 
> Thanks to Maxim Levitsky for bisecting this bug.
> 
> Cc: sta...@vger.kernel.org
> Fixes: 52918ed5fcf0 ("KVM: SVM: Override default MMIO mask if memory
> encryption is enabled")
> Signed-off-by: Paolo Bonzini 
> Signed-off-by: Sasha Levin 
> ---
>  arch/x86/kvm/mmu.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index d8878266553c..7220ab210dcf 100644
> 

rseq selftests param_test.c gettid build failure

2019-09-12 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hi Mathieu,

I'm getting rseq selftest build failure with glibc 2.30, which added
gettid():

param_test.c:18:21: error: static declaration of 'gettid' follows non-
static declaration
   18 | static inline pid_t gettid(void)
  | ^~
In file included from /usr/include/unistd.h:1170,
 from param_test.c:11:
/usr/include/bits/unistd_ext.h:34:16: note: previous declaration of
'gettid' was here
   34 | extern __pid_t gettid (void) __THROW;
  |^~

BR,
Tommi



nfs4 server stops responding

2019-08-19 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hello,

I have two VMs, exporting some directories in one VM:
# cat /etc/exports
/mnt 192.168.1.0/24(ro,fsid=0,no_subtree_check,sync)
/mnt/export
192.168.1.0/24(rw,no_root_squash,sync,no_wdelay,no_subtree_check)
[...]

And NFS mounting in the second VM:
# grep nfs /proc/mounts 
server:/export /mnt/export nfs4
rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,
acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,nordirplus,
proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.11,
local_lock=none,addr=192.168.1.10 0 0
[...]

If I keep some file descriptor open for several minutes in the second VM,
for example by running this:
# sleep 10m >/mnt/export/test

Then result is that the NFS mount stops responding: the sleep process
never finished but is "forever" stuck in (killable) D state, and any I/O
attempt from other processes in /mnt/export never finish.
It's always reproducible with this sleep command.
To recover the mountpoint I need to reboot the second VM.

Kernel version is 5.3.0-rc4 in both VMs.
Also reproducible with 4.14.x and 4.19.x

# ps aux|grep sleep
root  2524  0.0  0.0   5900   688 pts/0D14:04   0:00 sleep 5m

# grep -C100 nfs /proc/*/stack
/proc/2524/stack:[<0>] nfs4_do_close+0x87d/0xb20 [nfsv4]
/proc/2524/stack:[<0>] __put_nfs_open_context+0x297/0x4f0 [nfs]
/proc/2524/stack:[<0>] nfs_file_release+0xbe/0xf0 [nfs]
/proc/2524/stack-[<0>] __fput+0x1df/0x690
/proc/2524/stack-[<0>] task_work_run+0x123/0x1b0
/proc/2524/stack-[<0>] exit_to_usermode_loop+0x121/0x140
/proc/2524/stack-[<0>] do_syscall_64+0x2d1/0x370
/proc/2524/stack-[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
--
/proc/561/stack-[<0>] __rpc_execute+0x692/0xb10 [sunrpc]
/proc/561/stack-[<0>] rpc_run_task+0x45f/0x5d0 [sunrpc]
/proc/561/stack:[<0>] nfs4_call_sync_sequence+0x12a/0x210 [nfsv4]
/proc/561/stack:[<0>] _nfs4_proc_getattr+0x19a/0x200 [nfsv4]
/proc/561/stack:[<0>] nfs4_proc_getattr+0xda/0x230 [nfsv4]
/proc/561/stack:[<0>] __nfs_revalidate_inode+0x2ed/0x7a0 [nfs]
/proc/561/stack:[<0>] nfs_do_access+0x605/0xd00 [nfs]
/proc/561/stack:[<0>] nfs_permission+0x500/0x5e0 [nfs]
/proc/561/stack-[<0>] inode_permission+0x2dd/0x3f0
/proc/561/stack-[<0>] link_path_walk.part.60+0x681/0xe40
/proc/561/stack-[<0>] path_lookupat.isra.63+0x1af/0x850
/proc/561/stack-[<0>] filename_lookup.part.79+0x165/0x360
/proc/561/stack-[<0>] vfs_statx+0xb9/0x140
/proc/561/stack-[<0>] __do_sys_newstat+0x77/0xd0
/proc/561/stack-[<0>] do_syscall_64+0x9a/0x370
/proc/561/stack-[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9


In dmesg of second VM sometimes nfs complaints are seen:

[  386.362897] nfs: server xyz not responding, still trying

Any ideas what's going wrong here...?

-Tommi



Re: [PATCH 4.14 43/43] tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb

2019-08-02 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Fri, 2019-08-02 at 09:28 +0200, gre...@linuxfoundation.org wrote:
> On Thu, Aug 01, 2019 at 10:17:30AM +0000, Rantala, Tommi T. (Nokia -
> FI/Espoo) wrote:
> > Hi,
> > 
> > This tipc patch added in 4.14.132 is triggering a crash for me,
> > revert
> > fixes it.
> > 
> > Anyone have ideas if some other commits missing in 4.14.x to make
> > this
> > work...?
> 
> Do you also hav a problem with 4.19.y?  How about 5.2.y?  If not, can
> you do 'git bisect' to find the patch that fixes the issue?
> 
> thanks,
> 
> greg k-h

Hi, please pick this to 4.14.y and 4.19.y, tested that it fixes the
crash in both:

commit 5684abf7020dfc5f0b6ba1d68eda3663871fce52
Author: Xin Long 
Date:   Mon Jun 17 21:34:13 2019 +0800

ip_tunnel: allow not to count pkts on tstats by setting skb's dev
to NULL


For 5.2.y nothing is needed, these commits were in v5.2-rc6 already.

-Tommi



Re: [PATCH 4.14 43/43] tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb

2019-08-01 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Tue, 2019-07-02 at 10:02 +0200, Greg Kroah-Hartman wrote:
> From: Xin Long 
> 
> commit c3bcde026684c62d7a2b6f626dc7cf763833875c upstream.
> 
> udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel
> device
> to count packets on dev->tstats, a perpcu variable. However, TIPC is
> using
> udp tunnel with no tunnel device, and pass the lower dev, like veth
> device
> that only initializes dev->lstats(a perpcu variable) when creating
> it.

Hi,

This tipc patch added in 4.14.132 is triggering a crash for me, revert
fixes it.

Anyone have ideas if some other commits missing in 4.14.x to make this
work...?


# modprobe tipc
# tipc node set addr 1.1.2
# tipc bearer enable media udp name UDP1 localip 192.168.1.15

[  143.105529] Own node address <1.1.2>, network identity 4711
[  172.087098] BUG: unable to handle kernel NULL pointer dereference at
04f0
[  172.088375] IP: iptunnel_xmit+0x15e/0x1e0
[  172.089072] PGD 800231306067 P4D 800231306067 PUD 2356e1067
PMD 0
[  172.090094] Oops:  [#1] SMP PTI
[  172.090610] Modules linked in: tipc ip6_udp_tunnel udp_tunnel isofs
kvm_intel kvm irqbypass sch_fq_codel pcbc aesni_intel aes_x86_64
crypto_simd cryptd glue_helper ata_piix dm_mirror dm_region_hash dm_log
dm_mod dax autofs4
[  172.093293] CPU: 1 PID: 747 Comm: tipc Not tainted 4.14.134-1.x86_64 
#1
[  172.094448] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.12.0-2.fc30 04/01/2014
[  172.095703] task: 8b99f12c task.stack: 9ab481198000
[  172.096731] RIP: 0010:iptunnel_xmit+0x15e/0x1e0
[  172.097460] RSP: 0018:9ab48119ba00 EFLAGS: 00010202
[  172.098214] RAX:  RBX: bf4d8140 RCX:
008c
[  172.099320] RDX: 0001 RSI: fe01 RDI:
be944d62
[  172.100392] RBP: 8b99f1e7ed00 R08: 8b99ffc64520 R09:

[  172.101451] R10: 00023426d000 R11: 0002 R12:

[  172.102607] R13: 0040 R14:  R15:
8b99f426e0e8
[  172.103728] FS:  7efc82b96800() GS:8b99ffc4()
knlGS:
[  172.104976] CS:  0010 DS:  ES:  CR0: 80050033
[  172.105821] CR2: 04f0 CR3: 000234250001 CR4:
003606e0
[  172.106981] DR0:  DR1:  DR2:

[  172.108120] DR3:  DR6: fffe0ff0 DR7:
0400
[  172.109386] Call Trace:
[  172.109808]  tipc_udp_xmit.isra.18+0x1a7/0x1c0 [tipc]
[  172.110687]  ? __internal_add_timer+0x1a/0x50
[  172.111369]  ? __skb_clone+0x29/0x130
[  172.111999]  tipc_bearer_xmit_skb+0x4d/0x80 [tipc]
[  172.112845]  tipc_enable_bearer+0x2b9/0x3c0 [tipc]
[  172.113637]  ? __nla_put+0xc/0x20
[  172.114213]  tipc_nl_bearer_enable+0xca/0x100 [tipc]
[  172.114952]  genl_family_rcv_msg+0x190/0x390
[  172.115748]  genl_rcv_msg+0x47/0x90
[  172.116287]  ? __alloc_skb+0x72/0x1b0
[  172.116898]  ? genl_family_rcv_msg+0x390/0x390
[  172.117669]  netlink_rcv_skb+0x3d/0x100
[  172.118361]  genl_rcv+0x24/0x40
[  172.119005]  netlink_unicast+0x16d/0x230
[  172.119777]  netlink_sendmsg+0x1ae/0x3c0
[  172.120525]  SYSC_sendto+0xe6/0x140
[  172.121248]  ? SYSC_getsockname+0x81/0xa0
[  172.121989]  ? sock_alloc_file+0x97/0x120
[  172.122645]  ? sock_map_fd+0x3d/0x60
[  172.123278]  do_syscall_64+0x74/0x190
[  172.123911]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  172.124716] RIP: 0033:0x7efc82d6ac6b
[  172.125368] RSP: 002b:7fff40411ae8 EFLAGS: 0246 ORIG_RAX:
002c
[  172.126486] RAX: ffda RBX: 01dfca20 RCX:
7efc82d6ac6b
[  172.127632] RDX: 0054 RSI: 7fff40411b60 RDI:
0003
[  172.128765] RBP: 7fff40411b50 R08: 7efc82e36000 R09:
000c
[  172.129793] R10:  R11: 0246 R12:
7fff40411b60
[  172.130799] R13: 7fff40412d10 R14: 0040bb44 R15:

[  172.131868] Code: 01 00 00 00 85 d2 0f 44 d0 e8 1f f3 fa ff 48 8b 74
24 08 4c 89 fa 48 89 df e8 9f 94 fb ff 83 e0 fd 75 35 8b 4c 24 1c 85 c9
7e 2b <49> 8b 84 24 f0 04 00 00 65 48 03 05 aa 29 68 41 48 83 40 10 01
[  172.134773] RIP: iptunnel_xmit+0x15e/0x1e0 RSP: 9ab48119ba00
[  172.135697] CR2: 04f0
[  172.136305] ---[ end trace 27f7522ade26797f ]---


> Later iptunnel_xmit_stats() called by ip(6)tunnel_xmit() thinks the
> dev as
> a tunnel device, and uses dev->tstats instead of dev->lstats. tstats'
> each
> pointer points to a bigger struct than lstats, so when tstats-
> >tx_bytes is
> increased, other percpu variable's members could be overwritten.
> 
> syzbot has reported quite a few crashes due to fib_nh_common percpu
> member
> 'nhc_pcpu_rth_output' overwritten, call traces are like:
> 
>   BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190
>   net/ipv4/route.c:1556
> rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556
> __mkroute_output net/ipv4/route.c:2332 [inline]
> 

[PATCH 4.14] perf machine: Guard against NULL in machine__exit()

2019-06-19 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
From: Arnaldo Carvalho de Melo 

commit 4a2233b194c77ae1ea8304cb7c00b551de4313f0 upstream.

A recent fix for 'perf trace' introduced a bug where
machine__exit(trace->host) could be called while trace->host was still
NULL, so make this more robust by guarding against NULL, just like
free() does.

The problem happens, for instance, when !root users try to run 'perf
trace':

  [acme@jouet linux]$ trace
  Error:No permissions to read 
/sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
  Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'

  perf: Segmentation fault
  Obtained 7 stack frames.
  [0x4f1b2e]
  /lib64/libc.so.6(+0x3671f) [0x7f43a1dd971f]
  [0x4f3fec]
  [0x47468b]
  [0x42a2db]
  /lib64/libc.so.6(__libc_start_main+0xe9) [0x7f43a1dc3509]
  [0x42a6c9]
  Segmentation fault (core dumped)
  [acme@jouet linux]$

Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Andrei Vagin 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Vasily Averin 
Cc: Wang Nan 
Fixes: 33974a414ce2 ("perf trace: Call machine__exit() at exit")
Signed-off-by: Arnaldo Carvalho de Melo 
Signed-off-by: Tommi Rantala 
---
 tools/perf/util/machine.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 968fd0454e6b..d246080cd85e 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -156,6 +156,9 @@ void machine__delete_threads(struct machine *machine)
 
 void machine__exit(struct machine *machine)
 {
+   if (machine == NULL)
+   return;
+
machine__destroy_kernel_maps(machine);
map_groups__exit(>kmaps);
dsos__exit(>dsos);
-- 
2.20.1



perf top --stdio, glibc 2.28, stdio EOF sticky

2019-06-06 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hello,

"perf top --stdio" (or perf kvm top --stdio) keyboard handling does not
work properly for me. Instead of accepting key presses, it just
displays the "Mapped keys:" help output always.

Seems to be related to this glibc 2.28 stdio change:

https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS

* All stdio functions now treat end-of-file as a sticky condition.  If
you
 read from a file until EOF, and then the file is enlarged by another
 process, you must call clearerr or another function with the same
effect
 (e.g. fseek, rewind) before you can read the additional data.  This
 corrects a longstanding C99 conformance bug.  It is most likely to
affect
 programs that use stdio to read interactive input from a terminal.
 (Bug #1190.)


Also "perf top 

Re: [PATCH 4.19 144/187] selftests/bpf: skip verifier tests for unsupported program types

2019-05-23 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Thu, 2019-04-04 at 10:48 +0200, Greg Kroah-Hartman wrote:
> 4.19-stable review patch.  If anyone has any objections, please let
> me know.
> 
> --
> 
> [ Upstream commit 8184d44c9a577a2f1842ed6cc844bfd4a9981d8e ]
> 
> Use recently introduced bpf_probe_prog_type() to skip tests in the
> test_verifier() if bpf_verify_program() fails. The skipped test is
> indicated in the output.

Hi, this patch added in 4.19.34 causes test_verifier build failure, as
bpf_probe_prog_type() is not available:

gcc -Wall -O2 -I../../../include/uapi -I../../../lib -I../../../lib/bpf
-I../../../../include/generated -DHAVE_GENHDR
-I../../../includetest_verifier.c /root/linux-
4.19.44/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -lrt -lpthread
-o /root/linux-4.19.44/tools/testing/selftests/bpf/test_verifier
test_verifier.c: In function ‘do_test_single’:
test_verifier.c:12775:22: warning: implicit declaration of function
‘bpf_probe_prog_type’; did you mean ‘bpf_program__set_type’? [-
Wimplicit-function-declaration]
  if (fd_prog < 0 && !bpf_probe_prog_type(prog_type, 0)) {
  ^~~
  bpf_program__set_type
/usr/bin/ld: /tmp/ccEtyLhk.o: in function `do_test_single':
test_verifier.c:(.text+0xa19): undefined reference to
`bpf_probe_prog_type'
collect2: error: ld returned 1 exit status
make[1]: *** [../lib.mk:152: /root/linux-
4.19.44/tools/testing/selftests/bpf/test_verifier] Error 1


- Tommi

> Example:
> 
> ...
> 679/p bpf_get_stack return R0 within range SKIP (unsupported program
> type 5)
> 680/p ld_abs: invalid op 1 OK
> ...
> Summary: 863 PASSED, 165 SKIPPED, 3 FAILED
> 
> Signed-off-by: Stanislav Fomichev 
> Signed-off-by: Daniel Borkmann 
> Signed-off-by: Sasha Levin 
> ---
>  tools/testing/selftests/bpf/test_verifier.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/test_verifier.c
> b/tools/testing/selftests/bpf/test_verifier.c
> index 9db5a7378f40..294fc18aba2a 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -32,6 +32,7 @@
>  #include 
>  
>  #include 
> +#include 
>  
>  #ifdef HAVE_GENHDR
>  # include "autoconf.h"
> @@ -56,6 +57,7 @@
>  
>  #define UNPRIV_SYSCTL "kernel/unprivileged_bpf_disabled"
>  static bool unpriv_disabled = false;
> +static int skips;
>  
>  struct bpf_test {
>   const char *descr;
> @@ -12770,6 +12772,11 @@ static void do_test_single(struct bpf_test
> *test, bool unpriv,
>   fd_prog = bpf_verify_program(prog_type ? :
> BPF_PROG_TYPE_SOCKET_FILTER,
>prog, prog_len, test->flags &
> F_LOAD_WITH_STRICT_ALIGNMENT,
>"GPL", 0, bpf_vlog,
> sizeof(bpf_vlog), 1);
> + if (fd_prog < 0 && !bpf_probe_prog_type(prog_type, 0)) {
> + printf("SKIP (unsupported program type %d)\n",
> prog_type);
> + skips++;
> + goto close_fds;
> + }
>  
>   expected_ret = unpriv && test->result_unpriv != UNDEF ?
>  test->result_unpriv : test->result;
> @@ -12905,7 +12912,7 @@ static void get_unpriv_disabled()
>  
>  static int do_test(bool unpriv, unsigned int from, unsigned int to)
>  {
> - int i, passes = 0, errors = 0, skips = 0;
> + int i, passes = 0, errors = 0;
>  
>   for (i = from; i < to; i++) {
>   struct bpf_test *test = [i];



Re: [PATCH 4.14 09/69] x86: vdso: Use $LD instead of $CC to link

2019-04-26 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Fri, 2019-04-26 at 05:48 -0700, Nathan Chancellor wrote:
> On Fri, Apr 26, 2019 at 11:41:30AM +0000, Rantala, Tommi T. (Nokia -
> FI/Espoo) wrote:
> > On Mon, 2019-04-15 at 20:58 +0200, Greg Kroah-Hartman wrote:
> > > commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream.
> > > 
> > 
> > Hi,
> > 
> > With this patch in 4.14.112 build-id is now missing in vdso32.so:
> > 
> > $ file arch/x86/entry/vdso/vdso*so*
> > arch/x86/entry/vdso/vdso32.so: ELF 32-bit LSB pie executable,
> > Intel
> > 80386, version 1 (SYSV), dynamically linked, stripped
> > arch/x86/entry/vdso/vdso32.so.dbg: ELF 32-bit LSB pie executable,
> > Intel
> > 80386, version 1 (SYSV), dynamically linked, with debug_info, not
> > stripped
> > arch/x86/entry/vdso/vdso64.so: ELF 64-bit LSB pie executable,
> > x86-
> > 64, version 1 (SYSV), dynamically linked,
> > BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, stripped
> > arch/x86/entry/vdso/vdso64.so.dbg: ELF 64-bit LSB pie executable,
> > x86-
> > 64, version 1 (SYSV), dynamically linked,
> > BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, with
> > debug_info, not stripped
> > 
> > 
> > Based on quick check, "$(call ld-option, --build-id)" fails due to
> > some
> > 32/64 bit mismatch, so the --build-id linker flag is not used when
> > linking vdso32.so
> > 
> > Perhaps scripts/Kbuild.include is missing some change in 4.14.y to
> > make
> > this work properly.
> > 
> 
> Hi Tommi,
> 
> This appears to be fixed by commit 0294e6f4a000 ("kbuild: simplify
> ld-option implementation") upstream. Could you test the attached
> backport and make sure everything works on your end? Assuming that it
> does, I will test the other stable releases and see if this is needed
> and send those backports along.

Yes this patch fixes it. Many thanks!

-Tommi

> Thanks and sorry for the trouble!
> Nathan
> 
> > -Tommi
> > 
> > > The vdso{32,64}.so can fail to link with CC=clang when clang
> > > tries to
> > > find
> > > a suitable GCC toolchain to link these libraries with.
> > > 
> > > /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
> > >   access beyond end of merged section (782)
> > > 
> > > This happens because the host environment leaked into the cross
> > > compiler
> > > environment due to the way clang searches for suitable GCC
> > > toolchains.
> > > 
> > > Clang is a retargetable compiler, and each invocation of it must
> > > provide
> > > --target= --gcc-toolchain= to allow it to
> > > find
> > > the
> > > correct binutils for cross compilation. These flags had been
> > > added to
> > > KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS
> > > (for
> > > various
> > > reasons) which breaks clang's ability to find the correct linker
> > > when
> > > cross
> > > compiling.
> > > 
> > > Most of the time this goes unnoticed because the host linker is
> > > new
> > > enough
> > > to work anyway, or is incompatible and skipped, but this cannot
> > > be
> > > reliably
> > > assumed.
> > > 
> > > This change alters the vdso makefile to just use LD directly,
> > > which
> > > bypasses clang and thus the searching problem. The makefile will
> > > just
> > > use
> > > ${CROSS_COMPILE}ld instead, which is always what we want. This
> > > matches the
> > > method used to link vmlinux.
> > > 
> > > This drops references to DISABLE_LTO; this option doesn't seem to
> > > be
> > > set
> > > anywhere, and not knowing what its possible values are, it's not
> > > clear how
> > > to convert it from CC to LD flag.
> > > 
> > > Signed-off-by: Alistair Strachan 
> > > Signed-off-by: Thomas Gleixner 
> > > Acked-by: Andy Lutomirski 
> > > Cc: "H. Peter Anvin" 
> > > Cc: Greg Kroah-Hartman 
> > > Cc: kernel-t...@android.com
> > > Cc: j...@joelfernandes.org
> > > Cc: Andi Kleen 
> > > Link: 
> > > https://lkml.kernel.org/r/20180803173931.117515-1-astrac...@google.com
> > > Signed-off-by: Nathan Chancellor 
> > > Signed-off-by: Sasha Levin 
> > > ---
> > >  arch/x86/entry/vdso/Makefile | 22 +-
> > >  1 file changed, 9 insertions(+), 13 deletions(-)
> > > 
> > > d

Re: [PATCH 4.14 09/69] x86: vdso: Use $LD instead of $CC to link

2019-04-26 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Mon, 2019-04-15 at 20:58 +0200, Greg Kroah-Hartman wrote:
> commit 379d98ddf41344273d9718556f761420f4dc80b3 upstream.
> 

Hi,

With this patch in 4.14.112 build-id is now missing in vdso32.so:

$ file arch/x86/entry/vdso/vdso*so*
arch/x86/entry/vdso/vdso32.so: ELF 32-bit LSB pie executable, Intel
80386, version 1 (SYSV), dynamically linked, stripped
arch/x86/entry/vdso/vdso32.so.dbg: ELF 32-bit LSB pie executable, Intel
80386, version 1 (SYSV), dynamically linked, with debug_info, not
stripped
arch/x86/entry/vdso/vdso64.so: ELF 64-bit LSB pie executable, x86-
64, version 1 (SYSV), dynamically linked,
BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, stripped
arch/x86/entry/vdso/vdso64.so.dbg: ELF 64-bit LSB pie executable, x86-
64, version 1 (SYSV), dynamically linked,
BuildID[sha1]=d80730a5b561a3161e488a369d1c76c250b584b4, with
debug_info, not stripped


Based on quick check, "$(call ld-option, --build-id)" fails due to some
32/64 bit mismatch, so the --build-id linker flag is not used when
linking vdso32.so

Perhaps scripts/Kbuild.include is missing some change in 4.14.y to make
this work properly.

-Tommi

> The vdso{32,64}.so can fail to link with CC=clang when clang tries to
> find
> a suitable GCC toolchain to link these libraries with.
> 
> /usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
>   access beyond end of merged section (782)
> 
> This happens because the host environment leaked into the cross
> compiler
> environment due to the way clang searches for suitable GCC
> toolchains.
> 
> Clang is a retargetable compiler, and each invocation of it must
> provide
> --target= --gcc-toolchain= to allow it to find
> the
> correct binutils for cross compilation. These flags had been added to
> KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for
> various
> reasons) which breaks clang's ability to find the correct linker when
> cross
> compiling.
> 
> Most of the time this goes unnoticed because the host linker is new
> enough
> to work anyway, or is incompatible and skipped, but this cannot be
> reliably
> assumed.
> 
> This change alters the vdso makefile to just use LD directly, which
> bypasses clang and thus the searching problem. The makefile will just
> use
> ${CROSS_COMPILE}ld instead, which is always what we want. This
> matches the
> method used to link vmlinux.
> 
> This drops references to DISABLE_LTO; this option doesn't seem to be
> set
> anywhere, and not knowing what its possible values are, it's not
> clear how
> to convert it from CC to LD flag.
> 
> Signed-off-by: Alistair Strachan 
> Signed-off-by: Thomas Gleixner 
> Acked-by: Andy Lutomirski 
> Cc: "H. Peter Anvin" 
> Cc: Greg Kroah-Hartman 
> Cc: kernel-t...@android.com
> Cc: j...@joelfernandes.org
> Cc: Andi Kleen 
> Link: 
> https://lkml.kernel.org/r/20180803173931.117515-1-astrac...@google.com
> Signed-off-by: Nathan Chancellor 
> Signed-off-by: Sasha Levin 
> ---
>  arch/x86/entry/vdso/Makefile | 22 +-
>  1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/entry/vdso/Makefile
> b/arch/x86/entry/vdso/Makefile
> index 0a550dc5c525..0defcc939ab4 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -48,10 +48,8 @@ targets += $(vdso_img_sodbg)
>  
>  export CPPFLAGS_vdso.lds += -P -C
>  
> -VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
> - -Wl,--no-undefined \
> - -Wl,-z,max-page-size=4096 -Wl,-z,common-page-
> size=4096 \
> - $(DISABLE_LTO)
> +VDSO_LDFLAGS_vdso.lds = -m elf_x86_64 -soname linux-vdso.so.1 --no-
> undefined \
> + -z max-page-size=4096 -z common-page-size=4096
>  
>  $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
>   $(call if_changed,vdso)
> @@ -103,10 +101,8 @@ CFLAGS_REMOVE_vvar.o = -pg
>  #
>  
>  CPPFLAGS_vdsox32.lds = $(CPPFLAGS_vdso.lds)
> -VDSO_LDFLAGS_vdsox32.lds = -Wl,-m,elf32_x86_64 \
> --Wl,-soname=linux-vdso.so.1 \
> --Wl,-z,max-page-size=4096 \
> --Wl,-z,common-page-size=4096
> +VDSO_LDFLAGS_vdsox32.lds = -m elf32_x86_64 -soname linux-vdso.so.1 \
> +-z max-page-size=4096 -z common-page-
> size=4096
>  
>  # 64-bit objects to re-brand as x32
>  vobjs64-for-x32 := $(filter-out $(vobjs-nox32),$(vobjs-y))
> @@ -134,7 +130,7 @@ $(obj)/vdsox32.so.dbg: $(src)/vdsox32.lds
> $(vobjx32s) FORCE
>   $(call if_changed,vdso)
>  
>  CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
> -VDSO_LDFLAGS_vdso32.lds = -m32 -Wl,-m,elf_i386 -Wl,-soname=linux-
> gate.so.1
> +VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
>  
>  # This makes sure the $(obj) subdirectory exists even though vdso32/
>  # is not a kbuild sub-make subdirectory.
> @@ -180,13 +176,13 @@ $(obj)/vdso32.so.dbg: FORCE \
>  # The DSO images are built using a special linker script.
>  #
>  quiet_cmd_vdso = VDSO$@
> -  cmd_vdso = $(CC) 

/proc/sys/kernel/sched_domain/, isolcpus, CONFIG_CPUMASK_OFFSTACK

2019-02-15 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hello,

/proc/sys/kernel/sched_domain/ seems to be somewhat broken when kernel
is configured without CONFIG_CPUMASK_OFFSTACK and booting with
isolcpus= option.

Example with 8x CPU.

With CONFIG_CPUMASK_OFFSTACK=y and "isolcpus=2":

# uname -r
5.0.0-0.rc3.git0.1.fc30.x86_64

# ls /proc/sys/kernel/sched_domain/*
/proc/sys/kernel/sched_domain/cpu0:
domain0

/proc/sys/kernel/sched_domain/cpu1:
domain0

/proc/sys/kernel/sched_domain/cpu2:

/proc/sys/kernel/sched_domain/cpu3:
domain0

/proc/sys/kernel/sched_domain/cpu4:
domain0

/proc/sys/kernel/sched_domain/cpu5:
domain0

/proc/sys/kernel/sched_domain/cpu6:
domain0

/proc/sys/kernel/sched_domain/cpu7:
domain0


Another kernel without CONFIG_CPUMASK_OFFSTACK and "isolcpus=2", so
directories missing for CPUs 2-7:

# ls /proc/sys/kernel/sched_domain/
cpu0  cpu1

# ls /proc/sys/kernel/sched_domain/*
/proc/sys/kernel/sched_domain/cpu0:
domain0

/proc/sys/kernel/sched_domain/cpu1:
domain0


-Tommi



4.14 perf test patches

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hi,

Can you pick these patches to 4.14.y?
These fix some "perf test" errors seen when running in VM.

commit 10836d9f9ac63d40ccfa756f871ce4ed51ae3b52
Author: Jiri Olsa 
Date:   Mon Jul 3 16:50:30 2017 +0200

perf tests attr: Fix task term values

commit f6a9820d572bd8384d982357cbad214b3a6c04bb
Author: Jiri Olsa 
Date:   Thu Sep 28 18:06:33 2017 +0200

perf tests attr: Fix group stat tests

commit 692f5a22cd284bb8233a38e3ed86881d2d9c89d4
Author: Jiri Olsa 
Date:   Mon Oct 9 15:07:12 2017 +0200

perf tests attr: Make hw events optional


-Tommi



[PATCH 4.14 6/8] uio: fix wrong return value from uio_mmap()

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
From: Hailong Liu 

commit e7de2590f18a272e63732b9d519250d1b522b2c4 upstream.

uio_mmap has multiple fail paths to set return value to nonzero then
goto out. However, it always returns *0* from the *out* at end, and
this will mislead callers who check the return value of this function.

Fixes: 57c5f4df0a5a0ee ("uio: fix crash after the device is unregistered")
CC: Xiubo Li 
Signed-off-by: Hailong Liu 
Cc: stable 
Signed-off-by: Jiang Biao 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Tommi Rantala 
---
 drivers/uio/uio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 262610192755..fed2d8fa4d4d 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -816,7 +816,7 @@ static int uio_mmap(struct file *filep, struct 
vm_area_struct *vma)
 
 out:
mutex_unlock(>info_lock);
-   return 0;
+   return ret;
 }
 
 static const struct file_operations uio_fops = {
-- 
2.20.1



[PATCH 4.14 8/8] Revert "uio: use request_threaded_irq instead"

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
From: Xiubo Li 

commit 3d27c4de8d4fb2d4099ff324671792aa2578c6f9 upstream.

Since mutex lock in irq hanler is useless currently, here will
remove it together with it.

This reverts commit 9421e45f5ff3d558cf8b75a8cc0824530caf3453.

Reported-by: james.r.har...@intel.com
CC: Ahsan Atta 
Signed-off-by: Xiubo Li 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Tommi Rantala 
---
 drivers/uio/uio.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 4e0cb7cdf739..fb5c9701b1fb 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -445,13 +445,10 @@ static irqreturn_t uio_interrupt(int irq, void *dev_id)
struct uio_device *idev = (struct uio_device *)dev_id;
irqreturn_t ret;
 
-   mutex_lock(>info_lock);
-
ret = idev->info->handler(irq, idev->info);
if (ret == IRQ_HANDLED)
uio_event_notify(idev->info);
 
-   mutex_unlock(>info_lock);
return ret;
 }
 
@@ -974,9 +971,8 @@ int __uio_register_device(struct module *owner,
 * FDs at the time of unregister and therefore may not be
 * freed until they are released.
 */
-   ret = request_threaded_irq(info->irq, NULL, uio_interrupt,
-  info->irq_flags, info->name, idev);
-
+   ret = request_irq(info->irq, uio_interrupt,
+ info->irq_flags, info->name, idev);
if (ret) {
info->uio_dev = NULL;
goto err_request_irq;
-- 
2.20.1



[PATCH 4.14 7/8] uio: fix possible circular locking dependency

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
From: Xiubo Li 

commit b34e9a15b37b8ddbf06a4da142b0c39c74211eb4 upstream.

The call trace:
XXX/1910 is trying to acquire lock:
 (>mmap_sem){++}, at: [] might_fault+0x57/0xb0

but task is already holding lock:
 (>info_lock){+.+...}, at: [] uio_write+0x46/0x130 [uio]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (>info_lock){+.+...}:
   [] lock_acquire+0x99/0x1e0
   [] mutex_lock_nested+0x93/0x410
   [] uio_mmap+0x2d/0x170 [uio]
   [] mmap_region+0x428/0x650
   [] do_mmap+0x3b8/0x4e0
   [] vm_mmap_pgoff+0xd3/0x120
   [] SyS_mmap_pgoff+0x1f1/0x270
   [] SyS_mmap+0x22/0x30
   [] system_call_fastpath+0x1c/0x21

-> #0 (>mmap_sem){++}:
   [] __lock_acquire+0xdac/0x15f0
   [] lock_acquire+0x99/0x1e0
   [] might_fault+0x84/0xb0
   [] uio_write+0xb4/0x130 [uio]
   [] vfs_write+0xc3/0x1f0
   [] SyS_write+0x8a/0x100
   [] system_call_fastpath+0x1c/0x21

other info that might help us debug this:
 Possible unsafe locking scenario:
   CPU0CPU1
   
  lock(>info_lock);
   lock(>mmap_sem);
   lock(>info_lock);
  lock(>mmap_sem);

 *** DEADLOCK ***
1 lock held by XXX/1910:
 #0:  (>info_lock){+.+...}, at: [] uio_write+0x46/0x130 
[uio]

stack backtrace:
CPU: 0 PID: 1910 Comm: XXX Kdump: loaded Not tainted #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference 
Platform, BIOS 6.00 05/19/2017
Call Trace:
 [] dump_stack+0x19/0x1b
 [] print_circular_bug+0x1f9/0x207
 [] check_prevs_add+0x957/0x960
 [] __lock_acquire+0xdac/0x15f0
 [] ? mark_held_locks+0xb9/0x140
 [] lock_acquire+0x99/0x1e0
 [] ? might_fault+0x57/0xb0
 [] might_fault+0x84/0xb0
 [] ? might_fault+0x57/0xb0
 [] uio_write+0xb4/0x130 [uio]
 [] vfs_write+0xc3/0x1f0
 [] ? fget_light+0xfc/0x510
 [] SyS_write+0x8a/0x100
 [] system_call_fastpath+0x1c/0x21

Signed-off-by: Xiubo Li 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Tommi Rantala 
---
 drivers/uio/uio.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index fed2d8fa4d4d..4e0cb7cdf739 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -627,6 +627,12 @@ static ssize_t uio_write(struct file *filep, const char 
__user *buf,
ssize_t retval;
s32 irq_on;
 
+   if (count != sizeof(s32))
+   return -EINVAL;
+
+   if (copy_from_user(_on, buf, count))
+   return -EFAULT;
+
mutex_lock(>info_lock);
if (!idev->info) {
retval = -EINVAL;
@@ -638,21 +644,11 @@ static ssize_t uio_write(struct file *filep, const char 
__user *buf,
goto out;
}
 
-   if (count != sizeof(s32)) {
-   retval = -EINVAL;
-   goto out;
-   }
-
if (!idev->info->irqcontrol) {
retval = -ENOSYS;
goto out;
}
 
-   if (copy_from_user(_on, buf, count)) {
-   retval = -EFAULT;
-   goto out;
-   }
-
retval = idev->info->irqcontrol(idev->info, irq_on);
 
 out:
-- 
2.20.1



[PATCH 4.14 4/8] uio: change to use the mutex lock instead of the spin lock

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
From: Xiubo Li 

commit 543af5861f41af0a5d2432f6fb5976af50f9cee5 upstream.

We are hitting a regression with the following commit:

commit a93e7b331568227500186a465fee3c2cb5dffd1f
Author: Hamish Martin 
Date:   Mon May 14 13:32:23 2018 +1200

uio: Prevent device destruction while fds are open

The problem is the addition of spin_lock_irqsave in uio_write. This
leads to hitting  uio_write -> copy_from_user -> _copy_from_user ->
might_fault and the logs filling up with sleeping warnings.

I also noticed some uio drivers allocate memory, sleep, grab mutexes
from callouts like open() and release and uio is now doing
spin_lock_irqsave while calling them.

Reported-by: Mike Christie 
CC: Hamish Martin 
Reviewed-by: Hamish Martin 
Signed-off-by: Xiubo Li 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Tommi Rantala 
---
 drivers/uio/uio.c  | 32 +---
 include/linux/uio_driver.h |  2 +-
 2 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index c97945a3f572..4441235a56cc 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -435,7 +435,6 @@ static int uio_open(struct inode *inode, struct file *filep)
struct uio_device *idev;
struct uio_listener *listener;
int ret = 0;
-   unsigned long flags;
 
mutex_lock(_lock);
idev = idr_find(_idr, iminor(inode));
@@ -462,10 +461,10 @@ static int uio_open(struct inode *inode, struct file 
*filep)
listener->event_count = atomic_read(>event);
filep->private_data = listener;
 
-   spin_lock_irqsave(>info_lock, flags);
+   mutex_lock(>info_lock);
if (idev->info && idev->info->open)
ret = idev->info->open(idev->info, inode);
-   spin_unlock_irqrestore(>info_lock, flags);
+   mutex_unlock(>info_lock);
if (ret)
goto err_infoopen;
 
@@ -497,12 +496,11 @@ static int uio_release(struct inode *inode, struct file 
*filep)
int ret = 0;
struct uio_listener *listener = filep->private_data;
struct uio_device *idev = listener->dev;
-   unsigned long flags;
 
-   spin_lock_irqsave(>info_lock, flags);
+   mutex_lock(>info_lock);
if (idev->info && idev->info->release)
ret = idev->info->release(idev->info, inode);
-   spin_unlock_irqrestore(>info_lock, flags);
+   mutex_unlock(>info_lock);
 
module_put(idev->owner);
kfree(listener);
@@ -515,12 +513,11 @@ static unsigned int uio_poll(struct file *filep, 
poll_table *wait)
struct uio_listener *listener = filep->private_data;
struct uio_device *idev = listener->dev;
unsigned int ret = 0;
-   unsigned long flags;
 
-   spin_lock_irqsave(>info_lock, flags);
+   mutex_lock(>info_lock);
if (!idev->info || !idev->info->irq)
ret = -EIO;
-   spin_unlock_irqrestore(>info_lock, flags);
+   mutex_unlock(>info_lock);
 
if (ret)
return ret;
@@ -539,12 +536,11 @@ static ssize_t uio_read(struct file *filep, char __user 
*buf,
DECLARE_WAITQUEUE(wait, current);
ssize_t retval = 0;
s32 event_count;
-   unsigned long flags;
 
-   spin_lock_irqsave(>info_lock, flags);
+   mutex_lock(>info_lock);
if (!idev->info || !idev->info->irq)
retval = -EIO;
-   spin_unlock_irqrestore(>info_lock, flags);
+   mutex_unlock(>info_lock);
 
if (retval)
return retval;
@@ -594,9 +590,8 @@ static ssize_t uio_write(struct file *filep, const char 
__user *buf,
struct uio_device *idev = listener->dev;
ssize_t retval;
s32 irq_on;
-   unsigned long flags;
 
-   spin_lock_irqsave(>info_lock, flags);
+   mutex_lock(>info_lock);
if (!idev->info || !idev->info->irq) {
retval = -EIO;
goto out;
@@ -620,7 +615,7 @@ static ssize_t uio_write(struct file *filep, const char 
__user *buf,
retval = idev->info->irqcontrol(idev->info, irq_on);
 
 out:
-   spin_unlock_irqrestore(>info_lock, flags);
+   mutex_unlock(>info_lock);
return retval ? retval : sizeof(s32);
 }
 
@@ -874,7 +869,7 @@ int __uio_register_device(struct module *owner,
 
idev->owner = owner;
idev->info = info;
-   spin_lock_init(>info_lock);
+   mutex_init(>info_lock);
init_waitqueue_head(>wait);
atomic_set(>event, 0);
 
@@ -940,7 +935,6 @@ EXPORT_SYMBOL_GPL(__uio_register_device);
 void uio_unregister_device(struct uio_info *info)
 {
struct uio_device *idev;
-   unsigned long flags;
 
if (!info || !info->uio_dev)
return;
@@ -954,9 +948,9 @@ void uio_unregister_device(struct uio_info *info)
if (info->irq && info->irq != UIO_IRQ_CUSTOM)
free_irq(info->irq, idev);
 
-   spin_lock_irqsave(>info_lock, flags);
+   mutex_lock(>info_lock);
idev->info = 

[PATCH 4.14 5/8] uio: fix crash after the device is unregistered

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
From: Xiubo Li 

commit 57c5f4df0a5a0ee83df71251e2ee93a5e4e9 upstream.

For the target_core_user use case, after the device is unregistered
it maybe still opened in user space, then the kernel will crash, like:

[  251.163692] BUG: unable to handle kernel NULL pointer dereference at 
0008
[  251.163820] IP: [] show_name+0x23/0x40 [uio]
[  251.163965] PGD 800062694067 PUD 62696067 PMD 0
[  251.164097] Oops:  [#1] SMP
...
[  251.165605]  e1000 mptscsih mptbase drm_panel_orientation_quirks dm_mirror 
dm_region_hash dm_log dm_mod
[  251.166014] CPU: 0 PID: 13380 Comm: tcmu-runner Kdump: loaded Not tainted 
3.10.0-916.el7.test.x86_64 #1
[  251.166381] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 05/19/2017
[  251.166747] task: 971eb91db0c0 ti: 971e9e384000 task.ti: 
971e9e384000
[  251.167137] RIP: 0010:[]  [] 
show_name+0x23/0x40 [uio]
[  251.167563] RSP: 0018:971e9e387dc8  EFLAGS: 00010282
[  251.167978] RAX:  RBX: 971e9e3f8000 RCX: 971eb8368d98
[  251.168408] RDX: 971e9e3f8000 RSI: c0738084 RDI: 971e9e3f8000
[  251.168856] RBP: 971e9e387dd0 R08: 971eb8bc0018 R09: 
[  251.169296] R10: 1000 R11: a09d444d R12: a1076e80
[  251.169750] R13: 971e9e387f18 R14: 0001 R15: 971e9cfb1c80
[  251.170213] FS:  7ff37d175880() GS:971ebb60() 
knlGS:
[  251.170693] CS:  0010 DS:  ES:  CR0: 80050033
[  251.171248] CR2: 0008 CR3: 001f6000 CR4: 003607f0
[  251.172071] DR0:  DR1:  DR2: 
[  251.172640] DR3:  DR6: fffe0ff0 DR7: 0400
[  251.173236] Call Trace:
[  251.173789]  [] dev_attr_show+0x23/0x60
[  251.174356]  [] ? mutex_lock+0x12/0x2f
[  251.174892]  [] sysfs_kf_seq_show+0xcf/0x1f0
[  251.175433]  [] kernfs_seq_show+0x26/0x30
[  251.175981]  [] seq_read+0x110/0x3f0
[  251.176609]  [] kernfs_fop_read+0xf5/0x160
[  251.177158]  [] vfs_read+0x9f/0x170
[  251.177707]  [] SyS_read+0x7f/0xf0
[  251.178268]  [] system_call_fastpath+0x1c/0x21
[  251.178823] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 
89 d3 e8 7e 96 56 e0 48 8b 80 d8 02 00 00 48 89 df 48 c7 c6 84 80 73 c0 <48> 8b 
50 08 31 c0 e8 e2 67 44 e0 5b 48 98 5d c3 0f 1f 00 66 2e
[  251.180115] RIP  [] show_name+0x23/0x40 [uio]
[  251.180820]  RSP 
[  251.181473] CR2: 0008

CC: Hamish Martin 
CC: Mike Christie 
Reviewed-by: Hamish Martin 
Signed-off-by: Xiubo Li 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Tommi Rantala 
---
 drivers/uio/uio.c | 104 +++---
 1 file changed, 88 insertions(+), 16 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 4441235a56cc..262610192755 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -215,7 +215,20 @@ static ssize_t name_show(struct device *dev,
 struct device_attribute *attr, char *buf)
 {
struct uio_device *idev = dev_get_drvdata(dev);
-   return sprintf(buf, "%s\n", idev->info->name);
+   int ret;
+
+   mutex_lock(>info_lock);
+   if (!idev->info) {
+   ret = -EINVAL;
+   dev_err(dev, "the device has been unregistered\n");
+   goto out;
+   }
+
+   ret = sprintf(buf, "%s\n", idev->info->name);
+
+out:
+   mutex_unlock(>info_lock);
+   return ret;
 }
 static DEVICE_ATTR_RO(name);
 
@@ -223,7 +236,20 @@ static ssize_t version_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
struct uio_device *idev = dev_get_drvdata(dev);
-   return sprintf(buf, "%s\n", idev->info->version);
+   int ret;
+
+   mutex_lock(>info_lock);
+   if (!idev->info) {
+   ret = -EINVAL;
+   dev_err(dev, "the device has been unregistered\n");
+   goto out;
+   }
+
+   ret = sprintf(buf, "%s\n", idev->info->version);
+
+out:
+   mutex_unlock(>info_lock);
+   return ret;
 }
 static DEVICE_ATTR_RO(version);
 
@@ -417,11 +443,15 @@ EXPORT_SYMBOL_GPL(uio_event_notify);
 static irqreturn_t uio_interrupt(int irq, void *dev_id)
 {
struct uio_device *idev = (struct uio_device *)dev_id;
-   irqreturn_t ret = idev->info->handler(irq, idev->info);
+   irqreturn_t ret;
+
+   mutex_lock(>info_lock);
 
+   ret = idev->info->handler(irq, idev->info);
if (ret == IRQ_HANDLED)
uio_event_notify(idev->info);
 
+   mutex_unlock(>info_lock);
return ret;
 }
 
@@ -462,6 +492,12 @@ static int uio_open(struct inode *inode, struct file 
*filep)
filep->private_data = listener;
 
mutex_lock(>info_lock);
+   if (!idev->info) {
+   mutex_unlock(>info_lock);
+   ret = -EINVAL;
+   goto 

[PATCH 4.14 1/8] uio: Reduce return paths from uio_write()

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
From: Hamish Martin 

commit 81daa406c2cc97d85eef9409400404efc2a3f756 upstream.

Drive all return paths for uio_write() through a single block at the
end of the function.

Signed-off-by: Hamish Martin 
Reviewed-by: Chris Packham 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Tommi Rantala 
---
 drivers/uio/uio.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 654579bc1e54..10f249628e79 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -570,20 +570,29 @@ static ssize_t uio_write(struct file *filep, const char 
__user *buf,
ssize_t retval;
s32 irq_on;
 
-   if (!idev->info->irq)
-   return -EIO;
+   if (!idev->info->irq) {
+   retval = -EIO;
+   goto out;
+   }
 
-   if (count != sizeof(s32))
-   return -EINVAL;
+   if (count != sizeof(s32)) {
+   retval = -EINVAL;
+   goto out;
+   }
 
-   if (!idev->info->irqcontrol)
-   return -ENOSYS;
+   if (!idev->info->irqcontrol) {
+   retval = -ENOSYS;
+   goto out;
+   }
 
-   if (copy_from_user(_on, buf, count))
-   return -EFAULT;
+   if (copy_from_user(_on, buf, count)) {
+   retval = -EFAULT;
+   goto out;
+   }
 
retval = idev->info->irqcontrol(idev->info, irq_on);
 
+out:
return retval ? retval : sizeof(s32);
 }
 
-- 
2.20.1



[PATCH 4.14 3/8] uio: use request_threaded_irq instead

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
From: Xiubo Li 

commit 9421e45f5ff3d558cf8b75a8cc0824530caf3453 upstream.

Prepraing for changing to use mutex lock.

Signed-off-by: Xiubo Li 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Tommi Rantala 
---
 drivers/uio/uio.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 288c4b977184..c97945a3f572 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -911,8 +911,9 @@ int __uio_register_device(struct module *owner,
 * FDs at the time of unregister and therefore may not be
 * freed until they are released.
 */
-   ret = request_irq(info->irq, uio_interrupt,
- info->irq_flags, info->name, idev);
+   ret = request_threaded_irq(info->irq, NULL, uio_interrupt,
+  info->irq_flags, info->name, idev);
+
if (ret) {
info->uio_dev = NULL;
goto err_request_irq;
-- 
2.20.1



[PATCH 4.14 2/8] uio: Prevent device destruction while fds are open

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
From: Hamish Martin 

commit a93e7b331568227500186a465fee3c2cb5dffd1f upstream.

Prevent destruction of a uio_device while user space apps hold open
file descriptors to that device. Further, access to the 'info' member
of the struct uio_device is protected by spinlock. This is to ensure
stale pointers to data not under control of the UIO subsystem are not
dereferenced.

Signed-off-by: Hamish Martin 
Reviewed-by: Chris Packham 
Signed-off-by: Greg Kroah-Hartman 
[4.14 change __poll_t to unsigned int]
Signed-off-by: Tommi Rantala 
---
 drivers/uio/uio.c  | 98 --
 include/linux/uio_driver.h |  4 +-
 2 files changed, 75 insertions(+), 27 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 10f249628e79..288c4b977184 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -272,7 +272,7 @@ static int uio_dev_add_attributes(struct uio_device *idev)
if (!map_found) {
map_found = 1;
idev->map_dir = kobject_create_and_add("maps",
-   >dev->kobj);
+   >dev.kobj);
if (!idev->map_dir) {
ret = -ENOMEM;
goto err_map;
@@ -301,7 +301,7 @@ static int uio_dev_add_attributes(struct uio_device *idev)
if (!portio_found) {
portio_found = 1;
idev->portio_dir = kobject_create_and_add("portio",
-   >dev->kobj);
+   >dev.kobj);
if (!idev->portio_dir) {
ret = -ENOMEM;
goto err_portio;
@@ -344,7 +344,7 @@ static int uio_dev_add_attributes(struct uio_device *idev)
kobject_put(>kobj);
}
kobject_put(idev->map_dir);
-   dev_err(idev->dev, "error creating sysfs files (%d)\n", ret);
+   dev_err(>dev, "error creating sysfs files (%d)\n", ret);
return ret;
 }
 
@@ -381,7 +381,7 @@ static int uio_get_minor(struct uio_device *idev)
idev->minor = retval;
retval = 0;
} else if (retval == -ENOSPC) {
-   dev_err(idev->dev, "too many uio devices\n");
+   dev_err(>dev, "too many uio devices\n");
retval = -EINVAL;
}
mutex_unlock(_lock);
@@ -435,6 +435,7 @@ static int uio_open(struct inode *inode, struct file *filep)
struct uio_device *idev;
struct uio_listener *listener;
int ret = 0;
+   unsigned long flags;
 
mutex_lock(_lock);
idev = idr_find(_idr, iminor(inode));
@@ -444,9 +445,11 @@ static int uio_open(struct inode *inode, struct file 
*filep)
goto out;
}
 
+   get_device(>dev);
+
if (!try_module_get(idev->owner)) {
ret = -ENODEV;
-   goto out;
+   goto err_module_get;
}
 
listener = kmalloc(sizeof(*listener), GFP_KERNEL);
@@ -459,11 +462,13 @@ static int uio_open(struct inode *inode, struct file 
*filep)
listener->event_count = atomic_read(>event);
filep->private_data = listener;
 
-   if (idev->info->open) {
+   spin_lock_irqsave(>info_lock, flags);
+   if (idev->info && idev->info->open)
ret = idev->info->open(idev->info, inode);
-   if (ret)
-   goto err_infoopen;
-   }
+   spin_unlock_irqrestore(>info_lock, flags);
+   if (ret)
+   goto err_infoopen;
+
return 0;
 
 err_infoopen:
@@ -472,6 +477,9 @@ static int uio_open(struct inode *inode, struct file *filep)
 err_alloc_listener:
module_put(idev->owner);
 
+err_module_get:
+   put_device(>dev);
+
 out:
return ret;
 }
@@ -489,12 +497,16 @@ static int uio_release(struct inode *inode, struct file 
*filep)
int ret = 0;
struct uio_listener *listener = filep->private_data;
struct uio_device *idev = listener->dev;
+   unsigned long flags;
 
-   if (idev->info->release)
+   spin_lock_irqsave(>info_lock, flags);
+   if (idev->info && idev->info->release)
ret = idev->info->release(idev->info, inode);
+   spin_unlock_irqrestore(>info_lock, flags);
 
module_put(idev->owner);
kfree(listener);
+   put_device(>dev);
return ret;
 }
 
@@ -502,9 +514,16 @@ static unsigned int uio_poll(struct file *filep, 
poll_table *wait)
 {
struct uio_listener *listener = filep->private_data;
struct uio_device *idev = listener->dev;
+   unsigned int ret = 0;
+   unsigned long flags;
 
-   if (!idev->info->irq)
-   return -EIO;
+   spin_lock_irqsave(>info_lock, flags);
+   if (!idev->info || !idev->info->irq)
+   

[PATCH 4.14 0/8] uio backport fixes for 4.14

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Backport uio fixes to 4.14, to fix use-after-free memory errors.

Changed __poll_t to unsigned int as the former not found in 4.14, and
resolved some patch context conflicts.

Hailong Liu (1):
  uio: fix wrong return value from uio_mmap()

Hamish Martin (2):
  uio: Reduce return paths from uio_write()
  uio: Prevent device destruction while fds are open

Xiubo Li (5):
  uio: use request_threaded_irq instead
  uio: change to use the mutex lock instead of the spin lock
  uio: fix crash after the device is unregistered
  uio: fix possible circular locking dependency
  Revert "uio: use request_threaded_irq instead"

 drivers/uio/uio.c  | 206 -
 include/linux/uio_driver.h |   4 +-
 2 files changed, 163 insertions(+), 47 deletions(-)

-- 
2.20.1



Re: Suspected SPAM - Re: [PATCH 4.14 198/205] perf/core: Dont WARN() for impossible ring-buffer sizes

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Wed, 2019-02-13 at 13:03 +, Rantala, Tommi T. (Nokia - FI/Espoo)
wrote:
> On Mon, 2019-02-11 at 15:19 +0100, Greg Kroah-Hartman wrote:
> > 4.14-stable review patch.  If anyone has any objections, please let
> > me know.
> > 
> > --
> > 
> > From: Mark Rutland 
> > 
> > commit 9dff0aa95a324e262ffb03f425d00e4751f3294e upstream.
> > 
> > The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to
> > determine
> > how
> > large its ringbuffer mmap should be. This can be configured to
> > arbitrary
> > values, which can be larger than the maximum possible allocation
> > from
> > kmalloc.
> > 
> > When this is configured to a suitably large value (e.g. thanks to
> > the
> > perf fuzzer), attempting to use perf record triggers a
> > WARN_ON_ONCE()
> > in
> > __alloc_pages_nodemask():
> > 
> >WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511
> > __alloc_pages_nodemask+0x3f8/0xbc8
> > 
> > Let's avoid this by checking that the requested allocation is
> > possible
> > before calling kzalloc.
> 
> Hi,
> 
> Perf tool is broken for me in 4.14.99 (running in x86_64 VM),
> bisection
> points to this patch.

... and I see there's a fix available:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=528871b456026e6127d95b1b2bd8e3a003dc1614

-Tommi

> 
> 
> > Reported-by: Julien Thierry 
> > Signed-off-by: Mark Rutland 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > Reviewed-by: Julien Thierry 
> > Cc: Alexander Shishkin 
> > Cc: Arnaldo Carvalho de Melo 
> > Cc: Jiri Olsa 
> > Cc: Linus Torvalds 
> > Cc: Namhyung Kim 
> > Cc: Peter Zijlstra 
> > Cc: Thomas Gleixner 
> > Cc: 
> > Link: 
> > https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutl...@arm.com
> > Signed-off-by: Ingo Molnar 
> > Signed-off-by: Greg Kroah-Hartman 
> > 
> > ---
> >  kernel/events/ring_buffer.c |3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > --- a/kernel/events/ring_buffer.c
> > +++ b/kernel/events/ring_buffer.c
> > @@ -719,6 +719,9 @@ struct ring_buffer *rb_alloc(int nr_page
> > size = sizeof(struct ring_buffer);
> > size += nr_pages * sizeof(void *);
> >  
> > +   if (order_base_2(size) >= MAX_ORDER)
> > +   goto fail;
> > +
> > rb = kzalloc(size, GFP_KERNEL);
> > if (!rb)
> > goto fail;
> > 
> > 



Re: [PATCH 4.14 198/205] perf/core: Dont WARN() for impossible ring-buffer sizes

2019-02-13 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
On Mon, 2019-02-11 at 15:19 +0100, Greg Kroah-Hartman wrote:
> 4.14-stable review patch.  If anyone has any objections, please let
> me know.
> 
> --
> 
> From: Mark Rutland 
> 
> commit 9dff0aa95a324e262ffb03f425d00e4751f3294e upstream.
> 
> The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine
> how
> large its ringbuffer mmap should be. This can be configured to
> arbitrary
> values, which can be larger than the maximum possible allocation from
> kmalloc.
> 
> When this is configured to a suitably large value (e.g. thanks to the
> perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE()
> in
> __alloc_pages_nodemask():
> 
>WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511
> __alloc_pages_nodemask+0x3f8/0xbc8
> 
> Let's avoid this by checking that the requested allocation is
> possible
> before calling kzalloc.

Hi,

Perf tool is broken for me in 4.14.99 (running in x86_64 VM), bisection
points to this patch.

# perf top
Error:
Failed to mmap with 12 (Cannot allocate memory)

# perf trace
Cannot allocate memory

# strace -T -tt -f -y perf top
[...]
14:22:09.829544 openat(AT_FDCWD,
"/proc/sys/kernel/perf_event_mlock_kb", O_RDONLY) =
18 <0.15>
14:22:09.829612 read(18, "516\n",
64) = 4 <0.11>
14:22:09.829655 close(18) = 0
<0.08>
14:22:09.829702 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_SHARED,
3, 0) = -1 ENOMEM (Cannot allocate memory)
<0.15>
14:22:09.829763 write(2, "Error:\n", 7) = 7 <0.09>
14:22:09.829810 write(2, "Failed to mmap with 12 (Cannot
a"..., 48) = 48 <0.08>


Changing the patch like this fixes it...

-   if (order_base_2(size) >= MAX_ORDER)
+   if (order_base_2(size) > MAX_ORDER)

-Tommi


> Reported-by: Julien Thierry 
> Signed-off-by: Mark Rutland 
> Signed-off-by: Peter Zijlstra (Intel) 
> Reviewed-by: Julien Thierry 
> Cc: Alexander Shishkin 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Jiri Olsa 
> Cc: Linus Torvalds 
> Cc: Namhyung Kim 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: 
> Link: 
> https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutl...@arm.com
> Signed-off-by: Ingo Molnar 
> Signed-off-by: Greg Kroah-Hartman 
> 
> ---
>  kernel/events/ring_buffer.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -719,6 +719,9 @@ struct ring_buffer *rb_alloc(int nr_page
>   size = sizeof(struct ring_buffer);
>   size += nr_pages * sizeof(void *);
>  
> + if (order_base_2(size) >= MAX_ORDER)
> + goto fail;
> +
>   rb = kzalloc(size, GFP_KERNEL);
>   if (!rb)
>   goto fail;
> 
> 



4.4 "rcu: Force boolean subscript for expedited stall warnings"

2019-02-06 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hi,

Can you pick this tiny one-liner patch to 4.4.y?
Fixes unexpected null byte in RCU "expedited stall" message.


commit ec3833ed02ae6ef2a933ece9de7cbab0c64c699e
Author: Paul E. McKenney 
Date:   Mon Jan 11 16:29:29 2016 -0800

rcu: Force boolean subscript for expedited stall warnings


-Tommi



4.14 "random: add a config option to trust the CPU's hwrng"

2019-02-06 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hi stable maintainers,

Can you consider including these "random" patches in 4.14.y?

These are very useful in fixing esp. first-bootup delays of VMs due to
entropy starvation.


commit 39a8883a2b989d1d21bd8dd99f5557f0c5e89694
Author: Theodore Ts'o 
Date:   Tue Jul 17 18:24:27 2018 -0400

random: add a config option to trust the CPU's hwrng

commit 9b25436662d5fb4c66eb527ead53cab15f596ee0
Author: Kees Cook 
Date:   Mon Aug 27 14:51:54 2018 -0700

random: make CPU trust a boot parameter


-Tommi



4.14 "uio: Prevent device destruction while fds are open"

2019-02-06 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hi,

I hit use-after-free issues in UIO in 4.14.x, and discovered that it's
already fixed in later kernel versions:

commit a93e7b331568227500186a465fee3c2cb5dffd1f
Author: Hamish Martin 
Date:   Mon May 14 13:32:23 2018 +1200

uio: Prevent device destruction while fds are open

Can we have this in 4.14.y?
(good idea to older LTS kernels too)
I picked and tested the following commits in 4.14.x:


# Temporarily revert "uio: Fix an Oops on load",
# to avoid merge conflict later with "uio: use
# request_threaded_irq instead"
git revert f6a6ae4e0f345aa481535bfe2046cd33f4dc37b8

# "uio: Reduce return paths from uio_write()"
git cherry-pick 81daa406c2cc97d85eef9409400404efc2a3f756

# "uio: Prevent device destruction while fds are open"
# Also amend this, change __poll_t to plain unsigned int,
# the former not found in 4.14.
git cherry-pick a93e7b331568227500186a465fee3c2cb5dffd1f
sed -i "s/__poll_t/unsigned int/" drivers/uio/uio.c
git commit --amend drivers/uio/uio.c

# "uio: use request_threaded_irq instead"
git cherry-pick 9421e45f5ff3d558cf8b75a8cc0824530caf3453

# "uio: change to use the mutex lock instead of the spin lock"
# Resolve conflict due to __poll_t in patch context.
git cherry-pick 543af5861f41af0a5d2432f6fb5976af50f9cee5
sed -i -e '/<<>>/d' \
-e 's/__poll_t/unsigned int/' drivers/uio/uio.c
git add drivers/uio/uio.c
git cherry-pick --continue

# uio: fix crash after the device is unregistered
git cherry-pick 57c5f4df0a5a0ee83df71251e2ee93a5e4e9

# uio: fix wrong return value from uio_mmap()
git cherry-pick e7de2590f18a272e63732b9d519250d1b522b2c4

# uio: fix possible circular locking dependency
git cherry-pick b34e9a15b37b8ddbf06a4da142b0c39c74211eb4

# Revert "uio: use request_threaded_irq instead"
git cherry-pick 3d27c4de8d4fb2d4099ff324671792aa2578c6f9

# re-apply: uio: Fix an Oops on load
git cherry-pick 432798195bbce1f8cd33d1c0284d0538835e25fb

-Tommi



4.14 revert "seccomp: add a selftest for get_metadata"

2019-01-28 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hi Greg,

Can you please revert this commit in 4.14?

commit e65cd9a20343ea90f576c24c38ee85ab6e7d5fec
Author: Tycho Andersen 
Date:   Tue Feb 20 19:47:47 2018 -0700

seccomp: add a selftest for get_metadata

[ Upstream commit d057dc4e35e16050befa3dda943876dab39cbf80 ]

Let's test that we get the flags correctly, and that we preserve
the filter
index across the ptrace(PTRACE_SECCOMP_GET_METADATA) correctly.



PTRACE_SECCOMP_GET_METADATA was only added in 4.16
(26500475ac1b499d8636ff281311d633909f5d20)


And it's also breaking seccomp_bpf.c compilation for me:

seccomp_bpf.c: In function ‘get_metadata’:
seccomp_bpf.c:2878:26: error: storage size of ‘md’ isn’t known
  struct seccomp_metadata md;
  ^~

-Tommi



4.14 perf unwind fixes

2019-01-28 Thread Rantala, Tommi T. (Nokia - FI/Espoo)
Hi Greg,

Can you please pick these two upstream patches to 4.14?
They fix broken perf unwinding for me.


commit 3d20c6246690219881786de10d2dda93f616d0ac
Author: Martin Vuille <
jpm...@aim.com>
Date:   Sun Feb 11 16:24:20 2018 -0500

perf unwind: Unwind with libdw doesn't take symfs into account


commit 1fe627da30331024f453faef04d500079b901107
Author: Milian Wolff <
milian.wo...@kdab.com>
Date:   Mon Oct 29 15:16:44 2018 +0100

perf unwind: Take pgoff into account when reporting elf to libdwfl


-Tommi