Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-14 Thread Will Deacon
On Fri, Aug 11, 2017 at 04:53:30PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 11, 2017 at 12:06:39PM +0100, Mark Rutland wrote:
> > On Fri, Aug 11, 2017 at 12:52:52PM +0200, Peter Zijlstra wrote:
> > > On Fri, Aug 11, 2017 at 11:01:27AM +0100, Mark Rutland wrote:
> > > > On Thu, Aug 10, 2017 at 02:48:52PM -0400, Vince Weaver wrote:
> > > > > 
> > > > > So I was working on my perf_event_tests on ARM/ARM64 (the end goal 
> > > > > was to 
> > > > > get ARM64 rdpmc support working, but apparently those patches never 
> > > > > made 
> > > > > it upstream?)
> > > > 
> > > > IIUC by 'rdpmc' you mean direct userspace counter access?
> > > > 
> > > > Patches for that never made it upstream. Last I saw, there were no
> > > > patches in a suitable state for review.
> > > > 
> > > > There are also difficulties (e.g. big.LITTLE systems where the number of
> > > > counters can differ across CPUs) which have yet to be solved.
> > > 
> > > How would that be a problem? The API gives an explicit index to use with
> > > the 'rdpmc' instruction.
> > 
> > It's a problem because access to unimplemented counters trap. So if a
> > task gets migrated from a CPU with N counters to one with N-1, accessing
> > counter N would be problematic.
> > 
> > So we'd need to account for that somehow, in addition to the usual
> > sequence counter fun to verify the index was valid when the access was
> > performed.
> 
> Aah, you need restartable-sequences :-)

Or, in the absence of those, I wouldn't mind only supporting this for
non-big/little platforms initially.

Will


Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-11 Thread Vince Weaver
On Fri, 11 Aug 2017, Mark Rutland wrote:

> > This isn't some key thing that needs to be fixed, I was just curious about 
> > the behavior difference between x86 and ARM. 
> 
> Sure; likewise I'm curious.

well I finally got a current git 64-bit kernel booted on the pi3.

Challenge: USB known to be broken currently, so no keyboard or ethernet.
Extra challenge: had the RX/TX lines switched on the serial connector.
Bonus challenge: the bcm2837 dts file doesn't enable armv8 PMU

I got through all of that, only to find:

$ uname -a
Linux pi3-git 4.13.0-rc4-00152-g2627393 #2 SMP PREEMPT Fri Aug 11 13:58:42 EDT 
2017 aarch64 GNU/Linux

$ ./mmap_multiple 
Trying to mmap same perf_event fd multiple times...PASSED

So maybe the issue was fixed between 4.9 and current?

Vince


Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-11 Thread Mark Rutland
On Fri, Aug 11, 2017 at 12:51:12PM -0400, Vince Weaver wrote:
> On Fri, 11 Aug 2017, Mark Rutland wrote:
> > Just to check, how does x86 behave on each of those kernel releases?
> > 
> > Many things have changed since v4.4.
> 
> I'm fairly sure this test (well, the equivelent code in 
> tests/record_sample/record_mmap that I based the test on) has been passing 
> on all of my x86 test machines since ~3.10 or so, or else I would noticed.

Ok.

> If I can get a custom kernel to boot on one of my machines I can start 
> digging in and see if I can find where the EINVAL comes from.

>From a quick scan, I can't spot anything obvious that would affect the
arm64 perf mmap behaviour, that has changed since v4.9.

> This isn't some key thing that needs to be fixed, I was just curious about 
> the behavior difference between x86 and ARM. 

Sure; likewise I'm curious.

Thanks,
Mark.


Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-11 Thread Vince Weaver
On Fri, 11 Aug 2017, Mark Rutland wrote:

> IIRC, patches were sent back in 2014, but as I mentioned above, those
> were far from suitable for upstream, even ignoring cases like
> big.LITTLE. Said patches were never reworked and reposted.

Here's the commit message in the perf_event_tests tree, having trouble 
finding the original e-mail that went with it.

commit 2cc2e21e349243889ba59408527cc1a97dd0dc44
Author: Yogesh Tillu 
Date:   Tue Mar 1 14:18:22 2016 +0530

Add support for RDPMC test with mmap way

This test adds support for reading perf hw counter from userspace.
Method (2)
rdpmc_comparision_mmap:
Test read perf hw counter in userspace using open/mmap syscall.
It requires kernel with perf mmap patchset and
echo 1 > /sys/bus/platform/drivers/armv8-pmu/rdpmc

Above Method Tested On:(X86/ARM)
It is tested with perf mmap patchset on kernel v4.5.0-rc5+
With above Tests, we can benchmark access of perf hw counters in
userspace with syscall vs perf_event_mmap_page way.

Signed-off-by: Yogesh Tillu 



> Just to check, how does x86 behave on each of those kernel releases?
> 
> Many things have changed since v4.4.

I'm fairly sure this test (well, the equivelent code in 
tests/record_sample/record_mmap that I based the test on) has been passing 
on all of my x86 test machines since ~3.10 or so, or else I would noticed.

If I can get a custom kernel to boot on one of my machines I can start 
digging in and see if I can find where the EINVAL comes from.

This isn't some key thing that needs to be fixed, I was just curious about 
the behavior difference between x86 and ARM.  There are a few other minor 
x86/ARM diferences, especially realting to perf_event_open() error 
returns, that I had to special case in a few of my tests.

Vince



Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-11 Thread Mark Rutland
On Fri, Aug 11, 2017 at 11:25:37AM -0400, Vince Weaver wrote:
> On Fri, 11 Aug 2017, Mark Rutland wrote:
> 
> > IIUC by 'rdpmc' you mean direct userspace counter access?
> > 
> > Patches for that never made it upstream. Last I saw, there were no
> > patches in a suitable state for review.
> 
> yes, someone from Linaro sent me some code a while back that implemented 
> the userspace side and claimed the kernel patches would appear at some 
> point.  I should try to dig up that e-mail.

IIRC, patches were sent back in 2014, but as I mentioned above, those
were far from suitable for upstream, even ignoring cases like
big.LITTLE. Said patches were never reworked and reposted.

> > > On ARM/ARM64 you can only mmap() it once, any other attempts fail.
> > 
> > Interesting. Which platform(s) are you testing on, with which kernel
> > version(s)?
> 
> This is on a Dragonbaord 401c running a vendor 64-bit 4.4 kernel,
> a Nvidia Jetson TX-1 board running a 64-bit 3.10 vendor kernel,

Just to check, how does x86 behave on each of those kernel releases?

Many things have changed since v4.4.

> as well as a Raspberry Pi 3B running a 32-bit 4.9 pi foundation kernel.

Hmm. On 32-bit this might be down to some arch/arm/mm cache aliasing
code, or it might be down to something that's changed since v4.9.

> It's a pain getting a recent-git kernel on these boards but I'm most of 
> the way to getting one booting on the Pi 3B.  (got distracted by the fact 
> that Linpack still reliably crashes the Pi-3b even with a heatsink).

IIUC, were you to modify this test to use SW events, you could test it
on an aarch64 kernel running under QEMU. To the best of my knowledge,
the code paths for HW and SW PMU are identical for mmap.

Otherwise, you might have more luck using a foundation model, which has
a PMU. 

Thanks,
Mark.


Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-11 Thread Vince Weaver
On Fri, 11 Aug 2017, Mark Rutland wrote:

> IIUC by 'rdpmc' you mean direct userspace counter access?
> 
> Patches for that never made it upstream. Last I saw, there were no
> patches in a suitable state for review.

yes, someone from Linaro sent me some code a while back that implemented 
the userspace side and claimed the kernel patches would appear at some 
point.  I should try to dig up that e-mail.

The "rdpmc" code looked something like this
if (counter == PERF_COUNT_HW_CPU_CYCLES)
asm volatile("mrs %0, pmccntr_el0" : "=r" (ret));
else {
asm volatile("msr pmselr_el0, %0" : : "r" ((counter-1)));
asm volatile("mrs %0, pmxevcntr_el0" : "=r" (ret));
}


> > On ARM/ARM64 you can only mmap() it once, any other attempts fail.
> 
> Interesting. Which platform(s) are you testing on, with which kernel
> version(s)?

This is on a Dragonbaord 401c running a vendor 64-bit 4.4 kernel,
a Nvidia Jetson TX-1 board running a 64-bit 3.10 vendor kernel,
as well as a Raspberry Pi 3B running a 32-bit 4.9 pi foundation kernel.

It's a pain getting a recent-git kernel on these boards but I'm most of 
the way to getting one booting on the Pi 3B.  (got distracted by the fact 
that Linpack still reliably crashes the Pi-3b even with a heatsink).

Here's strace from the Dragonboard:
perf_event_open(0x7fc649e900, 0, -1, -1, 0) = 3
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f7e1b1000
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 EINVAL (Invalid 
argument)

Vince


Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-11 Thread Peter Zijlstra
On Fri, Aug 11, 2017 at 12:06:39PM +0100, Mark Rutland wrote:
> On Fri, Aug 11, 2017 at 12:52:52PM +0200, Peter Zijlstra wrote:
> > On Fri, Aug 11, 2017 at 11:01:27AM +0100, Mark Rutland wrote:
> > > On Thu, Aug 10, 2017 at 02:48:52PM -0400, Vince Weaver wrote:
> > > > 
> > > > So I was working on my perf_event_tests on ARM/ARM64 (the end goal was 
> > > > to 
> > > > get ARM64 rdpmc support working, but apparently those patches never 
> > > > made 
> > > > it upstream?)
> > > 
> > > IIUC by 'rdpmc' you mean direct userspace counter access?
> > > 
> > > Patches for that never made it upstream. Last I saw, there were no
> > > patches in a suitable state for review.
> > > 
> > > There are also difficulties (e.g. big.LITTLE systems where the number of
> > > counters can differ across CPUs) which have yet to be solved.
> > 
> > How would that be a problem? The API gives an explicit index to use with
> > the 'rdpmc' instruction.
> 
> It's a problem because access to unimplemented counters trap. So if a
> task gets migrated from a CPU with N counters to one with N-1, accessing
> counter N would be problematic.
> 
> So we'd need to account for that somehow, in addition to the usual
> sequence counter fun to verify the index was valid when the access was
> performed.

Aah, you need restartable-sequences :-)


Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-11 Thread Mark Rutland
On Fri, Aug 11, 2017 at 12:52:52PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 11, 2017 at 11:01:27AM +0100, Mark Rutland wrote:
> > On Thu, Aug 10, 2017 at 02:48:52PM -0400, Vince Weaver wrote:
> > > 
> > > So I was working on my perf_event_tests on ARM/ARM64 (the end goal was to 
> > > get ARM64 rdpmc support working, but apparently those patches never made 
> > > it upstream?)
> > 
> > IIUC by 'rdpmc' you mean direct userspace counter access?
> > 
> > Patches for that never made it upstream. Last I saw, there were no
> > patches in a suitable state for review.
> > 
> > There are also difficulties (e.g. big.LITTLE systems where the number of
> > counters can differ across CPUs) which have yet to be solved.
> 
> How would that be a problem? The API gives an explicit index to use with
> the 'rdpmc' instruction.

It's a problem because access to unimplemented counters trap. So if a
task gets migrated from a CPU with N counters to one with N-1, accessing
counter N would be problematic.

So we'd need to account for that somehow, in addition to the usual
sequence counter fun to verify the index was valid when the access was
performed.

Thanks,
Mark.


Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-11 Thread Peter Zijlstra
On Fri, Aug 11, 2017 at 11:01:27AM +0100, Mark Rutland wrote:
> On Thu, Aug 10, 2017 at 02:48:52PM -0400, Vince Weaver wrote:
> > 
> > So I was working on my perf_event_tests on ARM/ARM64 (the end goal was to 
> > get ARM64 rdpmc support working, but apparently those patches never made 
> > it upstream?)
> 
> IIUC by 'rdpmc' you mean direct userspace counter access?
> 
> Patches for that never made it upstream. Last I saw, there were no
> patches in a suitable state for review.
> 
> There are also difficulties (e.g. big.LITTLE systems where the number of
> counters can differ across CPUs) which have yet to be solved.

How would that be a problem? The API gives an explicit index to use with
the 'rdpmc' instruction.


Re: perf: multiple mmap of fd behavior on x86/ARM

2017-08-11 Thread Mark Rutland
On Thu, Aug 10, 2017 at 02:48:52PM -0400, Vince Weaver wrote:
> 
> So I was working on my perf_event_tests on ARM/ARM64 (the end goal was to 
> get ARM64 rdpmc support working, but apparently those patches never made 
> it upstream?)

IIUC by 'rdpmc' you mean direct userspace counter access?

Patches for that never made it upstream. Last I saw, there were no
patches in a suitable state for review.

There are also difficulties (e.g. big.LITTLE systems where the number of
counters can differ across CPUs) which have yet to be solved.

> anyway one test was failing due to an x86/arm difference, which is 
> possibly only tangentially perf related.
> 
> On x86 you can mmap() a perf_event_open() file descriptor multiple times 
> and it works.
> 
> On ARM/ARM64 you can only mmap() it once, any other attempts fail.

Interesting. Which platform(s) are you testing on, with which kernel
version(s)?

> Is this expected behavior?

I'm not sure, but it sounds surprising.

> You can run the
>   tests/record_sample/mmap_multiple
> test in the current git of my perf_event_tests testsuite for a testcase.

This appears to work for me:

nanook@ribbensteg:~/src/perf_event_tests/tests/record_sample$ ./mmap_multiple 
Trying to mmap same perf_event fd multiple times...PASSED

nanook@ribbensteg:~/src/perf_event_tests/tests/record_sample$ git log --oneline 
HEAD~1..
c82c4dd tests: huge_grou_start: add info that this was fixed in Linux 4.3
nanook@ribbensteg:~/src/perf_event_tests/tests/record_sample$ uname -a
Linux ribbensteg 4.13.0-rc4-00010-g2ce1491 #229 SMP PREEMPT Thu Aug 10 17:06:56 
BST 2017 aarch64 aarch64 aarch64 GNU/Linux

nanook@ribbensteg:~/src/perf_event_tests/tests/record_sample$ strace 
./mmap_multiple 
execve("./mmap_multiple", ["./mmap_multiple"], [/* 18 vars */]) = 0
brk(0)  = 0x2d9aa000
faccessat(AT_FDCWD, "/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x9d10e000
faccessat(AT_FDCWD, "/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=42361, ...}) = 0
mmap(NULL, 42361, PROT_READ, MAP_PRIVATE, 3, 0) = 0x9d103000
close(3)= 0
faccessat(AT_FDCWD, "/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0(\17\2\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1283776, ...}) = 0
mmap(NULL, 1356664, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x9cf9b000
mprotect(0x9d0ce000, 61440, PROT_NONE) = 0
mmap(0x9d0dd000, 24576, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x132000) = 0x9d0dd000
mmap(0x9d0e3000, 13176, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x9d0e3000
close(3)= 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x9cf9a000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x9cf99000
mprotect(0x9d0dd000, 16384, PROT_READ) = 0
mprotect(0x412000, 4096, PROT_READ) = 0
mprotect(0x9d112000, 4096, PROT_READ) = 0
munmap(0x9d103000, 42361)   = 0
perf_event_open(0xfbff0310, 0, -1, -1, 0) = 3
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x9d105000
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x9cf9
ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 
{B38400 opost isig icanon echo ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x9cf8
write(1, "Trying to mmap same perf_event f"..., 77Trying to mmap same 
perf_event fd multiple times...PASSED
) = 77
exit_group(0)   = ?
+++ exited with 0 +++

Thanks,
Mark.


perf: multiple mmap of fd behavior on x86/ARM

2017-08-10 Thread Vince Weaver

So I was working on my perf_event_tests on ARM/ARM64 (the end goal was to 
get ARM64 rdpmc support working, but apparently those patches never made 
it upstream?)

anyway one test was failing due to an x86/arm difference, which is 
possibly only tangentially perf related.

On x86 you can mmap() a perf_event_open() file descriptor multiple times 
and it works.

On ARM/ARM64 you can only mmap() it once, any other attempts fail.

Is this expected behavior?

You can run the
tests/record_sample/mmap_multiple
test in the current git of my perf_event_tests testsuite for a testcase.

Vince