Re: [PATCH 1/2] perf: Add munmap callback

2018-11-06 Thread Liang, Kan




On 11/6/2018 10:00 AM, Stephane Eranian wrote:

/*
* mmap 1 page at the location of the unmap page (should reuse virtual space)
* This creates a continuous region built from two mmaps and
potentially two different sources
* especially with jitted runtimes
*/

The two mmaps are both anon. As my understanding, we cannot symbolize
from the anonymous address, can we?

Can't we build the same test case using an actual file mapping (both
mmap from the same file)?



If they are from same file, both mmap have same symbolization name.
We don't need to distinguish them either.

I thought the symbolization issue can only happen when two mmaps are 
from different sources. Right?


Thanks,
Kan


Re: [PATCH 1/2] perf: Add munmap callback

2018-11-06 Thread Liang, Kan




On 11/6/2018 10:00 AM, Stephane Eranian wrote:

/*
* mmap 1 page at the location of the unmap page (should reuse virtual space)
* This creates a continuous region built from two mmaps and
potentially two different sources
* especially with jitted runtimes
*/

The two mmaps are both anon. As my understanding, we cannot symbolize
from the anonymous address, can we?

Can't we build the same test case using an actual file mapping (both
mmap from the same file)?



If they are from same file, both mmap have same symbolization name.
We don't need to distinguish them either.

I thought the symbolization issue can only happen when two mmaps are 
from different sources. Right?


Thanks,
Kan


Re: [PATCH 1/2] perf: Add munmap callback

2018-11-06 Thread Stephane Eranian
On Mon, Nov 5, 2018 at 7:43 AM Liang, Kan  wrote:
>
>
>
> On 11/5/2018 5:59 AM, Stephane Eranian wrote:
> > Hi Kan,
> >
> > I built a small test case for you to demonstrate the issue for code and 
> > data.
> > Compile the test program and then do:
> > For text:
> > $ perf record  ./mmap
> > $ perf report -D | fgrep MMAP2
> >
> > The test program mmaps 2 pages, unmaps the second, and remap 1 page
> > over the freed space.
> > If you look at the MMAP2 record, you will not be able to reconstruct
> > what happened and perf will
> > get confused should it try to symbolize from the address range
> >
> > With Text:
> > PERF_RECORD_MMAP2 5937/5937: [0x40(0x1000) @ 0 08:01 400938
> > 824817672]: r-xp /home/eranian/mmap
> > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
> > 00:00 0 0]: rwxp //anon
> > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
> > 00:00 0 0]: rwxp //anon
> >
> >  captures the whole VMA but not the mapping
> > change in user space
> >
> > For data:
> > $ perf record -d  ./mmap
> > $ perf report -D | fgrep MMAP2
> > With data:
> > PERF_RECORD_MMAP2 6430/6430: [0x40(0x1000) @ 0 08:01 400938
> > 3278843184]: r-xp /home/eranian/mmap
> > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
> > 00:00 0 0]: rw-p //anon
> > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
> > 00:00 0 0]: rw-p //anon
> >
> > Same test case with data.
> > Perf will think the entire 2 pages have been replaced when in fact
> > only the second has.
> > I believe the problem is likely to impact data and jitted code cache
> >
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > int main(int argc, char **argv)
> > {
> > void *addr1, *addr2;
> > size_t pgsz = sysconf(_SC_PAGESIZE);
> > int n = 2;
> > int ret;
> > int c, mode = 0;
> >
> > while ((c = getopt(argc, argv, "hd")) != -1) {
> > switch (c) {
> > case 'h':
> > printf("[-h]\tget this help\n");
> > printf("[-d]\tuse data mmaps (no PROT_EXEC)\n");
> > return 0;
> > case 'd':
> > mode = PROT_EXEC;
> > break;
> > default:
> > errx(1, "unknown option");
> > }
> > }
> > /* default to data */
> > if (mode == 0)
> > mode = PROT_WRITE;
> >
> > /*
> > * mmap 2 contiugous pages
> > */
> > addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, 
> > -1, 0);
> > if (addr1 == (void *)MAP_FAILED)
> > err(1, "mmap 1 failed");
> >
> > printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz);
> >
> > /*
> > * unmap only the second page
> > */
> > ret = munmap(addr1 + pgsz, pgsz);
> > if (ret == -1)
> > err(1, "munmp failed");
> >
> > /*
> > * mmap 1 page at the location of the unmap page (should reuse virtual space)
> > * This creates a continuous region built from two mmaps and
> > potentially two different sources
> > * especially with jitted runtimes
> > */
>
> The two mmaps are both anon. As my understanding, we cannot symbolize
> from the anonymous address, can we?
Can't we build the same test case using an actual file mapping (both
mmap from the same file)?

> If we cannot, why we have to distinguish with them? I think we do not
> need to know their sources for symbolization.
>
> As my understanding, only --jit can inject MMAP event, which tag an
> anon. Perf can symbolize the address after that. Then the unmap is needed.
>
Yes,  perf inject --jit injects timestamped MMAP2 records to cover the
jitted regions
which helps symbolize anons.

> Thanks,
> Kan
> > addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode,
> > MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> >
> > printf("addr2=%p\n", addr2);
> >
> > if (addr2 == (void *)MAP_FAILED)
> > err(1, "mmap 2 failed");
> > if (addr2 != (addr1 + pgsz))
> > errx(1, "wrong mmap2 address");
> >
> > sleep(1);
> >
> > return 0;
> > }
> >
> > On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan  wrote:
> >>
> >>
> >>
> >> On 10/24/2018 3:30 PM, Stephane Eranian wrote:
> >>> The need for this new record type extends beyond physical address 
> >>> conversions
> >>> and PEBS. A long while ago, someone reported issues with symbolization 
> >>> related
> >>> to perf lacking munmap tracking. It had to do with vma merging. I think 
> >>> the
> >>> sequence of mmaps was as follows in the problematic case:
> >>> 1.   addr1 = mmap(8192);
> >>> 2.   munmap(addr1 + 4096, 4096)
> >>> 3.   addr2 = mmap(addr1+4096, 4096)
> >>>
> >>> If successful, that yields addr2 = addr1 + 4096 (could also get the
> >>> same without forcing the address).
> >>>
> >>> In that case, if I recall correctly, the vma for 1st mapping (now at
> >>> 4k) and that of the 2nd mapping (4k)
> >>> get merged into a single 8k vma and this is what perf_events will
> >>> record for PERF_RECORD_MMAP.
> >>> On the perf tool side, it is assumed that if two timestamped mappings
> >>> overlap then, the latter overrides
> >>> the former. In this case, perf would loose the mapping of the first
> >>> 4kb and assume all symbols comes from
> >>> 

Re: [PATCH 1/2] perf: Add munmap callback

2018-11-06 Thread Stephane Eranian
On Mon, Nov 5, 2018 at 7:43 AM Liang, Kan  wrote:
>
>
>
> On 11/5/2018 5:59 AM, Stephane Eranian wrote:
> > Hi Kan,
> >
> > I built a small test case for you to demonstrate the issue for code and 
> > data.
> > Compile the test program and then do:
> > For text:
> > $ perf record  ./mmap
> > $ perf report -D | fgrep MMAP2
> >
> > The test program mmaps 2 pages, unmaps the second, and remap 1 page
> > over the freed space.
> > If you look at the MMAP2 record, you will not be able to reconstruct
> > what happened and perf will
> > get confused should it try to symbolize from the address range
> >
> > With Text:
> > PERF_RECORD_MMAP2 5937/5937: [0x40(0x1000) @ 0 08:01 400938
> > 824817672]: r-xp /home/eranian/mmap
> > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
> > 00:00 0 0]: rwxp //anon
> > PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
> > 00:00 0 0]: rwxp //anon
> >
> >  captures the whole VMA but not the mapping
> > change in user space
> >
> > For data:
> > $ perf record -d  ./mmap
> > $ perf report -D | fgrep MMAP2
> > With data:
> > PERF_RECORD_MMAP2 6430/6430: [0x40(0x1000) @ 0 08:01 400938
> > 3278843184]: r-xp /home/eranian/mmap
> > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
> > 00:00 0 0]: rw-p //anon
> > PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
> > 00:00 0 0]: rw-p //anon
> >
> > Same test case with data.
> > Perf will think the entire 2 pages have been replaced when in fact
> > only the second has.
> > I believe the problem is likely to impact data and jitted code cache
> >
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > int main(int argc, char **argv)
> > {
> > void *addr1, *addr2;
> > size_t pgsz = sysconf(_SC_PAGESIZE);
> > int n = 2;
> > int ret;
> > int c, mode = 0;
> >
> > while ((c = getopt(argc, argv, "hd")) != -1) {
> > switch (c) {
> > case 'h':
> > printf("[-h]\tget this help\n");
> > printf("[-d]\tuse data mmaps (no PROT_EXEC)\n");
> > return 0;
> > case 'd':
> > mode = PROT_EXEC;
> > break;
> > default:
> > errx(1, "unknown option");
> > }
> > }
> > /* default to data */
> > if (mode == 0)
> > mode = PROT_WRITE;
> >
> > /*
> > * mmap 2 contiugous pages
> > */
> > addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, 
> > -1, 0);
> > if (addr1 == (void *)MAP_FAILED)
> > err(1, "mmap 1 failed");
> >
> > printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz);
> >
> > /*
> > * unmap only the second page
> > */
> > ret = munmap(addr1 + pgsz, pgsz);
> > if (ret == -1)
> > err(1, "munmp failed");
> >
> > /*
> > * mmap 1 page at the location of the unmap page (should reuse virtual space)
> > * This creates a continuous region built from two mmaps and
> > potentially two different sources
> > * especially with jitted runtimes
> > */
>
> The two mmaps are both anon. As my understanding, we cannot symbolize
> from the anonymous address, can we?
Can't we build the same test case using an actual file mapping (both
mmap from the same file)?

> If we cannot, why we have to distinguish with them? I think we do not
> need to know their sources for symbolization.
>
> As my understanding, only --jit can inject MMAP event, which tag an
> anon. Perf can symbolize the address after that. Then the unmap is needed.
>
Yes,  perf inject --jit injects timestamped MMAP2 records to cover the
jitted regions
which helps symbolize anons.

> Thanks,
> Kan
> > addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode,
> > MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
> >
> > printf("addr2=%p\n", addr2);
> >
> > if (addr2 == (void *)MAP_FAILED)
> > err(1, "mmap 2 failed");
> > if (addr2 != (addr1 + pgsz))
> > errx(1, "wrong mmap2 address");
> >
> > sleep(1);
> >
> > return 0;
> > }
> >
> > On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan  wrote:
> >>
> >>
> >>
> >> On 10/24/2018 3:30 PM, Stephane Eranian wrote:
> >>> The need for this new record type extends beyond physical address 
> >>> conversions
> >>> and PEBS. A long while ago, someone reported issues with symbolization 
> >>> related
> >>> to perf lacking munmap tracking. It had to do with vma merging. I think 
> >>> the
> >>> sequence of mmaps was as follows in the problematic case:
> >>> 1.   addr1 = mmap(8192);
> >>> 2.   munmap(addr1 + 4096, 4096)
> >>> 3.   addr2 = mmap(addr1+4096, 4096)
> >>>
> >>> If successful, that yields addr2 = addr1 + 4096 (could also get the
> >>> same without forcing the address).
> >>>
> >>> In that case, if I recall correctly, the vma for 1st mapping (now at
> >>> 4k) and that of the 2nd mapping (4k)
> >>> get merged into a single 8k vma and this is what perf_events will
> >>> record for PERF_RECORD_MMAP.
> >>> On the perf tool side, it is assumed that if two timestamped mappings
> >>> overlap then, the latter overrides
> >>> the former. In this case, perf would loose the mapping of the first
> >>> 4kb and assume all symbols comes from
> >>> 

Re: [PATCH 1/2] perf: Add munmap callback

2018-11-05 Thread Liang, Kan




On 11/5/2018 5:59 AM, Stephane Eranian wrote:

Hi Kan,

I built a small test case for you to demonstrate the issue for code and data.
Compile the test program and then do:
For text:
$ perf record  ./mmap
$ perf report -D | fgrep MMAP2

The test program mmaps 2 pages, unmaps the second, and remap 1 page
over the freed space.
If you look at the MMAP2 record, you will not be able to reconstruct
what happened and perf will
get confused should it try to symbolize from the address range

With Text:
PERF_RECORD_MMAP2 5937/5937: [0x40(0x1000) @ 0 08:01 400938
824817672]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon

 captures the whole VMA but not the mapping
change in user space

For data:
$ perf record -d  ./mmap
$ perf report -D | fgrep MMAP2
With data:
PERF_RECORD_MMAP2 6430/6430: [0x40(0x1000) @ 0 08:01 400938
3278843184]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon

Same test case with data.
Perf will think the entire 2 pages have been replaced when in fact
only the second has.
I believe the problem is likely to impact data and jitted code cache

#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
void *addr1, *addr2;
size_t pgsz = sysconf(_SC_PAGESIZE);
int n = 2;
int ret;
int c, mode = 0;

while ((c = getopt(argc, argv, "hd")) != -1) {
switch (c) {
case 'h':
printf("[-h]\tget this help\n");
printf("[-d]\tuse data mmaps (no PROT_EXEC)\n");
return 0;
case 'd':
mode = PROT_EXEC;
break;
default:
errx(1, "unknown option");
}
}
/* default to data */
if (mode == 0)
mode = PROT_WRITE;

/*
* mmap 2 contiugous pages
*/
addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (addr1 == (void *)MAP_FAILED)
err(1, "mmap 1 failed");

printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz);

/*
* unmap only the second page
*/
ret = munmap(addr1 + pgsz, pgsz);
if (ret == -1)
err(1, "munmp failed");

/*
* mmap 1 page at the location of the unmap page (should reuse virtual space)
* This creates a continuous region built from two mmaps and
potentially two different sources
* especially with jitted runtimes
*/


The two mmaps are both anon. As my understanding, we cannot symbolize 
from the anonymous address, can we?
If we cannot, why we have to distinguish with them? I think we do not 
need to know their sources for symbolization.


As my understanding, only --jit can inject MMAP event, which tag an 
anon. Perf can symbolize the address after that. Then the unmap is needed.


Thanks,
Kan

addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

printf("addr2=%p\n", addr2);

if (addr2 == (void *)MAP_FAILED)
err(1, "mmap 2 failed");
if (addr2 != (addr1 + pgsz))
errx(1, "wrong mmap2 address");

sleep(1);

return 0;
}

On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan  wrote:




On 10/24/2018 3:30 PM, Stephane Eranian wrote:

The need for this new record type extends beyond physical address conversions
and PEBS. A long while ago, someone reported issues with symbolization related
to perf lacking munmap tracking. It had to do with vma merging. I think the
sequence of mmaps was as follows in the problematic case:
1.   addr1 = mmap(8192);
2.   munmap(addr1 + 4096, 4096)
3.   addr2 = mmap(addr1+4096, 4096)

If successful, that yields addr2 = addr1 + 4096 (could also get the
same without forcing the address).

In that case, if I recall correctly, the vma for 1st mapping (now at
4k) and that of the 2nd mapping (4k)
get merged into a single 8k vma and this is what perf_events will
record for PERF_RECORD_MMAP.
On the perf tool side, it is assumed that if two timestamped mappings
overlap then, the latter overrides
the former. In this case, perf would loose the mapping of the first
4kb and assume all symbols comes from
2nd mapping. Hopefully I got the scenario right. If so, then you'd
need PERF_RECORD_UNMAP to
disambiguate assuming the perf tool is modified accordingly.



Hi Stephane and Peter,

I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying
to understand the problematic case.

It looks like the issue can only be triggered by perf inject --jit.
Because it can inject extra MMAP events.
As my understanding,  Linux kernel only try to merge VMAs if they are
both from anon or they are both from the same file. --jit breaks the
rule, and makes the merged VMA partly from anon, partly from file.
Now, there is a new MMAP event which range covers the modified VMA.
Without the help of MUNMAP event, perf tool have no idea if the new one
is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA.
Current code just simply overwrite the modified VMAs. 

Re: [PATCH 1/2] perf: Add munmap callback

2018-11-05 Thread Liang, Kan




On 11/5/2018 5:59 AM, Stephane Eranian wrote:

Hi Kan,

I built a small test case for you to demonstrate the issue for code and data.
Compile the test program and then do:
For text:
$ perf record  ./mmap
$ perf report -D | fgrep MMAP2

The test program mmaps 2 pages, unmaps the second, and remap 1 page
over the freed space.
If you look at the MMAP2 record, you will not be able to reconstruct
what happened and perf will
get confused should it try to symbolize from the address range

With Text:
PERF_RECORD_MMAP2 5937/5937: [0x40(0x1000) @ 0 08:01 400938
824817672]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon

 captures the whole VMA but not the mapping
change in user space

For data:
$ perf record -d  ./mmap
$ perf report -D | fgrep MMAP2
With data:
PERF_RECORD_MMAP2 6430/6430: [0x40(0x1000) @ 0 08:01 400938
3278843184]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon

Same test case with data.
Perf will think the entire 2 pages have been replaced when in fact
only the second has.
I believe the problem is likely to impact data and jitted code cache

#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
void *addr1, *addr2;
size_t pgsz = sysconf(_SC_PAGESIZE);
int n = 2;
int ret;
int c, mode = 0;

while ((c = getopt(argc, argv, "hd")) != -1) {
switch (c) {
case 'h':
printf("[-h]\tget this help\n");
printf("[-d]\tuse data mmaps (no PROT_EXEC)\n");
return 0;
case 'd':
mode = PROT_EXEC;
break;
default:
errx(1, "unknown option");
}
}
/* default to data */
if (mode == 0)
mode = PROT_WRITE;

/*
* mmap 2 contiugous pages
*/
addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (addr1 == (void *)MAP_FAILED)
err(1, "mmap 1 failed");

printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz);

/*
* unmap only the second page
*/
ret = munmap(addr1 + pgsz, pgsz);
if (ret == -1)
err(1, "munmp failed");

/*
* mmap 1 page at the location of the unmap page (should reuse virtual space)
* This creates a continuous region built from two mmaps and
potentially two different sources
* especially with jitted runtimes
*/


The two mmaps are both anon. As my understanding, we cannot symbolize 
from the anonymous address, can we?
If we cannot, why we have to distinguish with them? I think we do not 
need to know their sources for symbolization.


As my understanding, only --jit can inject MMAP event, which tag an 
anon. Perf can symbolize the address after that. Then the unmap is needed.


Thanks,
Kan

addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

printf("addr2=%p\n", addr2);

if (addr2 == (void *)MAP_FAILED)
err(1, "mmap 2 failed");
if (addr2 != (addr1 + pgsz))
errx(1, "wrong mmap2 address");

sleep(1);

return 0;
}

On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan  wrote:




On 10/24/2018 3:30 PM, Stephane Eranian wrote:

The need for this new record type extends beyond physical address conversions
and PEBS. A long while ago, someone reported issues with symbolization related
to perf lacking munmap tracking. It had to do with vma merging. I think the
sequence of mmaps was as follows in the problematic case:
1.   addr1 = mmap(8192);
2.   munmap(addr1 + 4096, 4096)
3.   addr2 = mmap(addr1+4096, 4096)

If successful, that yields addr2 = addr1 + 4096 (could also get the
same without forcing the address).

In that case, if I recall correctly, the vma for 1st mapping (now at
4k) and that of the 2nd mapping (4k)
get merged into a single 8k vma and this is what perf_events will
record for PERF_RECORD_MMAP.
On the perf tool side, it is assumed that if two timestamped mappings
overlap then, the latter overrides
the former. In this case, perf would loose the mapping of the first
4kb and assume all symbols comes from
2nd mapping. Hopefully I got the scenario right. If so, then you'd
need PERF_RECORD_UNMAP to
disambiguate assuming the perf tool is modified accordingly.



Hi Stephane and Peter,

I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying
to understand the problematic case.

It looks like the issue can only be triggered by perf inject --jit.
Because it can inject extra MMAP events.
As my understanding,  Linux kernel only try to merge VMAs if they are
both from anon or they are both from the same file. --jit breaks the
rule, and makes the merged VMA partly from anon, partly from file.
Now, there is a new MMAP event which range covers the modified VMA.
Without the help of MUNMAP event, perf tool have no idea if the new one
is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA.
Current code just simply overwrite the modified VMAs. 

Re: [PATCH 1/2] perf: Add munmap callback

2018-11-05 Thread Stephane Eranian
Hi Kan,

I built a small test case for you to demonstrate the issue for code and data.
Compile the test program and then do:
For text:
$ perf record  ./mmap
$ perf report -D | fgrep MMAP2

The test program mmaps 2 pages, unmaps the second, and remap 1 page
over the freed space.
If you look at the MMAP2 record, you will not be able to reconstruct
what happened and perf will
get confused should it try to symbolize from the address range

With Text:
PERF_RECORD_MMAP2 5937/5937: [0x40(0x1000) @ 0 08:01 400938
824817672]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon

 captures the whole VMA but not the mapping
change in user space

For data:
$ perf record -d  ./mmap
$ perf report -D | fgrep MMAP2
With data:
PERF_RECORD_MMAP2 6430/6430: [0x40(0x1000) @ 0 08:01 400938
3278843184]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon

Same test case with data.
Perf will think the entire 2 pages have been replaced when in fact
only the second has.
I believe the problem is likely to impact data and jitted code cache

#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
void *addr1, *addr2;
size_t pgsz = sysconf(_SC_PAGESIZE);
int n = 2;
int ret;
int c, mode = 0;

while ((c = getopt(argc, argv, "hd")) != -1) {
switch (c) {
case 'h':
printf("[-h]\tget this help\n");
printf("[-d]\tuse data mmaps (no PROT_EXEC)\n");
return 0;
case 'd':
mode = PROT_EXEC;
break;
default:
errx(1, "unknown option");
}
}
/* default to data */
if (mode == 0)
mode = PROT_WRITE;

/*
* mmap 2 contiugous pages
*/
addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (addr1 == (void *)MAP_FAILED)
err(1, "mmap 1 failed");

printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz);

/*
* unmap only the second page
*/
ret = munmap(addr1 + pgsz, pgsz);
if (ret == -1)
err(1, "munmp failed");

/*
* mmap 1 page at the location of the unmap page (should reuse virtual space)
* This creates a continuous region built from two mmaps and
potentially two different sources
* especially with jitted runtimes
*/
addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

printf("addr2=%p\n", addr2);

if (addr2 == (void *)MAP_FAILED)
err(1, "mmap 2 failed");
if (addr2 != (addr1 + pgsz))
errx(1, "wrong mmap2 address");

sleep(1);

return 0;
}

On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan  wrote:
>
>
>
> On 10/24/2018 3:30 PM, Stephane Eranian wrote:
> > The need for this new record type extends beyond physical address 
> > conversions
> > and PEBS. A long while ago, someone reported issues with symbolization 
> > related
> > to perf lacking munmap tracking. It had to do with vma merging. I think the
> > sequence of mmaps was as follows in the problematic case:
> > 1.   addr1 = mmap(8192);
> > 2.   munmap(addr1 + 4096, 4096)
> > 3.   addr2 = mmap(addr1+4096, 4096)
> >
> > If successful, that yields addr2 = addr1 + 4096 (could also get the
> > same without forcing the address).
> >
> > In that case, if I recall correctly, the vma for 1st mapping (now at
> > 4k) and that of the 2nd mapping (4k)
> > get merged into a single 8k vma and this is what perf_events will
> > record for PERF_RECORD_MMAP.
> > On the perf tool side, it is assumed that if two timestamped mappings
> > overlap then, the latter overrides
> > the former. In this case, perf would loose the mapping of the first
> > 4kb and assume all symbols comes from
> > 2nd mapping. Hopefully I got the scenario right. If so, then you'd
> > need PERF_RECORD_UNMAP to
> > disambiguate assuming the perf tool is modified accordingly.
> >
>
> Hi Stephane and Peter,
>
> I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying
> to understand the problematic case.
>
> It looks like the issue can only be triggered by perf inject --jit.
> Because it can inject extra MMAP events.
> As my understanding,  Linux kernel only try to merge VMAs if they are
> both from anon or they are both from the same file. --jit breaks the
> rule, and makes the merged VMA partly from anon, partly from file.
> Now, there is a new MMAP event which range covers the modified VMA.
> Without the help of MUNMAP event, perf tool have no idea if the new one
> is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA.
> Current code just simply overwrite the modified VMAs. The VMA
> information which --jit injected may be lost. The symbolization may be
> lost as well.
>
> Except --jit, the VMAs information should be consistent between kernel
> and perf tools. We shouldn't observe the problem. MUNMAP event is not
> needed.
>
> Is my understanding correct?
>
> Do you have a test 

Re: [PATCH 1/2] perf: Add munmap callback

2018-11-05 Thread Stephane Eranian
Hi Kan,

I built a small test case for you to demonstrate the issue for code and data.
Compile the test program and then do:
For text:
$ perf record  ./mmap
$ perf report -D | fgrep MMAP2

The test program mmaps 2 pages, unmaps the second, and remap 1 page
over the freed space.
If you look at the MMAP2 record, you will not be able to reconstruct
what happened and perf will
get confused should it try to symbolize from the address range

With Text:
PERF_RECORD_MMAP2 5937/5937: [0x40(0x1000) @ 0 08:01 400938
824817672]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon
PERF_RECORD_MMAP2 5937/5937: [0x7f7c01019000(0x2000) @ 0x7f7c01019000
00:00 0 0]: rwxp //anon

 captures the whole VMA but not the mapping
change in user space

For data:
$ perf record -d  ./mmap
$ perf report -D | fgrep MMAP2
With data:
PERF_RECORD_MMAP2 6430/6430: [0x40(0x1000) @ 0 08:01 400938
3278843184]: r-xp /home/eranian/mmap
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon
PERF_RECORD_MMAP2 6430/6430: [0x7f4aa704b000(0x2000) @ 0x7f4aa704b000
00:00 0 0]: rw-p //anon

Same test case with data.
Perf will think the entire 2 pages have been replaced when in fact
only the second has.
I believe the problem is likely to impact data and jitted code cache

#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
void *addr1, *addr2;
size_t pgsz = sysconf(_SC_PAGESIZE);
int n = 2;
int ret;
int c, mode = 0;

while ((c = getopt(argc, argv, "hd")) != -1) {
switch (c) {
case 'h':
printf("[-h]\tget this help\n");
printf("[-d]\tuse data mmaps (no PROT_EXEC)\n");
return 0;
case 'd':
mode = PROT_EXEC;
break;
default:
errx(1, "unknown option");
}
}
/* default to data */
if (mode == 0)
mode = PROT_WRITE;

/*
* mmap 2 contiugous pages
*/
addr1 = mmap(NULL, n * pgsz, PROT_READ| mode, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (addr1 == (void *)MAP_FAILED)
err(1, "mmap 1 failed");

printf("addr1=[%p : %p]\n", addr1, addr1 + n * pgsz);

/*
* unmap only the second page
*/
ret = munmap(addr1 + pgsz, pgsz);
if (ret == -1)
err(1, "munmp failed");

/*
* mmap 1 page at the location of the unmap page (should reuse virtual space)
* This creates a continuous region built from two mmaps and
potentially two different sources
* especially with jitted runtimes
*/
addr2 = mmap(addr1 + pgsz, 1 * pgsz, PROT_READ|PROT_WRITE | mode,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

printf("addr2=%p\n", addr2);

if (addr2 == (void *)MAP_FAILED)
err(1, "mmap 2 failed");
if (addr2 != (addr1 + pgsz))
errx(1, "wrong mmap2 address");

sleep(1);

return 0;
}

On Thu, Nov 1, 2018 at 7:10 AM Liang, Kan  wrote:
>
>
>
> On 10/24/2018 3:30 PM, Stephane Eranian wrote:
> > The need for this new record type extends beyond physical address 
> > conversions
> > and PEBS. A long while ago, someone reported issues with symbolization 
> > related
> > to perf lacking munmap tracking. It had to do with vma merging. I think the
> > sequence of mmaps was as follows in the problematic case:
> > 1.   addr1 = mmap(8192);
> > 2.   munmap(addr1 + 4096, 4096)
> > 3.   addr2 = mmap(addr1+4096, 4096)
> >
> > If successful, that yields addr2 = addr1 + 4096 (could also get the
> > same without forcing the address).
> >
> > In that case, if I recall correctly, the vma for 1st mapping (now at
> > 4k) and that of the 2nd mapping (4k)
> > get merged into a single 8k vma and this is what perf_events will
> > record for PERF_RECORD_MMAP.
> > On the perf tool side, it is assumed that if two timestamped mappings
> > overlap then, the latter overrides
> > the former. In this case, perf would loose the mapping of the first
> > 4kb and assume all symbols comes from
> > 2nd mapping. Hopefully I got the scenario right. If so, then you'd
> > need PERF_RECORD_UNMAP to
> > disambiguate assuming the perf tool is modified accordingly.
> >
>
> Hi Stephane and Peter,
>
> I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying
> to understand the problematic case.
>
> It looks like the issue can only be triggered by perf inject --jit.
> Because it can inject extra MMAP events.
> As my understanding,  Linux kernel only try to merge VMAs if they are
> both from anon or they are both from the same file. --jit breaks the
> rule, and makes the merged VMA partly from anon, partly from file.
> Now, there is a new MMAP event which range covers the modified VMA.
> Without the help of MUNMAP event, perf tool have no idea if the new one
> is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA.
> Current code just simply overwrite the modified VMAs. The VMA
> information which --jit injected may be lost. The symbolization may be
> lost as well.
>
> Except --jit, the VMAs information should be consistent between kernel
> and perf tools. We shouldn't observe the problem. MUNMAP event is not
> needed.
>
> Is my understanding correct?
>
> Do you have a test 

Re: [PATCH 1/2] perf: Add munmap callback

2018-11-01 Thread Liang, Kan




On 10/24/2018 3:30 PM, Stephane Eranian wrote:

The need for this new record type extends beyond physical address conversions
and PEBS. A long while ago, someone reported issues with symbolization related
to perf lacking munmap tracking. It had to do with vma merging. I think the
sequence of mmaps was as follows in the problematic case:
1.   addr1 = mmap(8192);
2.   munmap(addr1 + 4096, 4096)
3.   addr2 = mmap(addr1+4096, 4096)

If successful, that yields addr2 = addr1 + 4096 (could also get the
same without forcing the address).

In that case, if I recall correctly, the vma for 1st mapping (now at
4k) and that of the 2nd mapping (4k)
get merged into a single 8k vma and this is what perf_events will
record for PERF_RECORD_MMAP.
On the perf tool side, it is assumed that if two timestamped mappings
overlap then, the latter overrides
the former. In this case, perf would loose the mapping of the first
4kb and assume all symbols comes from
2nd mapping. Hopefully I got the scenario right. If so, then you'd
need PERF_RECORD_UNMAP to
disambiguate assuming the perf tool is modified accordingly.



Hi Stephane and Peter,

I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying 
to understand the problematic case.


It looks like the issue can only be triggered by perf inject --jit. 
Because it can inject extra MMAP events.
As my understanding,  Linux kernel only try to merge VMAs if they are 
both from anon or they are both from the same file. --jit breaks the 
rule, and makes the merged VMA partly from anon, partly from file.

Now, there is a new MMAP event which range covers the modified VMA.
Without the help of MUNMAP event, perf tool have no idea if the new one 
is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA.
Current code just simply overwrite the modified VMAs. The VMA 
information which --jit injected may be lost. The symbolization may be 
lost as well.


Except --jit, the VMAs information should be consistent between kernel 
and perf tools. We shouldn't observe the problem. MUNMAP event is not 
needed.


Is my understanding correct?

Do you have a test case for the problem?

Thanks,
Kan


Re: [PATCH 1/2] perf: Add munmap callback

2018-11-01 Thread Liang, Kan




On 10/24/2018 3:30 PM, Stephane Eranian wrote:

The need for this new record type extends beyond physical address conversions
and PEBS. A long while ago, someone reported issues with symbolization related
to perf lacking munmap tracking. It had to do with vma merging. I think the
sequence of mmaps was as follows in the problematic case:
1.   addr1 = mmap(8192);
2.   munmap(addr1 + 4096, 4096)
3.   addr2 = mmap(addr1+4096, 4096)

If successful, that yields addr2 = addr1 + 4096 (could also get the
same without forcing the address).

In that case, if I recall correctly, the vma for 1st mapping (now at
4k) and that of the 2nd mapping (4k)
get merged into a single 8k vma and this is what perf_events will
record for PERF_RECORD_MMAP.
On the perf tool side, it is assumed that if two timestamped mappings
overlap then, the latter overrides
the former. In this case, perf would loose the mapping of the first
4kb and assume all symbols comes from
2nd mapping. Hopefully I got the scenario right. If so, then you'd
need PERF_RECORD_UNMAP to
disambiguate assuming the perf tool is modified accordingly.



Hi Stephane and Peter,

I went through the link(https://lkml.org/lkml/2017/1/27/452). I'm trying 
to understand the problematic case.


It looks like the issue can only be triggered by perf inject --jit. 
Because it can inject extra MMAP events.
As my understanding,  Linux kernel only try to merge VMAs if they are 
both from anon or they are both from the same file. --jit breaks the 
rule, and makes the merged VMA partly from anon, partly from file.

Now, there is a new MMAP event which range covers the modified VMA.
Without the help of MUNMAP event, perf tool have no idea if the new one 
is a newly merged VMA (modified VMA + a new VMA) or a brand new VMA.
Current code just simply overwrite the modified VMAs. The VMA 
information which --jit injected may be lost. The symbolization may be 
lost as well.


Except --jit, the VMAs information should be consistent between kernel 
and perf tools. We shouldn't observe the problem. MUNMAP event is not 
needed.


Is my understanding correct?

Do you have a test case for the problem?

Thanks,
Kan


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-30 Thread Peter Zijlstra
On Thu, Oct 25, 2018 at 10:00:07AM -0400, Liang, Kan wrote:
> Is this patch you mentioned?
> https://lkml.org/lkml/2017/1/27/452

Yes.


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-30 Thread Peter Zijlstra
On Thu, Oct 25, 2018 at 10:00:07AM -0400, Liang, Kan wrote:
> Is this patch you mentioned?
> https://lkml.org/lkml/2017/1/27/452

Yes.


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-25 Thread Liang, Kan




On 10/24/2018 8:29 PM, Peter Zijlstra wrote:

On Wed, Oct 24, 2018 at 08:11:15AM -0700, kan.li...@linux.intel.com wrote:

+void perf_event_munmap(void)
+{
+   struct perf_cpu_context *cpuctx;
+   unsigned long flags;
+   struct pmu *pmu;
+
+   local_irq_save(flags);


It is impossible to get here with IRQs already disabled.


I don't think so. Based on my test, IRQs are still enabled here. I once 
observed dead lock with my stress test. So I have to explicitly disable 
the IRQs here.





+   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
sched_cb_entry) {
+   pmu = cpuctx->ctx.pmu;
+
+   if (!pmu->munmap)
+   continue;
+
+   perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+   perf_pmu_disable(pmu);
+
+   pmu->munmap();
+
+   perf_pmu_enable(pmu);
+
+   perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
+   }
+   local_irq_restore(flags);
+}
+
  static void perf_event_switch(struct task_struct *task,
  struct task_struct *next_prev, bool sched_in);
  
diff --git a/mm/mmap.c b/mm/mmap.c

index 5f2b2b184c60..61978ad8c480 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2777,6 +2777,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, 
size_t len,
/*
 * Remove the vma's, and unmap the actual pages
 */
+   perf_event_munmap();


I think that if you add the munmap hook, you should do it right and at
least do it such that we can solve the other munmap problem.



Is this patch you mentioned?
https://lkml.org/lkml/2017/1/27/452

I will take a look and find a right place for both problems.

Thanks,
Kan

detach_vmas_to_be_unmapped(mm, vma, prev, end);
unmap_region(mm, vma, prev, start, end);
  
--

2.17.1



Re: [PATCH 1/2] perf: Add munmap callback

2018-10-25 Thread Liang, Kan




On 10/24/2018 8:29 PM, Peter Zijlstra wrote:

On Wed, Oct 24, 2018 at 08:11:15AM -0700, kan.li...@linux.intel.com wrote:

+void perf_event_munmap(void)
+{
+   struct perf_cpu_context *cpuctx;
+   unsigned long flags;
+   struct pmu *pmu;
+
+   local_irq_save(flags);


It is impossible to get here with IRQs already disabled.


I don't think so. Based on my test, IRQs are still enabled here. I once 
observed dead lock with my stress test. So I have to explicitly disable 
the IRQs here.





+   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
sched_cb_entry) {
+   pmu = cpuctx->ctx.pmu;
+
+   if (!pmu->munmap)
+   continue;
+
+   perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+   perf_pmu_disable(pmu);
+
+   pmu->munmap();
+
+   perf_pmu_enable(pmu);
+
+   perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
+   }
+   local_irq_restore(flags);
+}
+
  static void perf_event_switch(struct task_struct *task,
  struct task_struct *next_prev, bool sched_in);
  
diff --git a/mm/mmap.c b/mm/mmap.c

index 5f2b2b184c60..61978ad8c480 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2777,6 +2777,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, 
size_t len,
/*
 * Remove the vma's, and unmap the actual pages
 */
+   perf_event_munmap();


I think that if you add the munmap hook, you should do it right and at
least do it such that we can solve the other munmap problem.



Is this patch you mentioned?
https://lkml.org/lkml/2017/1/27/452

I will take a look and find a right place for both problems.

Thanks,
Kan

detach_vmas_to_be_unmapped(mm, vma, prev, end);
unmap_region(mm, vma, prev, start, end);
  
--

2.17.1



Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Stephane Eranian
On Wed, Oct 24, 2018 at 5:34 PM Peter Zijlstra  wrote:
>
> On Wed, Oct 24, 2018 at 05:25:59PM -0700, Stephane Eranian wrote:
> > On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra  wrote:
>
> > > That is actually a different problem. And you're right, we never did fix
> > > that.
> > >
> > it is a different problem but the solution is the same: PERF_RECORD_UNMAP!
>
> But he's not actually doing PERF_RECORD_UNMAP, nor is his perf_munmap()
> hook sufficient to actually generate those.
>
You're right. But I saw that and thought, this is going in the right
direction: tracking munmap.
I agree with you that we should use this opportunity to track unmap
for his purpose but also
the issue I raised, which needs a new record type based on the new unmap hook.

> Now I agree that if he's going to do an munmap hook, he should do it
> 'right' and at the very least allow for PERF_RECORD_UNMAP to be done,
> but ideally simply pick up and finish that patch we had back then.
Agreed.


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Stephane Eranian
On Wed, Oct 24, 2018 at 5:34 PM Peter Zijlstra  wrote:
>
> On Wed, Oct 24, 2018 at 05:25:59PM -0700, Stephane Eranian wrote:
> > On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra  wrote:
>
> > > That is actually a different problem. And you're right, we never did fix
> > > that.
> > >
> > it is a different problem but the solution is the same: PERF_RECORD_UNMAP!
>
> But he's not actually doing PERF_RECORD_UNMAP, nor is his perf_munmap()
> hook sufficient to actually generate those.
>
You're right. But I saw that and thought, this is going in the right
direction: tracking munmap.
I agree with you that we should use this opportunity to track unmap
for his purpose but also
the issue I raised, which needs a new record type based on the new unmap hook.

> Now I agree that if he's going to do an munmap hook, he should do it
> 'right' and at the very least allow for PERF_RECORD_UNMAP to be done,
> but ideally simply pick up and finish that patch we had back then.
Agreed.


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Peter Zijlstra
On Wed, Oct 24, 2018 at 08:11:15AM -0700, kan.li...@linux.intel.com wrote:
> +void perf_event_munmap(void)
> +{
> + struct perf_cpu_context *cpuctx;
> + unsigned long flags;
> + struct pmu *pmu;
> +
> + local_irq_save(flags);

It is impossible to get here with IRQs already disabled.

> + list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> sched_cb_entry) {
> + pmu = cpuctx->ctx.pmu;
> +
> + if (!pmu->munmap)
> + continue;
> +
> + perf_ctx_lock(cpuctx, cpuctx->task_ctx);
> + perf_pmu_disable(pmu);
> +
> + pmu->munmap();
> +
> + perf_pmu_enable(pmu);
> +
> + perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
> + }
> + local_irq_restore(flags);
> +}
> +
>  static void perf_event_switch(struct task_struct *task,
> struct task_struct *next_prev, bool sched_in);
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 5f2b2b184c60..61978ad8c480 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2777,6 +2777,7 @@ int do_munmap(struct mm_struct *mm, unsigned long 
> start, size_t len,
>   /*
>* Remove the vma's, and unmap the actual pages
>*/
> + perf_event_munmap();

I think that if you add the munmap hook, you should do it right and at
least do it such that we can solve the other munmap problem.

>   detach_vmas_to_be_unmapped(mm, vma, prev, end);
>   unmap_region(mm, vma, prev, start, end);
>  
> -- 
> 2.17.1
> 


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Peter Zijlstra
On Wed, Oct 24, 2018 at 08:11:15AM -0700, kan.li...@linux.intel.com wrote:
> +void perf_event_munmap(void)
> +{
> + struct perf_cpu_context *cpuctx;
> + unsigned long flags;
> + struct pmu *pmu;
> +
> + local_irq_save(flags);

It is impossible to get here with IRQs already disabled.

> + list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> sched_cb_entry) {
> + pmu = cpuctx->ctx.pmu;
> +
> + if (!pmu->munmap)
> + continue;
> +
> + perf_ctx_lock(cpuctx, cpuctx->task_ctx);
> + perf_pmu_disable(pmu);
> +
> + pmu->munmap();
> +
> + perf_pmu_enable(pmu);
> +
> + perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
> + }
> + local_irq_restore(flags);
> +}
> +
>  static void perf_event_switch(struct task_struct *task,
> struct task_struct *next_prev, bool sched_in);
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 5f2b2b184c60..61978ad8c480 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2777,6 +2777,7 @@ int do_munmap(struct mm_struct *mm, unsigned long 
> start, size_t len,
>   /*
>* Remove the vma's, and unmap the actual pages
>*/
> + perf_event_munmap();

I think that if you add the munmap hook, you should do it right and at
least do it such that we can solve the other munmap problem.

>   detach_vmas_to_be_unmapped(mm, vma, prev, end);
>   unmap_region(mm, vma, prev, start, end);
>  
> -- 
> 2.17.1
> 


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Peter Zijlstra
On Wed, Oct 24, 2018 at 05:25:59PM -0700, Stephane Eranian wrote:
> On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra  wrote:

> > That is actually a different problem. And you're right, we never did fix
> > that.
> >
> it is a different problem but the solution is the same: PERF_RECORD_UNMAP!

But he's not actually doing PERF_RECORD_UNMAP, nor is his perf_munmap()
hook sufficient to actually generate those.

Now I agree that if he's going to do an munmap hook, he should do it
'right' and at the very least allow for PERF_RECORD_UNMAP to be done,
but ideally simply pick up and finish that patch we had back then.


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Peter Zijlstra
On Wed, Oct 24, 2018 at 05:25:59PM -0700, Stephane Eranian wrote:
> On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra  wrote:

> > That is actually a different problem. And you're right, we never did fix
> > that.
> >
> it is a different problem but the solution is the same: PERF_RECORD_UNMAP!

But he's not actually doing PERF_RECORD_UNMAP, nor is his perf_munmap()
hook sufficient to actually generate those.

Now I agree that if he's going to do an munmap hook, he should do it
'right' and at the very least allow for PERF_RECORD_UNMAP to be done,
but ideally simply pick up and finish that patch we had back then.


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Peter Zijlstra
On Wed, Oct 24, 2018 at 11:28:54AM -0700, Andi Kleen wrote:
> > > void perf_event_mmap(struct vm_area_struct *vma)
> > > {
> > >  struct perf_mmap_event mmap_event;
> > > 
> > >  if (!atomic_read(_mmap_events))
> > >  return;
> > > 
> > > }
> > > 
> > 
> > Thanks. I'll add the nr_mmap_events check in V2.
> 
> No, that's the wrong check here. The PEBS flush is independent of mmap
> events being requested.
> 
> If anything would need to check for any PEBS events active, which
> would need a new counter.  I think the easiest is to just check if 
> this_cpu_ptr(_cb_list)
> is empty, which should be good enough.

It is just the CLI+STI, not PUSHF;CLI+POPF that are required and that is
a lot cheaper already. Also, you need to have preemption disabled in
order to check the per-cpu cb list.

So I don't think it really makes much sense to try and frob a fast path
in there.


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Peter Zijlstra
On Wed, Oct 24, 2018 at 11:28:54AM -0700, Andi Kleen wrote:
> > > void perf_event_mmap(struct vm_area_struct *vma)
> > > {
> > >  struct perf_mmap_event mmap_event;
> > > 
> > >  if (!atomic_read(_mmap_events))
> > >  return;
> > > 
> > > }
> > > 
> > 
> > Thanks. I'll add the nr_mmap_events check in V2.
> 
> No, that's the wrong check here. The PEBS flush is independent of mmap
> events being requested.
> 
> If anything would need to check for any PEBS events active, which
> would need a new counter.  I think the easiest is to just check if 
> this_cpu_ptr(_cb_list)
> is empty, which should be good enough.

It is just the CLI+STI, not PUSHF;CLI+POPF that are required and that is
a lot cheaper already. Also, you need to have preemption disabled in
order to check the per-cpu cb list.

So I don't think it really makes much sense to try and frob a fast path
in there.


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Stephane Eranian
On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra  wrote:
>
> On Wed, Oct 24, 2018 at 12:30:52PM -0700, Stephane Eranian wrote:
> > Hi,
> >
> > On Wed, Oct 24, 2018 at 8:12 AM  wrote:
> > >
> > > From: Kan Liang 
> > >
> > > To calculate the physical address, perf needs to walk the pages tables.
> > > The related mapping may has already been removed from pages table in
> > > some cases (e.g. large PEBS). The virtual address recorded in the first
> > > PEBS records may already be unmapped before draining PEBS buffers.
> > >
> > > Add a munmap callback to notify the PMU of any unmapping, which only be
> > > invoked when the munmap is implemented.
> > >
> > The need for this new record type extends beyond physical address 
> > conversions
> > and PEBS. A long while ago, someone reported issues with symbolization 
> > related
> > to perf lacking munmap tracking. It had to do with vma merging. I think the
> > sequence of mmaps was as follows in the problematic case:
> > 1.   addr1 = mmap(8192);
> > 2.   munmap(addr1 + 4096, 4096)
> > 3.   addr2 = mmap(addr1+4096, 4096)
> >
> > If successful, that yields addr2 = addr1 + 4096 (could also get the
> > same without forcing the address).
>
> That is actually a different problem. And you're right, we never did fix
> that.
>
it is a different problem but the solution is the same: PERF_RECORD_UNMAP!
That's why I mentioned it here. To show that this is needed for more
than one reason ;-)


> I agree with you that Kan's Changelog is somewhat cryptic; it took me at
> least 3 times reading and looking at what the patches actually do to
> understand :/


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Stephane Eranian
On Wed, Oct 24, 2018 at 5:23 PM Peter Zijlstra  wrote:
>
> On Wed, Oct 24, 2018 at 12:30:52PM -0700, Stephane Eranian wrote:
> > Hi,
> >
> > On Wed, Oct 24, 2018 at 8:12 AM  wrote:
> > >
> > > From: Kan Liang 
> > >
> > > To calculate the physical address, perf needs to walk the pages tables.
> > > The related mapping may has already been removed from pages table in
> > > some cases (e.g. large PEBS). The virtual address recorded in the first
> > > PEBS records may already be unmapped before draining PEBS buffers.
> > >
> > > Add a munmap callback to notify the PMU of any unmapping, which only be
> > > invoked when the munmap is implemented.
> > >
> > The need for this new record type extends beyond physical address 
> > conversions
> > and PEBS. A long while ago, someone reported issues with symbolization 
> > related
> > to perf lacking munmap tracking. It had to do with vma merging. I think the
> > sequence of mmaps was as follows in the problematic case:
> > 1.   addr1 = mmap(8192);
> > 2.   munmap(addr1 + 4096, 4096)
> > 3.   addr2 = mmap(addr1+4096, 4096)
> >
> > If successful, that yields addr2 = addr1 + 4096 (could also get the
> > same without forcing the address).
>
> That is actually a different problem. And you're right, we never did fix
> that.
>
it is a different problem but the solution is the same: PERF_RECORD_UNMAP!
That's why I mentioned it here. To show that this is needed for more
than one reason ;-)


> I agree with you that Kan's Changelog is somewhat cryptic; it took me at
> least 3 times reading and looking at what the patches actually do to
> understand :/


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Peter Zijlstra
On Wed, Oct 24, 2018 at 12:30:52PM -0700, Stephane Eranian wrote:
> Hi,
> 
> On Wed, Oct 24, 2018 at 8:12 AM  wrote:
> >
> > From: Kan Liang 
> >
> > To calculate the physical address, perf needs to walk the pages tables.
> > The related mapping may has already been removed from pages table in
> > some cases (e.g. large PEBS). The virtual address recorded in the first
> > PEBS records may already be unmapped before draining PEBS buffers.
> >
> > Add a munmap callback to notify the PMU of any unmapping, which only be
> > invoked when the munmap is implemented.
> >
> The need for this new record type extends beyond physical address conversions
> and PEBS. A long while ago, someone reported issues with symbolization related
> to perf lacking munmap tracking. It had to do with vma merging. I think the
> sequence of mmaps was as follows in the problematic case:
> 1.   addr1 = mmap(8192);
> 2.   munmap(addr1 + 4096, 4096)
> 3.   addr2 = mmap(addr1+4096, 4096)
> 
> If successful, that yields addr2 = addr1 + 4096 (could also get the
> same without forcing the address).

That is actually a different problem. And you're right, we never did fix
that.

I agree with you that Kan's Changelog is somewhat cryptic; it took me at
least 3 times reading and looking at what the patches actually do to
understand :/


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Peter Zijlstra
On Wed, Oct 24, 2018 at 12:30:52PM -0700, Stephane Eranian wrote:
> Hi,
> 
> On Wed, Oct 24, 2018 at 8:12 AM  wrote:
> >
> > From: Kan Liang 
> >
> > To calculate the physical address, perf needs to walk the pages tables.
> > The related mapping may has already been removed from pages table in
> > some cases (e.g. large PEBS). The virtual address recorded in the first
> > PEBS records may already be unmapped before draining PEBS buffers.
> >
> > Add a munmap callback to notify the PMU of any unmapping, which only be
> > invoked when the munmap is implemented.
> >
> The need for this new record type extends beyond physical address conversions
> and PEBS. A long while ago, someone reported issues with symbolization related
> to perf lacking munmap tracking. It had to do with vma merging. I think the
> sequence of mmaps was as follows in the problematic case:
> 1.   addr1 = mmap(8192);
> 2.   munmap(addr1 + 4096, 4096)
> 3.   addr2 = mmap(addr1+4096, 4096)
> 
> If successful, that yields addr2 = addr1 + 4096 (could also get the
> same without forcing the address).

That is actually a different problem. And you're right, we never did fix
that.

I agree with you that Kan's Changelog is somewhat cryptic; it took me at
least 3 times reading and looking at what the patches actually do to
understand :/


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Stephane Eranian
Hi,

On Wed, Oct 24, 2018 at 8:12 AM  wrote:
>
> From: Kan Liang 
>
> To calculate the physical address, perf needs to walk the pages tables.
> The related mapping may has already been removed from pages table in
> some cases (e.g. large PEBS). The virtual address recorded in the first
> PEBS records may already be unmapped before draining PEBS buffers.
>
> Add a munmap callback to notify the PMU of any unmapping, which only be
> invoked when the munmap is implemented.
>
The need for this new record type extends beyond physical address conversions
and PEBS. A long while ago, someone reported issues with symbolization related
to perf lacking munmap tracking. It had to do with vma merging. I think the
sequence of mmaps was as follows in the problematic case:
1.   addr1 = mmap(8192);
2.   munmap(addr1 + 4096, 4096)
3.   addr2 = mmap(addr1+4096, 4096)

If successful, that yields addr2 = addr1 + 4096 (could also get the
same without forcing the address).

In that case, if I recall correctly, the vma for 1st mapping (now at
4k) and that of the 2nd mapping (4k)
get merged into a single 8k vma and this is what perf_events will
record for PERF_RECORD_MMAP.
On the perf tool side, it is assumed that if two timestamped mappings
overlap then, the latter overrides
the former. In this case, perf would loose the mapping of the first
4kb and assume all symbols comes from
2nd mapping. Hopefully I got the scenario right. If so, then you'd
need PERF_RECORD_UNMAP to
disambiguate assuming the perf tool is modified accordingly.


>
> Signed-off-by: Kan Liang 
> ---
>  include/linux/perf_event.h |  3 +++
>  kernel/events/core.c   | 25 +
>  mm/mmap.c  |  1 +
>  3 files changed, 29 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f0ca79..7f0a9258ce1f 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -400,6 +400,7 @@ struct pmu {
>  */
> void (*sched_task)  (struct perf_event_context *ctx,
> bool sched_in);
> +   void (*munmap)  (void);
> /*
>  * PMU specific data size
>  */
> @@ -1113,6 +1114,7 @@ static inline void perf_event_task_sched_out(struct 
> task_struct *prev,
>  }
>
>  extern void perf_event_mmap(struct vm_area_struct *vma);
> +extern void perf_event_munmap(void);
>  extern struct perf_guest_info_callbacks *perf_guest_cbs;
>  extern int perf_register_guest_info_callbacks(struct 
> perf_guest_info_callbacks *callbacks);
>  extern int perf_unregister_guest_info_callbacks(struct 
> perf_guest_info_callbacks *callbacks);
> @@ -1333,6 +1335,7 @@ static inline int perf_unregister_guest_info_callbacks
>  (struct perf_guest_info_callbacks *callbacks)  { 
> return 0; }
>
>  static inline void perf_event_mmap(struct vm_area_struct *vma) { }
> +static inline void perf_event_munmap(void) { }
>  static inline void perf_event_exec(void)   { }
>  static inline void perf_event_comm(struct task_struct *tsk, bool exec) { }
>  static inline void perf_event_namespaces(struct task_struct *tsk)  { }
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5a97f34bc14c..00338d6fbed7 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3184,6 +3184,31 @@ static void perf_pmu_sched_task(struct task_struct 
> *prev,
> }
>  }
>
> +void perf_event_munmap(void)
> +{
> +   struct perf_cpu_context *cpuctx;
> +   unsigned long flags;
> +   struct pmu *pmu;
> +
> +   local_irq_save(flags);
> +   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> sched_cb_entry) {
> +   pmu = cpuctx->ctx.pmu;
> +
> +   if (!pmu->munmap)
> +   continue;
> +
> +   perf_ctx_lock(cpuctx, cpuctx->task_ctx);
> +   perf_pmu_disable(pmu);
> +
> +   pmu->munmap();
> +
> +   perf_pmu_enable(pmu);
> +
> +   perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
> +   }
> +   local_irq_restore(flags);
> +}
> +
>  static void perf_event_switch(struct task_struct *task,
>   struct task_struct *next_prev, bool sched_in);
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 5f2b2b184c60..61978ad8c480 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2777,6 +2777,7 @@ int do_munmap(struct mm_struct *mm, unsigned long 
> start, size_t len,
> /*
>  * Remove the vma's, and unmap the actual pages
>  */
> +   perf_event_munmap();
> detach_vmas_to_be_unmapped(mm, vma, prev, end);
> unmap_region(mm, vma, prev, start, end);
>
> --
> 2.17.1
>


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Stephane Eranian
Hi,

On Wed, Oct 24, 2018 at 8:12 AM  wrote:
>
> From: Kan Liang 
>
> To calculate the physical address, perf needs to walk the pages tables.
> The related mapping may has already been removed from pages table in
> some cases (e.g. large PEBS). The virtual address recorded in the first
> PEBS records may already be unmapped before draining PEBS buffers.
>
> Add a munmap callback to notify the PMU of any unmapping, which only be
> invoked when the munmap is implemented.
>
The need for this new record type extends beyond physical address conversions
and PEBS. A long while ago, someone reported issues with symbolization related
to perf lacking munmap tracking. It had to do with vma merging. I think the
sequence of mmaps was as follows in the problematic case:
1.   addr1 = mmap(8192);
2.   munmap(addr1 + 4096, 4096)
3.   addr2 = mmap(addr1+4096, 4096)

If successful, that yields addr2 = addr1 + 4096 (could also get the
same without forcing the address).

In that case, if I recall correctly, the vma for 1st mapping (now at
4k) and that of the 2nd mapping (4k)
get merged into a single 8k vma and this is what perf_events will
record for PERF_RECORD_MMAP.
On the perf tool side, it is assumed that if two timestamped mappings
overlap then, the latter overrides
the former. In this case, perf would loose the mapping of the first
4kb and assume all symbols comes from
2nd mapping. Hopefully I got the scenario right. If so, then you'd
need PERF_RECORD_UNMAP to
disambiguate assuming the perf tool is modified accordingly.


>
> Signed-off-by: Kan Liang 
> ---
>  include/linux/perf_event.h |  3 +++
>  kernel/events/core.c   | 25 +
>  mm/mmap.c  |  1 +
>  3 files changed, 29 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 53c500f0ca79..7f0a9258ce1f 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -400,6 +400,7 @@ struct pmu {
>  */
> void (*sched_task)  (struct perf_event_context *ctx,
> bool sched_in);
> +   void (*munmap)  (void);
> /*
>  * PMU specific data size
>  */
> @@ -1113,6 +1114,7 @@ static inline void perf_event_task_sched_out(struct 
> task_struct *prev,
>  }
>
>  extern void perf_event_mmap(struct vm_area_struct *vma);
> +extern void perf_event_munmap(void);
>  extern struct perf_guest_info_callbacks *perf_guest_cbs;
>  extern int perf_register_guest_info_callbacks(struct 
> perf_guest_info_callbacks *callbacks);
>  extern int perf_unregister_guest_info_callbacks(struct 
> perf_guest_info_callbacks *callbacks);
> @@ -1333,6 +1335,7 @@ static inline int perf_unregister_guest_info_callbacks
>  (struct perf_guest_info_callbacks *callbacks)  { 
> return 0; }
>
>  static inline void perf_event_mmap(struct vm_area_struct *vma) { }
> +static inline void perf_event_munmap(void) { }
>  static inline void perf_event_exec(void)   { }
>  static inline void perf_event_comm(struct task_struct *tsk, bool exec) { }
>  static inline void perf_event_namespaces(struct task_struct *tsk)  { }
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5a97f34bc14c..00338d6fbed7 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3184,6 +3184,31 @@ static void perf_pmu_sched_task(struct task_struct 
> *prev,
> }
>  }
>
> +void perf_event_munmap(void)
> +{
> +   struct perf_cpu_context *cpuctx;
> +   unsigned long flags;
> +   struct pmu *pmu;
> +
> +   local_irq_save(flags);
> +   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> sched_cb_entry) {
> +   pmu = cpuctx->ctx.pmu;
> +
> +   if (!pmu->munmap)
> +   continue;
> +
> +   perf_ctx_lock(cpuctx, cpuctx->task_ctx);
> +   perf_pmu_disable(pmu);
> +
> +   pmu->munmap();
> +
> +   perf_pmu_enable(pmu);
> +
> +   perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
> +   }
> +   local_irq_restore(flags);
> +}
> +
>  static void perf_event_switch(struct task_struct *task,
>   struct task_struct *next_prev, bool sched_in);
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 5f2b2b184c60..61978ad8c480 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2777,6 +2777,7 @@ int do_munmap(struct mm_struct *mm, unsigned long 
> start, size_t len,
> /*
>  * Remove the vma's, and unmap the actual pages
>  */
> +   perf_event_munmap();
> detach_vmas_to_be_unmapped(mm, vma, prev, end);
> unmap_region(mm, vma, prev, start, end);
>
> --
> 2.17.1
>


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 24, 2018 at 02:12:54PM -0400, Liang, Kan escreveu:
> 
> 
> On 10/24/2018 12:32 PM, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Oct 24, 2018 at 09:23:34AM -0700, Andi Kleen escreveu:
> > > > +void perf_event_munmap(void)
> > > > +{
> > > > +   struct perf_cpu_context *cpuctx;
> > > > +   unsigned long flags;
> > > > +   struct pmu *pmu;
> > > > +
> > > > +   local_irq_save(flags);
> > > > +   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> > > > sched_cb_entry) {
> > > 
> > > Would be good have a fast path here that checks for the list being empty
> > > without disabling the interrupts. munmap can be somewhat hot. I think
> > > it's ok to make it slower with perf running, but we shouldn't impact
> > > it without perf.
> > 
> > Right, look at how its counterpart, perf_event_mmap() works:
> > 
> > void perf_event_mmap(struct vm_area_struct *vma)
> > {
> >  struct perf_mmap_event mmap_event;
> > 
> >  if (!atomic_read(_mmap_events))
> >  return;
> > 
> > }
> > 
> 
> Thanks. I'll add the nr_mmap_events check in V2.

That would be a nr_munmap_events, if this is tied to PERF_RECORD_MUNMAP
(right?), which it isn't right now, check Andi's response, mine was more
of a "hey, perf_event_mmap does an atomic check, before grabbing any
locks".

- Arnaldo


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 24, 2018 at 02:12:54PM -0400, Liang, Kan escreveu:
> 
> 
> On 10/24/2018 12:32 PM, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Oct 24, 2018 at 09:23:34AM -0700, Andi Kleen escreveu:
> > > > +void perf_event_munmap(void)
> > > > +{
> > > > +   struct perf_cpu_context *cpuctx;
> > > > +   unsigned long flags;
> > > > +   struct pmu *pmu;
> > > > +
> > > > +   local_irq_save(flags);
> > > > +   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> > > > sched_cb_entry) {
> > > 
> > > Would be good have a fast path here that checks for the list being empty
> > > without disabling the interrupts. munmap can be somewhat hot. I think
> > > it's ok to make it slower with perf running, but we shouldn't impact
> > > it without perf.
> > 
> > Right, look at how its counterpart, perf_event_mmap() works:
> > 
> > void perf_event_mmap(struct vm_area_struct *vma)
> > {
> >  struct perf_mmap_event mmap_event;
> > 
> >  if (!atomic_read(_mmap_events))
> >  return;
> > 
> > }
> > 
> 
> Thanks. I'll add the nr_mmap_events check in V2.

That would be a nr_munmap_events, if this is tied to PERF_RECORD_MUNMAP
(right?), which it isn't right now, check Andi's response, mine was more
of a "hey, perf_event_mmap does an atomic check, before grabbing any
locks".

- Arnaldo


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Andi Kleen
> > void perf_event_mmap(struct vm_area_struct *vma)
> > {
> >  struct perf_mmap_event mmap_event;
> > 
> >  if (!atomic_read(_mmap_events))
> >  return;
> > 
> > }
> > 
> 
> Thanks. I'll add the nr_mmap_events check in V2.

No, that's the wrong check here. The PEBS flush is independent of mmap
events being requested.

If anything would need to check for any PEBS events active, which
would need a new counter.  I think the easiest is to just check if 
this_cpu_ptr(_cb_list)
is empty, which should be good enough.

-Andi


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Andi Kleen
> > void perf_event_mmap(struct vm_area_struct *vma)
> > {
> >  struct perf_mmap_event mmap_event;
> > 
> >  if (!atomic_read(_mmap_events))
> >  return;
> > 
> > }
> > 
> 
> Thanks. I'll add the nr_mmap_events check in V2.

No, that's the wrong check here. The PEBS flush is independent of mmap
events being requested.

If anything would need to check for any PEBS events active, which
would need a new counter.  I think the easiest is to just check if 
this_cpu_ptr(_cb_list)
is empty, which should be good enough.

-Andi


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Liang, Kan




On 10/24/2018 12:32 PM, Arnaldo Carvalho de Melo wrote:

Em Wed, Oct 24, 2018 at 09:23:34AM -0700, Andi Kleen escreveu:

+void perf_event_munmap(void)
+{
+   struct perf_cpu_context *cpuctx;
+   unsigned long flags;
+   struct pmu *pmu;
+
+   local_irq_save(flags);
+   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
sched_cb_entry) {


Would be good have a fast path here that checks for the list being empty
without disabling the interrupts. munmap can be somewhat hot. I think
it's ok to make it slower with perf running, but we shouldn't impact
it without perf.


Right, look at how its counterpart, perf_event_mmap() works:

void perf_event_mmap(struct vm_area_struct *vma)
{
 struct perf_mmap_event mmap_event;

 if (!atomic_read(_mmap_events))
 return;

}



Thanks. I'll add the nr_mmap_events check in V2.

Thanks,
Kan


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Liang, Kan




On 10/24/2018 12:32 PM, Arnaldo Carvalho de Melo wrote:

Em Wed, Oct 24, 2018 at 09:23:34AM -0700, Andi Kleen escreveu:

+void perf_event_munmap(void)
+{
+   struct perf_cpu_context *cpuctx;
+   unsigned long flags;
+   struct pmu *pmu;
+
+   local_irq_save(flags);
+   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
sched_cb_entry) {


Would be good have a fast path here that checks for the list being empty
without disabling the interrupts. munmap can be somewhat hot. I think
it's ok to make it slower with perf running, but we shouldn't impact
it without perf.


Right, look at how its counterpart, perf_event_mmap() works:

void perf_event_mmap(struct vm_area_struct *vma)
{
 struct perf_mmap_event mmap_event;

 if (!atomic_read(_mmap_events))
 return;

}



Thanks. I'll add the nr_mmap_events check in V2.

Thanks,
Kan


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 24, 2018 at 09:23:34AM -0700, Andi Kleen escreveu:
> > +void perf_event_munmap(void)
> > +{
> > +   struct perf_cpu_context *cpuctx;
> > +   unsigned long flags;
> > +   struct pmu *pmu;
> > +
> > +   local_irq_save(flags);
> > +   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> > sched_cb_entry) {
> 
> Would be good have a fast path here that checks for the list being empty
> without disabling the interrupts. munmap can be somewhat hot. I think
> it's ok to make it slower with perf running, but we shouldn't impact
> it without perf.

Right, look at how its counterpart, perf_event_mmap() works:

void perf_event_mmap(struct vm_area_struct *vma)
{
struct perf_mmap_event mmap_event;

if (!atomic_read(_mmap_events))
return;

}

- Arnaldo


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 24, 2018 at 09:23:34AM -0700, Andi Kleen escreveu:
> > +void perf_event_munmap(void)
> > +{
> > +   struct perf_cpu_context *cpuctx;
> > +   unsigned long flags;
> > +   struct pmu *pmu;
> > +
> > +   local_irq_save(flags);
> > +   list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> > sched_cb_entry) {
> 
> Would be good have a fast path here that checks for the list being empty
> without disabling the interrupts. munmap can be somewhat hot. I think
> it's ok to make it slower with perf running, but we shouldn't impact
> it without perf.

Right, look at how its counterpart, perf_event_mmap() works:

void perf_event_mmap(struct vm_area_struct *vma)
{
struct perf_mmap_event mmap_event;

if (!atomic_read(_mmap_events))
return;

}

- Arnaldo


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Andi Kleen
> +void perf_event_munmap(void)
> +{
> + struct perf_cpu_context *cpuctx;
> + unsigned long flags;
> + struct pmu *pmu;
> +
> + local_irq_save(flags);
> + list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> sched_cb_entry) {

Would be good have a fast path here that checks for the list being empty
without disabling the interrupts. munmap can be somewhat hot. I think
it's ok to make it slower with perf running, but we shouldn't impact
it without perf.

-Andi


Re: [PATCH 1/2] perf: Add munmap callback

2018-10-24 Thread Andi Kleen
> +void perf_event_munmap(void)
> +{
> + struct perf_cpu_context *cpuctx;
> + unsigned long flags;
> + struct pmu *pmu;
> +
> + local_irq_save(flags);
> + list_for_each_entry(cpuctx, this_cpu_ptr(_cb_list), 
> sched_cb_entry) {

Would be good have a fast path here that checks for the list being empty
without disabling the interrupts. munmap can be somewhat hot. I think
it's ok to make it slower with perf running, but we shouldn't impact
it without perf.

-Andi