Stephane,
On Tuesday 04 September 2007 01:47, Stephane Eranian wrote:
> Vivek,
>
> > On Monday 03 September 2007 16:21, Stephane Eranian wrote:
> > > Vivek,
> > >
> > > Look at notify_self2.c. It is not system-wide but it can be converted
> > > fairly easily.
>
> Ok, there is another small issue related to switching from per-thread to
> system-wide on IA-64,
>
>       /*
>        * Let's roll now
>        */
>       pfm_self_start(ctx_fd);
>
>       busyloop();
>
>       pfm_self_stop(ctx_fd);
>
> The pfm_self_start/pfm_self_stop macros only work for a per-thread context.
> I should have commented on that in the source code. So your problem is that
> monitoring is never activated. You need to use the actual systam call:
>
>       perfmonctl(ctx_fd, PFM_START, 0, 0);
>       busyloop();
>       perfmonctl(ctx_fd, PFM_STOP, 0, 0);
>
> With this, notify-self2.c worked for me.

Thanks a lot ! This works for me as well ! But for my benchmark, I observe the 
following results running with 16 threads, each bounded to a different 
processor (running in SYSTEM_WIDE mode) :

Sampling Interval = 1
Event monitored: DATA_EAR_CACHE_LAT_512 (Cache latency threshold of 512 i.e. 
probably all level of cache misses).
Invalid Overflows=>If the SIGIO had fd (in siginfo structure) not meant for 
this thread.

Overflows by thread 9  = 1, Invalid overflows = 0
Overflows by thread 5  = 3, Invalid overflows = 0
Overflows by thread 6  = 2, Invalid overflows = 0
Overflows by thread 10  = 2, Invalid overflows = 0
Overflows by thread 11  = 3, Invalid overflows = 0
Overflows by thread 8  = 8, Invalid overflows = 0
Overflows by thread 4  = 3, Invalid overflows = 0
Overflows by thread 7  = 1, Invalid overflows = 0
Overflows by thread 12  = 3, Invalid overflows = 0
Overflows by thread 14  = 3, Invalid overflows = 0
Overflows by thread 3  = 3, Invalid overflows = 0
Overflows by thread 13  = 23, Invalid overflows = 0
Overflows by thread 15  = 24, Invalid overflows = 0
Overflows by thread 0  = 9, Invalid overflows = 0
Overflows by thread 1  = 88, Invalid overflows = 0
Overflows by thread 2  = 205, Invalid overflows = 0

And, with sampling interval of 10, it is:

Overflows by thread 1  = 11, Invalid overflows = 0
Overflows by thread 13  = 14, Invalid overflows = 0
Overflows by thread 8  = 5, Invalid overflows = 0
Overflows by thread 0  = 11, Invalid overflows = 0
Overflows by thread 5  = 16, Invalid overflows = 0
Overflows by thread 9  = 15, Invalid overflows = 0
Overflows by thread 2  = 11, Invalid overflows = 0
Overflows by thread 7  = 14, Invalid overflows = 0
Overflows by thread 12  = 16, Invalid overflows = 0
Overflows by thread 14  = 14, Invalid overflows = 0
Overflows by thread 10  = 16, Invalid overflows = 0
Overflows by thread 11  = 14, Invalid overflows = 0
Overflows by thread 4  = 14, Invalid overflows = 0
Overflows by thread 3  = 11, Invalid overflows = 0
Overflows by thread 6  = 21, Invalid overflows = 0
Overflows by thread 15  = 22, Invalid overflows = 0

As you can see, the distribution is much better in latter case. 
Have you observed anything similar in your tests (while implementing this 
feature in kernel ) ? Your comments would be really useful to us.

Regards,
Vivek

> --
> -Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to