Stephane, On Tuesday 04 September 2007 01:47, Stephane Eranian wrote: > Vivek, > > > On Monday 03 September 2007 16:21, Stephane Eranian wrote: > > > Vivek, > > > > > > Look at notify_self2.c. It is not system-wide but it can be converted > > > fairly easily. > > Ok, there is another small issue related to switching from per-thread to > system-wide on IA-64, > > /* > * Let's roll now > */ > pfm_self_start(ctx_fd); > > busyloop(); > > pfm_self_stop(ctx_fd); > > The pfm_self_start/pfm_self_stop macros only work for a per-thread context. > I should have commented on that in the source code. So your problem is that > monitoring is never activated. You need to use the actual systam call: > > perfmonctl(ctx_fd, PFM_START, 0, 0); > busyloop(); > perfmonctl(ctx_fd, PFM_STOP, 0, 0); > > With this, notify-self2.c worked for me.
Thanks a lot ! This works for me as well ! But for my benchmark, I observe the following results running with 16 threads, each bounded to a different processor (running in SYSTEM_WIDE mode) : Sampling Interval = 1 Event monitored: DATA_EAR_CACHE_LAT_512 (Cache latency threshold of 512 i.e. probably all level of cache misses). Invalid Overflows=>If the SIGIO had fd (in siginfo structure) not meant for this thread. Overflows by thread 9 = 1, Invalid overflows = 0 Overflows by thread 5 = 3, Invalid overflows = 0 Overflows by thread 6 = 2, Invalid overflows = 0 Overflows by thread 10 = 2, Invalid overflows = 0 Overflows by thread 11 = 3, Invalid overflows = 0 Overflows by thread 8 = 8, Invalid overflows = 0 Overflows by thread 4 = 3, Invalid overflows = 0 Overflows by thread 7 = 1, Invalid overflows = 0 Overflows by thread 12 = 3, Invalid overflows = 0 Overflows by thread 14 = 3, Invalid overflows = 0 Overflows by thread 3 = 3, Invalid overflows = 0 Overflows by thread 13 = 23, Invalid overflows = 0 Overflows by thread 15 = 24, Invalid overflows = 0 Overflows by thread 0 = 9, Invalid overflows = 0 Overflows by thread 1 = 88, Invalid overflows = 0 Overflows by thread 2 = 205, Invalid overflows = 0 And, with sampling interval of 10, it is: Overflows by thread 1 = 11, Invalid overflows = 0 Overflows by thread 13 = 14, Invalid overflows = 0 Overflows by thread 8 = 5, Invalid overflows = 0 Overflows by thread 0 = 11, Invalid overflows = 0 Overflows by thread 5 = 16, Invalid overflows = 0 Overflows by thread 9 = 15, Invalid overflows = 0 Overflows by thread 2 = 11, Invalid overflows = 0 Overflows by thread 7 = 14, Invalid overflows = 0 Overflows by thread 12 = 16, Invalid overflows = 0 Overflows by thread 14 = 14, Invalid overflows = 0 Overflows by thread 10 = 16, Invalid overflows = 0 Overflows by thread 11 = 14, Invalid overflows = 0 Overflows by thread 4 = 14, Invalid overflows = 0 Overflows by thread 3 = 11, Invalid overflows = 0 Overflows by thread 6 = 21, Invalid overflows = 0 Overflows by thread 15 = 22, Invalid overflows = 0 As you can see, the distribution is much better in latter case. Have you observed anything similar in your tests (while implementing this feature in kernel ) ? Your comments would be really useful to us. Regards, Vivek > -- > -Stephane _______________________________________________ perfmon mailing list [email protected] http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
