Vivek, On Tue, Sep 04, 2007 at 12:51:24PM -0400, [EMAIL PROTECTED] wrote: > > > > perfmonctl(ctx_fd, PFM_START, 0, 0); > > busyloop(); > > perfmonctl(ctx_fd, PFM_STOP, 0, 0); > > > > With this, notify-self2.c worked for me. > > Thanks a lot ! This works for me as well ! But for my benchmark, I observe > the > following results running with 16 threads, each bounded to a different > processor (running in SYSTEM_WIDE mode) : > Ok, good to see progress for you as well.
> Sampling Interval = 1 > Event monitored: DATA_EAR_CACHE_LAT_512 (Cache latency threshold of 512 i.e. > probably all level of cache misses). No, you got it reversed. If you want to capture for all levels you need to use LAT_4. But be careful, a sampling period of 1 is likely to be way too aggresive here. Remember this is sampling not tracing. > Invalid Overflows=>If the SIGIO had fd (in siginfo structure) not meant for > this thread. Well, I have seen this problem in the past even in per-thread mode. It was reported on this list by U. of Waterloo. Did you do the F_SETSIG/F_SETOWN using gettid()? I think there maybe a problem with this approach because it seems hard to guarantee which thread is going to receive the signal. Note that in system-wide your monitoring program is probably doing nothing more than to wait for the signal, in which case, it might as well, block on read(fd,&msg). The workload you are running could as well run in a separate program pinned to the core. > > Overflows by thread 9 = 1, Invalid overflows = 0 > Overflows by thread 5 = 3, Invalid overflows = 0 > Overflows by thread 6 = 2, Invalid overflows = 0 > Overflows by thread 10 = 2, Invalid overflows = 0 > Overflows by thread 11 = 3, Invalid overflows = 0 > Overflows by thread 8 = 8, Invalid overflows = 0 > Overflows by thread 4 = 3, Invalid overflows = 0 > Overflows by thread 7 = 1, Invalid overflows = 0 > Overflows by thread 12 = 3, Invalid overflows = 0 > Overflows by thread 14 = 3, Invalid overflows = 0 > Overflows by thread 3 = 3, Invalid overflows = 0 > Overflows by thread 13 = 23, Invalid overflows = 0 > Overflows by thread 15 = 24, Invalid overflows = 0 > Overflows by thread 0 = 9, Invalid overflows = 0 > Overflows by thread 1 = 88, Invalid overflows = 0 > Overflows by thread 2 = 205, Invalid overflows = 0 > > And, with sampling interval of 10, it is: > > Overflows by thread 1 = 11, Invalid overflows = 0 > Overflows by thread 13 = 14, Invalid overflows = 0 > Overflows by thread 8 = 5, Invalid overflows = 0 > Overflows by thread 0 = 11, Invalid overflows = 0 > Overflows by thread 5 = 16, Invalid overflows = 0 > Overflows by thread 9 = 15, Invalid overflows = 0 > Overflows by thread 2 = 11, Invalid overflows = 0 > Overflows by thread 7 = 14, Invalid overflows = 0 > Overflows by thread 12 = 16, Invalid overflows = 0 > Overflows by thread 14 = 14, Invalid overflows = 0 > Overflows by thread 10 = 16, Invalid overflows = 0 > Overflows by thread 11 = 14, Invalid overflows = 0 > Overflows by thread 4 = 14, Invalid overflows = 0 > Overflows by thread 3 = 11, Invalid overflows = 0 > Overflows by thread 6 = 21, Invalid overflows = 0 > Overflows by thread 15 = 22, Invalid overflows = 0 > > As you can see, the distribution is much better in latter case. > Have you observed anything similar in your tests (while implementing this > feature in kernel ) ? Your comments would be really useful to us. > > Regards, > Vivek > > > -- > > -Stephane -- -Stephane _______________________________________________ perfmon mailing list [email protected] http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
