Vivek,

On Tue, Sep 04, 2007 at 12:51:24PM -0400, [EMAIL PROTECTED] wrote:
> >
> >     perfmonctl(ctx_fd, PFM_START, 0, 0);
> >     busyloop();
> >     perfmonctl(ctx_fd, PFM_STOP, 0, 0);
> >
> > With this, notify-self2.c worked for me.
> 
> Thanks a lot ! This works for me as well ! But for my benchmark, I observe 
> the 
> following results running with 16 threads, each bounded to a different 
> processor (running in SYSTEM_WIDE mode) :
> 
Ok,  good to see progress for you as well.

> Sampling Interval = 1
> Event monitored: DATA_EAR_CACHE_LAT_512 (Cache latency threshold of 512 i.e. 
> probably all level of cache misses).
No, you got it reversed. If you want to capture for all levels you need to use 
LAT_4.
But be careful, a sampling period of 1 is likely to be way too aggresive here.
Remember this is sampling not tracing.

> Invalid Overflows=>If the SIGIO had fd (in siginfo structure) not meant for 
> this thread.

Well, I have seen this problem in the past even in per-thread mode. It was
reported on this list by U. of Waterloo.

Did you do the F_SETSIG/F_SETOWN using gettid()?

I think there maybe a problem with this approach because it seems hard
to guarantee which thread is going to receive the signal. Note that in 
system-wide
your monitoring program is probably doing nothing more than to wait for the
signal, in which case, it might as well, block on read(fd,&msg). The workload
you are running could as well run in a separate program pinned to the core.

> 
> Overflows by thread 9  = 1, Invalid overflows = 0
> Overflows by thread 5  = 3, Invalid overflows = 0
> Overflows by thread 6  = 2, Invalid overflows = 0
> Overflows by thread 10  = 2, Invalid overflows = 0
> Overflows by thread 11  = 3, Invalid overflows = 0
> Overflows by thread 8  = 8, Invalid overflows = 0
> Overflows by thread 4  = 3, Invalid overflows = 0
> Overflows by thread 7  = 1, Invalid overflows = 0
> Overflows by thread 12  = 3, Invalid overflows = 0
> Overflows by thread 14  = 3, Invalid overflows = 0
> Overflows by thread 3  = 3, Invalid overflows = 0
> Overflows by thread 13  = 23, Invalid overflows = 0
> Overflows by thread 15  = 24, Invalid overflows = 0
> Overflows by thread 0  = 9, Invalid overflows = 0
> Overflows by thread 1  = 88, Invalid overflows = 0
> Overflows by thread 2  = 205, Invalid overflows = 0
> 
> And, with sampling interval of 10, it is:
> 
> Overflows by thread 1  = 11, Invalid overflows = 0
> Overflows by thread 13  = 14, Invalid overflows = 0
> Overflows by thread 8  = 5, Invalid overflows = 0
> Overflows by thread 0  = 11, Invalid overflows = 0
> Overflows by thread 5  = 16, Invalid overflows = 0
> Overflows by thread 9  = 15, Invalid overflows = 0
> Overflows by thread 2  = 11, Invalid overflows = 0
> Overflows by thread 7  = 14, Invalid overflows = 0
> Overflows by thread 12  = 16, Invalid overflows = 0
> Overflows by thread 14  = 14, Invalid overflows = 0
> Overflows by thread 10  = 16, Invalid overflows = 0
> Overflows by thread 11  = 14, Invalid overflows = 0
> Overflows by thread 4  = 14, Invalid overflows = 0
> Overflows by thread 3  = 11, Invalid overflows = 0
> Overflows by thread 6  = 21, Invalid overflows = 0
> Overflows by thread 15  = 22, Invalid overflows = 0
> 
> As you can see, the distribution is much better in latter case. 
> Have you observed anything similar in your tests (while implementing this 
> feature in kernel ) ? Your comments would be really useful to us.
> 
> Regards,
> Vivek
> 
> > --
> > -Stephane

-- 

-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to