Dan Terpstra wrote:
> > Phil is correct in that the L3 events on Shanghai are shared
> > across all cores in a chip. I don't know if perfmon2 specifically
> > traps for this; I don't think it does.
> > However the AMD documents suggest that by convention
> > only one core per chip should access these events.

Yep. I'm doing this already (Only one thread/chip access the counter)
Please look at the demo program.

>> >> On Jul 2, 2009, at 6:18 AM, Martin Vogt wrote:
>> >> [......]
>>> >>> This means:
>>> >>> -one threads monitors PAPI_L3_TCM correct
>>> >>> -two threads on different (or the same) Numa nodes throw
>>> >>> this error.
>>> >>>
>>> >>> Other Event counter seems to work correctly (without errors)
>>> >>>

Thus if I start two threads on different NUMA nodes I get the error.
(Ok, the two threads/chips error is ok then)

I attach the program hacked together from the ctest examples.
(The program is not by me, but from the person who actually works
on the machine.)
Its more or less a new test case which adds "MASK_L3_TCM"
The program needs libnuma (to bind to the numa nodes)
The output from the demo program is:


vogt[BugReport]>./zero_pthreads
Thread 0x40cac940 started
binding to numa node: 0
Thread 0x414ad940 started
binding to numa node: 1
PAPI Error: pfm_load_context(5,0x4508718(31165)): Unknown error
18446744073709551615.
PAPI Error: pfm_stop(5): Unknown error 18446744073709551615.
zero_pthreads.c                          FAILED
Line # 71
System error in PAPI_start: Invalid argument


If I haven't done anything wrong every thread runs on its own chip.

One point is important:

If I start papi_avail I get on the machine:

> >Model string and code    : Quad-Core .. Processor 8384 (16)
> >CPU Revision             : 2.000000
> >CPU Megahertz            : 800.000000
> >CPU Clock Megahertz      : 800
> >CPU's in this Node       : 32        <----(*)
> >Nodes in this System     : 1         <----(**)
> >Total CPU's              : 32


For * I would expect "4" and for ** I would expect "8"
The machine has 8 Shanghais (4 cores each).
Thus the error is consistent with the papi_avail output.
If I start two threads (on different chips) but "papi thinks" they are on
the same Node "1" then the Papi error is consistent (with papi_avail).



regards,

Martin

Attachment: BugReport.tgz
Description: application/compressed

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to