> Hi All, > > When I use cputrack to track one process on a numa > system > (like nhm-ex) and want to see some performance events > like > “RMA” (Remote Memory Access) which the process > costs. > > The cputrack can tell me the RMA value which the > process costs > on the whole system, eg: in last 5s, it costs 1,000 > RMA on all 4 > nodes (4 sockets). > > But sometime, I want to know the RMA costs per node, > > eg, how many RMA the process costs on node1 or how > many > it costs on node2? > > The cputrack can't give me above result because cpc > doesn't support > seperating performance counter value per cpu for a > thread/process. > So I want to provide a patch to enhance cpc to > support this feature. > > Does anybody think it's valuable? > > Thanks > Jin Yao
oh, looks like the patch idea is lack of interests now. Please allow me to give a sample again to show it's value. We run a stream benchmark on a 4 sockets system 2 times. The stream benchmark uses OpenMP for parallel and creates specifiled threads to do the computation tasks. All computing threads must do barrier. We see a big performance variation (10%) between two run. 1. In the better run, all the computing threads run on their own home lgroup. 2. In the worse run, part of threads are migrated from its home lgroup to other lgroup. Dtrace script confirms threads migration between different lgroups with the worse run. We guess the root cause is the threads on home lgroup runs faster than other migrated off threads. But the fast threads have to wait the slow threads to complete the jobs during the barrier phase. If above guess is true, the migrated off threads should have a lot of RMA (Remote Memory Access) on other nodes to access the memory on their home lgroup. But unfortunately, we don't have such data or observation to support it. Because in current cpc implement, we can only get the total RMA of a thread on all cpus/nodes. There is no way to seperate them per cpu/node for a thread. We don't know how many RMA the migrated off threads cost on other lgroup. That's why I want to provide a small patch to enhance cpc. btw, another new question is raised, why scheduler migrates these threads from their home lgroup to other lgroup? It's maybe a interesting topic worth digging later. Thanks Jin Yao -- This message posted from opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org