Li, Aubrey wrote:
Hi Jonathan,
Do you have any comments about this proposal?
Thanks,
-Aubrey
Li, Aubrey wrote:
Jonathan Chew wrote:
Thanks for summarizing the metrics. However, I wanted to see a summary
of the overall NUMAtop proposal given the feedback that you have gotten,
so I can understand what the project is proposing to do now that you
have gotten feedback. Then I can decide whether I have anything to add
and whether I want to approve it as is or not.
From the email thread so far, it looks as though Krish gave a very
brief description of the project, Jin Yao explained some phases for the
project, and you have listed some proposed metrics for the tool
Have anything of these changed given the feedback that you have gotten?
Can you please summarize your latest project proposal including the
description, phases, metrics, and anything else that is useful for
understanding what the project is proposing to do?
Jonathan
NUMAtop focus on NUMA-related characteristic, it's a tool to help
developers
identify memory locality in NUMA systems. The tool is top-like that
shows
the top N processes in the system and their memory locality, with those
processes
that have the worst memory locality will be at the top of the list, it
can
attach into a process to show the threads memory locality in the top
style as well.
The information NUMAtop reported is collected from memory-related
hardware
counters and libcpc Dtrace provider. Some of these counters are already
supported
in kcpc and libcpc, while some of them are not. Intel Nehalem-based and
next-generation platform provide memory load latency event, which is an
important approach of NUMAtop and needs PEBS framework solaris
implementation.
The following proposed metrics will be one part of our phase I job.
Application can be classified into CPU-sensitive, Memory-sensitive, IO-
sensitive.
IO-sensitive application can be idendified by low CPU utilization.
Memory-sensitive
application should be CPU-sensitive application with high CPU
utilization.
Can you please explain what you mean by CPU, memory, and I/O sensitive?
What do these have to do with memory locality?
So we have the following metrics:
1) sysload - cpu sensitive
What do you mean by "sysload"?
2) LLC Miss per Instruction - memory sensitive
So, is a memory sensitive thread one that has low or high LLC mis per
instruction?
After we figure out the application is memory-sensitive, we'll check
memory locality
metrics to see what is the performance regression cause.
How will you do that? Do you mean that you will try to use the four
metrics that you have listed here to determine the cause?
3) LLC Latency Ratio(Average Latency for LLC Miss/Local Memory Access
Latency)
Will the latency for each LLC miss be measured then? Is the local
memory latency the *ideal* local memory latency when the system is
unloaded or the *current* local memory latency which may be higher than
the ideal because of load?
4) Source distribution for LLC miss:
-4.1)LMA/(Total LLC Miss Retired)%
-4.2)RMA/(Total LLC Miss Retired)%
Will these ratios be given for each NUMA node, the whole system, or both?
Here, 4.2) could be separated into different % onto different NUMA node
hop.
Do you mean that the total RMA will be broken down into percentage of
remote memory accesses to each NUMA node from a given NUMA node?
NUMAtop should have a useful report to show how effective the
application is using the
local memory.
I think that someone already pointed out that you don't seem to mention
anything about where the thread runs as part of your proposal even
though that is pretty important in figuring out how effective a thread
is using local memory. The thread won't be very effective using local
memory if it never runs on CPUs where its local memory lives.
Also, the memory allocation policy may matter too. For example, a
thread may access remote memory a lot if it is accessing shared memory
because the default memory allocation policy for shared memory is to
spread it out by allocating it randomly across lgroups.
We need PEBS framework to implement the metrics of NUMATOP,
We need MPO
sponsor and libcpc dtrace provider sponsor to figure out where is not
effective and why.
Ok.
A better memory placement strategy suggestion is also a valuable goal of
NUMATOP.
How are you proposing to do that?
Jonathan
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org