Re: [perf-discuss] NUMAtop for OpenSolaris

Jonathan Chew Tue, 23 Feb 2010 19:00:37 -0800

Li, Aubrey wrote:

Hi Jonathan,


Do you have any comments about this proposal?

Thanks,
-Aubrey

Li, Aubrey wrote:

Jonathan Chew wrote:

Thanks for summarizing the metrics.  However, I wanted to see a summary
of the overall NUMAtop proposal given the feedback that you have gotten,
so I can understand what the project is proposing to do now that you
have gotten feedback.  Then I can decide whether I have anything to add
and whether I want to approve it as is or not.

From the email thread so far, it looks as though Krish gave a very
brief description of the project, Jin Yao explained some phases for the
project, and you have listed some proposed metrics for the tool

Have anything of these changed given the feedback that you have gotten?
Can you please summarize your latest project proposal including the
description, phases, metrics, and anything else that is useful for
understanding what the project is proposing to do?


Jonathan

NUMAtop focus on NUMA-related characteristic, it's a tool to help
developers
identify memory locality in NUMA systems. The tool is top-like that
shows
the top N processes in the system and their memory locality, with those
processes
that have the worst memory locality will be at the top of the list, it
can
attach into a process to show the threads memory locality in the top
style as well.

The information NUMAtop reported is collected from memory-related
hardware
counters and libcpc Dtrace provider. Some of these counters are already
supported
in kcpc and libcpc, while some of them are not. Intel Nehalem-based and
next-generation platform provide memory load latency event, which is an
important approach of NUMAtop and needs PEBS framework solaris
implementation.

The following proposed metrics will be one part of our phase I job.
Application can be classified into CPU-sensitive, Memory-sensitive, IO-
sensitive.
IO-sensitive application can be idendified by low CPU utilization.
Memory-sensitive
application should be CPU-sensitive application with high CPU
utilization.

Can you please explain what you mean by CPU, memory, and I/O sensitive?What do these have to do with memory locality?

So we have the following metrics:

1) sysload      -  cpu sensitive


What do you mean by "sysload"?

2) LLC Miss per Instruction - memory sensitive

So, is a memory sensitive thread one that has low or high LLC mis perinstruction?

After we figure out the application is memory-sensitive, we'll check
memory locality
metrics to see what is the performance regression cause.

How will you do that? Do you mean that you will try to use the fourmetrics that you have listed here to determine the cause?

3) LLC Latency Ratio(Average Latency for LLC Miss/Local Memory Access
Latency)

Will the latency for each LLC miss be measured then? Is the localmemory latency the *ideal* local memory latency when the system isunloaded or the *current* local memory latency which may be higher thanthe ideal because of load?

4) Source distribution for LLC miss:
 -4.1)LMA/(Total LLC Miss Retired)%
 -4.2)RMA/(Total LLC Miss Retired)%


Will these ratios be given for each NUMA node, the whole system, or both?

Here, 4.2) could be separated into different % onto different NUMA node
hop.

Do you mean that the total RMA will be broken down into percentage ofremote memory accesses to each NUMA node from a given NUMA node?

NUMAtop should have a useful report to show how effective the
application is using the

local memory.

I think that someone already pointed out that you don't seem to mentionanything about where the thread runs as part of your proposal eventhough that is pretty important in figuring out how effective a threadis using local memory. The thread won't be very effective using localmemory if it never runs on CPUs where its local memory lives.

Also, the memory allocation policy may matter too. For example, athread may access remote memory a lot if it is accessing shared memorybecause the default memory allocation policy for shared memory is tospread it out by allocating it randomly across lgroups.

We need PEBS framework to implement the metrics of NUMATOP,
We need MPO
sponsor and libcpc dtrace provider sponsor to figure out where is not
effective and why.

Ok.

A better memory placement strategy suggestion is also a valuable goal of
NUMATOP.


How are you proposing to do that?



Jonathan

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] NUMAtop for OpenSolaris

Reply via email to