Re: [perf-discuss] NUMAtop for OpenSolaris

Jonathan Chew Tue, 19 Jan 2010 18:41:58 -0800

Li, Aubrey wrote:

Hi Jonathan,


Nice to see you have interest.

We are discussing the metrics of NUMAtop, and so far the proposal is
that the following parameters will be reported by NUMAtop as the metrics.

1) sysload      -  cpu sensitive
2) LLC Miss per Instruction - memory sensitive
3) LLC Latency ratio - memory locality
4) the percent of the number of LMA/RMA access / total memory access
- 4.1) LMA/(total memory access)%
- 4.2) RMA/(total memory access)%

4.2) could be separated into different % onto different NUMA node hop.
These parameters are not platform specific and probably be common
enough to extend to SPARC platform.

Thanks for summarizing the metrics. However, I wanted to see a summaryof the overall NUMAtop proposal given the feedback that you have gotten,so I can understand what the project is proposing to do now that youhave gotten feedback. Then I can decide whether I have anything to addand whether I want to approve it as is or not.

From the email thread so far, it looks as though Krish gave a verybrief description of the project, Jin Yao explained some phases for theproject, and you have listed some proposed metrics for the tool

Have anything of these changed given the feedback that you have gotten?Can you please summarize your latest project proposal including thedescription, phases, metrics, and anything else that is useful forunderstanding what the project is proposing to do?




Jonathan

Jonathan Chew wrote:

There has been a lot of discussion on this since it was proposed last
month.  I want to know what is currently being proposed given the
lengthy discussion.

Can someone please summarize what the current proposal is now?



Jonathan


Li, Aubrey wrote:

johansen wrote:

On Tue, Jan 12, 2010 at 02:20:02PM +0800, zhihui Chen wrote:

Application can be categoried into CPU-sensitive, Memory-sensitive,
IO-sensitive.

My concern here is that unless the customer knows how to determine
whether his application is CPU, memory, or IO sensitive it's going to

be

hard to use the tools well.

"sysload" in NUMAtop can tell the customer if the app is cpu sensitive.
"Last Level Cache Miss per Instruction" will be added into NUMAtop to
determine if the app is memory sensitive.

When CPU trigged one LLC miss, the data can be gotten from local
memory, cache or memory in remote node. Generlly, the latency for
local memory will be close to latency for remote cache, while

latency

for remote memory should be much higher.

This isn't universally true.  On some SPARC platforms, it actually

takes

longer to read a line out of a remote CPU's cache than it does to

access

the memory on a remote system board.  On a large system, many CPU's

may

have this address in their cache, and they all need to know that it

has

become owned by the reading CPU.  If you're going to make this tool

work

on SPARC, it won't always be safe to make this assumption.

-j

Thanks to point this issue out. We are not SPARC expert and I think

SPARC

NUMAtop design is not in our phase I design, :)
We hope the SPARC expert like you or other expert can take SPARC into
account and extend this tool onto SPARC platform.

On systems where some remote memory accesses take longer than others,
this could be especially useful.  Instead of just reporting the

number

of remote accesses, it would be useful to report the amount of time

the

application spent accessing that memory.  Then it's possible for the
user to figure out what kind of performance win they might achieve

by

making the memory accesses local.

As for the metric of NUMAtop, the memory access latency is a good idea.
But the absolute amount is not a good indicator for NUMAtop. This

amount

will be different on different platforms, a specific number of amount

is

good on one platform while it's bad on another one. It's hard to tell

the

customer what data is good. So we will introduce a ratio into NUMAtop,

"LLC Latency ratio" =
"the actual memory access latency" / "calibrated local memory access

latency"

We assume different node hop has different memory access latency,

longer

distance node hop has the longer memory access latency. This ratio

will be near

to 1 if most of the memory access of the application is to the local

memory.

So as a conclusion, here we propose the metrics of NUMAtop
1) sysload      -  cpu sensitive
2) LLC Miss per Instruction - memory sensitive
3) LLC Latency ratio - memory locality
4) the percent of the number of LMA/RMA access / total memory access
- 4.1) LMA/(total memory access)%
- 4.2) RMA/(total memory access)%
- ...

4.2) could separate into different % onto different NUMA hop.
These parameters are not platform specific and probably be common

enough to extend

to SPARC platform.

Looking forward to your thoughts.

BTW: Do we still need one more +1 vote for NUMAtop project?

Thanks,
-Aubrey
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org


_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] NUMAtop for OpenSolaris

Reply via email to