Li, Aubrey wrote:
Hi Jonathan,

Nice to see you have interest.

We are discussing the metrics of NUMAtop, and so far the proposal is
that the following parameters will be reported by NUMAtop as the metrics.

1) sysload      -  cpu sensitive
2) LLC Miss per Instruction - memory sensitive
3) LLC Latency ratio - memory locality
4) the percent of the number of LMA/RMA access / total memory access
- 4.1) LMA/(total memory access)%
- 4.2) RMA/(total memory access)%

4.2) could be separated into different % onto different NUMA node hop.
These parameters are not platform specific and probably be common
enough to extend to SPARC platform.

Thanks for summarizing the metrics. However, I wanted to see a summary of the overall NUMAtop proposal given the feedback that you have gotten, so I can understand what the project is proposing to do now that you have gotten feedback. Then I can decide whether I have anything to add and whether I want to approve it as is or not.

From the email thread so far, it looks as though Krish gave a very brief description of the project, Jin Yao explained some phases for the project, and you have listed some proposed metrics for the tool

Have anything of these changed given the feedback that you have gotten? Can you please summarize your latest project proposal including the description, phases, metrics, and anything else that is useful for understanding what the project is proposing to do?



Jonathan
Jonathan Chew wrote:
There has been a lot of discussion on this since it was proposed last
month.  I want to know what is currently being proposed given the
lengthy discussion.

Can someone please summarize what the current proposal is now?



Jonathan


Li, Aubrey wrote:
johansen wrote:

On Tue, Jan 12, 2010 at 02:20:02PM +0800, zhihui Chen wrote:

Application can be categoried into CPU-sensitive, Memory-sensitive,
IO-sensitive.

My concern here is that unless the customer knows how to determine
whether his application is CPU, memory, or IO sensitive it's going to
be
hard to use the tools well.


"sysload" in NUMAtop can tell the customer if the app is cpu sensitive.
"Last Level Cache Miss per Instruction" will be added into NUMAtop to
determine if the app is memory sensitive.


When CPU trigged one LLC miss, the data can be gotten from local
memory, cache or memory in remote node. Generlly, the latency for
local memory will be close to latency for remote cache, while
latency
for remote memory should be much higher.

This isn't universally true.  On some SPARC platforms, it actually
takes
longer to read a line out of a remote CPU's cache than it does to
access
the memory on a remote system board.  On a large system, many CPU's
may
have this address in their cache, and they all need to know that it
has
become owned by the reading CPU.  If you're going to make this tool
work
on SPARC, it won't always be safe to make this assumption.

-j

Thanks to point this issue out. We are not SPARC expert and I think
SPARC
NUMAtop design is not in our phase I design, :)
We hope the SPARC expert like you or other expert can take SPARC into
account and extend this tool onto SPARC platform.


On systems where some remote memory accesses take longer than others,
this could be especially useful.  Instead of just reporting the
number
of remote accesses, it would be useful to report the amount of time

the

application spent accessing that memory.  Then it's possible for the
user to figure out what kind of performance win they might achieve
by
making the memory accesses local.


As for the metric of NUMAtop, the memory access latency is a good idea.
But the absolute amount is not a good indicator for NUMAtop. This
amount
will be different on different platforms, a specific number of amount
is
good on one platform while it's bad on another one. It's hard to tell
the
customer what data is good. So we will introduce a ratio into NUMAtop,

"LLC Latency ratio" =
"the actual memory access latency" / "calibrated local memory access
latency"
We assume different node hop has different memory access latency,
longer
distance node hop has the longer memory access latency. This ratio
will be near
to 1 if most of the memory access of the application is to the local
memory.
So as a conclusion, here we propose the metrics of NUMAtop
1) sysload      -  cpu sensitive
2) LLC Miss per Instruction - memory sensitive
3) LLC Latency ratio - memory locality
4) the percent of the number of LMA/RMA access / total memory access
- 4.1) LMA/(total memory access)%
- 4.2) RMA/(total memory access)%
- ...

4.2) could separate into different % onto different NUMA hop.
These parameters are not platform specific and probably be common
enough to extend
to SPARC platform.

Looking forward to your thoughts.

BTW: Do we still need one more +1 vote for NUMAtop project?

Thanks,
-Aubrey
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org



_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to