Marty Itzkowitz of the Sun Studio team had this to say:
"We would not use the buffering, because of the restriction to only accumulate
PC's from the events, and we want full callstacks. But it seems like
a good idea.
I'd suggest extending the API for counters that refer to memory
operations, to allow for an event to be a triple of:
interrupt-PC
trigger-PC
Virtual address
Or even a four-vector, adding:
Physical address
The trigger-PC is the true PC of the instruction causing the
overflow, virtual address is the one being referenced by that
instruction. Physical address is the map from the VA to PA.
(This is predicated on cpc being able to get that info, but, based on
a conversation I had with someone from AMD at SC|05, I think their future
chips will have it).
I don't know if the current single-overflow interface can have such
extra data, but, if not, it should be expanded to do so.
(In the Analyzer, we currently use a backtracking heuristic to get at
those three fields, but getting them from the HW would be an enormous
step forward, and extending the interface to allow for that seems like a
Good Thing.)
Marty"
Russ Blaine wrote:
Here is a paper written by Willian Chen of SunLabs. He proposes a change
to the CPC (CPU Performance Counters) subsystem whereby Solaris will
buffer overflow events in the kernel.
In the interests of soliciting feedback from community members and being
more public about the types of changes we're considering for Solaris,
I'm posting this here for all to see and comment on before we submit it
to PSARC.
-------
Project name : CPC in-kernel buffering
Project summary :
CPC in-kernel buffering adds a counter buffer such that the user can
request the kernel to throw an overflow interrupt when the counter
buffer overflows.
Business Summary :
Any application that requires low sampling overhead might benefit from
in-kernel buffering. Our experiment shows that on average, 6-10%
runtime can be saved using cpc in-kernel buffering.
Run-Time Dynamic Optimizer is a user of the CPC in-kernel buffering to
reduce sampling overhead and to achieve better SPEC numbers. Run-time
Dynamic Optimizer is also targetting commercial applications, such as
Oracle, that do not ship profile-optimized binaries.
Profile collection of cache information for various compilers is another
client for in-kernel buffering.
Technical Description :
Counter overflow interrupt is one usage of libcpc. Upon counter
overflow, the likely useful information is the PC value of the
instruction which triggered the event which caused the overflow. If the
user is expected to accumulate a number of counter samples before the
collected information becomes useful, then its possible to aggregate
multiple counter overflow interrupts into a single interrupt by storing
the counter samples into a PC buffer (ie. counter overflows are deferred
and saved in the in-kernel buffer until the buffer overflows and only by
then is the SIG_EMT delivered).
This proposal enhances libcpc with one additional command
'CPC_OVF_BUFFERED', and the function 'cpc_set_sample_pcbuf' to read the
PC samples. By specifying CPC_OVF_BUFFERED, the user indicates that the
PC information should be saved into the buffer of size 'CPC_PCBUF_SIZE'
upon counter overflow, and uses the function cpc_set_sample_pcbuf for
reading.
For example to trigger an interrupt only when the in-kernel buffer is
filled, the user would add CPC_OVF_BUFFERED into the cpc request command
as follows.
cpc_set_add_request(cpc, set, name, period,
(CPC_COUNTER_USER | CPC_OVF_NOTIFY_EMT | CPC_OVF_BUFFERED),
0, NULL);
And in the EMT handler, cpc_set_sample_pcbuf is called to copy back the
buffered PC values into a user specified buffered area. After calling
cpc_set_sample_pcbuf, the kernel buffer must be cleared.
High level overview of the proposed changes are as follows.
1. Create a new command CPC_OVF_BUFFERED.
2. Define size of PC buffer CPC_PCBUF_SIZE.
3. Implement cpc_set_sample_pcbuf to complement cpc_set_sample.
4. Implement kcpc_sample_pcbuf to complement kcpc_sample.
5. Modify kcpc_overflow_ast to buffer samples if command
CPC_OVF_BUFFERED is issued.
Here is the proposed prototype for the new functions.
int cpc_set_sample_pcbuf(cpc_t *cpc, cpc_set_t *set, cpc_buf_t *buf,
uint64_t *pcbuf);
int kcpc_sample_pcbuf(kcpc_set_t *set, uint64_t *buf, hrtime_t *hr_time,
uint64_t *tick);
Here is the proposed change to the prototype of existing function.
int kcpc_overflow_ast(uint64_t pc);
Limitations :
1. Only PC values are stored in the buffer.
Thanks to William for this contribution. There were a few responses to
this inside Sun which I will post shortly.
- Russ
-----------------------------------------------------
Russ Blaine | Solaris Kernel | [EMAIL PROTECTED]
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org
--
-----------------------------------------------------
Russ Blaine | Solaris Kernel | [EMAIL PROTECTED]
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org