I'm submitting the following closed approved automatic fasttrack on behalf
of Jon Haslam and the DTrace community. It has been approved by the community
after discussion on dtrace-discuss at opensolaris.org. The stability is
Committed
and the binding is Patch.
Adam
---8<---
A. INTRODUCTION
This case adds the 'cpc' provider which will enable consumers to access the
performance counters of a CPU. This will allow users to easily connect CPU
events (e.g. TLB misses, L2 cache misses) to the cause of the event on a
system-wide basis.
The Solaris CPU Performance Counter (CPC) subsystem (PSARC 2002/180) gives
general purpose access to the hardware performance counters of a
microprocessor. The cpc provider leverages the infrastructure provided by
the CPC subsystem to access the CPU performance counter resources of a system.
The provider utilises the hardware overflow interrupt mechanism to allow
profiling based upon CPU performance counter events (in the same way that
the profile provider allows us to profile by time).
B. DESCRIPTION
1. Probe Format
The format of probes made available by the cpc provider:
cpc:::event_name-mode-{optional mask}-count
where:
event_name: The event name of interest. A full list of events available
on each platform are given in the output of `cpustat -h`.
mode: The operating mode of the processor in which the event is
counted. Valid settings are "user" (user mode), "kernel"
(kernel mode) and "all" (user and kernel mode).
optional mask: Some platform specific events can be further specified with
the use of a mask (sometimes known as a 'umask' or an 'emask').
This field is optional and can only be specified for platform
specific events. It cannot be used with generic performance
counter events (PSARC 2008/334). Specified as a hex value.
count: Specifies the number of events to be counted on a CPU for a
probe to fire on that CPU.
As an example, the specification for a probe which fires every 10000 user mode
DTLB misses on an UltraSPARC IV processor would look like:
cpc:::DTLB_miss-user-10000
The probes exported by the cpc provider are unanchored and are not associated
with a particular point of execution, but rather an asynchronous performance
counter event interrupt. When a probe is fired we can sample aspects of system
state and inferences can be made about system behaviour. The following example
records the user-land stack trace if the "foo" executable was executing when
the probe fired and the probe fires every 10000 user mode L1 instruction
cache misses (note that executable "foo" may have generated anywhere between
1 and 10000 of those events).
cpc:::IC_miss-user-10000
/execname == "foo"/
{
@[ustack()] = count();
}
2. Probe arguments
All probes provide two arguments:
arg0 The program counter (PC) in the kernel at the time the probe
fired, or 0 if the current process was not executing in the
kernel at the time the probe fired.
arg1 The PC in the user-level process at the time the probe fired,
or 0 if the current process was executing in the kernel at the
time the probe fired.
3. Probe Availability
Probes are made available dynamically when requested by a user. The probes
available will differ according to the events exported by the CPC subsystem
on a platform. The names of available events can be discovered, as mentioned
in section 'B1 - Probe Format', using the output of `cpustat -h`.
CPU performance counters are a finite resource and the number of probes
that can be enabled depends upon hardware capabilities. Processors
that cannot determine which counter has overflowed when multiple counters
are programmed (e.g. AMD, UltraSPARC) are only allowed to have a single
enabling at any one time. On such platforms, consumers attempting to enable
more than 1 probe will fail as will consumers attempting to enable a probe
when a disparate enabling already exists. Processors that can detect which
counter has overflowed (e.g. Niagara2, Intel P4) are allowed to have as many
probes enabled as the hardware will allow. This will be, at most, the number
of counters available on a processor. On such configurations, multiple probes
can be enabled at any one time.
Probes are enabled by consumers on a first-come, first-served basis. When
hardware resources are fully utilised subsequent enablings will fail until
resources become available.
3. Co-existence with existing tools
The provider has priority over per-LWP libcpc usage (i.e. cputrack)
for access to counters. In the same manner as cpustat, enabling probes
causes all existing per-LWP counter contexts to be invalidated. As long as
these enablings remain active, the counters will remain unavailable to
cputrack-type consumers.
Only one of cpustat and DTrace may use the counter hardware at any one time.
Ownership of the counters is given on a first-come, first-served basis.
4. Limiting Overflow Rate
So as to not saturate the system with overflow interrupts, a default minimum
of 5000 is imposed on the value that can be specified for the 'count'
part of the probename (refer to section 'B1 - Probe Format'). This can be
reduced explicitly by altering the 'dcpc_min_overflow' kernel variable with
mdb(1) or by modifying the dcpc.conf driver configuration file and unloading
and reloading the dcpc driver module.
C. EXAMPLES
1. Instructions executed by applications on an AMD platform:
cpc:::FR_retired_x86_instr_w_excp_intr-user-10000
{
@[execname] = count();
}
# ./user-insts.d
dtrace: script './user-insts.d' matched 2 probes
^C
[chop]
init 138
dtrace 175
nis_cachemgr 179
automountd 183
intrd 235
run-mozilla.sh 306
thunderbird 316
Xorg 453
thunderbird-bin 2370
sshd 8114
2. A kernel profiled by cycle usage on an AMD platform.
cpc:::BU_cpu_clk_unhalted-kernel-10000
{
@[func(arg0)] = count();
}
# ./kerncycprof.d
dtrace: script './kerncycprof.d' matched 1 probe
^C
[chop]
genunix`vpm_sync_pages 478948
genunix`vpm_unmap_pages 496626
genunix`vpm_map_pages 640785
unix`mutex_delay_default 916703
unix`hat_kpm_page2va 988880
tmpfs`rdtmp 991252
unix`hat_page_setattr 1077717
unix`page_try_reclaim_lock 1213379
genunix`free_vpmap 1914810
genunix`get_vpmap 2417896
unix`page_lookup_create 3992197
unix`mutex_enter 5595647
unix`do_copy_fault_nta 27803554
3. L2 cache misses, by function, generated by any running executables
called 'brendan' on an AMD platform.
cpc:::BU_fill_req_missed_L2-all-0x7-10000
/execname == "brendan"/
{
@[ufunc(arg1)] = count();
}
./brendan-l2miss.d
dtrace: script './brendan-l2miss.d' matched 1 probe
CPU ID FUNCTION:NAME
^C
brendan`func_gamma 930
brendan`func_beta 1578
brendan`func_alpha 2945
4. The same example as in example (3) above but using a generic event to
specify L2 data cache misses:
cpc:::PAPI_l2_dcm-all-10000
/execname == "brendan"/
{
@[ufunc(arg1)] = count();
}
# ./papi-l2miss.d
dtrace: script './papi-l2miss.d' matched 1 probe
^C
brendan`func_gamma 1681
brendan`func_beta 2521
brendan`func_alpha 5068
D. REFERENCES
http://bugs.opensolaris.org/view_bug.do?bug_id=6486156
PSARC/2002/180 CPU Performance Counters (CPC) Version 2
PSARC/2008/334 CPU Performance Counter Generic Event Names
E. DOCUMENTATION
A new chapter has been added to the Solaris Dynamic Tracing Guide for this
proposed provider:
http://wikis.sun.com/display/DTrace/Documentation # DTrace Guide
http://wikis.sun.com/display/DTrace/cpc+Provider # CPC Provider Chapter
F. STABILITY
The DTrace internal stability table is described below:
Element Name stability Data stability Dependency class
Provider Evolving Evolving Common
Module Private Private Unknown
Function Private Private Unknown
Name Evolving Evolving CPU
Arguments Evolving Evolving Common