Since I was the one who originally created this topic I'd like to
restate what I said that got this all started. I'm trying to do
relatively lightweight monitoring of lots of system performance counters
(on the order of 100-200 or more) across a number of subsystems using
standard interfaces. While I don't feel the need to be able to take
10Ks of samples/sec I would like to at least run efficiently at 1-10/sec
range. I also want to avoid writing custom kernel code and/or talking
directly to hardware.
As I said in my base note I'm currently reading from /proc while some
sets of counters are better organized than others, I can still access
them relatively efficiently. While I could certainly "get by" reading
one variable per file, I do worry about the overhead as the sampling
frequency goes down. This will also be a problem as the number of
counters and devices grow. The suggestion about using perfquery would
certainly work, but I'd also be concerned about the overhead in running
it at smaller sampling intervals.
I certainly understand the desire to move to sysfs and that
/usr/src/linux/Documentation/filesystems/sysfs.txt states that "Mixing
types, expressing multiple lines of data, and doing fancy formatting of
data is heavily frowned upon. Doing these things may get you publically
humiliated and your code rewritten without notice." However, I don't
read this to mean you must only have one data item per file. For
example, I took a look at /sys/block/hda/stat because one of the types
of data I collect is disk stats and I was wondering how sysfs dealt with
them. Sure enough, they're all in one file per disk as shown below:
dl380-2: cat /sys/block/hda/stat
0 0 0 0 0 0 0
0 0 0 0
Also note some of these count bytes, some sectors and other jiffies, so
even the units need not be identical.
-mark
Hal Rosenstock wrote:
On Mon, 2005-05-23 at 12:27, Sean Hefty wrote:
Are there any performance counters that aren't available through the PMA
MADs? If not, is there any reason why the PMA interface shouldn't be used
for programmatic access?
All the counters found in:
/sys/class/infiniband/mthca0/ports/1/counters
excessive_buffer_overrun_errors port_rcv_remote_physical_errors
link_downed port_rcv_switch_relay_errors
link_error_recovery port_xmit_constraint_errors
local_link_integrity_errors port_xmit_data
port_rcv_constraint_errors port_xmit_discards
port_rcv_data port_xmit_packets
port_rcv_errors symbol_error
port_rcv_packets VL15_dropped
are available via the PMA (and via the perfquery tool):
/usr/local/ib/bin/perfquery 1 1
# Port counters: Lid 0x1 port 1
PortSelect:......................1
CounterSelect:...................0x0000
SymbolErrors:....................10344
LinkRecovers:....................255
LinkDowned:......................4
RcvErrors:.......................0
RcvRemotePhysErrors:.............0
RcvSwRelayErrors:................0
XmtDiscards:.....................19
XmtConstraintErrors:.............0
RcvConstraintErrors:.............0
LinkIntegrityErrors:.............0
ExcBufOverrunErrors:.............0
VL15Dropped:.....................0
XmtBytes:........................126990
RcvBytes:........................126952
XmtPkts:.........................1791
RcvBytes:........................1790
One advantage is that all counters are retrieved with one MAD.
-- Hal
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general