Since I was the one who originally created this topic I'd like to restate what I said that got this all started. I'm trying to do relatively lightweight monitoring of lots of system performance counters (on the order of 100-200 or more) across a number of subsystems using standard interfaces. While I don't feel the need to be able to take 10Ks of samples/sec I would like to at least run efficiently at 1-10/sec range. I also want to avoid writing custom kernel code and/or talking directly to hardware.

As I said in my base note I'm currently reading from /proc while some sets of counters are better organized than others, I can still access them relatively efficiently. While I could certainly "get by" reading one variable per file, I do worry about the overhead as the sampling frequency goes down. This will also be a problem as the number of counters and devices grow. The suggestion about using perfquery would certainly work, but I'd also be concerned about the overhead in running it at smaller sampling intervals.

I certainly understand the desire to move to sysfs and that /usr/src/linux/Documentation/filesystems/sysfs.txt states that "Mixing types, expressing multiple lines of data, and doing fancy formatting of data is heavily frowned upon. Doing these things may get you publically humiliated and your code rewritten without notice." However, I don't read this to mean you must only have one data item per file. For example, I took a look at /sys/block/hda/stat because one of the types of data I collect is disk stats and I was wondering how sysfs dealt with them. Sure enough, they're all in one file per disk as shown below:

dl380-2: cat /sys/block/hda/stat
0 0 0 0 0 0 0 0 0 0 0

Also note some of these count bytes, some sectors and other jiffies, so even the units need not be identical.

-mark

Hal Rosenstock wrote:

On Mon, 2005-05-23 at 12:27, Sean Hefty wrote:
Are there any performance counters that aren't available through the PMA MADs? If not, is there any reason why the PMA interface shouldn't be used for programmatic access?

All the counters found in:
/sys/class/infiniband/mthca0/ports/1/counters
excessive_buffer_overrun_errors  port_rcv_remote_physical_errors
link_downed                      port_rcv_switch_relay_errors
link_error_recovery              port_xmit_constraint_errors
local_link_integrity_errors      port_xmit_data
port_rcv_constraint_errors       port_xmit_discards
port_rcv_data                    port_xmit_packets
port_rcv_errors                  symbol_error
port_rcv_packets                 VL15_dropped

are available via the PMA (and via the perfquery tool):
/usr/local/ib/bin/perfquery 1 1
# Port counters: Lid 0x1 port 1
PortSelect:......................1
CounterSelect:...................0x0000
SymbolErrors:....................10344
LinkRecovers:....................255
LinkDowned:......................4
RcvErrors:.......................0
RcvRemotePhysErrors:.............0
RcvSwRelayErrors:................0
XmtDiscards:.....................19
XmtConstraintErrors:.............0
RcvConstraintErrors:.............0
LinkIntegrityErrors:.............0
ExcBufOverrunErrors:.............0
VL15Dropped:.....................0
XmtBytes:........................126990
RcvBytes:........................126952
XmtPkts:.........................1791
RcvBytes:........................1790

One advantage is that all counters are retrieved with one MAD.

-- Hal
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to