On 01/13/2011 04:42 PM, Andrew Deason wrote:
On Thu, 13 Jan 2011 15:24:00 -0500
Dale Pontius<[email protected]>  wrote:

I'm wondering if it's possible to collect access time statistics out
of an OpenAFS Linux client.
"access time" is a bit vague to me; you just want to see how quickly it
is getting a response from the fileserver? There are numerous steps
involved in fetching data, and the cause of bad performance could be in
many places.
I guess I'm thinking "round-trip time" from data request to data response. I installed and fired up wireshark a while back, with the thought of tying together request and response packets to measure response time. But this is far from my normal job, I was just starting to play with wireshark, and normal-job demands pushed it to the back burner.
A little time with google and I see the "-enable_peer_stats" and
"-enable_process_stats" options when starting the client daemon, and
this very well may furnish the information that I need.
You don't need to start the client with those options; see the 'fs
rxstatpeer' and 'fs rxstatproc' commands to turn the stats on and off.

However, your bigger problem is retrieving the statistics. I don't think
we offer much in the tree that's very useful; you can try
src/libadmin/samples/rxstat_get_peer and rxstat_get_process, but I don't
expect them to be very robust. Of course, I'm not sure if there are
other tools to retrieve the data floating around somewhere (or in
IBM...).
Perhaps there are tools, but I don't have them. In fact, the "standard" deployments don't even have some of the standard OpenAFS tools like afsmonitor and it's underlying programs. My system is multiboot, with one of my options being Gentoo, and it's OpenAFS install is more complete. I have played with afsmonitor a little, rapidly getting swamped in information. At the time I was hoping to tune my cache parameters, and again normal-job demands pushed that to the back-burner, too.
A subsequent search gets me to the "rxdebug" document, though that
document appears to be server-centric as opposed to querying the
client.  Nor does it tell me what information I can collect or if
access time is part of that information - only mentioning serveral
parameters that it does collect.
rxdebug is useful for clients and servers. The 'rxdebug -rxstats'
statistics and other information are useful for debugging performance
problems, but won't tell you much about time taken to process RPCs. It's
more useful for just indicating if there's a problem with packets
getting lost or if there's some other problems interfering with packets
and such.

If you just want the RTT to the various fileservers, 'rxdebug -peers'
can tell you that. The RTT calculated by Rx isn't always accurate
(depending on the version in use and other factors), but it will tell
you what Rx thinks the RTT is.

Oh, and also, 'rxdebug' can be used as a simple test of fileserver
overloaded-ness. If you just run 'rxdebug<fileserver>', you'll see a
couple of lines that say

X calls waiting for a thread

and

Y calls have waited for a thread

Which is how many calls are currently not being serviced due to a lack
of available threads, and a running count of how many calls have waited,
respectively. You normally want them to be 0; the higher they are, the
slower the fileserver is going to be.
I'll have to give this a try. I know that "thread waiting" is one of the things that they have looked at and occasionally found, but is not all of the problem that we see.
Can someone toss me a bone here - or a link?
If you want something quick, you can look at the output of

$ xstat_cm_test<client>  -collID 2 -onceonly

Which will give you a bunch of statistics for the client. Many of the
fields are briefly described here:
<http://docs.openafs.org/AdminGuide/apc.html#HDRWQ618>.

For RPC timings, for reading data you probably want to be looking at
FetchStatus, FetchData, and InlineBulkStatus.
I'm currently running Fedora Core 13 on a multiboot machine, and:
[user@hostname~]$ xstat_cm_test hostname -collID 2 -onceonly

Starting up the xstat_cm service, no debugging, one-shot operation

-----------------------------------------------------------
** Data size mismatch in performance collection!** Expecting 1064, got 759
** Version mismatch with Cache Manager
[user@hostname~]$

I'll have to reboot with Gentoo and give this another try.

Dale
--

Dale Pontius
Senior Engineer
IBM Corporation
Phone: (802) 769-6850
Tie-Line: 446-6850
email: [email protected]

This e-mail and its attachments, if any, may contain confidential and 
privileged material for the sole use of the intended recipient. Any review, 
use, distribution or disclosure by others is strictly prohibited. If you are 
not the intended recipient (or authorized to receive for the recipient), please 
contact the sender by reply e-mail and delete all copies of this message from 
your system without copying it and notify sender of the misdirection by reply 
e-mail.

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to