On 01/13/2011 04:42 PM, Andrew Deason wrote:
On Thu, 13 Jan 2011 15:24:00 -0500
Dale Pontius<[email protected]> wrote:
I'm wondering if it's possible to collect access time statistics out
of an OpenAFS Linux client.
"access time" is a bit vague to me; you just want to see how quickly it
is getting a response from the fileserver? There are numerous steps
involved in fetching data, and the cause of bad performance could be in
many places.
I guess I'm thinking "round-trip time" from data request to data
response. I installed and fired up wireshark a while back, with the
thought of tying together request and response packets to measure
response time. But this is far from my normal job, I was just starting
to play with wireshark, and normal-job demands pushed it to the back burner.
A little time with google and I see the "-enable_peer_stats" and
"-enable_process_stats" options when starting the client daemon, and
this very well may furnish the information that I need.
You don't need to start the client with those options; see the 'fs
rxstatpeer' and 'fs rxstatproc' commands to turn the stats on and off.
However, your bigger problem is retrieving the statistics. I don't think
we offer much in the tree that's very useful; you can try
src/libadmin/samples/rxstat_get_peer and rxstat_get_process, but I don't
expect them to be very robust. Of course, I'm not sure if there are
other tools to retrieve the data floating around somewhere (or in
IBM...).
Perhaps there are tools, but I don't have them. In fact, the "standard"
deployments don't even have some of the standard OpenAFS tools like
afsmonitor and it's underlying programs. My system is multiboot, with
one of my options being Gentoo, and it's OpenAFS install is more
complete. I have played with afsmonitor a little, rapidly getting
swamped in information. At the time I was hoping to tune my cache
parameters, and again normal-job demands pushed that to the back-burner,
too.
A subsequent search gets me to the "rxdebug" document, though that
document appears to be server-centric as opposed to querying the
client. Nor does it tell me what information I can collect or if
access time is part of that information - only mentioning serveral
parameters that it does collect.
rxdebug is useful for clients and servers. The 'rxdebug -rxstats'
statistics and other information are useful for debugging performance
problems, but won't tell you much about time taken to process RPCs. It's
more useful for just indicating if there's a problem with packets
getting lost or if there's some other problems interfering with packets
and such.
If you just want the RTT to the various fileservers, 'rxdebug -peers'
can tell you that. The RTT calculated by Rx isn't always accurate
(depending on the version in use and other factors), but it will tell
you what Rx thinks the RTT is.
Oh, and also, 'rxdebug' can be used as a simple test of fileserver
overloaded-ness. If you just run 'rxdebug<fileserver>', you'll see a
couple of lines that say
X calls waiting for a thread
and
Y calls have waited for a thread
Which is how many calls are currently not being serviced due to a lack
of available threads, and a running count of how many calls have waited,
respectively. You normally want them to be 0; the higher they are, the
slower the fileserver is going to be.
I'll have to give this a try. I know that "thread waiting" is one of
the things that they have looked at and occasionally found, but is not
all of the problem that we see.
Can someone toss me a bone here - or a link?
If you want something quick, you can look at the output of
$ xstat_cm_test<client> -collID 2 -onceonly
Which will give you a bunch of statistics for the client. Many of the
fields are briefly described here:
<http://docs.openafs.org/AdminGuide/apc.html#HDRWQ618>.
For RPC timings, for reading data you probably want to be looking at
FetchStatus, FetchData, and InlineBulkStatus.
I'm currently running Fedora Core 13 on a multiboot machine, and:
[user@hostname~]$ xstat_cm_test hostname -collID 2 -onceonly
Starting up the xstat_cm service, no debugging, one-shot operation
-----------------------------------------------------------
** Data size mismatch in performance collection!** Expecting 1064, got 759
** Version mismatch with Cache Manager
[user@hostname~]$
I'll have to reboot with Gentoo and give this another try.
Dale
--
Dale Pontius
Senior Engineer
IBM Corporation
Phone: (802) 769-6850
Tie-Line: 446-6850
email: [email protected]
This e-mail and its attachments, if any, may contain confidential and
privileged material for the sole use of the intended recipient. Any review,
use, distribution or disclosure by others is strictly prohibited. If you are
not the intended recipient (or authorized to receive for the recipient), please
contact the sender by reply e-mail and delete all copies of this message from
your system without copying it and notify sender of the misdirection by reply
e-mail.
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info