Hi Tom!

On Jun 19, 2005, at 5:03 AM, Tom Keiser wrote:

On 6/16/05, Roland Kuhn <[EMAIL PROTECTED]> wrote:

Dear experts!

We are fighting with the fileserver performance since a long time.
Once I got the advice to use the single threaded fileserver, which
helped, but didn't get me more than 10MB/s. Now we upgraded to Debian
sarge (openafs 1.3.81), which comes again with the threaded server.
With default settings we get 1MB/s (the underlying RAID can easily
deliver >200MB/s, which shows that the VM settings are okay). Now I
tried with -L -vc 10000 -cb 100000 -udpsize 12800, which brings it
back to about 6MB/s (all numbers with >>1 simultanous clients
reading). This is still a factor 30 below the capabilities of the
RAID (okay, we only have 1GB/s ethernet ;-) ). I've seen excessive
context switch rates (>>100000/s), which obviously don't happen with
the single threaded fileserver.

So, can anybody comment on these numbers? Those are dual Opteron
boxes with enough RAM, so please make some suggestions what options I
should try to get more like the real performance of a fileserver...


This is very interesting.  On much much older hardware (2x 300MHz sun
e450 running solaris 10) I can get >15MB/s aggregate off a single
fc-al disk with >> 1 clients over gigE with absolutely no tweaking of
fileserver parameters.  Of course, there are many performance
bottlenecks in multithreading that are actually exacerbated by faster
cpu's, so the results you're seeing are plausible.

Interesting indeed. Thanks for the very detailed answer, it'll take some time for me to digest it ;-)

Right now I'm using the single threaded fileserver with

-L -vc 10000 -udpsize 128000 -rxpck 400 -busyat 200 -cb 100000

and I'm getting 11MB/s using 8 slow and 3 fast clients. The workload is sequential reading of 1.5GB big files (thousands of those).

I'd be interested in seeing a comparison of 1.3.81 and 1.3.84
performance.  Several threading patches were integrated between these
revisions, and it would be interesting to see how they affect your
problem.  I know on sparc they are making a difference, but that
doesn't necessarily correllate to amd64.

We're not using amd64 yet, still on i386 Debian. But I can easily compile the 1.3.84 fileserver, unless there are problems to be expected with 1.2.13 clients (I've not seen any so far with the 1.3.81-sarge1 fileserver).

If you upgrade to 1.3.84, there is another fileserver option that you
will want to experiment with: -rxpck.  Sometime after 1.3.81,
thread-local packet queues were integrated, and they may reduce your
context switch rate due to less contention for the global packet queue
lock.  The default value for -rxpck will give you approximately 500
rx_packet structures.  I recommend trying several values in the range
1000-5000.  At some point, you will reach an optimal tradeoff between
a small value that fits within your cache hierarchy, and a large value
that reduces the number of transfers between the thread-local and
global rx_packet queues.  Before submitting the thread-local patch to
RT, I was only able to test on a few architectures, and I'd like to
get feedback for amd64.

Sorry, the upgrade to amd64 will only be done in a few months :-(

Another option you might care to experiment with is: -p .  IIRC, the
default will give you 12 worker threads.  It sounds like many of your
worker threads are busy handling calls, but are constantly contending
over locks, and blocking on i/o.  You will need to experiment with
this, but you may find that reducing the number of worker threads will
actually improve performance by forcing new calls to queue up, thereby
allowing your active calls to complete with less contention.  Of
course, this won't alleviate the problems caused by blocking i/o.
Reducing this value too far is dangerous because some calls have high
latencies (e.g. some calls make calls to the ptserver).

Have you looked at the xstat results from your servers?  afsmonitor is
a great little tool, and it can even dump these results periodically
to a log.  This data could help us to understand your workload.
Seeing those numbers would also help us with suggesting changes to
parameters in the volume package.

Well, I have no idea what the numbers mean, but I've attached the output of "afsmonitor -detailed -fshost <XXX>". In this case the workload was such that the computation on the client side limited the throughput to about 8MB/s. The problem is that this fileserver is very much used from our cluster and I cannot easily introduce downtime or change the workload.

Thanks for all your help!

Ciao,
                    Roland

Attachment: afs.log
Description: Binary data


--
TU Muenchen, Physik-Department E18, James-Franck-Str. 85747 Garching
Telefon 089/289-12592; Telefax 089/289-12570
--
A mouse is a device used to point at
the xterm you want to type in.
Kim Alm on a.s.r.
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P-(+) L+++ E(+) W+ !N K- w--- M + !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++
------END GEEK CODE BLOCK------


Attachment: PGP.sig
Description: This is a digitally signed message part

Reply via email to