nfs performance at high loads
Hello!! Well... thanks for all the suggestions, but we might need to stick with 2.4.2 for various other dependencies, but, I have a surprising thing to report on the observations. I tried the zero-copy patch on 2.4.0, and it seemed to help in solving the memory allocation problem, and also did have some decent throughput and response time ( around 5 milliseconcds or so ). But, with 2.4.2, its horrible!!! Yes, we don't see any memory allocation problems, but nfs seems to have been really screwed up or something. I haven't had the chance to look at the code ( should try to do so soon ), but does anybody have any idea of lurking bugs in this area?? This is totally unacceptable. We see response times of 80 milliseconds!!! There is something really gone wrong here... any ideas?? Thanks Get your own "800" number Voicemail, fax, email, and a lot more http://www.ureach.com/reg/tag - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nfs performance at high loads
Kapish K wrote: > > Hello, > I had sent in a note on nfs performance issues some time back, > and Mark Hemment had been kind enough to point out to the > zerocopy networking patch. Well, we tried with it, and it does > seem to have some improvement, but it seems to have screwed up > nfs performance a bit, because we see a LOT of rpc failures for > all kinds of calls, starting from lookup, to read and writes. > Could this possibly be triggered by this patch ( picked up from > davem's site for 2.4.0 ). > On the other hand, we do plan to migrate to 2.4.2. Can somebody > update me or provide pointers to info. as to whether we can > expect some of these problems have been resolved in 2.4.2? We > should soon be testing on 2.4.2 > Thanks > While I'm not an expert hacker or anything, I can tell you for sure, that even 2.4.2 is full of really system crippling bugs. You need to track the current kernels. All of the 2.4.x series should be compatible, in other words, you should upgrade as soon as possible to the latest stable kernel. Currently, that's 2.4.3 (not 2.4.2). And even 2.4.3 has many known bugs that are capable of 1) destroying performance and 2) destroying filesystems. At the very least, you should upgrade to 2.4.3, but better yet would be to upgrade to the 2.4.4 when it comes out (soon?), or even the 2.4.4-pre patch since it has the zero copy networking patch already included, as well as fixes for bugs that could corrupt your filesystems. The zero copy patch as it existed for 2.4.0 was also buggy in itself, so that would explain some of your extended problems. Really, 2.4.0 is a 'horrible' kernel to be running, as it is missing an enormous amount of performance fixes, and bug fixes. David -- David Mansfield (718) 963-2020 [EMAIL PROTECTED] Ultramaster Group, LLC www.ultramaster.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: nfs performance at high loads
Hello, I had sent in a note on nfs performance issues some time back, and Mark Hemment had been kind enough to point out to the zerocopy networking patch. Well, we tried with it, and it does seem to have some improvement, but it seems to have screwed up nfs performance a bit, because we see a LOT of rpc failures for all kinds of calls, starting from lookup, to read and writes. Could this possibly be triggered by this patch ( picked up from davem's site for 2.4.0 ). On the other hand, we do plan to migrate to 2.4.2. Can somebody update me or provide pointers to info. as to whether we can expect some of these problems have been resolved in 2.4.2? We should soon be testing on 2.4.2 Thanks Get your own "800" number Voicemail, fax, email, and a lot more http://www.ureach.com/reg/tag On Wed, 4 Apr 2001, Mark Hemment ([EMAIL PROTECTED]) wrote: > > I believe David Miller's latest zero-copy patches might help here. > In his patch, the pull-up buffer is now allocated near the top of > stack > (in the sunrpc code), so it can be a blocking allocation. > This doesn't fix the core VM problems, but does relieve the pressure > _slightly_ on the VM (I assume, haven't tried David's patch yet). > > One of the core problems is that the VM keeps no measure of > page fragmentation in the free page pool. The system reaches the state > of > having plenty of free single pages (so kswapd and friends aren't kicked > - or if they are, they do no or little word), and very few buddied pages > (which you need for some of the NFS requests). > > Unfortunately, even with keeping a mesaure of fragmentation, and > insuring work is done when it is reached, doesn't solve the next > problem. > > When a large order request comes in, the inactive_clean page list is > reaped. As reclaim_page() simply selects the "oldest" page it can, with > no regard as to whether it will buddy (now, or 'possibily in the near > future), this list is quickly shrunk by a large order request - far too > quickly for a well behaved system. > > An NFS write request, with an 8K block size, needs an order-2 (16K) > pull > up buffer (we shouldn't really be pulling the header into the same > buffer > as the data - perhaps we aren't any more?). On a well used system, an > order-2 _blocking_ allocation ends up populating the order-0 and order-1 > with quite a few pages from the inactive_clean. > > This then triggers another problem. :( > > As large (non-zero) order requests are always from the NORMAL or DMA > zones, these zones tend to have a lot of free-pages (put there by the > blind reclaim_page() - well, once you can do a blocking allocation they > are, or when the fragmentation kicking is working). > New allocations for pages for the page-cache often ignore the HIGHMEM > zone (it reaches a steady state), and so is passed over by the loop at > the > head of __alloc_pages()). > However, NORMAL and DMA zones tend to be above pages_low (due to the > reason above), and so new page-cache pages came from these zones. On a > HIGHMEM system this leads to thrashing of the NORMAL zone, while the > HIGHMEM zone stays (relatively) quiet. > Note: To make matters even worse under this condition, pulling pages > out > of the NORMAL zone is exactly what you don't want to happen! It would > be > much better if they could be left alone for a (short) while to give them > chance to buddy - Linux (at present) doesn't care about the budding of > pages in the HIGHMEM zone (no non-zero allocations come from there). > > I was working on these problems (and some others) a few months back, > and > will to return to them shortly. Unfortunately, the changes started to > look too large for 2.4 > Also, for NFS, the best solution now might be to give the nfsd threads > a > receive buffer. With David's patches, the pull-up occurs in the context > of a thread, making this possible. > This doesn't solve the problem for other subsystems which do non-zero > order page allocations, but (perhaps) they have a low enough frequency > not > to be of real issue. > > > Kapish, > > Note: Ensure you put a "sync" in your /etc/exports - the default > behaviour was "async" (not legal for a valid SpecFS run). > > Mark > > > On Wed, 4 Apr 2001, Alan Cox wrote: > > > > We have been seeing some problems with running nfs benchmarks > > > at very high loads and were wondering if somebody could show > > > some pointers to where the problem lies. > > > The system is a 2.4.0 kernel on a 6.2 Red at distribution ( so > > > > Use 2.2.19. The 2.4 VM is currently too broken to survive high I/O > benchmark > > tests without going silly > > > > - > > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/l
Re: Re: nfs performance at high loads
> Thanks for the inputs.. But, if we cannot move back to 2.2.19 > and need to stick with 2.4.0 for our own reasons concerning the > work underway, would it be possible to give us a pointer us to > the list of issues related to this problem in the vm, so that we > may attempt to try and get some fixes or workarounds done - > well, they may or may not be accepted into mainstream linux for > various reasons, but we may need to get them fixed to ship our > stuff and may plan to do so.. See the linux-mm list. Or talk to Rik van Riel <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: nfs performance at high loads
Hello, Thanks for the inputs.. But, if we cannot move back to 2.2.19 and need to stick with 2.4.0 for our own reasons concerning the work underway, would it be possible to give us a pointer us to the list of issues related to this problem in the vm, so that we may attempt to try and get some fixes or workarounds done - well, they may or may not be accepted into mainstream linux for various reasons, but we may need to get them fixed to ship our stuff and may plan to do so.. Any pointers, suggestions, opinions, etc. are most welcome.. Thanks Get your own "800" number Voicemail, fax, email, and a lot more http://www.ureach.com/reg/tag On Wed, 4 Apr 2001, Alan Cox ([EMAIL PROTECTED]) wrote: > > We have been seeing some problems with running nfs benchmarks > > at very high loads and were wondering if somebody could show > > some pointers to where the problem lies. > > The system is a 2.4.0 kernel on a 6.2 Red at distribution ( so > > Use 2.2.19. The 2.4 VM is currently too broken to survive high I/O > benchmark > tests without going silly > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nfs performance at high loads
I believe David Miller's latest zero-copy patches might help here. In his patch, the pull-up buffer is now allocated near the top of stack (in the sunrpc code), so it can be a blocking allocation. This doesn't fix the core VM problems, but does relieve the pressure _slightly_ on the VM (I assume, haven't tried David's patch yet). One of the core problems is that the VM keeps no measure of page fragmentation in the free page pool. The system reaches the state of having plenty of free single pages (so kswapd and friends aren't kicked - or if they are, they do no or little word), and very few buddied pages (which you need for some of the NFS requests). Unfortunately, even with keeping a mesaure of fragmentation, and insuring work is done when it is reached, doesn't solve the next problem. When a large order request comes in, the inactive_clean page list is reaped. As reclaim_page() simply selects the "oldest" page it can, with no regard as to whether it will buddy (now, or 'possibily in the near future), this list is quickly shrunk by a large order request - far too quickly for a well behaved system. An NFS write request, with an 8K block size, needs an order-2 (16K) pull up buffer (we shouldn't really be pulling the header into the same buffer as the data - perhaps we aren't any more?). On a well used system, an order-2 _blocking_ allocation ends up populating the order-0 and order-1 with quite a few pages from the inactive_clean. This then triggers another problem. :( As large (non-zero) order requests are always from the NORMAL or DMA zones, these zones tend to have a lot of free-pages (put there by the blind reclaim_page() - well, once you can do a blocking allocation they are, or when the fragmentation kicking is working). New allocations for pages for the page-cache often ignore the HIGHMEM zone (it reaches a steady state), and so is passed over by the loop at the head of __alloc_pages()). However, NORMAL and DMA zones tend to be above pages_low (due to the reason above), and so new page-cache pages came from these zones. On a HIGHMEM system this leads to thrashing of the NORMAL zone, while the HIGHMEM zone stays (relatively) quiet. Note: To make matters even worse under this condition, pulling pages out of the NORMAL zone is exactly what you don't want to happen! It would be much better if they could be left alone for a (short) while to give them chance to buddy - Linux (at present) doesn't care about the budding of pages in the HIGHMEM zone (no non-zero allocations come from there). I was working on these problems (and some others) a few months back, and will to return to them shortly. Unfortunately, the changes started to look too large for 2.4 Also, for NFS, the best solution now might be to give the nfsd threads a receive buffer. With David's patches, the pull-up occurs in the context of a thread, making this possible. This doesn't solve the problem for other subsystems which do non-zero order page allocations, but (perhaps) they have a low enough frequency not to be of real issue. Kapish, Note: Ensure you put a "sync" in your /etc/exports - the default behaviour was "async" (not legal for a valid SpecFS run). Mark On Wed, 4 Apr 2001, Alan Cox wrote: > > We have been seeing some problems with running nfs benchmarks > > at very high loads and were wondering if somebody could show > > some pointers to where the problem lies. > > The system is a 2.4.0 kernel on a 6.2 Red at distribution ( so > > Use 2.2.19. The 2.4 VM is currently too broken to survive high I/O benchmark > tests without going silly > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nfs performance at high loads
> We have been seeing some problems with running nfs benchmarks > at very high loads and were wondering if somebody could show > some pointers to where the problem lies. > The system is a 2.4.0 kernel on a 6.2 Red at distribution ( so Use 2.2.19. The 2.4 VM is currently too broken to survive high I/O benchmark tests without going silly - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
nfs performance at high loads
Hello, We have been seeing some problems with running nfs benchmarks at very high loads and were wondering if somebody could show some pointers to where the problem lies. The system is a 2.4.0 kernel on a 6.2 Red at distribution ( so nfs utils from 6.2 and the nfsd of 2.4.0 ) - the benchmark run is the SPECsfs97 benchmarks that runs through a series of the nfs operations. We have about 4 nfs clients, each invoking the operations via 8 processes. Everything goes fine till the 500-1000 IOPs mark - no errors, response time is good (0.8 sec/op )and throughput is as expected. But at the 1500 IOPs mark, errors show up ( nfs operations failure ) and response time drops to 1.4 Msec/Op. Continue to 2000 IOPs, there is a drop in the error count and the response time improves to 1.0 Msec/Op. But from there on, it gets worse, at 2500 IOPs and 3000 IOPs with huge number of nfs errors and finally the nfs server console scrolls on with an endless number of 'alloc-pages: 0-order allocation failed' and the clients shutdown due to too many rpc call failures and all that can be done on the server is to reboot the system as it becomes practically locked for all purposes. Any hints or directions to follow or as to whether such a benchmark testing has been carried out by somebody else for nfs performance would be very much appreciated. Thanks, KK Get your own "800" number Voicemail, fax, email, and a lot more http://www.ureach.com/reg/tag - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/