Hello, I've been looking at performance issues on our NFS server, which I tracked down to overquota writes. The problem is caused by software that do writes without error checkings. When doing this, the nfsd threads becomes 100% busy, and nfs requests from other clients can de delayed by several seconds. To reproduce this, I've used the attached program. Basically it does an endless write, without error checking. I first ran it on a NFS client against a test nfs server and could reproduce the problem. The I ran it directly on the server against the ffs-exported filesystem, and could see a similar behavior: When the uid running it is overquota, the process start using 100% CPU in system and the number of write syscall per second drops dramatically (from about 170 to about 20). I can see there is still some write activity on the disk (about 55 KB/s, 76 writes/s).
The problem is that when we notice we can't do the write, ffs_write() already did some things that needs to be undone. one of them, which is time consuming, is to trucate the file back to its original size. Most of the time is spent in genfs_do_putpages(). The problem here seems to be that we always to a page list walk because endoff is 0. If the file is large enough to have lots of pages in core, a lot of time is spent here. The attached patch improves this a bit, by not always using a list walk. but I wonder if this could cause some pages to be lost until the vnode is recycled. AFAIK v_writesize nor v_size can be shrunk without genfs_do_putpages() being called, but I may have missed something. I'll also see if we can take shortcuts in ffs_writes at last for some trivial cases. -- Manuel Bouyer <bou...@antioche.eu.org> NetBSD: 26 ans d'experience feront toujours la difference --
#include <unistd.h> #include <fcntl.h> #include <errno.h> #include <sys/time.h> main(int argc, const char *argv[]) { char *buf[65536]; int fd; int saved_errno = -1; struct timeval t1, t2; uint64_t ms1, ms2; int i; fd = open(argv[1], O_WRONLY|O_CREAT, 0600); if (fd < 0) { perror("open"); exit(1); } while (1) { if (gettimeofday(&t1, NULL) != 0) { perror("gettimeofday"); } for (i = 0; i < 1000; i++) { if (write(fd, buf, sizeof(buf)) >= 0) { errno = 0; } if (errno != saved_errno) { perror("write"); saved_errno = errno; } } if (gettimeofday(&t2, NULL) != 0) { perror("gettimeofday"); } ms1 = (uint64_t)t1.tv_sec * 1000ULL + (uint64_t)t1.tv_usec / 1000; ms2 = (uint64_t)t2.tv_sec * 1000ULL + (uint64_t)t2.tv_usec / 1000; printf("%f syscalls per second\n", (float)(1000 * 1000) / (float)(ms2 - ms1)); } exit(0); }
Index: genfs_io.c =================================================================== RCS file: /cvsroot/src/sys/miscfs/genfs/genfs_io.c,v retrieving revision 1.53.8.1 diff -u -p -u -r1.53.8.1 genfs_io.c --- genfs_io.c 7 May 2012 03:01:12 -0000 1.53.8.1 +++ genfs_io.c 21 Nov 2012 13:59:45 -0000 @@ -892,12 +900,16 @@ retry: error = 0; wasclean = (vp->v_numoutput == 0); off = startoff; - if (endoff == 0 || flags & PGO_ALLPAGES) { - endoff = trunc_page(LLONG_MAX); + if (endoff == 0) { + if (flags & PGO_ALLPAGES) + endoff = trunc_page(LLONG_MAX); + else + endoff = MAX(vp->v_writesize, vp->v_size); } + by_list = (uobj->uo_npages <= ((endoff - startoff) >> PAGE_SHIFT) * UVM_PAGE_TREE_PENALTY); /* * if this vnode is known not to have dirty pages, * don't bother to clean it out.