Hi All,
It looks like we’ve run into an NFS client bug in SL4. We stumbled
upon this while trying to checkout code from a subversion repository
to an nfs directory. Our NFS servers are SL3, and we only see this
bug with SL4 clients. Things work when mounting the directory using
‘noac’, but we can’t live with the performance hit.
We get the following error when running an svn checkout on SL4 from
an nfs-mounted directory:
REPORT request failed on '/svn/!svn/vcc/default'
svn: REPORT of '/svn/!svn/vcc/default': 400 Bad Request (https://
accserv)
In looking at system call traces for SL3 and SL4 clients checking
out --here's where we think it's going pear-shaped, about 1800
syscalls in:
open("bmad/.svn/tmp/tempfile.tmp", O_RDWR|O_CREAT|O_EXCL, 0666) = 3
[...]
write(3, "<S:update-report send-all=\"true\""..., 218) = 218
-fstat64(3, {st_mode=S_IFREG|0644, st_size=218, ...}) = 0
+fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
'-' is SL3, '+' is SL4. So it's opening a temporary file under
the working area, writing 218 bytes to it, and immediately
statting it. SL3 gets 218 bytes, SL4 gets 0. On SL3, the next
step is to rewind the file, read 218 bytes, and send something
(probably the same 218 bytes SSL encrypted) to the svn server.
With the attribute cache on, the answer to the fstat64 comes out
of the cache; with it off, the write has to commit and the fstat64
round-trips to the server, so this is all looking consistent with
an NFS attribute cache problem on SL4.
doing a second fstat gets the right answer, so
that could be a workaround if we can identify the right spot in the
svn code. (We suppose it's possible this is timing related, in which
case the
second stat might not *reliably* fix the problem...)
Here is a test program to demonstrate the bug.
dsr_lnxcu9% cat svnbug.c
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#ifndef TESTSIZE
#define TESTSIZE 52
#endif
int main(int argc, char** argv)
{
char s[TESTSIZE+1];
struct stat st1, st2;
int r;
ssize_t len;
int fd = open("tmpfile.xyzzy", O_RDWR|O_CREAT|O_EXCL, 0666);
if (fd < 0) {
perror("open");
exit(errno);
}
memset(s, 'x', TESTSIZE);
len = write(fd, s, TESTSIZE);
if (len < 0) {
perror("write");
exit(errno);
}
r = fstat(fd, &st1);
if (0 != r) {
perror("fstat");
exit(errno);
}
r = fstat(fd, &st2);
if (0 != r) {
perror("fstat");
exit(errno);
}
printf("len = %zd, st1 = %zd, st2 = %zd\n",
len, st1.st_size, st2.st_size);
close(fd);
return 0;
}
dsr_lnxcu9% gcc svnbug.c
dsr_lnxcu9% ~/a.out
len = 52, st1 = 52, st2 = 52
dsr_lnxcu9% cd /cdat/tem/dsr
dsr_lnxcu9% ~/a.out
len = 52, st1 = 0, st2 = 52
I have also posted a message to the Subversion developers mailing
list, and here's a bugzilla report with TUV:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=236308
Finally, here are others with the same problem:
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=4875
Any suggestions or workaround would be greatly appreciated.
Devin
------
Devin Bougie
Laboratory for Elementary-Particle Physics
[EMAIL PROTECTED]