On Tue, 13 Nov 2012, Richard Brittain wrote:
While testing new client installs, I've got a regular habit of banging hard
on my fileservers and checking the md5sum of a bunch of random files. I came
across an odd error recently with this scenario:
- Client (doesn't seem to matter what platform) writes a bunch of largish
files to fileserver.
- Linux client tries to read same files before they have finished writing.
Mostly this results in premature EOF, but eventually the whole file can be
read and the checksum is correct.
- Occasionally the short file results in corrupt blocks in cache, which the
local client thinks are good, and when the complete file is available, the
checksum is wrong. Running 'cmp' between the bad file and a copy of the
original shows a similar number of changed bytes (~4k) regardless of size of
file.
More testing shows that every time I create this scenario, it is the
first 4kB of the file that has been replaced by nulls. The initial test
was confusing because some of my test files contain nulls.
- Run 'fs flushvolume' on the client, and recompute md5sum, and it always
checks out fine, so the fileserver has correct data.
Tested with 1.6.1 client on RHEL5 and RHEL6, 1.6.1 fileserver on RHEL5 and
RHEL6. Reasonably reproducible, although the locations in the files might
change. Small files don't show problems, but I never get partial reads on
them. If I'm patient and let the files finish copying to the server, there
is never a problem.
Richard
--
Richard Brittain, Research Computing Group,
Computing Services, 37 Dewey Field Road, HB6219
Dartmouth College, Hanover NH 03755
[email protected] 6-2085
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info