On Tue, 13 Nov 2012, Richard Brittain wrote:

While testing new client installs, I've got a regular habit of banging hard on my fileservers and checking the md5sum of a bunch of random files. I came across an odd error recently with this scenario:

- Client (doesn't seem to matter what platform) writes a bunch of largish files to fileserver.

- Linux client tries to read same files before they have finished writing.
Mostly this results in premature EOF, but eventually the whole file can be read and the checksum is correct.

- Occasionally the short file results in corrupt blocks in cache, which the local client thinks are good, and when the complete file is available, the checksum is wrong. Running 'cmp' between the bad file and a copy of the original shows a similar number of changed bytes (~4k) regardless of size of file.

More testing shows that every time I create this scenario, it is the first 4kB of the file that has been replaced by nulls. The initial test was confusing because some of my test files contain nulls.

- Run 'fs flushvolume' on the client, and recompute md5sum, and it always checks out fine, so the fileserver has correct data.

Tested with 1.6.1 client on RHEL5 and RHEL6, 1.6.1 fileserver on RHEL5 and RHEL6. Reasonably reproducible, although the locations in the files might change. Small files don't show problems, but I never get partial reads on them. If I'm patient and let the files finish copying to the server, there is never a problem.


Richard


--
Richard Brittain,  Research Computing Group,
                   Computing Services, 37 Dewey Field Road, HB6219
                   Dartmouth College, Hanover NH 03755
[email protected] 6-2085
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to