[OpenAFS] Re: Possible cache corruption with linux client and 1.6.1 fileserver

Richard Brittain Tue, 13 Nov 2012 08:27:04 -0800

On Tue, 13 Nov 2012, Richard Brittain wrote:

While testing new client installs, I've got a regular habit of banging hardon my fileservers and checking the md5sum of a bunch of random files. I cameacross an odd error recently with this scenario:
- Client (doesn't seem to matter what platform) writes a bunch of largishfiles to fileserver.
- Linux client tries to read same files before they have finished writing.
Mostly this results in premature EOF, but eventually the whole file can beread and the checksum is correct.
- Occasionally the short file results in corrupt blocks in cache, which thelocal client thinks are good, and when the complete file is available, thechecksum is wrong. Running 'cmp' between the bad file and a copy of theoriginal shows a similar number of changed bytes (~4k) regardless of size offile.

More testing shows that every time I create this scenario, it is thefirst 4kB of the file that has been replaced by nulls. The initial testwas confusing because some of my test files contain nulls.

- Run 'fs flushvolume' on the client, and recompute md5sum, and it alwayschecks out fine, so the fileserver has correct data.
Tested with 1.6.1 client on RHEL5 and RHEL6, 1.6.1 fileserver on RHEL5 andRHEL6. Reasonably reproducible, although the locations in the files mightchange. Small files don't show problems, but I never get partial reads onthem. If I'm patient and let the files finish copying to the server, thereis never a problem.
Richard


--
Richard Brittain,  Research Computing Group,
                   Computing Services, 37 Dewey Field Road, HB6219
                   Dartmouth College, Hanover NH 03755
[email protected] 6-2085
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

[OpenAFS] Re: Possible cache corruption with linux client and 1.6.1 fileserver

Reply via email to