On Mon, Jan 25, 2010 at 10:52:03PM +0100, Christoph Badura wrote: > I am seeing FS corruption on my development server in the source trees. > The server is running Xen on i386 with a 128MB RAM dom0 and 256MB RAM domUs. > I'm using netbsd-5 in the dom0 and some domUs -current in other domUs. > > Typical ways to provoke corruption is rsync'ing a source tree from the > vnd-backed xbd in a domU to local partition in the dom0 or running "cvs > update" in the dom0 on a tree. The most obvious damage was corrupt CVS/Root > and directory contents.
Can you give more details on the corruption ? Was it only directory entries that were corrupted, or did you notice corruptions in the data block too ? I'm seeing panic like: bad dir ino 14212602 at offset 0: mangled entry on NFS servers (a few times a year) and the directory is indeed corrupted on fsck. I've seen this with both netbsd-3 and netbsd-5 > Once I got an I/O error in a domU from the xbd with the sources on it during > a build.sh run. At that point I noticed the following messages in the > kernel message buffer: > > raid1: IO failed after 5 retries. > cgd1: error 5 > xbd IO domain 1: error 5 It seems raidframe doesn't do anything special for memory failure. It returns EIO for the whole request if it can't get an entry from bufio_cache for I/O to one component. Maybe it should wait and retry to I/O later ? dk(4) does this ... -- Manuel Bouyer <[email protected]> NetBSD: 26 ans d'experience feront toujours la difference --
