On Sat, 2007-11-17 at 19:42 -0500, Tom Sightler wrote: > Running this command on RHEL5.1 consistently produces errors on the > filesystem with messages like the following: > > EXT3-fs error (device dm-19) in ext3_orphan_del: Journal has aborted > EXT3-fs error (device dm-19) in ext3_reserve_inode_write: Journal has > aborted > __journal_remove_journal_head: freeing b_committed_data > __journal_remove_journal_head: freeing b_committed_data > attempt to access beyond end of device > dm-19: rw=0, want=17247241224, limit=4688363520 > attempt to access beyond end of device > > If we reboot the very same hardware with RHEL4.5, mount the same volume, > and run the same test it works perfectly every time. > > Has anyone else run significant I/O stress test on RHEL5.1 yet? We have > not been able to reproduce this issue with non-striped volumes but we're > still very early in our testing and are just looking for community > feedback before taking up the problem with Redhat.
I know it's poor form to reply to myself but looking deeper into the test results it seems the corruption is only happening when the underlying physical volumes are using dm-multipath with round-robin load balancing, and perhaps only with certain hardware. We can easily reproduce the issue with a simple partition over a single dm-multipath device to a LUN on an Apple Xserve RAID. This still seems like it's probably a bug since the exact same config works flawlessly with RHEL4.5 and hardware works fine with round-robin. Changing the policy to "failover" rather than "multibus" seems to work around the problem since that makes only one path active. We'll do more testing with a wider array of storage next week but I'd still love to hear from others that might be running dm-multipath with round-robin load balancing if their seeing any issues with 5.1. Thanks, Tom _______________________________________________ rhelv5-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv5-list
