Forgot to add that this issue is limited to metaecc. So you could avoid the issue in your same setup by not enabling metaecc on the volume. And last I checked mkfs did not enable it by default.
On Mon, Aug 27, 2012 at 10:35 AM, Sunil Mushran <sunil.mush...@gmail.com>wrote: > So you are running into a bug that has been fixed in 2.6.36. Upgrade to > that version, > if not something more current. > > $ git describe --tags 13ceef09 > v2.6.35-rc3-14-g13ceef0 > > commit 13ceef099edd2b70c5a6f3a9ef5d6d97cda2e096 > Author: Jan Kara <j...@suse.cz> > Date: Wed Jul 14 07:56:33 2010 +0200 > > jbd2/ocfs2: Fix block checksumming when a buffer is used in several > transactions > > OCFS2 uses t_commit trigger to compute and store checksum of the just > committed blocks. When a buffer has b_frozen_data, checksum is computed > for it instead of b_data but this can result in an old checksum being > written to the filesystem in the following scenario: > > 1) transaction1 is opened > 2) handle1 is opened > 3) journal_access(handle1, bh) > - This sets jh->b_transaction to transaction1 > 4) modify(bh) > 5) journal_dirty(handle1, bh) > 6) handle1 is closed > 7) start committing transaction1, opening transaction2 > 8) handle2 is opened > 9) journal_access(handle2, bh) > - This copies off b_frozen_data to make it safe for transaction1 > to commit. > jh->b_next_transaction is set to transaction2. > 10) jbd2_journal_write_metadata() checksums b_frozen_data > 11) the journal correctly writes b_frozen_data to the disk journal > 12) handle2 is closed > - There was no dirty call for the bh on handle2, so it is never > queued for > any more journal operation > 13) Checkpointing finally happens, and it just spools the bh via > normal buffer > writeback. This will write b_data, which was never triggered on and > thus > contains a wrong (old) checksum. > > This patch fixes the problem by calling the trigger at the moment data > is > frozen for journal commit - i.e., either when b_frozen_data is created > by > do_get_write_access or just before we write a buffer to the log if > b_frozen_data does not exist. We also rename the trigger to t_frozen as > that better describes when it is called. > > Signed-off-by: Jan Kara <j...@suse.cz> > Signed-off-by: Mark Fasheh <mfas...@suse.com> > Signed-off-by: Joel Becker <joel.bec...@oracle.com> > > > On Mon, Aug 27, 2012 at 5:10 AM, Rory Kilkenny > <rory.kilke...@ticoon.com>wrote: > >> # uname -a >> Linux FILEt1 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53 >> +0100 x86_64 x86_64 x86_64 GNU/Linux >> >> # modinfo ocfs2 >> filename: /lib/modules/2.6.34.7-0.7-desktop/kernel/fs/ocfs2/ocfs2.ko >> license: GPL >> author: Oracle >> version: 1.5.0 >> description: OCFS2 1.5.0 >> srcversion: B13569B35F99D43FA80D129 >> depends: jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager >> vermagic: 2.6.34.7-0.7-desktop SMP preempt mod_unload modversions >> >> # mkfs.ocfs2 --version >> mkfs.ocfs2 1.4.3 >> >> >> >> >> On 12-08-24 5:44 PM, "Sunil Mushran" <sunil.mush...@gmail.com> wrote: >> >> What is the version of the kernel, ocfs2 and ocfs2 tools? >> >> uname -a >> modinfo ocfs2 >> mkfs.ocfs2 --version >> >> On Fri, Aug 24, 2012 at 1:09 PM, Rory Kilkenny <rory.kilke...@ticoon.com> >> wrote: >> >> We have an HP P2000 G3 Storage array, fiber connected. The storage array >> has a RAID5 array broken into 2 physical OCFS2 volumes (A & B). >> >> A & B are both mounted and formatted as NTFS. >> >> One of the volumes is NFS mounted. >> >> Every couple of months or so we start getting tons of errors on the NFS >> mounted volume: >> >> >> Aug 24 09:48:13 FILEt2 kernel: [2234285.848940] >> (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed: >> stored: 0, computed 1467126086. Applying ECC. >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849252] >> (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32 >> failed: stored: 0, computed 3828104806 >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849256] >> (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed >> for extent block 1169089 >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849261] >> (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5 >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849264] >> (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5 >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849267] >> (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5 >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849270] >> (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5 >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849274] >> (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5 >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849280] >> (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5 >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849284] >> (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5 >> Aug 24 09:48:13 FILEt2 kernel: [2234285.849287] >> (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5 >> >> >> If we pull all the data off, destroy the volume, rebuilt it, and copy our >> data back, all works fine; for a while. >> >> This issue does not happen on the non NFS mounted volume. I am currently >> assuming the issue is with NFS and how we have it configured (which to the >> best of my knowledge is default). >> >> Has anyone had a similar experience and be able to share some insight and >> knowledge on any tricks with NFS and OCFS2 volumes? >> >> Thanks in advance. >> >> >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-users >> >> >> >> >
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users