Forgot to add that this issue is limited to metaecc. So you could avoid the
issue in your
same setup by not enabling metaecc on the volume. And last I checked mkfs
did not
enable it by default.

On Mon, Aug 27, 2012 at 10:35 AM, Sunil Mushran <sunil.mush...@gmail.com>wrote:

> So you are running into a bug that has been fixed in 2.6.36. Upgrade to
> that version,
> if not something more current.
>
> $ git describe --tags 13ceef09
> v2.6.35-rc3-14-g13ceef0
>
> commit 13ceef099edd2b70c5a6f3a9ef5d6d97cda2e096
> Author: Jan Kara <j...@suse.cz>
> Date:   Wed Jul 14 07:56:33 2010 +0200
>
>     jbd2/ocfs2: Fix block checksumming when a buffer is used in several
> transactions
>
>     OCFS2 uses t_commit trigger to compute and store checksum of the just
>     committed blocks. When a buffer has b_frozen_data, checksum is computed
>     for it instead of b_data but this can result in an old checksum being
>     written to the filesystem in the following scenario:
>
>     1) transaction1 is opened
>     2) handle1 is opened
>     3) journal_access(handle1, bh)
>         - This sets jh->b_transaction to transaction1
>     4) modify(bh)
>     5) journal_dirty(handle1, bh)
>     6) handle1 is closed
>     7) start committing transaction1, opening transaction2
>     8) handle2 is opened
>     9) journal_access(handle2, bh)
>         - This copies off b_frozen_data to make it safe for transaction1
> to commit.
>           jh->b_next_transaction is set to transaction2.
>     10) jbd2_journal_write_metadata() checksums b_frozen_data
>     11) the journal correctly writes b_frozen_data to the disk journal
>     12) handle2 is closed
>         - There was no dirty call for the bh on handle2, so it is never
> queued for
>           any more journal operation
>     13) Checkpointing finally happens, and it just spools the bh via
> normal buffer
>     writeback.  This will write b_data, which was never triggered on and
> thus
>     contains a wrong (old) checksum.
>
>     This patch fixes the problem by calling the trigger at the moment data
> is
>     frozen for journal commit - i.e., either when b_frozen_data is created
> by
>     do_get_write_access or just before we write a buffer to the log if
>     b_frozen_data does not exist. We also rename the trigger to t_frozen as
>     that better describes when it is called.
>
>     Signed-off-by: Jan Kara <j...@suse.cz>
>     Signed-off-by: Mark Fasheh <mfas...@suse.com>
>     Signed-off-by: Joel Becker <joel.bec...@oracle.com>
>
>
> On Mon, Aug 27, 2012 at 5:10 AM, Rory Kilkenny 
> <rory.kilke...@ticoon.com>wrote:
>
>>  # uname -a
>> Linux FILEt1 2.6.34.7-0.7-desktop #1 SMP PREEMPT 2010-12-13 11:13:53
>> +0100 x86_64 x86_64 x86_64 GNU/Linux
>>
>> # modinfo ocfs2
>> filename:       /lib/modules/2.6.34.7-0.7-desktop/kernel/fs/ocfs2/ocfs2.ko
>> license:        GPL
>> author:         Oracle
>> version:        1.5.0
>> description:    OCFS2 1.5.0
>> srcversion:     B13569B35F99D43FA80D129
>> depends:        jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager
>> vermagic:       2.6.34.7-0.7-desktop SMP preempt mod_unload modversions
>>
>> # mkfs.ocfs2 --version
>> mkfs.ocfs2 1.4.3
>>
>>
>>
>>
>> On 12-08-24 5:44 PM, "Sunil Mushran" <sunil.mush...@gmail.com> wrote:
>>
>> What is the version of the kernel, ocfs2 and ocfs2 tools?
>>
>> uname -a
>> modinfo ocfs2
>> mkfs.ocfs2 --version
>>
>> On Fri, Aug 24, 2012 at 1:09 PM, Rory Kilkenny <rory.kilke...@ticoon.com>
>> wrote:
>>
>> We have an HP P2000 G3 Storage array, fiber connected.  The storage array
>> has a RAID5 array broken into 2 physical OCFS2 volumes (A & B).
>>
>> A & B are both mounted and formatted as NTFS.
>>
>> One of the volumes is NFS mounted.
>>
>> Every couple of months or so we start getting tons of errors on the NFS
>> mounted volume:
>>
>>
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.848940]
>> (ocfs2_wq,13844,7):ocfs2_block_check_validate:443 ERROR: CRC32 failed:
>> stored: 0, computed 1467126086.  Applying ECC.
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849252]
>> (ocfs2_wq,13844,7):ocfs2_block_check_validate:457 ERROR: Fixed CRC32
>> failed: stored: 0, computed 3828104806
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849256]
>> (ocfs2_wq,13844,7):ocfs2_validate_extent_block:903 ERROR: Checksum failed
>> for extent block 1169089
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849261]
>> (ocfs2_wq,13844,7):__ocfs2_find_path:1861 ERROR: status = -5
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849264]
>> (ocfs2_wq,13844,7):ocfs2_find_leaf:1958 ERROR: status = -5
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849267]
>> (ocfs2_wq,13844,7):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849270]
>> (ocfs2_wq,13844,7):ocfs2_do_truncate:6900 ERROR: status = -5
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849274]
>> (ocfs2_wq,13844,7):ocfs2_commit_truncate:7556 ERROR: status = -5
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849280]
>> (ocfs2_wq,13844,7):ocfs2_truncate_for_delete:593 ERROR: status = -5
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849284]
>> (ocfs2_wq,13844,7):ocfs2_wipe_inode:769 ERROR: status = -5
>> Aug 24 09:48:13 FILEt2 kernel: [2234285.849287]
>> (ocfs2_wq,13844,7):ocfs2_delete_inode:1067 ERROR: status = -5
>>
>>
>> If we pull all the data off, destroy the volume, rebuilt it, and copy our
>> data back, all works fine; for a while.
>>
>> This issue does not happen on the non NFS mounted volume. I am currently
>> assuming the issue is with NFS and how we have it configured (which to the
>> best of my knowledge is default).
>>
>> Has anyone had a similar experience and be able to share some insight and
>> knowledge on any tricks with NFS and OCFS2 volumes?
>>
>> Thanks in advance.
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
>>
>>
>
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to