we ran fsck.ocfs2 -f -F, and the machine crashed with May 14 17:10:34 lp-bbprd1-rh4v kernel: ------------[ cut here ]------------ May 14 17:10:34 lp-bbprd1-rh4v kernel: kernel BUG at /var/autofs/ca-fileserver2/home/seeda/tmp/kernel/BUILD/ocfs2-1.2.9/fs/ocfs2/file.c:794! May 14 17:10:34 lp-bbprd1-rh4v kernel: invalid operand: 0000 [#1] May 14 17:10:34 lp-bbprd1-rh4v kernel: SMP May 14 17:10:34 lp-bbprd1-rh4v kernel: Modules linked in: md5 ipv6 parport_pc lp parport autofs4 ocfs2(U) debugfs(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs(U) vmmemctl(U) sunrpc cpufreq_powersave dm_mod button battery ac pcnet32 vmxnet(U) mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod May 14 17:10:34 lp-bbprd1-rh4v kernel: CPU: 1 May 14 17:10:34 lp-bbprd1-rh4v kernel: EIP: 0060:[<f8ce77b7>] Not tainted VLI May 14 17:10:34 lp-bbprd1-rh4v kernel: EFLAGS: 00210292 (2.6.9-78.0.22.ELsmp) May 14 17:10:34 lp-bbprd1-rh4v kernel: EIP is at ocfs2_extend_file+0x38f/0xf77 [ocfs2] May 14 17:10:34 lp-bbprd1-rh4v kernel: eax: 0000008c ebx: 00000000 ecx: f4db2e6c edx: f8d11f97 May 14 17:10:34 lp-bbprd1-rh4v kernel: esi: f5c899a8 edi: f4db2f18 ebp: d9a29000 esp: f4db2ea4 May 14 17:10:34 lp-bbprd1-rh4v kernel: ds: 007b es: 007b ss: 0068 May 14 17:10:34 lp-bbprd1-rh4v kernel: Process java (pid: 4594, threadinfo=f4db2000 task=f18ca1f0) May 14 17:10:34 lp-bbprd1-rh4v kernel: Stack: c32324c0 00000000 00000000 00000000 f5c899a8 f7054100 f4db2f58 00000000 May 14 17:10:34 lp-bbprd1-rh4v kernel: 00000000 e017ee18 00000001 f8cdd01f 00000000 00000000 00000000 f4db2f68 May 14 17:10:34 lp-bbprd1-rh4v kernel: f7054100 f5c899a8 f8cf6038 0033ffcc 00000000 f4db2f18 0032ffcd 00000000 May 14 17:10:34 lp-bbprd1-rh4v kernel: Call Trace: May 14 17:10:34 lp-bbprd1-rh4v kernel: [<f8cdd01f>] ocfs2_data_lock+0x19d/0x27f [ocfs2] May 14 17:10:34 lp-bbprd1-rh4v kernel: [<f8cf6038>] ocfs2_write_lock_maybe_extend+0x860/0xb4c [ocfs2] May 14 17:10:34 lp-bbprd1-rh4v kernel: [<f8ce5744>] ocfs2_file_write+0x11f/0x254 [ocfs2] May 14 17:10:34 lp-bbprd1-rh4v kernel: [<c015caef>] vfs_write+0xb6/0xe2 May 14 17:10:34 lp-bbprd1-rh4v kernel: [<c015cbb9>] sys_write+0x3c/0x62 May 14 17:10:34 lp-bbprd1-rh4v kernel: [<c02e0a2f>] syscall_call+0x7/0xb May 14 17:10:34 lp-bbprd1-rh4v kernel: Code: b1 dc fd ff ff ff b1 d8 fd ff ff 68 1a 03 00 00 68 15 c3 d0 f8 ff 70 10 ff b2 94 00 00 00 68 97 1f d1 f8 e8 ae b3 43 c7 83 c4 3c <0f> 0b 1a 03 dd 1c d1 f8 8b 5c 24 10 8b 83 54 01 00 00 0f ae e8 May 14 17:10:34 lp-bbprd1-rh4v kernel: <0>Fatal exception: panic in 5 seconds
But we tried again on the other cluster member, and it's currently running. -- Update The fsck eventually died with a segmentation fault, after repairing a couple of thousand errors. Today we've taken a snap-clone of the filesystem, and we're running an offline fsck over the copy. This time we've got a stack of o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate cluster errors. We're now running ocfs2-2.6.9-78.0.22.ELsmp-1.2.9-1.2.el4. Any suggestions ? Sunil Mushran wrote: > Did you run fsck with the force flag? > $ fsck.ocfs2 -f /dev/sdX > > By default, fsck only replays the journals. > > Paul Taylor wrote: >> Hi >> >> errors like the one listed below have been coming through in our logs on >> a daily basis. We tried to run fsck.ocfs2 over the file system bet it >> thinks that it is clean. We are wondering if there is another tool >> available or process to follow to resolve the corruption. >> >> OS: Linux lp-bbprd1-rh4v 2.6.9-78.0.22.ELsmp >> ocfs2.1.2.9-1 >> >> May 14 10:03:14 lp-bbprd1-rh4v kernel: (20690,1):ocfs2_lookup:183 ERROR: >> Unable to create inode 75385485 >> May 14 10:03:43 lp-bbprd1-rh4v kernel: >> (14213,0):ocfs2_check_dir_entry:1727 ERROR: bad entry in directory >> #76004924: rec_len is smaller than minimal - offset=0, inode=0, >> rec_len=0, name_len=0 >> May 14 10:03:43 lp-bbprd1-rh4v kernel: >> (14213,0):ocfs2_check_dir_entry:1727 ERROR: bad entry in directory >> #76004924: rec_len is smaller than minimal - offset=0, inode=0, >> rec_len=0, name_len=0 >> May 14 10:03:43 lp-bbprd1-rh4v kernel: (14213,0):ocfs2_empty_dir:305 >> ERROR: bad directory (dir #76004924) - no `.' or `..' >> May 14 10:03:43 lp-bbprd1-rh4v kernel: (14213,0):ocfs2_empty_dir:305 >> ERROR: bad directory (dir #76004924) - no `.' or `..' >> > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users