On 2016/4/22 11:17, Gang He wrote: > > > >>>> >> Hi Gang, >> >> On 2016/4/21 17:50, Gang He wrote: >>> >>> >>> >>>>>> >>>> Hi Gang, >>>> May be. If so, it seems that it has relations with locks. >>>> Can we know which data is newer? >>> Hi Joseph, I do not get your question. >>> About this bug, the customer only have a panic message, no message/kernel >> dump is provided. >>> >> Filesystem not corrupted doesn't mean the disk inode is right (the latest). >> That means the in-memory inode may be right but happens that flush hasn't >> been done successfully. >> But as of now, we can't distinguish which the case is. > True, I just brought out this bug to everyone, need more time to analyze the > code, to see if we can reproduce this issue manually in the future. Fine, we will also set up a race environment and try to reproduce it. And give feedback to mail list ASAP once we have more info.
Thanks, Joseph > > Thanks > Gang > >> >> Thanks, >> Joseph >> >>> >>> Thanks >>> Gang >>> >>>> >>>> Thanks, >>>> Joseph >>>> >>>> On 2016/4/21 15:59, Gang He wrote: >>>>> Hello Guys, >>>>> >>>>> Sorry for delayed reply, the fsck log was reported from the customer. >>>>> OCFS2 volumes to check are: >>>>> /dev/dm-11 83886080 44540260 39345820 54% >>>>> /sapmnt >>>>> /dev/dm-14 62914560 15374960 47539600 25% >>>>> /usr/sap >>>>> /dev/dm-13 89128960 19976272 69152688 23% >>>> /usr/sap/trans >>>>> uii316:~ # fsck.ocfs2 /dev/dm-13 >>>>> fsck.ocfs2 1.8.2 >>>>> Checking OCFS2 filesystem in /dev/dm-13: >>>>> Label: <NONE> >>>>> UUID: E35AF95C8114494391E0FFDEFD07309B >>>>> Number of blocks: 22282240 >>>>> Block size: 4096 >>>>> Number of clusters: 22282240 >>>>> Cluster size: 4096 >>>>> Number of slots: 2 >>>>> /dev/dm-13 is clean. It will be checked after 20 additional mounts. >>>>> uii316:~ # fsck.ocfs2 /dev/dm-14 >>>>> fsck.ocfs2 1.8.2 >>>>> Checking OCFS2 filesystem in /dev/dm-14: >>>>> Label: <NONE> >>>>> UUID: 43F2D9C8D70B47388527075F3C6C38BB >>>>> Number of blocks: 15728640 >>>>> Block size: 4096 >>>>> Number of clusters: 15728640 >>>>> Cluster size: 4096 >>>>> Number of slots: 2 >>>>> /dev/dm-14 is clean. It will be checked after 20 additional mounts. >>>>> uii316:~ # fsck.ocfs2 /dev/dm-11 >>>>> fsck.ocfs2 1.8.2 >>>>> Checking OCFS2 filesystem in /dev/dm-11: >>>>> Label: <NONE> >>>>> UUID: 0F1CBE6017934B5E8AD80149ED332659 >>>>> Number of blocks: 20971520 >>>>> Block size: 4096 >>>>> Number of clusters: 20971520 >>>>> Cluster size: 4096 >>>>> Number of slots: 2 >>>>> /dev/dm-11 is clean. It will be checked after 20 additional mounts. >>>>> >>>>> >From the log, it looks the file systems were not corrupted, this should >>>>> >be a >> >>>> race-condition issue ?! >>>>> >>>>> Thanks >>>>> Gang >>>>> >>>>> >>>>>>>> >>>>>> On 03/31/2016 10:56 AM, Gang He wrote: >>>>>>> Hello Joseph and Junxiao, >>>>>>> >>>>>>> Did you encounter this issue in the past? I doubt this is possible a >>>>>>> race >>>>>> condition bug (rather than data inconsistency). >>>>>> Never saw this. fsck report any corruption? >>>>>> >>>>>> Thanks, >>>>>> Junxiao. >>>>>>> >>>>>>> Thanks >>>>>>> Gang >>>>>>> >>>>>>> >>>>>>>>>> >>>>>>>> Hello Guys, >>>>>>>> >>>>>>>> I got a bug, which reported a kernel BUG in function >>>>>>>> ocfs2_truncate_file, >>>>>>>> Base on my initial analysis, this bug looks like a race condition >>>>>>>> problem. >>>>>>>> Unfortunately, there was no kernel crash dump caught, just got some >>>>>>>> kernel >>>>>>>> log as below, >>>>>>>> >>>>>>>> kernel BUG at >>>>>>>> /usr/src/packages/BUILD/ocfs2-1.6/default/ocfs2/file.c:466! >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831230] Supported: Yes >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831234] >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831238] Pid: 7134, comm: saposcol Not >>>>>>>> tainted >>>>>>>> 3.0.101-0.47.67-default #1 >>>>>>>> Oct 21 13:02:19 uii316 HP ProLiant BL460c G1 >>>>>>>> Oct 21 13:02:19 uii316 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831247] RIP: 0010:[<ffffffffa0744b95>] >>>>>>>> Oct 21 13:02:19 uii316 [<ffffffffa0744b95>] >>>>>>>> ocfs2_truncate_file+0xa5/0x490 >>>>>>>> [ocfs2] >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831312] RSP: 0018:ffff880f39d79b68 >>>>>>>> EFLAGS: >>>>>>>> 00010296 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831321] RAX: 000000000000008f RBX: >>>>>>>> ffff880f39c5e240 RCX: 00000000000039fd >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831326] RDX: 0000000000000000 RSI: >>>>>>>> 0000000000000007 RDI: 0000000000000246 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831331] RBP: 1000000000000000 R08: >>>>>>>> ffffffff81da0ac0 R09: 0000000000000000 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831336] R10: 0000000000000003 R11: >>>>>>>> 00000000ffffffff R12: ffff880f3949bc78 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831342] R13: ffff880f39c5e888 R14: >>>>>>>> ffff880f3d481000 R15: 00000000000e43bc >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831347] FS: 00007f11cda9d720(0000) >>>>>>>> GS:ffff880fefd40000(0000) knlGS:0000000000000000 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831353] CS: 0010 DS: 0000 ES: 0000 CR0: >>>>>>>> 0000000080050033 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831358] CR2: 00007f11cdad4000 CR3: >>>>>>>> 0000000f39d35000 CR4: 00000000000007e0 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831363] DR0: 0000000000000000 DR1: >>>>>>>> 0000000000000000 DR2: 0000000000000000 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831368] DR3: 0000000000000000 DR6: >>>>>>>> 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831374] Process saposcol (pid: 7134, >>>>>>>> threadinfo ffff880f39d78000, task ffff880f39c5e240) >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831379] Stack: >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831383] 000000000002433c >>>>>>>> Oct 21 13:02:19 uii316 00000000000e43bc >>>>>>>> Oct 21 13:02:19 uii316 00000000000eab40 >>>>>>>> Oct 21 13:02:19 uii316 ffff880f00000001 >>>>>>>> Oct 21 13:02:19 uii316 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831397] ffff880f394956e0 >>>>>>>> Oct 21 13:02:19 uii316 ffff880f8e0d1000 >>>>>>>> Oct 21 13:02:19 uii316 ffff880f3949b800 >>>>>>>> Oct 21 13:02:19 uii316 0000000000000000 >>>>>>>> Oct 21 13:02:19 uii316 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831410] ffff880f39454980 >>>>>>>> Oct 21 13:02:19 uii316 0000000000000001 >>>>>>>> Oct 21 13:02:19 uii316 000000000002433c >>>>>>>> Oct 21 13:02:19 uii316 0000000000000008 >>>>>>>> Oct 21 13:02:19 uii316 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831423] Call Trace: >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831492] [<ffffffffa074742e>] >>>>>>>> ocfs2_setattr+0x26e/0xa90 [ocfs2] >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831522] [<ffffffff81178faf>] >>>>>>>> notify_change+0x19f/0x2f0 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831534] [<ffffffff8115d517>] >>>>>>>> do_truncate+0x57/0x80 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831544] [<ffffffff8116c053>] >>>>>>>> do_last+0x603/0x800 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831551] [<ffffffff8116ceb9>] >>>>>>>> path_openat+0xd9/0x420 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831558] [<ffffffff8116d33c>] >>>>>>>> do_filp_open+0x4c/0xc0 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831566] [<ffffffff8115de5f>] >>>>>>>> do_sys_open+0x17f/0x250 >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831575] [<ffffffff8146e1f2>] >>>>>>>> system_call_fastpath+0x16/0x1b >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831588] [<00007f11ccb07080>] >>>>>>>> 0x7f11ccb0707f >>>>>>>> Oct 21 13:02:19 uii316 [ 1766.831592] Code: >>>>>>>> >>>>>>>> The source code in question is as below, >>>>>>>> 444 static int ocfs2_truncate_file(struct inode *inode, >>>>>>>> 445 struct buffer_head *di_bh, >>>>>>>> 446 u64 new_i_size) >>>>>>>> 447 { >>>>>>>> 448 int status = 0; >>>>>>>> 449 struct ocfs2_dinode *fe = NULL; >>>>>>>> 450 struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>>>>> 451 >>>>>>>> 452 /* We trust di_bh because it comes from >>>>>>>> ocfs2_inode_lock(), >>>>>>>> which >>>>>>>> 453 * already validated it */ >>>>>>>> 454 fe = (struct ocfs2_dinode *) di_bh->b_data; >>>>>>>> 455 >>>>>>>> 456 trace_ocfs2_truncate_file((unsigned long >>>>>>>> long)OCFS2_I(inode)->ip_blkno, >>>>>>>> 457 (unsigned long >>>>>>>> long)le64_to_cpu(fe->i_size), >>>>>>>> 458 (unsigned long long)new_i_size); >>>>>>>> 459 >>>>>>>> 460 mlog_bug_on_msg(le64_to_cpu(fe->i_size) != >>>>>>>> i_size_read(inode), >> >>>> >>>>>> >>>>>>>> <<= here >>>>>>>> 461 "Inode %llu, inode i_size = %lld != di " >>>>>>>> 462 "i_size = %llu, i_flags = 0x%x\n", >>>>>>>> 463 (unsigned long >>>>>>>> long)OCFS2_I(inode)->ip_blkno, >>>>>>>> 464 i_size_read(inode), >>>>>>>> 465 (unsigned long >>>>>>>> long)le64_to_cpu(fe->i_size), >>>>>>>> 466 le32_to_cpu(fe->i_flags)); >>>>>>>> 467 >>>>>>>> >>>>>>>> If your encountered the similar issue in the past, please let me know, >>>>>>>> to >>>>>>>> see if there is any patch/discussion for this bug. >>>>>>>> >>>>>>>> Thanks a lot. >>>>>>>> Gang >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Ocfs2-devel mailing list >>>>>>>> Ocfs2-devel@oss.oracle.com >>>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>>> >>>>> . >>>>> >>> >>> . >>> > > . > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel