[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
[Expired for linux (Ubuntu) because there has been no activity for 60 days.] ** Changed in: linux (Ubuntu) Status: Incomplete = Expired -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1049267 Title: XFS corruption on machine which never suffered a hard reset or disk failure To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
I was going to install the latest mainline kernel. HOWEVER - dpkg-sig --list shows that the packages contain no signatures at all. - Further, there doesn't seem to be any signature files on the webserver [0] - The webserver does not accept https connections. While installing a release-candidate kernel on a production machine is something which I dislike already, the fact that it doesn't even contain a signature makes this inacceptable. Please provide signed packages. [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6-rc7-quantal/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1049267 Title: XFS corruption on machine which never suffered a hard reset or disk failure To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
I now did the following: - I put the disks of an affected machine (not the original one in this bug report) into a Debian6 machine which has been running rock-solid with XFS for years - I used a script of my own to generate checksums file date listing of ALL files (~2.5TB) on the disks using the Debian6. - I then used an USB stick with Ubuntu12.04 to run xfs_repair on the affected XFS. - After repair finished, I again put the disks into the Debian6 machine an generated checksums / filedate listing. - I diff'ed the pre-repair and post-repair checksums and filedates. They are absolutely identical. Conclusion: The fact that the Debian did not complain about corruption when generating the checksums and that the checksums are not affected by repair maybe shows that there is no actual physical corruption but it was rather a crash bug? I will put the affected machine back into operation with a 3.6 kernel as requested. HOWEVER I should say that it took multiple weeks of operation until the issue first happened, so I don't think that testing this with 3.6 will disprove anything any soon. I think you guys should read the changelogs of the kernels or actually look at the stack trace and see what happened :| -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1049267 Title: XFS corruption on machine which never suffered a hard reset or disk failure To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
As requested in comment #4, it would be helpful to know if this bug also exists upstream, as well as bug 1051689 . There is no indication that this specific issue is already fixed upstream, but testing the mainline kernel will prove or dis-prove that. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1049267 Title: XFS corruption on machine which never suffered a hard reset or disk failure To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
apport information ** Tags added: apport-collected staging ** Description changed: Using Ubuntu 12.04 server, we installed a machine using the following disk layout: XFS = dm-crypt = RAID5. A *complete* list of ALL configuration of the machine including the setup can be provided if you need it, we documented everything. The harddisks are tested weekly with a full SMART test and they are okay. The machine is attached to a UPS and therefore never suffered a hard reset. Also, the memory was tested with memtest86+. Nevertheless, the kernel reports XFS problems: Sep 10 10:01:00 server kernel: [379001.376989] XFS (dm-0): xfs_da_do_buf: bno 0 dir: inode 3045868 Sep 10 10:01:00 server kernel: [379001.377011] XFS (dm-0): [00] br_startoff 0 br_startblock -2 br_blockcount 1 br_state 0 Sep 10 10:01:00 server kernel: [379001.377032] XFS (dm-0): Internal error xfs_da_do_buf(1) at line 2011 of file /build/buildd/linux-3.2.0/fs/xfs/xfs_da_btree.c. Caller 0xa01feeef Sep 10 10:01:00 server kernel: [379001.377033] Sep 10 10:01:00 server kernel: [379001.377069] Pid: 26624, comm: updatedb.mlocat Tainted: G C 3.2.0-30-generic #48-Ubuntu Sep 10 10:01:00 server kernel: [379001.377071] Call Trace: Sep 10 10:01:00 server kernel: [379001.377089] [a01cb6bf] xfs_error_report+0x3f/0x50 [xfs] Sep 10 10:01:00 server kernel: [379001.377099] [a01feeef] ? xfs_da_reada_buf+0x2f/0x40 [xfs] Sep 10 10:01:00 server kernel: [379001.377108] [a01fea12] xfs_da_do_buf+0x182/0x630 [xfs] Sep 10 10:01:00 server kernel: [379001.377117] [a01feeef] xfs_da_reada_buf+0x2f/0x40 [xfs] Sep 10 10:01:00 server kernel: [379001.377124] [a01cbdc8] xfs_dir_open+0x68/0x80 [xfs] Sep 10 10:01:00 server kernel: [379001.377127] [81175bd0] __dentry_open+0x290/0x360 Sep 10 10:01:00 server kernel: [379001.377133] [a01cbd60] ? xfs_dir_fsync+0x110/0x110 [xfs] Sep 10 10:01:00 server kernel: [379001.377136] [8129cdbc] ? security_inode_permission+0x1c/0x30 Sep 10 10:01:00 server kernel: [379001.377138] [8118389a] ? inode_permission+0x4a/0x110 Sep 10 10:01:00 server kernel: [379001.377139] [8117624d] vfs_open+0x3d/0x40 Sep 10 10:01:00 server kernel: [379001.377141] [81177130] nameidata_to_filp+0x40/0x50 Sep 10 10:01:00 server kernel: [379001.377143] [811860d8] do_last+0x3f8/0x730 Sep 10 10:01:00 server kernel: [379001.377144] [811877b1] path_openat+0xd1/0x3f0 Sep 10 10:01:00 server kernel: [379001.377146] [811830f5] ? putname+0x35/0x50 Sep 10 10:01:00 server kernel: [379001.377147] [81187b53] ? user_path_at_empty+0x63/0xa0 Sep 10 10:01:00 server kernel: [379001.377149] [81187bf2] do_filp_open+0x42/0xa0 Sep 10 10:01:00 server kernel: [379001.377152] [81319321] ? strncpy_from_user+0x31/0x40 Sep 10 10:01:00 server kernel: [379001.377153] [81182f3a] ? do_getname+0x10a/0x180 Sep 10 10:01:00 server kernel: [379001.377156] [8165a41e] ? _raw_spin_lock+0xe/0x20 Sep 10 10:01:00 server kernel: [379001.377158] [81194eb7] ? alloc_fd+0xf7/0x150 Sep 10 10:01:00 server kernel: [379001.377159] [8117722d] do_sys_open+0xed/0x220 Sep 10 10:01:00 server kernel: [379001.377161] [81177380] sys_open+0x20/0x30 Sep 10 10:01:00 server kernel: [379001.377163] [81662a02] system_call_fastpath+0x16/0x1b Sep 10 10:01:00 server kernel: [379001.377170] BUG: unable to handle kernel paging request at 0108 Sep 10 10:01:00 server kernel: [379001.377197] IP: [81122869] file_ra_state_init+0x9/0x30 Sep 10 10:01:00 server kernel: [379001.377215] PGD 176937067 PUD 20eb89067 PMD 0 Sep 10 10:01:00 server kernel: [379001.377230] Oops: [#1] SMP Sep 10 10:01:00 server kernel: [379001.377241] CPU 2 Sep 10 10:01:00 server kernel: [379001.377247] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usb_storage uas nfsd nfs lockd fscache binfmt_misc auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw mei(C) mac_hid lp parport xfs dm_crypt raid10 raid0 multipath linear aesni_intel cryptd aes_x86_64 usbhid hid raid1 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx i915 drm_kms_helper drm i2c_algo_bit video e1000e Sep 10 10:01:00 server kernel: [379001.377384] Sep 10 10:01:00 server kernel: [379001.377390] Pid: 26624, comm: updatedb.mlocat Tainted: G C 3.2.0-30-generic #48-Ubuntu /DH67GD Sep 10 10:01:00 server kernel: [379001.377419] RIP: 0010:[81122869] [81122869] file_ra_state_init+0x9/0x30 Sep 10 10:01:00 server kernel: [379001.377441] RSP: 0018:8801d6a35c98 EFLAGS: 00010206 Sep 10 10:01:00 server kernel: [379001.377454] RAX: 880073981bc5 RBX: 880157dde800 RCX: 0001 Sep 10 10:01:00 server kernel: [379001.377471] RDX:
[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
NOTICE: This happened on the same machine as bug #1051689. After the machine suffered from #1051689, we tried to do a full-backup of the machine on ext4, which then also crashed due to a NULL pointer dereference. Maybe the underlying issue is a RAID/dm-crypt bug? Both the XFS and ext4 were on RAID/dm-crypt. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1049267 Title: XFS corruption on machine which never suffered a hard reset or disk failure To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
(In reply to bot comment #3: We will try to do that. We hope that apport-collect is not a GUI application since the affected machine does not have an X-Server) In reply to comment #4: Do you have an actual indication that the upstream kernel would fix this? In other words: Does its changelog contain something about XFS? The machine is a multi-user production machine. We CAN do some testing with it, but it needs to be justified. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1049267 Title: XFS corruption on machine which never suffered a hard reset or disk failure To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.6 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages. Once you've tested the upstream kernel, please remove the 'needs- upstream-testing' tag. Please only remove that one tag and leave the other tags. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'. If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'. If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'. Once testing of the upstream kernel is complete, please mark this bug as Confirmed. Thanks in advance. [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6-rc5-quantal/ ** Changed in: linux (Ubuntu) Importance: Undecided = High ** Tags added: kernel-da-key ** Tags added: file-ra-state-init ** Tags added: needs-upstream-testing -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1049267 Title: XFS corruption on machine which never suffered a hard reset or disk failure To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
After that, what happened very often is the following: Sep 10 11:58:11 server kernel: [386031.913144] BUG: soft lockup - CPU#0 stuck for 23s! [kswapd0:35] Sep 10 11:58:11 server kernel: [386031.913200] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usb_storage uas nfsd nfs lockd fscache binfmt_misc auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw mei(C) mac_hid lp parport xfs dm_crypt raid10 raid0 multipath linear aesni_intel cryptd aes_x86_64 usbhid hid raid1 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx i915 drm_kms_helper drm i2c_algo_bit video e1000e Sep 10 11:58:11 server kernel: [386031.913512] CPU 0 Sep 10 11:58:11 server kernel: [386031.913526] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usb_storage uas nfsd nfs lockd fscache binfmt_misc auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw mei(C) mac_hid lp parport xfs dm_crypt raid10 raid0 multipath linear aesni_intel cryptd aes_x86_64 usbhid hid raid1 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx i915 drm_kms_helper drm i2c_algo_bit video e1000e Sep 10 11:58:11 server kernel: [386031.921028] Sep 10 11:58:11 server kernel: [386031.923600] Pid: 35, comm: kswapd0 Tainted: G D C 3.2.0-30-generic #48-Ubuntu /DH67GD Sep 10 11:58:11 server kernel: [386031.926215] RIP: 0010:[8103dc4d] [8103dc4d] __ticket_spin_lock+0xd/0x30 Sep 10 11:58:11 server kernel: [386031.928810] RSP: 0018:88020f911b80 EFLAGS: 0286 Sep 10 11:58:11 server kernel: [386031.931399] RAX: ed7ded7d RBX: 88021f20ec40 RCX: 880073983d80 Sep 10 11:58:11 server kernel: [386031.933958] RDX: 88013113d740 RSI: 0001 RDI: 88013113d71c Sep 10 11:58:11 server kernel: [386031.936490] RBP: 88020f911b80 R08: 0001 R09: dead00200200 Sep 10 11:58:11 server kernel: [386031.939047] R10: R11: dead00200200 R12: Sep 10 11:58:11 server kernel: [386031.941586] R13: R14: 0020 R15: 8112a74f Sep 10 11:58:11 server kernel: [386031.944133] FS: () GS:88021f20() knlGS: Sep 10 11:58:11 server kernel: [386031.946716] CS: 0010 DS: ES: CR0: 8005003b Sep 10 11:58:11 server kernel: [386031.949245] CR2: 7f6cd7267400 CR3: 01c05000 CR4: 000406f0 Sep 10 11:58:11 server kernel: [386031.951695] DR0: DR1: DR2: Sep 10 11:58:11 server kernel: [386031.954102] DR3: DR6: 0ff0 DR7: 0400 Sep 10 11:58:11 server kernel: [386031.956484] Process kswapd0 (pid: 35, threadinfo 88020f91, task 88020f908000) Sep 10 11:58:11 server kernel: [386031.958863] Stack: Sep 10 11:58:11 server kernel: [386031.961214] 88020f911b90 8165a41e 88020f911c00 8118eadf Sep 10 11:58:11 server kernel: [386031.963586] 88020c7f1000 880073983d80 88013113d740 88018d115600 Sep 10 11:58:11 server kernel: [386031.965963] 88020f911bd0 88013113d740 88020f911c30 8801765034dc Sep 10 11:58:11 server kernel: [386031.968308] Call Trace: Sep 10 11:58:11 server kernel: [386031.970613] [8165a41e] _raw_spin_lock+0xe/0x20 Sep 10 11:58:11 server kernel: [386031.972937] [8118eadf] shrink_dentry_list+0x4f/0x370 Sep 10 11:58:11 server kernel: [386031.975267] [8118f93a] prune_dcache_sb+0x15a/0x190 Sep 10 11:58:11 server kernel: [386031.977579] [8117b083] prune_super+0xe3/0x1a0 Sep 10 11:58:11 server kernel: [386031.979859] [81129834] shrink_slab+0x154/0x310 Sep 10 11:58:11 server kernel: [386031.982124] [8112cb3a] balance_pgdat+0x50a/0x6d0 Sep 10 11:58:11 server kernel: [386031.984401] [8112ce21] kswapd+0x121/0x210 Sep 10 11:58:11 server kernel: [386031.986658] [8112cd00] ? balance_pgdat+0x6d0/0x6d0 Sep 10 11:58:11 server kernel: [386031.988796] [8108a03c] kthread+0x8c/0xa0 Sep 10 11:58:11 server kernel: [386031.991028] [81664b74] kernel_thread_helper+0x4/0x10 Sep 10 11:58:11 server kernel: [386031.993263] [81089fb0] ? flush_kthread_worker+0xa0/0xa0 Sep 10 11:58:11 server kernel: [386031.995473] [81664b70] ? gs_change+0x13/0x13 Sep 10 11:58:11 server kernel: [386031.997681] Code: c1 51 da 03 81 48 c7 c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 13 66 0f 1f 84 00 00 00 00 00 f3 90 Sep 10 11:58:11 server kernel: [386032.002613] Call Trace: Sep 10 11:58:11 server kernel: [386032.005004] [8165a41e] _raw_spin_lock+0xe/0x20 Sep 10 11:58:11 server kernel: [386032.007426] [8118eadf] shrink_dentry_list+0x4f/0x370 Sep 10 11:58:11 server kernel: [386032.009852] [8118f93a] prune_dcache_sb+0x15a/0x190 Sep 10 11:58:11 server
[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure
** Package changed: ubuntu = linux (Ubuntu) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1049267 Title: XFS corruption on machine which never suffered a hard reset or disk failure To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs