[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-11-24 Thread Launchpad Bug Tracker
[Expired for linux (Ubuntu) because there has been no activity for 60
days.]

** Changed in: linux (Ubuntu)
   Status: Incomplete = Expired

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1049267

Title:
  XFS corruption on machine which never suffered a hard reset or disk
  failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-09-25 Thread xor
I was going to install the latest mainline kernel.
HOWEVER

- dpkg-sig --list shows that the packages contain no signatures at all.
- Further, there doesn't seem to be any signature files on the webserver [0]
- The webserver does not accept https connections.

While installing a release-candidate kernel on a production machine is 
something which I dislike already, the fact that it doesn't even contain a 
signature makes this inacceptable.
Please provide signed packages.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6-rc7-quantal/

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1049267

Title:
  XFS corruption on machine which never suffered a hard reset or disk
  failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-09-24 Thread xor
I now did the following:
- I put the disks of an affected machine (not the original one in this bug 
report) into a Debian6 machine which has been running rock-solid with XFS for 
years
- I used a script of my own to generate checksums  file date listing of ALL 
files (~2.5TB) on the disks using the Debian6.
- I then used an USB stick with Ubuntu12.04 to run xfs_repair on the affected 
XFS. 
- After repair finished, I again put the disks into the Debian6 machine an 
generated checksums / filedate listing.
- I diff'ed the pre-repair and post-repair checksums and filedates. They are 
absolutely identical. 

Conclusion: 
The fact that the Debian did not complain about corruption when generating the 
checksums and that the checksums are not affected by repair maybe shows that 
there is no actual physical corruption but it was rather a crash bug?

I will put the affected machine back into operation with a 3.6 kernel as 
requested.
HOWEVER I should say that it took multiple weeks of operation until the issue 
first happened, so I don't think that testing this with 3.6 will disprove 
anything any soon. I think you guys should read the changelogs of the kernels 
or actually look at the stack trace and see what happened :|

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1049267

Title:
  XFS corruption on machine which never suffered a hard reset or disk
  failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-09-17 Thread Joseph Salisbury
As requested in comment #4, it would be helpful to know if this bug also
exists upstream, as well as bug 1051689 .  There is no indication that
this specific issue is already fixed upstream, but testing the mainline
kernel will prove or dis-prove that.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1049267

Title:
  XFS corruption on machine which never suffered a hard reset or disk
  failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-09-16 Thread xor
apport information

** Tags added: apport-collected staging

** Description changed:

  Using Ubuntu 12.04 server, we installed a machine using the following disk 
layout:
  XFS = dm-crypt = RAID5.
  
  A *complete* list of ALL configuration of the machine including the
  setup can be provided if you need it, we documented everything.
  
  The harddisks are tested weekly with a full SMART test and they are okay.
  The machine is attached to a UPS and therefore never suffered a hard reset.
  Also, the memory was tested with memtest86+.
  
  Nevertheless, the kernel reports XFS problems:
  
  Sep 10 10:01:00 server kernel: [379001.376989] XFS (dm-0): xfs_da_do_buf: bno 
0 dir: inode 3045868
  Sep 10 10:01:00 server kernel: [379001.377011] XFS (dm-0): [00] br_startoff 0 
br_startblock -2 br_blockcount 1 br_state 0
  Sep 10 10:01:00 server kernel: [379001.377032] XFS (dm-0): Internal error 
xfs_da_do_buf(1) at line 2011 of file 
/build/buildd/linux-3.2.0/fs/xfs/xfs_da_btree.c.  Caller 0xa01feeef
  Sep 10 10:01:00 server kernel: [379001.377033] 
  Sep 10 10:01:00 server kernel: [379001.377069] Pid: 26624, comm: 
updatedb.mlocat Tainted: G C   3.2.0-30-generic #48-Ubuntu
  Sep 10 10:01:00 server kernel: [379001.377071] Call Trace:
  Sep 10 10:01:00 server kernel: [379001.377089]  [a01cb6bf] 
xfs_error_report+0x3f/0x50 [xfs]
  Sep 10 10:01:00 server kernel: [379001.377099]  [a01feeef] ? 
xfs_da_reada_buf+0x2f/0x40 [xfs]
  Sep 10 10:01:00 server kernel: [379001.377108]  [a01fea12] 
xfs_da_do_buf+0x182/0x630 [xfs]
  Sep 10 10:01:00 server kernel: [379001.377117]  [a01feeef] 
xfs_da_reada_buf+0x2f/0x40 [xfs]
  Sep 10 10:01:00 server kernel: [379001.377124]  [a01cbdc8] 
xfs_dir_open+0x68/0x80 [xfs]
  Sep 10 10:01:00 server kernel: [379001.377127]  [81175bd0] 
__dentry_open+0x290/0x360
  Sep 10 10:01:00 server kernel: [379001.377133]  [a01cbd60] ? 
xfs_dir_fsync+0x110/0x110 [xfs]
  Sep 10 10:01:00 server kernel: [379001.377136]  [8129cdbc] ? 
security_inode_permission+0x1c/0x30
  Sep 10 10:01:00 server kernel: [379001.377138]  [8118389a] ? 
inode_permission+0x4a/0x110
  Sep 10 10:01:00 server kernel: [379001.377139]  [8117624d] 
vfs_open+0x3d/0x40
  Sep 10 10:01:00 server kernel: [379001.377141]  [81177130] 
nameidata_to_filp+0x40/0x50
  Sep 10 10:01:00 server kernel: [379001.377143]  [811860d8] 
do_last+0x3f8/0x730
  Sep 10 10:01:00 server kernel: [379001.377144]  [811877b1] 
path_openat+0xd1/0x3f0
  Sep 10 10:01:00 server kernel: [379001.377146]  [811830f5] ? 
putname+0x35/0x50
  Sep 10 10:01:00 server kernel: [379001.377147]  [81187b53] ? 
user_path_at_empty+0x63/0xa0
  Sep 10 10:01:00 server kernel: [379001.377149]  [81187bf2] 
do_filp_open+0x42/0xa0
  Sep 10 10:01:00 server kernel: [379001.377152]  [81319321] ? 
strncpy_from_user+0x31/0x40
  Sep 10 10:01:00 server kernel: [379001.377153]  [81182f3a] ? 
do_getname+0x10a/0x180
  Sep 10 10:01:00 server kernel: [379001.377156]  [8165a41e] ? 
_raw_spin_lock+0xe/0x20
  Sep 10 10:01:00 server kernel: [379001.377158]  [81194eb7] ? 
alloc_fd+0xf7/0x150
  Sep 10 10:01:00 server kernel: [379001.377159]  [8117722d] 
do_sys_open+0xed/0x220
  Sep 10 10:01:00 server kernel: [379001.377161]  [81177380] 
sys_open+0x20/0x30
  Sep 10 10:01:00 server kernel: [379001.377163]  [81662a02] 
system_call_fastpath+0x16/0x1b
  Sep 10 10:01:00 server kernel: [379001.377170] BUG: unable to handle kernel 
paging request at 0108
  Sep 10 10:01:00 server kernel: [379001.377197] IP: [81122869] 
file_ra_state_init+0x9/0x30
  Sep 10 10:01:00 server kernel: [379001.377215] PGD 176937067 PUD 20eb89067 
PMD 0 
  Sep 10 10:01:00 server kernel: [379001.377230] Oops:  [#1] SMP 
  Sep 10 10:01:00 server kernel: [379001.377241] CPU 2 
  Sep 10 10:01:00 server kernel: [379001.377247] Modules linked in: 
nls_iso8859_1 nls_cp437 vfat fat usb_storage uas nfsd nfs lockd fscache 
binfmt_misc auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw mei(C) mac_hid 
lp parport xfs dm_crypt raid10 raid0 multipath linear aesni_intel cryptd 
aes_x86_64 usbhid hid raid1 raid456 async_pq async_xor xor async_memcpy 
async_raid6_recov raid6_pq async_tx i915 drm_kms_helper drm i2c_algo_bit video 
e1000e
  Sep 10 10:01:00 server kernel: [379001.377384] 
  Sep 10 10:01:00 server kernel: [379001.377390] Pid: 26624, comm: 
updatedb.mlocat Tainted: G C   3.2.0-30-generic #48-Ubuntu  
/DH67GD
  Sep 10 10:01:00 server kernel: [379001.377419] RIP: 0010:[81122869] 
 [81122869] file_ra_state_init+0x9/0x30
  Sep 10 10:01:00 server kernel: [379001.377441] RSP: 0018:8801d6a35c98  
EFLAGS: 00010206
  Sep 10 10:01:00 server kernel: [379001.377454] RAX: 880073981bc5 RBX: 
880157dde800 RCX: 0001
  Sep 10 10:01:00 server kernel: [379001.377471] RDX: 

[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-09-16 Thread xor
NOTICE: This happened on the same machine as bug #1051689. After the machine 
suffered from  #1051689, we tried to do a full-backup of the machine on ext4, 
which then also crashed due to a NULL pointer dereference.
Maybe the underlying issue is a RAID/dm-crypt bug? Both the XFS and ext4 were 
on RAID/dm-crypt.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1049267

Title:
  XFS corruption on machine which never suffered a hard reset or disk
  failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-09-14 Thread xor
(In reply to bot comment #3: We will try to do that. We hope that
apport-collect is not a GUI application since the affected machine does
not have an X-Server)

In reply to comment #4: 
Do you have an actual indication that the upstream kernel would fix this? In 
other words: Does its changelog contain something about XFS? The machine is a 
multi-user production machine. We CAN do some testing with it, but it needs to 
be justified.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1049267

Title:
  XFS corruption on machine which never suffered a hard reset or disk
  failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-09-12 Thread Joseph Salisbury
Would it be possible for you to test the latest upstream kernel?  Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v3.6 kernel[0] (Not a kernel in the daily directory) and install both
the linux-image and linux-image-extra .deb packages.

Once you've tested the upstream kernel, please remove the 'needs-
upstream-testing' tag.  Please only remove that one tag and leave the
other tags. This can be done by clicking on the yellow pencil icon next
to the tag located at the bottom of the bug description and deleting the
'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, 
please add the tag: 'kernel-unable-to-test-upstream'.  
Once testing of the upstream kernel is complete, please mark this bug as 
Confirmed.


Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6-rc5-quantal/

** Changed in: linux (Ubuntu)
   Importance: Undecided = High

** Tags added: kernel-da-key

** Tags added: file-ra-state-init

** Tags added: needs-upstream-testing

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1049267

Title:
  XFS corruption on machine which never suffered a hard reset or disk
  failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-09-11 Thread xor
After that, what happened very often is the following:

Sep 10 11:58:11 server kernel: [386031.913144] BUG: soft lockup - CPU#0 stuck 
for 23s! [kswapd0:35]
Sep 10 11:58:11 server kernel: [386031.913200] Modules linked in: nls_iso8859_1 
nls_cp437 vfat fat usb_storage uas nfsd nfs lockd fscache binfmt_misc 
auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw mei(C) mac_hid lp parport 
xfs dm_crypt raid10 raid0 multipath linear aesni_intel cryptd aes_x86_64 usbhid 
hid raid1 raid456 async_pq async_xor xor async_memcpy async_raid6_recov 
raid6_pq async_tx i915 drm_kms_helper drm i2c_algo_bit video e1000e
Sep 10 11:58:11 server kernel: [386031.913512] CPU 0 
Sep 10 11:58:11 server kernel: [386031.913526] Modules linked in: nls_iso8859_1 
nls_cp437 vfat fat usb_storage uas nfsd nfs lockd fscache binfmt_misc 
auth_rpcgss nfs_acl sunrpc psmouse joydev serio_raw mei(C) mac_hid lp parport 
xfs dm_crypt raid10 raid0 multipath linear aesni_intel cryptd aes_x86_64 usbhid 
hid raid1 raid456 async_pq async_xor xor async_memcpy async_raid6_recov 
raid6_pq async_tx i915 drm_kms_helper drm i2c_algo_bit video e1000e
Sep 10 11:58:11 server kernel: [386031.921028] 
Sep 10 11:58:11 server kernel: [386031.923600] Pid: 35, comm: kswapd0 Tainted: 
G  D  C   3.2.0-30-generic #48-Ubuntu  /DH67GD
Sep 10 11:58:11 server kernel: [386031.926215] RIP: 0010:[8103dc4d]  
[8103dc4d] __ticket_spin_lock+0xd/0x30
Sep 10 11:58:11 server kernel: [386031.928810] RSP: 0018:88020f911b80  
EFLAGS: 0286
Sep 10 11:58:11 server kernel: [386031.931399] RAX: ed7ded7d RBX: 
88021f20ec40 RCX: 880073983d80
Sep 10 11:58:11 server kernel: [386031.933958] RDX: 88013113d740 RSI: 
0001 RDI: 88013113d71c
Sep 10 11:58:11 server kernel: [386031.936490] RBP: 88020f911b80 R08: 
0001 R09: dead00200200
Sep 10 11:58:11 server kernel: [386031.939047] R10:  R11: 
dead00200200 R12: 
Sep 10 11:58:11 server kernel: [386031.941586] R13:  R14: 
0020 R15: 8112a74f
Sep 10 11:58:11 server kernel: [386031.944133] FS:  () 
GS:88021f20() knlGS:
Sep 10 11:58:11 server kernel: [386031.946716] CS:  0010 DS:  ES:  CR0: 
8005003b
Sep 10 11:58:11 server kernel: [386031.949245] CR2: 7f6cd7267400 CR3: 
01c05000 CR4: 000406f0
Sep 10 11:58:11 server kernel: [386031.951695] DR0:  DR1: 
 DR2: 
Sep 10 11:58:11 server kernel: [386031.954102] DR3:  DR6: 
0ff0 DR7: 0400
Sep 10 11:58:11 server kernel: [386031.956484] Process kswapd0 (pid: 35, 
threadinfo 88020f91, task 88020f908000)
Sep 10 11:58:11 server kernel: [386031.958863] Stack:
Sep 10 11:58:11 server kernel: [386031.961214]  88020f911b90 
8165a41e 88020f911c00 8118eadf
Sep 10 11:58:11 server kernel: [386031.963586]  88020c7f1000 
880073983d80 88013113d740 88018d115600
Sep 10 11:58:11 server kernel: [386031.965963]  88020f911bd0 
88013113d740 88020f911c30 8801765034dc
Sep 10 11:58:11 server kernel: [386031.968308] Call Trace:
Sep 10 11:58:11 server kernel: [386031.970613]  [8165a41e] 
_raw_spin_lock+0xe/0x20
Sep 10 11:58:11 server kernel: [386031.972937]  [8118eadf] 
shrink_dentry_list+0x4f/0x370
Sep 10 11:58:11 server kernel: [386031.975267]  [8118f93a] 
prune_dcache_sb+0x15a/0x190
Sep 10 11:58:11 server kernel: [386031.977579]  [8117b083] 
prune_super+0xe3/0x1a0
Sep 10 11:58:11 server kernel: [386031.979859]  [81129834] 
shrink_slab+0x154/0x310
Sep 10 11:58:11 server kernel: [386031.982124]  [8112cb3a] 
balance_pgdat+0x50a/0x6d0
Sep 10 11:58:11 server kernel: [386031.984401]  [8112ce21] 
kswapd+0x121/0x210
Sep 10 11:58:11 server kernel: [386031.986658]  [8112cd00] ? 
balance_pgdat+0x6d0/0x6d0
Sep 10 11:58:11 server kernel: [386031.988796]  [8108a03c] 
kthread+0x8c/0xa0
Sep 10 11:58:11 server kernel: [386031.991028]  [81664b74] 
kernel_thread_helper+0x4/0x10
Sep 10 11:58:11 server kernel: [386031.993263]  [81089fb0] ? 
flush_kthread_worker+0xa0/0xa0
Sep 10 11:58:11 server kernel: [386031.995473]  [81664b70] ? 
gs_change+0x13/0x13
Sep 10 11:58:11 server kernel: [386031.997681] Code: c1 51 da 03 81 48 c7 c2 4e 
da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 b8 00 00 01 
00 48 89 e5 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 13 66 0f 1f 84 00 00 00 00 
00 f3 90 
Sep 10 11:58:11 server kernel: [386032.002613] Call Trace:
Sep 10 11:58:11 server kernel: [386032.005004]  [8165a41e] 
_raw_spin_lock+0xe/0x20
Sep 10 11:58:11 server kernel: [386032.007426]  [8118eadf] 
shrink_dentry_list+0x4f/0x370
Sep 10 11:58:11 server kernel: [386032.009852]  [8118f93a] 
prune_dcache_sb+0x15a/0x190
Sep 10 11:58:11 server 

[Bug 1049267] Re: XFS corruption on machine which never suffered a hard reset or disk failure

2012-09-11 Thread Edward Donovan
** Package changed: ubuntu = linux (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1049267

Title:
  XFS corruption on machine which never suffered a hard reset or disk
  failure

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049267/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs