** Description changed:

  [Impact]
  
  The node which mounts a ceph rbd volume causes a panic when all OSD
  daemons on the all ceph nodes are restarted.
  
- [642981.871592] ------------[ cut here ]------------ 
+ [642981.871592] ------------[ cut here ]------------
  [642981.912255] kernel BUG at
- /build/buildd/linux-3.13.0/net/ceph/osd_client.c:892! 
- [642981.994517] invalid opcode: 0000 [#1] SMP 
+ /build/buildd/linux-3.13.0/net/ceph/osd_client.c:892!
+ [642981.994517] invalid opcode: 0000 [#1] SMP
  [642982.037227] Modules linked in: xt_multiport iptable_mangle xt_nat
  xt_tcpudp veth xfs rbd libceph libcrc32c xt_addrtype xt_conntrack
  ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
  iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge aufs
  ipmi_devintf joydev gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp
  kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
  aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd hid_generic mei_me
  ioatdma mei lpc_ich wmi ipmi_si 8021q garp stp mrp llc bonding
  acpi_power_meter mac_hid lp parport ixgbe usbhid dca tg3 ahci libahci hid
- ptp megaraid_sas mdio pps_core 
+ ptp megaraid_sas mdio pps_core
  [642982.528519] CPU: 0 PID: 1062099 Comm: kworker/0:6 Not tainted
- 3.13.0-45-generic #74-Ubuntu 
- [642982.648057] Hardware name: NEC Express5800/R120f-1M 
- [N8100-2203Y]/MS-S0901, BIOS 5.0.4016 12/17/2014 
- [642982.775433] Workqueue: ceph-msgr con_work [libceph] 
+ 3.13.0-45-generic #74-Ubuntu
+ [642982.648057] Hardware name: NEC Express5800/R120f-1M
+ [N8100-2203Y]/MS-S0901, BIOS 5.0.4016 12/17/2014
+ [642982.775433] Workqueue: ceph-msgr con_work [libceph]
  [642982.841300] task: ffff881028444800 ti: ffff880d92374000 task.ti:
- ffff880d92374000 
+ ffff880d92374000
  [642982.973255] RIP: 0010:[<ffffffffa025f5be>] [<ffffffffa025f5be>]
- osd_reset+0x22e/0x2c0 [libceph] 
- [642983.114484] RSP: 0018:ffff880d92375d80 EFLAGS: 00010283 
+ osd_reset+0x22e/0x2c0 [libceph]
+ [642983.114484] RSP: 0018:ffff880d92375d80 EFLAGS: 00010283
  [642983.188540] RAX: ffff8800197f2ca8 RBX: ffff882028194750 RCX:
- ffff880036bcdc48 
+ ffff880036bcdc48
  [642983.334096] RDX: ffff8800197f2ca8 RSI: ffff8800197f2c10 RDI:
- 0000000000000286 
+ 0000000000000286
  [642983.485552] RBP: ffff880d92375dd8 R08: 0000000000000000 R09:
- 0000000000000000 
+ 0000000000000000
  [642983.643277] R10: ffffffff8160afcf R11: ffffea00710cae00 R12:
- ffff8800197f2c58 
+ ffff8800197f2c58
  [642983.805364] R13: ffff882028194810 R14: ffff880036bcdbf8 R15:
- ffff880036bcdc18 
+ ffff880036bcdc18
  [642983.968728] FS: 0000000000000000(0000) GS:ffff88103fa00000(0000)
- knlGS:0000000000000000 
- [642984.135368] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
+ knlGS:0000000000000000
+ [642984.135368] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [642984.220577] CR2: 00007f60d4cb7868 CR3: 0000000001c0e000 CR4:
- 00000000001407f0 
- [642984.383051] Stack: 
+ 00000000001407f0
+ [642984.383051] Stack:
  [642984.459809] ffff8820281947a8 ffff882028194760 ffff8800197f2800
- ffff8800197f2ca8 
+ ffff8800197f2ca8
  [642984.618038] ffff880d92375da0 ffff880d92375da0 ffff8800197f2c10
- ffff8800197f2830 
+ ffff8800197f2830
  
  [Fix]
  
  A linked list to manage OSDs in the kernel was corrupted when restarting
- all OSD daemons on all ceph nodes at the almost same time. 
+ all OSD daemons on all ceph nodes at the almost same time.
  
  The issues must be fixed by the following.
  
- libceph: must use new tid when watch is resent 
- http://tracker.ceph.com/issues/8806 
+ libceph: must use new tid when watch is resent
+ http://tracker.ceph.com/issues/8806
  
  This includes two patched and they has been already released.
  
- http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20878 
- [PATCH 1/2] libceph: abstract out ceph_osd_request enqueue logic 
- [PATCH 2/2] libceph: resend lingering requests with a new tid 
+ http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20878
+ [PATCH 1/2] libceph: abstract out ceph_osd_request enqueue logic
+ [PATCH 2/2] libceph: resend lingering requests with a new tid
  
  3.18 kernel adopts the fixes.
  
- libceph: abstract out ceph_osd_request enqueue logic 
- 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f671b581f1dac61354186b7373af5f97fe420584
 
- libceph: resend lingering requests with a new tid 
- 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2cc6128ab2afff7864dbdc33a73e2deaa935d9e0
 
+ libceph: abstract out ceph_osd_request enqueue logic
+ 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f671b581f1dac61354186b7373af5f97fe420584
+ libceph: resend lingering requests with a new tid
+ 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2cc6128ab2afff7864dbdc33a73e2deaa935d9e0
  
  [Test Case]
  
  After setting up the ceph environment, repeatedly issued the following
- command from a node to all ceph nodes. 
+ command from a node to all ceph nodes.
  
  rsh -i key -l ubuntu sn_hostname sudo service ceph-all restart
  
  And verify if there is panics.
+ 
+ A test kernel with this fix was verified to fix this problem.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1488035

Title:
  OSDs Linked list corruption causes kernel BUG at
  /build/buildd/linux-3.13.0/net/ceph/osd_client.c:892!

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1488035/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to