[Kernel-packages] [Bug 1614565] panic dmesg from chig5/Trusty

2016-10-13 Thread bugproxy
--- Comment (attachment only) From ru...@us.ibm.com 2016-09-28 14:08 
EDT---


** Attachment added: "panic dmesg from chig5/Trusty"
   
https://bugs.launchpad.net/bugs/1614565/+attachment/4760440/+files/dmesg.201609281035

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1614565

Title:
  ISST-LTE:pKVM311:lotg5:Ubutu16041:lotg5 crashed @
  writeback_sb_inodes+0x30c/0x590

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  == Comment: #0 - PRIYA M. A  - 2016-06-17 10:01:28 ==
  Problem Description:
  
  - lotg5 crashed at writeback_sb_inodes+0x30c/0x590

  Steps to re-create:
  ==
  - Install lotg5 with Ubuntu16041(4.4.0-24-generic)
  - Start the regression tests in lotg5
  Logs:
  
  root@lotg5:~# show.report.py
  HOSTNAMEKERNEL VERSION  DISTRO INFO
  ---
  lotg5   4.4.0-24-genericUbuntu 16.04 LTS \n \l

   Current Time: Fri Jun 17 01:10:46 2016 
  Job-ID  FOCUS   Start-Time  DurationFunction
  --  -   --  
  1   BASE20160614-05:50:19   67.0 hr(s) 20.0 min(s)  Test
  2   IO  20160614-05:50:26   67.0 hr(s) 20.0 min(s)  IO_Focus
  3   NFS 20160614-06:24:35   66.0 hr(s) 46.0 min(s)  
DistributeFS_Testing
  4   TCP 20160614-06:32:03   66.0 hr(s) 38.0 min(s)  
networkTest2lotg3

  FOCUS   BASEIO  NFS TCP SUM
  TOTAL   48647   1825517 82690   133679
  FAIL50280   0   24  5052
  PASS43619   1825517 82666   128627
  (%) (89%)   (100%)  (100%)  (99%)   (96%)

  DLPAR is not tested!
  root@lotg5:~#

  - After 65+ hr of execution lotg5 crashed with follwoing call traces
  Logs:
  
  [root@lotkvm ~]# virsh console lotg5
  Connected to domain lotg5
  Escape character is ^]

  0:mon> c
  cpus stopped: 0x0 0x4 0x8 0xc
  0:mon> d
      ||
  0:mon> e
  cpu 0x0: Vector: 300 (Data Access) at [c000c4f4b620]
  pc: c0323720: locked_inode_to_wb_and_lock_list+0x50/0x290
  lr: c0326dbc: writeback_sb_inodes+0x30c/0x590
  sp: c000c4f4b8a0
 msr: 80019033
 dar: 0
   dsisr: 4000
current = 0xc0017191cf60
paca= 0xc7b4   softe: 0irq_happened: 0x01
  pid   = 5792, comm = kworker/u32:5
  0:mon> t
  [c000c4f4b900] c0326dbc writeback_sb_inodes+0x30c/0x590
  [c000c4f4ba10] c0327124 __writeback_inodes_wb+0xe4/0x150
  [c000c4f4ba70] c032758c wb_writeback+0x30c/0x450
  [c000c4f4bb40] c032803c wb_workfn+0x14c/0x570
  [c000c4f4bc50] c00dd1d0 process_one_work+0x1e0/0x5a0
  [c000c4f4bce0] c00dd724 worker_thread+0x194/0x680
  [c000c4f4bd80] c00e61e0 kthread+0x110/0x130
  [c000c4f4be30] c0009538 ret_from_kernel_thread+0x5c/0xa4
  --- Exception: 0  at 
  0:mon>

  
  == Comment: #4 - Chandan Kumar  - 2016-06-20 06:23:33 ==
  dmesg log:
  -
  [251403.003999] EXT4-fs (loop0): mounted filesystem without journal. Opts: 
(null)
  [251403.471118] Unable to handle kernel paging request for data at address 
0x
  [251403.473391] Faulting instruction address: 0xc0323720  <<  PC
  -

  0:mon> di c0323720
  c0323720  e93fld  r9,0(r31)  
  // [R31 = , trying to de-reference null address]
  c0323724  39290050addir9,r9,80
  c0323728  7fbf4840cmpld   cr7,r31,r9

  

  Dominic,

  Can you please take a look and assign this to suitable developer.

  Thanks,
  Chandan

  == Comment: #6 - Laurent Dufour  - 2016-06-20 
13:03:15 ==
  It sounds that inode->i_wb has been cleared while waiting for IO to be 
dropped in writeback_sb_inodes().

  That's need to be double checked...

  == Comment: #10 - Laurent Dufour  - 2016-06-21 
05:11:35 ==
  That seems to be an already known issue raised by commit 43d1c0eb7e11 "block: 
detach bdev inode from its wb in __blkdev_put()".

  There is a patch pushed on the lkml but there is still on going discussion 
about it :
  https://patchwork.kernel.org/patch/9184495/
  https://lkml.org/lkml/2016/6/17/676

  == Comment: #13 - Laurent Dufour  - 2016-06-22 
03:29:00 ==
  It appears that the right way to fix that would be 
https://patchwork.kernel.org/patch/9187409/.

  I may build a patched ubuntu kernel on your node and you may restart the test 
again.
  Do 

[Kernel-packages] [Bug 1614565] panic dmesg from chig5/Trusty

2016-09-28 Thread bugproxy
--- Comment (attachment only) From ru...@us.ibm.com 2016-09-28 14:08 
EDT---


** Attachment added: "panic dmesg from chig5/Trusty"
   
https://bugs.launchpad.net/bugs/1614565/+attachment/4750345/+files/dmesg.201609281035

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1614565

Title:
  ISST-LTE:pKVM311:lotg5:Ubutu16041:lotg5 crashed @
  writeback_sb_inodes+0x30c/0x590

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  == Comment: #0 - PRIYA M. A  - 2016-06-17 10:01:28 ==
  Problem Description:
  
  - lotg5 crashed at writeback_sb_inodes+0x30c/0x590

  Steps to re-create:
  ==
  - Install lotg5 with Ubuntu16041(4.4.0-24-generic)
  - Start the regression tests in lotg5
  Logs:
  
  root@lotg5:~# show.report.py
  HOSTNAMEKERNEL VERSION  DISTRO INFO
  ---
  lotg5   4.4.0-24-genericUbuntu 16.04 LTS \n \l

   Current Time: Fri Jun 17 01:10:46 2016 
  Job-ID  FOCUS   Start-Time  DurationFunction
  --  -   --  
  1   BASE20160614-05:50:19   67.0 hr(s) 20.0 min(s)  Test
  2   IO  20160614-05:50:26   67.0 hr(s) 20.0 min(s)  IO_Focus
  3   NFS 20160614-06:24:35   66.0 hr(s) 46.0 min(s)  
DistributeFS_Testing
  4   TCP 20160614-06:32:03   66.0 hr(s) 38.0 min(s)  
networkTest2lotg3

  FOCUS   BASEIO  NFS TCP SUM
  TOTAL   48647   1825517 82690   133679
  FAIL50280   0   24  5052
  PASS43619   1825517 82666   128627
  (%) (89%)   (100%)  (100%)  (99%)   (96%)

  DLPAR is not tested!
  root@lotg5:~#

  - After 65+ hr of execution lotg5 crashed with follwoing call traces
  Logs:
  
  [root@lotkvm ~]# virsh console lotg5
  Connected to domain lotg5
  Escape character is ^]

  0:mon> c
  cpus stopped: 0x0 0x4 0x8 0xc
  0:mon> d
      ||
  0:mon> e
  cpu 0x0: Vector: 300 (Data Access) at [c000c4f4b620]
  pc: c0323720: locked_inode_to_wb_and_lock_list+0x50/0x290
  lr: c0326dbc: writeback_sb_inodes+0x30c/0x590
  sp: c000c4f4b8a0
 msr: 80019033
 dar: 0
   dsisr: 4000
current = 0xc0017191cf60
paca= 0xc7b4   softe: 0irq_happened: 0x01
  pid   = 5792, comm = kworker/u32:5
  0:mon> t
  [c000c4f4b900] c0326dbc writeback_sb_inodes+0x30c/0x590
  [c000c4f4ba10] c0327124 __writeback_inodes_wb+0xe4/0x150
  [c000c4f4ba70] c032758c wb_writeback+0x30c/0x450
  [c000c4f4bb40] c032803c wb_workfn+0x14c/0x570
  [c000c4f4bc50] c00dd1d0 process_one_work+0x1e0/0x5a0
  [c000c4f4bce0] c00dd724 worker_thread+0x194/0x680
  [c000c4f4bd80] c00e61e0 kthread+0x110/0x130
  [c000c4f4be30] c0009538 ret_from_kernel_thread+0x5c/0xa4
  --- Exception: 0  at 
  0:mon>

  
  == Comment: #4 - Chandan Kumar  - 2016-06-20 06:23:33 ==
  dmesg log:
  -
  [251403.003999] EXT4-fs (loop0): mounted filesystem without journal. Opts: 
(null)
  [251403.471118] Unable to handle kernel paging request for data at address 
0x
  [251403.473391] Faulting instruction address: 0xc0323720  <<  PC
  -

  0:mon> di c0323720
  c0323720  e93fld  r9,0(r31)  
  // [R31 = , trying to de-reference null address]
  c0323724  39290050addir9,r9,80
  c0323728  7fbf4840cmpld   cr7,r31,r9

  

  Dominic,

  Can you please take a look and assign this to suitable developer.

  Thanks,
  Chandan

  == Comment: #6 - Laurent Dufour  - 2016-06-20 
13:03:15 ==
  It sounds that inode->i_wb has been cleared while waiting for IO to be 
dropped in writeback_sb_inodes().

  That's need to be double checked...

  == Comment: #10 - Laurent Dufour  - 2016-06-21 
05:11:35 ==
  That seems to be an already known issue raised by commit 43d1c0eb7e11 "block: 
detach bdev inode from its wb in __blkdev_put()".

  There is a patch pushed on the lkml but there is still on going discussion 
about it :
  https://patchwork.kernel.org/patch/9184495/
  https://lkml.org/lkml/2016/6/17/676

  == Comment: #13 - Laurent Dufour  - 2016-06-22 
03:29:00 ==
  It appears that the right way to fix that would be 
https://patchwork.kernel.org/patch/9187409/.

  I may build a patched ubuntu kernel on your node and you may restart the test 
again.
  Do