[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-06-11 Thread Manoj Iyer
** Changed in: linux (Ubuntu Artful)
   Status: Incomplete => Won't Fix

** Changed in: ubuntu-power-systems
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757402

Title:
  Ubuntu18.04:pKVM - Host  in hung state and out of network after few
  hours of stress run on all guests

Status in The Ubuntu-power-systems project:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Artful:
  Won't Fix
Status in linux source package in Bionic:
  Fix Released

Bug description:
  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .

  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.

  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic

  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off

  > Hugepage set up

  echo 8500 > /proc/sys/vm/nr_hugepages

  > Defined the guests from host machine

  IdName   State
  
   2 boslcp3g2  shut off
   3 boslcp3g3  shut off
   4 boslcp3g4  shut off
   6 boslcp3g1  shut off
   7 boslcp3g5  shut off

  > Started and installed ubuntu1804 daily build on all the guests.

  root@boslcp3:~# virsh list --all
   IdName   State
  
   2 boslcp3g2  running
   3 boslcp3g3  running
   4 boslcp3g4  running
   6 boslcp3g1  running
   7 boslcp3g5  running

  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.

  > Run went fine for few hours on all guests.

  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below

  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100

  > Not able to get the prompt

  > Ping /shh to boslcp3 also fails

  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable

  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host

  > boslcp3 is not reachable

  > Attached boslcp3 host console logs

  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs

  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===

  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 
c0c77980
  [72072.290223] REGS: c000200e140fb790 TRAP: 0901   Tainted: GWL   
(4.13.0-32-generic)
  [72072.290224] 

[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-04-23 Thread Andrew Cloke
** Changed in: linux (Ubuntu Bionic)
   Status: Fix Committed => Fix Released

** Changed in: ubuntu-power-systems
   Status: Triaged => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757402

Title:
  Ubuntu18.04:pKVM - Host  in hung state and out of network after few
  hours of stress run on all guests

Status in The Ubuntu-power-systems project:
  Incomplete
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Artful:
  Incomplete
Status in linux source package in Bionic:
  Fix Released

Bug description:
  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .

  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.

  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic

  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off

  > Hugepage set up

  echo 8500 > /proc/sys/vm/nr_hugepages

  > Defined the guests from host machine

  IdName   State
  
   2 boslcp3g2  shut off
   3 boslcp3g3  shut off
   4 boslcp3g4  shut off
   6 boslcp3g1  shut off
   7 boslcp3g5  shut off

  > Started and installed ubuntu1804 daily build on all the guests.

  root@boslcp3:~# virsh list --all
   IdName   State
  
   2 boslcp3g2  running
   3 boslcp3g3  running
   4 boslcp3g4  running
   6 boslcp3g1  running
   7 boslcp3g5  running

  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.

  > Run went fine for few hours on all guests.

  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below

  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100

  > Not able to get the prompt

  > Ping /shh to boslcp3 also fails

  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable

  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host

  > boslcp3 is not reachable

  > Attached boslcp3 host console logs

  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs

  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===

  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 
c0c77980
  [72072.290223] REGS: c000200e140fb790 TRAP: 0901 

[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-04-09 Thread Andrew Cloke
** Changed in: linux (Ubuntu Artful)
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757402

Title:
  Ubuntu18.04:pKVM - Host  in hung state and out of network after few
  hours of stress run on all guests

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Artful:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .

  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.

  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic

  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off

  > Hugepage set up

  echo 8500 > /proc/sys/vm/nr_hugepages

  > Defined the guests from host machine

  IdName   State
  
   2 boslcp3g2  shut off
   3 boslcp3g3  shut off
   4 boslcp3g4  shut off
   6 boslcp3g1  shut off
   7 boslcp3g5  shut off

  > Started and installed ubuntu1804 daily build on all the guests.

  root@boslcp3:~# virsh list --all
   IdName   State
  
   2 boslcp3g2  running
   3 boslcp3g3  running
   4 boslcp3g4  running
   6 boslcp3g1  running
   7 boslcp3g5  running

  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.

  > Run went fine for few hours on all guests.

  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below

  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100

  > Not able to get the prompt

  > Ping /shh to boslcp3 also fails

  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable

  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host

  > boslcp3 is not reachable

  > Attached boslcp3 host console logs

  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs

  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===

  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 
c0c77980
  [72072.290223] REGS: c000200e140fb790 TRAP: 0901   Tainted: GWL   
(4.13.0-32-generic)
  [72072.290224] MSR: 

[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-04-05 Thread Manoj Iyer
** Changed in: linux (Ubuntu Artful)
 Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team)

** Changed in: linux (Ubuntu Artful)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757402

Title:
  Ubuntu18.04:pKVM - Host  in hung state and out of network after few
  hours of stress run on all guests

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Artful:
  New
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .

  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.

  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic

  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off

  > Hugepage set up

  echo 8500 > /proc/sys/vm/nr_hugepages

  > Defined the guests from host machine

  IdName   State
  
   2 boslcp3g2  shut off
   3 boslcp3g3  shut off
   4 boslcp3g4  shut off
   6 boslcp3g1  shut off
   7 boslcp3g5  shut off

  > Started and installed ubuntu1804 daily build on all the guests.

  root@boslcp3:~# virsh list --all
   IdName   State
  
   2 boslcp3g2  running
   3 boslcp3g3  running
   4 boslcp3g4  running
   6 boslcp3g1  running
   7 boslcp3g5  running

  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.

  > Run went fine for few hours on all guests.

  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below

  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100

  > Not able to get the prompt

  > Ping /shh to boslcp3 also fails

  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable

  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host

  > boslcp3 is not reachable

  > Attached boslcp3 host console logs

  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs

  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===

  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 
c0c77980
  [72072.290223] REGS: 

[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-03-29 Thread Thadeu Lima de Souza Cascardo
This has been committed to Bionic, as part of another bug.

However, it looks like this would affect Artful as well. Would you be
able to confirm and test a patched kernel for Artful?

Thanks.
Cascardo.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757402

Title:
  Ubuntu18.04:pKVM - Host  in hung state and out of network after few
  hours of stress run on all guests

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Artful:
  New
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .

  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.

  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic

  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off

  > Hugepage set up

  echo 8500 > /proc/sys/vm/nr_hugepages

  > Defined the guests from host machine

  IdName   State
  
   2 boslcp3g2  shut off
   3 boslcp3g3  shut off
   4 boslcp3g4  shut off
   6 boslcp3g1  shut off
   7 boslcp3g5  shut off

  > Started and installed ubuntu1804 daily build on all the guests.

  root@boslcp3:~# virsh list --all
   IdName   State
  
   2 boslcp3g2  running
   3 boslcp3g3  running
   4 boslcp3g4  running
   6 boslcp3g1  running
   7 boslcp3g5  running

  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.

  > Run went fine for few hours on all guests.

  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below

  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100

  > Not able to get the prompt

  > Ping /shh to boslcp3 also fails

  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable

  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host

  > boslcp3 is not reachable

  > Attached boslcp3 host console logs

  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs

  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===

  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 
c0c77980
  

[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-03-29 Thread Thadeu Lima de Souza Cascardo
** Description changed:

  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .
  
  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.
  
  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic
  
  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off
  
  > Hugepage set up
  
  echo 8500 > /proc/sys/vm/nr_hugepages
  
  > Defined the guests from host machine
  
  IdName   State
  
-  2 boslcp3g2  shut off
-  3 boslcp3g3  shut off
-  4 boslcp3g4  shut off
-  6 boslcp3g1  shut off
-  7 boslcp3g5  shut off
- 
+  2 boslcp3g2  shut off
+  3 boslcp3g3  shut off
+  4 boslcp3g4  shut off
+  6 boslcp3g1  shut off
+  7 boslcp3g5  shut off
  
  > Started and installed ubuntu1804 daily build on all the guests.
  
  root@boslcp3:~# virsh list --all
-  IdName   State
+  IdName   State
  
-  2 boslcp3g2  running
-  3 boslcp3g3  running
-  4 boslcp3g4  running
-  6 boslcp3g1  running
-  7 boslcp3g5  running
+  2 boslcp3g2  running
+  3 boslcp3g3  running
+  4 boslcp3g4  running
+  6 boslcp3g1  running
+  7 boslcp3g5  running
  
  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.
  
  > Run went fine for few hours on all guests.
  
  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below
  
  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  
  > Not able to get the prompt
  
  > Ping /shh to boslcp3 also fails
  
  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable
  
  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host
  
  > boslcp3 is not reachable
  
  > Attached boslcp3 host console logs
  
  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs
  
  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===
  
  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 
c0c77980
  [72072.290223] REGS: c000200e140fb790 TRAP: 0901   Tainted: GWL   
(4.13.0-32-generic)
  [72072.290224] 

[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-03-26 Thread Frank Heimes
** Changed in: ubuntu-power-systems
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757402

Title:
  Ubuntu18.04:pKVM - Host  in hung state and out of network after few
  hours of stress run on all guests

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .

  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.

  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic

  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off

  > Hugepage set up

  echo 8500 > /proc/sys/vm/nr_hugepages

  > Defined the guests from host machine

  IdName   State
  
   2 boslcp3g2  shut off
   3 boslcp3g3  shut off
   4 boslcp3g4  shut off
   6 boslcp3g1  shut off
   7 boslcp3g5  shut off

  
  > Started and installed ubuntu1804 daily build on all the guests.

  root@boslcp3:~# virsh list --all
   IdName   State
  
   2 boslcp3g2  running
   3 boslcp3g3  running
   4 boslcp3g4  running
   6 boslcp3g1  running
   7 boslcp3g5  running

  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.

  > Run went fine for few hours on all guests.

  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below

  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100

  > Not able to get the prompt

  > Ping /shh to boslcp3 also fails

  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable

  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host

  > boslcp3 is not reachable

  > Attached boslcp3 host console logs

  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs

  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===

  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 
c0c77980
  [72072.290223] REGS: c000200e140fb790 TRAP: 0901   Tainted: GWL   
(4.13.0-32-generic)
  [72072.290224] MSR: 9280b033 
  [72072.290235]   CR: 

[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-03-26 Thread Manoj Iyer
** Changed in: linux (Ubuntu Bionic)
 Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) => 
Canonical Kernel Team (canonical-kernel-team)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757402

Title:
  Ubuntu18.04:pKVM - Host  in hung state and out of network after few
  hours of stress run on all guests

Status in The Ubuntu-power-systems project:
  Triaged
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .

  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.

  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic

  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off

  > Hugepage set up

  echo 8500 > /proc/sys/vm/nr_hugepages

  > Defined the guests from host machine

  IdName   State
  
   2 boslcp3g2  shut off
   3 boslcp3g3  shut off
   4 boslcp3g4  shut off
   6 boslcp3g1  shut off
   7 boslcp3g5  shut off

  
  > Started and installed ubuntu1804 daily build on all the guests.

  root@boslcp3:~# virsh list --all
   IdName   State
  
   2 boslcp3g2  running
   3 boslcp3g3  running
   4 boslcp3g4  running
   6 boslcp3g1  running
   7 boslcp3g5  running

  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.

  > Run went fine for few hours on all guests.

  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below

  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100

  > Not able to get the prompt

  > Ping /shh to boslcp3 also fails

  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable

  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host

  > boslcp3 is not reachable

  > Attached boslcp3 host console logs

  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs

  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===

  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 
c0c77980
  [72072.290223] REGS: c000200e140fb790 TRAP: 0901   Tainted: GWL   
(4.13.0-32-generic)
  

[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-03-21 Thread Joseph Salisbury
** Changed in: linux (Ubuntu)
   Status: New => Triaged

** Changed in: linux (Ubuntu)
   Importance: Undecided => Critical

** Also affects: linux (Ubuntu Bionic)
   Importance: Critical
 Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
   Status: Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757402

Title:
  Ubuntu18.04:pKVM - Host  in hung state and out of network after few
  hours of stress run on all guests

Status in The Ubuntu-power-systems project:
  New
Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .

  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.

  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic

  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off

  > Hugepage set up

  echo 8500 > /proc/sys/vm/nr_hugepages

  > Defined the guests from host machine

  IdName   State
  
   2 boslcp3g2  shut off
   3 boslcp3g3  shut off
   4 boslcp3g4  shut off
   6 boslcp3g1  shut off
   7 boslcp3g5  shut off

  
  > Started and installed ubuntu1804 daily build on all the guests.

  root@boslcp3:~# virsh list --all
   IdName   State
  
   2 boslcp3g2  running
   3 boslcp3g3  running
   4 boslcp3g4  running
   6 boslcp3g1  running
   7 boslcp3g5  running

  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.

  > Run went fine for few hours on all guests.

  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below

  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100

  > Not able to get the prompt

  > Ping /shh to boslcp3 also fails

  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable

  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host

  > boslcp3 is not reachable

  > Attached boslcp3 host console logs

  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs

  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===

  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 

[Kernel-packages] [Bug 1757402] Re: Ubuntu18.04:pKVM - Host in hung state and out of network after few hours of stress run on all guests

2018-03-21 Thread Andrew Cloke
** Also affects: ubuntu-power-systems
   Importance: Undecided
   Status: New

** Changed in: ubuntu-power-systems
   Importance: Undecided => Critical

** Changed in: ubuntu-power-systems
 Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team)

** Tags added: triage-g

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1757402

Title:
  Ubuntu18.04:pKVM - Host  in hung state and out of network after few
  hours of stress run on all guests

Status in The Ubuntu-power-systems project:
  New
Status in linux package in Ubuntu:
  New

Bug description:
  == Comment: #0 - INDIRA P. JOGA  - 2018-02-11 
12:37:25 ==
  Problem Description:
  ===
  After few hours of run system is in hung state with, "rcu_sched detected 
stalls on CPUs/tasks" messages on the host IPMI console and host is out of 
network .

  Steps to re-create:
  ==
  > Installed Ubuntu1804 on boslcp3 host.

  root@boslcp3:~# uname -a
  Linux boslcp3 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:15:55 UTC 2018 
ppc64le ppc64le ppc64le GNU/Linux
  root@boslcp3:~# uname -r
  4.13.0-25-generic

  > root@boslcp3:~# ppc64_cpu --smt
  SMT is off

  > Hugepage set up

  echo 8500 > /proc/sys/vm/nr_hugepages

  > Defined the guests from host machine

  IdName   State
  
   2 boslcp3g2  shut off
   3 boslcp3g3  shut off
   4 boslcp3g4  shut off
   6 boslcp3g1  shut off
   7 boslcp3g5  shut off

  
  > Started and installed ubuntu1804 daily build on all the guests.

  root@boslcp3:~# virsh list --all
   IdName   State
  
   2 boslcp3g2  running
   3 boslcp3g3  running
   4 boslcp3g4  running
   6 boslcp3g1  running
   7 boslcp3g5  running

  > Started regression run (IO_BASE_TCP_NFS) tests on all 5 guests.
  NOTE: Removed madvise test case from BASE focus areas.

  > Run went fine for few hours on all guests.

  >  After few hours of run ,Host system is in hung state and  host
  console dumps CPU stall messages as below

  [SOL Session operational.  Use ~? for help]
  [250867.133429] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250867.133499]   (detected by 86, t=62711832 jiffies, g=497, c=496, 
q=31987857)
  [250867.133554] All QSes seen, last rcu_sched kthread activity 62711828 
(4357609080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250867.133690] rcu_sched kthread starved for 62711828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250931.133433] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250931.133494]   (detected by 3, t=62727832 jiffies, g=497, c=496, 
q=31995625)
  [250931.133572] All QSes seen, last rcu_sched kthread activity 62727828 
(4357625080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250931.133741] rcu_sched kthread starved for 62727828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
  [250995.133432] INFO: rcu_sched detected stalls on CPUs/tasks:
  [250995.133480]   (detected by 54, t=62743832 jiffies, g=497, c=496, 
q=32004479)
  [250995.133526] All QSes seen, last rcu_sched kthread activity 62743828 
(4357641080-4294897252), jiffies_till_next_fqs=1, root ->qsmask 0x0
  [250995.133645] rcu_sched kthread starved for 62743828 jiffies! g497 c496 
f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100

  > Not able to get the prompt

  > Ping /shh to boslcp3 also fails

  [ipjoga@kte ~]$ ping boslcp3
  PING boslcp3.isst.aus.stglabs.ibm.com (10.33.0.157) 56(84) bytes of data.
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=1 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=2 Destination Host 
Unreachable
  From kte.isst.aus.stglabs.ibm.com (10.33.11.31) icmp_seq=3 Destination Host 
Unreachable

  [ipjoga@kte ~]$ ssh root@boslcp3
  ssh: connect to host boslcp3 port 22: No route to host

  > boslcp3 is not reachable

  > Attached boslcp3 host console logs

  == Comment: #1 - INDIRA P. JOGA  - 2018-02-11 
12:39:29 ==
  Added Host console logs

  == Comment: #24 - VIPIN K. PARASHAR  - 2018-02-16 
05:46:13 ==
  From Linux logs
  ===

  [72072.290071] watchdog: BUG: soft lockup - CPU#132 stuck for 22s! [CPU 
12/KVM:15579]
  [72072.290218] CPU: 132 PID: 15579 Comm: CPU 12/KVM Tainted: GWL  
4.13.0-32-generic #35-Ubuntu
  [72072.290220] task: c000200debf82e00 task.stack: c000200e140f8000
  [72072.290221] NIP: c0c779e0 LR: c008166893a0 CTR: 
c0c77980
  [72072.290223] REGS: