Public bug reported:
Description:
------------
Call traces dumping continuously with Leaf IO and SMT tests with SMT fix
(140718) after 10+ hours of regression run and not able to get the prompt.
Steps to re-create:
------------------
> cap installed with latest ubuntu160401 kernel ,4.4.0-38-generic.
> Applied SMT kernel patch on system cap for issue:140718
root@cap:~# ls -l
total 56572
-rw-r--r-- 1 root root 18838772 Sep 22 12:24
linux-image-4.4.0-21-generic_4.4.0-21.37+smt_ppc64el.deb
-rw-r--r-- 1 root root 39081588 Sep 22 12:24
linux-image-extra-4.4.0-21-generic_4.4.0-21.37+smt_ppc64el.deb
-rw------- 1 root root 70 Sep 21 04:24 nohup.out
> Booted with above kernel
root@cap:~# uname -a
Linux cap 4.4.0-21-generic #37+smt SMP Mon Aug 29 15:07:28 CDT 2016 ppc64le
ppc64le ppc64le GNU/Linux
root@cap:~# uname -r
4.4.0-21-generic
> Enabled sysrq and also xmon before starting tests
root@cap:~# cat /proc/sys/kernel/sysrq
1
root@cap:~# cat /proc/cmdline
root=UUID=4114e1ef-5e30-45ae-a5fb-a5429946434c ro xmon=on splash quiet
crashkernel=384M-:128M
> root@cap:~/fix_140718# ppc64_cpu --smt
SMT=8
> Started tests with Leaf IO and SMT. After 10+ hours of run, ipmi console
> dumping call traces continuously and not able to get the prompt.
ssh cap is hung, ping cap is working fine
[ipjoga@kte ~]$ ssh root@cap
ipjoga@kte ~]$ ping cap
PING cap.isst.aus.stglabs.ibm.com (10.33.17.16) 56(84) bytes of data.
64 bytes from cap.isst.aus.stglabs.ibm.com (10.33.17.16): icmp_seq=1 ttl=64
time=0.095 ms
64 bytes from cap.isst.aus.stglabs.ibm.com (10.33.17.16): icmp_seq=2 ttl=64
time=0.055 ms
^C
> Attached Call traces
> Also memory in this system is
oot@cap:/kte/tools/setup.d# free -h
total used free shared buff/cache available
Mem: 1.0T 4.4G 1.0T 37M 9.4G 1.0T
Swap: 37G 0B 37G
root@cap:/kte/tools/setup.d#
UBUNTU BUILD: 4.4.0-38-generic
SL Firmware Version : IBM-garrison-ibm-OP8_v1.10_2.17
IO team thinks this is related/fixed by commit 135e8c9250dd
("sched/core: Fix a race between try_to_wake_up() and a woken up task").
We built a kernel with that patch applied and asked indira to restart
the tests.
Developer provided the fix for above issue . Applied it and restarted
Leaf IO and SMT tests which has both SMT fix and Memory barrier fix.
root@cap:~# uname -r
4.4.0-38.58+ibm-smt1-generic
Run went fine for more than 60+ hours without any system hang.
Canonical, we believe the following issue to be fixed by:
commit 135e8c9250dd ("sched/core: Fix a race between try_to_wake_up()
and a woken up task")
Which was marked to the -stable tree. Can you pull it into your kernel?
** Affects: linux (Ubuntu)
Importance: Undecided
Assignee: Taco Screen team (taco-screen-team)
Status: New
** Tags: architecture-ppc64le bugnameltc-146713 severity-critical
targetmilestone-inin16041
** Tags added: architecture-ppc64le bugnameltc-146713 severity-critical
targetmilestone-inin16041
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1629872
Title:
ISST-LTE:pNV:cap: Call traces dumping continuously after 10+ hours of
regression with Leaf IO and SMT tests with SMT fix
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1629872/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs