There are certainly no user inactivity monitors in the default Ubuntu
Server install and we have never seen systemd-suspend.service triggered
on any of our systems.  I don't think this is a systemd bug - it's
either a configuration error, or it's a bug in whatever is triggering
systemd-suspend.service (which systemd itself does not do).

This bug report indicates that there is desktop software installed on
the system where this bug was encountered.  There certainly may be idle
timers in desktop software that trigger a suspend; and there may be bugs
in how those triggers detect that the system is "idle"; possibly
including platform-specific bugs.  Canonical does not support the Ubuntu
Desktop on POWER.  Is this problem reproducible for you with a pristine
install of Ubuntu Server?

** Changed in: systemd (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1758273

Title:
  DD2.2 freezes/hangs after 20mins of uptime

Status in The Ubuntu-power-systems project:
  Triaged
Status in systemd package in Ubuntu:
  Incomplete

Bug description:
  == Comment: #0 - Application Cdeadmin  - 2018-03-19 09:30:53 ==
  == Comment: #1 - Application Cdeadmin <> - 2018-03-19 09:30:55 ==
  == Comment: #2 - Application Cdeadmin <> - 2018-03-19 09:30:57 ==
  ------- Comment From brihh 2018-03-19 09:10:30 EDT -------
  Needless to say, machine is pretty unusable. Needed for Performance testing 
for release.

  == Comment: #3 - Application Cdeadmin <> - 2018-03-19 10:10:59 ==
  ------- Comment From vaibhav92 2018-03-19 10:06:17 EDT -------
  @pridhiviraj Any idea who can look into this ?
  ------- Comment From dougmill-ibm 2018-03-19 10:09:59 EDT -------
  If machine is locked-up and console is unresponsive, try collecting eSEL data 
from the BMC.

  == Comment: #4 - Application Cdeadmin <c> - 2018-03-19 10:40:59 ==
  ------- Comment From pridhiviraj 2018-03-19 10:38:47 EDT -------
  @brihh Can you use latest 03/15 PNOR and re-create the issue. And also before 
it hangs please collect OPAL and kernel logs. @vaibhav92 If it is re-creatable 
OPAL/EM team need to look at it.

  == Comment: #5 - Application Cdeadmin <> - 2018-03-19 11:01:58 ==
  ------- Comment From vaibhav92 2018-03-19 10:41:53 EDT -------
  Saw this in the kernel-log of the system:

  [ 1247.404962] PM: suspend entry (s2idle)
  [ 1247.404970] PM: Syncing filesystems ... done.

  Looks like its getting suspended after 20 mins of inactivity. Looking at the 
/etc/systemd/logind.conf see IdleAction by default is 'ignore':
  #IdleAction=ignore
  #IdleActionSec=30min

  But clearly someone is issuing a suspend to the system. So this probably need 
to be looked by the distro/Power-Management/EM team
  ------- Comment From megcurry 2018-03-19 10:43:14 EDT -------
  Please advise re. Assignments and Labels that would get the right team 
working on this.....Mirror label gets a Bz opened, and that is necessary or at 
least useful for some of the LTC teams to look at things, right?
  ------- Comment From brihh 2018-03-19 10:47:45 EDT -------
  odd about inactivity - i once had a test running for 20mins and it still 
froze:

  `08:52:23 up 20 min,  4 users,  load average: 64.40, 98.94, 64.32`
  ------- Comment From brihh 2018-03-19 10:49:30 EDT -------
  @pridhiviraj  where is the latest 3/15 PNOR that I can load?

  ```

  == Comment: #8 - Application Cdeadmin <> - 2018-03-19 12:40:54 ==
  ------- Comment From pridhiviraj 2018-03-19 12:34:18 EDT -------
  ```
  Mar 19 11:35:01 p215n15 rsyslogd-2007: action 'action 13' suspended, next 
retry is Mon Mar 19 11:35:31 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
  Mar 19 11:35:01 p215n15 CRON[6464]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
  Mar 19 11:42:39 p215n15 systemd[1]: Starting Cleanup of Temporary 
Directories...
  Mar 19 11:42:39 p215n15 rsyslogd-2007: action 'action 13' suspended, next 
retry is Mon Mar 19 11:43:39 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
  Mar 19 11:42:40 p215n15 systemd-tmpfiles[6504]: 
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
  Mar 19 11:42:40 p215n15 systemd[1]: Started Cleanup of Temporary Directories.
  Mar 19 11:45:01 p215n15 rsyslogd-2007: action 'action 13' suspended, next 
retry is Mon Mar 19 11:46:01 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
  Mar 19 11:45:01 p215n15 CRON[6524]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
  Mar 19 11:47:39 p215n15 systemd[1]: Starting Message of the Day...
  Mar 19 11:47:39 p215n15 rsyslogd-2007: action 'action 13' suspended, next 
retry is Mon Mar 19 11:48:39 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
  Mar 19 11:47:40 p215n15 50-motd-news[6528]:  * Meltdown, Spectre and Ubuntu: 
What are the attack vectors,
  Mar 19 11:47:40 p215n15 50-motd-news[6528]:    how the fixes work, and 
everything else you need to know
  Mar 19 11:47:40 p215n15 50-motd-news[6528]:    - https://ubu.one/u2Know
  Mar 19 11:47:40 p215n15 systemd[1]: Started Message of the Day.
  Mar 19 11:48:02 p215n15 NetworkManager[4662]: <info>  [1521474482.7252] 
manager: sleep: sleep requested (sleeping: no  enabled: yes)
  Mar 19 11:48:02 p215n15 NetworkManager[4662]: <info>  [1521474482.7259] 
manager: NetworkManager state is now ASLEEP
  Mar 19 11:48:02 p215n15 gnome-shell[5538]: Screen lock is locked down, not 
locking
  Mar 19 11:48:02 p215n15 whoopsie[5264]: [11:48:02] offline
  Mar 19 11:48:02 p215n15 systemd[1]: Reached target Sleep.
  Mar 19 11:48:02 p215n15 systemd[1]: Starting Suspend...
  Mar 19 11:48:02 p215n15 systemd-sleep[6576]: Suspending system...
  Mar 19 11:48:02 p215n15 kernel: [ 1246.906320] PM: suspend entry (s2idle)
  ```
  @vaibhav92 You are right, from the above messages looks like system is 
suspending by some cron job i guess.

  == Comment: #9 - Application Cdeadmin <> - 2018-03-19 13:30:57 ==
  ------- Comment From brihh 2018-03-19 13:28:36 EDT -------
  @pridhiviraj interesting. know offhand how i can shut that off? seems a bit 
annoying .. :-)

  == Comment: #11 - Application Cdeadmin <> - 2018-03-19 17:10:51 ==
  ------- Comment From mzipse 2018-03-19 16:52:11 EDT -------
  At our daily defect call, it was suggested that we check to see if Opal-PRD 
is running, which is a prereq for the Firmware recovery to work properly.  
Opal-PRD is an app that should be part of the distro and should automatically 
be started.  Never-the-less, if you are want to check on it, here's the command 
to run at the OS..... 

  sudo service opal-prd status

  You'll need root authority to do this.

  The output will look something like this.....
  # sudo service opal-prd status
  ? opal-prd.service - OPAL PRD daemon
     Loaded: loaded (/lib/systemd/system/opal-prd.service; enabled; vendor 
preset: enabled)
     Active: active (running) since Wed 2018-03-14 19:04:36 CDT; 4 days ago
       Docs: man:opal-prd(8)
   Main PID: 5085 (opal-prd)
      Tasks: 1
     CGroup: /system.slice/opal-prd.service
             ??5085 /usr/sbin/opal-prd --pnor /dev/mtd0

  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x00d0] 1d000000 
bd021c00 0000bf02 1c000000    *................*
  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x00e0] c1021b00 
0000c302 1d000000 00030000    *................*
  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x00f0] 0000ff06 
21465245 51000206 16000000    *....!FREQ.......*
  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0100] 5d08f600 
00006008 f6000000 6308f600    *].....`.....c...*
  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0110] 00006608 
f6000000 6908f600 00006c08    *..f.....i.....l.*
  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0120] f6000000 
6f08f600 00007208             *....o.....r.    *
  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:OCC1 rsp status=0x00, 
length=0x01CC
  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:rsp data: (up to 16 bytes)
  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0000] 03100200 
03000000 00000000 00000000    *................*
  Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:<<processOccError()

  == Comment: #13 - Application Cdeadmin <> - 2018-03-19 17:30:51 ==
  ------- Comment From brihh 2018-03-19 17:22:06 EDT -------
  @mzipse seems to be there and running:
  ```
  root@p215n15:~# service opal-prd status
  ? opal-prd.service - OPAL PRD daemon
     Loaded: loaded (/lib/systemd/system/opal-prd.service; enabled; vendor 
preset: enabled)
     Active: active (running) since Mon 2018-03-19 18:19:58 EDT; 1min 35s ago
       Docs: man:opal-prd(8)
   Main PID: 4648 (opal-prd)
      Tasks: 1 (limit: 27033)
     CGroup: /system.slice/opal-prd.service
             ??4648 /usr/sbin/opal-prd

  Mar 19 18:19:58 p215n15 opal-prd[4648]: SCOM: read: chip 0x8, addr 0x8010a50, 
val 0x0, rc 0
  Mar 19 18:19:58 p215n15 opal-prd[4648]: SCOM: read: chip 0x8, addr 0x8010a54, 
val 0x0, rc 0
  Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: PRDF:<<PRDF::noLock_initialize()
  Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: PRDF:<<PRDF::initialize()
  Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: 
ATTN_SLOW:I>Service::enableAttns() enter
  Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: 
ATTN_SLOW:I>Service::enableAttns() exit
  Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: 
ATTN_SLOW:I><<ATTN_RT::enableAttns rc: 0
  Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: calling get_ipoll_events
  Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: enabling IPOLL events 
0x5b90000000000000
  Mar 19 18:19:58 p215n15 opal-prd[4648]: FW: writing init message
  ```

  == Comment: #14 - Application Cdeadmin <> - 2018-03-20 03:50:53 ==
  ------- Comment From vaibhav92 2018-03-20 03:47:08 EDT -------
  Did some kernel tracing and it seems that someone is starting the 
systemd-suspend.service that invokes systemd-sleep which then forces a suspend 
by writing to /sys/power/state file. Further investigation is needed as to who 
is invoking the systemd-suspend.service. In the meantime you can issue command 
on the host that will disable the service and prevent system from getting 
suspended:
  `systemctl mask systemd-suspend.service`

  
  Actually, it is also possible that the presence of the suspend config 
indicates user error. If Ubuntu Server was installed, I don't think suspend 
should have been configured. Did someone install "desktop Ubuntu" (or some 
other non-server config) on a server?

  Also, 18.04 should be used, not 17.10 (and never 17.04), on P9.

  == Comment: #25 - Application Cdeadmin <> - 2018-03-23 00:20:53 ==
  ------- Comment From vaibhav92 2018-03-23 00:19:40 EDT -------
  Hi @mzipse @stewart-ibm. AFAIK this issue is not related to CAPP or CAPI at 
all. @brian_horton had confirmed that he was seeing this even without 
enabling/running any CAPI workloads. Disabling the systemd-suspend.service made 
the problem go away. Hence asked the bug to be mirrored to canonical.  @mzipse 
not sure what CAPP errors PRD team saw. Can you please ask them to get in touch 
with me.

  == Comment: #26 - Vaibhav Jain <> - 2018-03-23 00:39:45 ==
  Summary of the issue:

  Ubuntu 18.04 is forcing a system suspend after 20 mins of system boot. 
Suspend is forced even if system is running a workload or user is logged on to 
the terminal and performing any activity.  The issue goes away if 
systemd-suspend service is disabled via:
  "systemctl mask systemd-suspend.service"

  So requesting canonical to look into this issue as a possible bug in
  systemd or user inactivity monitor.

  ~ Vaibhav

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1758273/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to