Hi Ivan, the best way to engage Canonical Support to get assistance with
this issue will be to file a support case on support.canonical.com and
attach an sosreport of the affected system that is collected when the
issue happens. See my previous comment #5 for the details of sosreport.
Please check with Stephen Zarkos if you need access to the Canonical
Support Portal.

One other idea that may help in case your system is not responsive is to
have a serial console output logged in a gnu screen or tmux session.
Inside this console session, you can enable the maximum log level ("echo
9 > /proc/sysrq-trigger", you might have to run "sysctl -w
kernel.sysrq=1" before to enable sysrq) and run "dmesg -w", which will
dump dmesg and continuously append new entries to the kernel log. This
way, you won't depend on saving logs to the disk to see what's going on,
since the disk access could freeze in the moment of the failure.

You can also enable kdump and all the "panic_on_X" sysctl settings
(section Enabling various types of panics in CrashdumpRecipe
article[1]). If the system is locking up so hard that it freezes, it may
then capture a dump so that we can see what's going on. Refer to the
CrashdumpRecipe article[1] for more information.

[1] https://wiki.ubuntu.com/Kernel/CrashdumpRecipe

Thank you,
David

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1788643

Title:
  zombies pile up, system becomes unresponsive

Status in systemd package in Ubuntu:
  New

Bug description:
  Description:    Ubuntu 16.04.5 LTS
  Release:        16.04

  systemd:
    Installed: 229-4ubuntu21.4
    Candidate: 229-4ubuntu21.4
    Version table:
   *** 229-4ubuntu21.4 500
          500 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 
Packages
          100 /var/lib/dpkg/status
       229-4ubuntu21.1 500
          500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 
Packages
       229-4ubuntu4 500
          500 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

  This problem is in Azure. We are seeing these problems on different
  systems. Worker nodes (Ubuntu 16.04) in a hadoop cluster start piling
  up zombies and become unresponsive. The syslog and the kernel logs
  don't provide much information.

  The only error we could correlate with what we are seeing was in the
  audit logs. See at the end of this message, the "Connection timed out"
  and the "Cannot create session: Already running in a session"
  messages.

  Our first suspect was memory pressure on the machines. We added
  logging and settings to reboot on out of memory, but all these turned
  to be red herrings.

  Aug 18 19:11:08 wn2-d3ncsp su[112600]: Successful su for root by root
  Aug 18 19:11:08 wn2-d3ncsp su[112600]: + ??? root:root
  Aug 18 19:11:08 wn2-d3ncsp su[112600]: pam_unix(su:session): session opened 
for user root by (uid=0)
  Aug 18 19:11:08 wn2-d3ncsp systemd-logind[1486]: New session c8 of user root.
  Aug 18 19:11:26 wn2-d3ncsp sshd[112690]: Did not receive identification 
string from 10.84.93.35
  Aug 18 19:11:34 wn2-d3ncsp su[112600]: pam_systemd(su:session): Failed to 
create session: Connection timed out
  Aug 18 19:11:34 wn2-d3ncsp su[112600]: pam_unix(su:session): session closed 
for user root
  Aug 18 19:11:34 wn2-d3ncsp systemd-logind[1486]: Removed session c8.

   
  Aug 18 19:12:03 wn2-d3ncsp sudo: ehiadmin : TTY=pts/1 ; PWD=/home/ehiadmin ; 
USER=root ; COMMAND=/bin/su -
  Aug 18 19:12:03 wn2-d3ncsp sudo: pam_unix(sudo:session): session opened for 
user root by ehiadmin(uid=0)
  Aug 18 19:12:03 wn2-d3ncsp su[113085]: Successful su for root by root
  Aug 18 19:12:03 wn2-d3ncsp su[113085]: + /dev/pts/1 root:root
  Aug 18 19:12:03 wn2-d3ncsp su[113085]: pam_unix(su:session): session opened 
for user root by ehiadmin(uid=0)
  Aug 18 19:12:03 wn2-d3ncsp su[113085]: pam_systemd(su:session): Cannot create 
session: Already running in a session
  Aug 18 19:12:42 wn2-d3ncsp sshd[113274]: Did not receive identification 
string from 10.84.93.42
  Aug 18 19:13:37 wn2-d3ncsp su[113085]: pam_unix(su:session): session closed 
for user root
  Aug 18 19:13:37 wn2-d3ncsp sudo: pam_unix(sudo:session): session closed for 
user root
  Aug 18 19:13:37 wn2-d3ncsp sshd[112285]: pam_unix(sshd:session): session 
closed for user ehiadmin
  Aug 18 19:13:37 wn2-d3ncsp systemd-logind[1486]: Removed session 1291.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1788643/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to