Re: [Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-16 Thread Luis Rodriguez
It sounds like what I was getting.


On Thu, Jan 16, 2020 at 11:05 PM Colin Ian King <1799...@bugs.launchpad.net>
wrote:

> After quite a bit of experimentation I found that I can reproduce the bug
> if I have zram *and* also swap on the filesystem enabled while exercising
> the brk stressors and aiol (to cause lots of I/O). Eventually the system
> grinds to a halt, we lose interactivity and we eventually get lockups as
> follows:
> [ 2012.040006] watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
> [stress-ng-brk:1632]
> [ 2012.040922] Modules linked in: zram(E) kvm_intel(E) kvm(E) irqbypass(E)
> crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E)
> aesni_intel(E) aes_x86_64(E) crypto_simd(E) glue_helper(E) cryptd(E)
> psmouse(E) input_leds(E) floppy(E) virtio_scsi(E) serio_raw(E) i2c_piix4(E)
> mac_hid(E) pata_acpi(E) qemu_fw_cfg(E) 9pnet_virtio(E) 9p(E) 9pnet(E)
> fscache(E)
> [ 2012.044655] CPU: 2 PID: 1632 Comm: stress-ng-brk Tainted: G
> EL   4.15.18 #1
> [ 2012.045581] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1 04/01/2014
> [ 2012.046555] RIP:
> 0010:__raw_callee_save___pv_queued_spin_unlock+0x10/0x17
> [ 2012.047340] RSP: 0018:b73382083718 EFLAGS: 0246 ORIG_RAX:
> ff11
> [ 2012.048238] RAX: 0001 RBX:  RCX:
> 0002
> [ 2012.049078] RDX:  RSI: 9d327c2f6918 RDI:
> a3269978
> [ 2012.049909] RBP: b73382083720 R08: 9d327c2f6918 R09:
> 9d327c0a5328
> [ 2012.050746] R10: 9d327c1e2310 R11: 9d327c1e2328 R12:
> 9d327c2f6800
> [ 2012.051574] R13: 9d327c1e2328 R14: 9d327c1e2310 R15:
> 9d327c1e2200
> [ 2012.052436] FS:  7f89f2ccd740() GS:9d327f28()
> knlGS:
> [ 2012.053382] CS:  0010 DS:  ES:  CR0: 80050033
> [ 2012.054058] CR2: 7f1350a8dd90 CR3: 311a4004 CR4:
> 00160ee0
> [ 2012.054889] Call Trace:
> [ 2012.055192]  get_swap_pages+0x193/0x360
> [ 2012.055652]  get_swap_page+0x13f/0x1e0
> [ 2012.056123]  add_to_swap+0x14/0x70
> [ 2012.056530]  shrink_page_list+0x81d/0xbc0
> [ 2012.057013]  shrink_inactive_list+0x242/0x590
> [ 2012.057523]  shrink_node_memcg+0x364/0x770
> [ 2012.058012]  shrink_node+0xf7/0x300
> [ 2012.058432]  ? shrink_node+0xf7/0x300
> [ 2012.058863]  do_try_to_free_pages+0xc9/0x330
> [ 2012.059368]  try_to_free_pages+0xee/0x1b0
> [ 2012.059842]  __alloc_pages_slowpath+0x3fc/0xe00
> [ 2012.060424]  __alloc_pages_nodemask+0x29a/0x2c0
> [ 2012.060963]  alloc_pages_vma+0x88/0x1f0
> [ 2012.061414]  __handle_mm_fault+0x8b7/0x12e0
> [ 2012.061909]  handle_mm_fault+0xb1/0x210
> [ 2012.062375]  __do_page_fault+0x281/0x4b0
> [ 2012.062848]  do_page_fault+0x2e/0xe0
> [ 2012.063274]  ? async_page_fault+0x2f/0x50
> [ 2012.063751]  do_async_page_fault+0x51/0x80
> [ 2012.064262]  async_page_fault+0x45/0x50
> [ 2012.064719] RIP: 0033:0x55ec1997bd0a
> [ 2012.065147] RSP: 002b:7ffeacd21600 EFLAGS: 00010246
> [ 2012.065754] RAX: 55ec28601000 RBX: 0005 RCX:
> 7f89f2de956b
> [ 2012.066580] RDX: 55ec28601000 RSI: 7ffeacd216d0 RDI:
> 55ec28602000
> [ 2012.067410] RBP: 7ffeacd216c0 R08:  R09:
> 7f89f3d0c2f0
> [ 2012.068290] R10:  R11: 0246 R12:
> 
> [ 2012.069129] R13: 0002 R14: 0001 R15:
> 7ffeacd216d0
> [ 2012.069965] Code: 50 41 51 41 52 41 53 e8 3b 05 00 00 41 5b 41 5a 41 59
> 41 58 5f 5e 5a 59 5d c3 90 55 48 89 e5 52 b8 01 00 00 00 31 d2 f0 0f b0 17
> <3c> 01 75 03 5a 5d c3 56 0f b6 f0 e8 bc ff ff ff 5e 5a 5d c3 0f
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1799497
>
> Title:
>   4.15 kernel hard lockup about once a week
>
> Status in linux package in Ubuntu:
>   Incomplete
> Status in zram-config package in Ubuntu:
>   Incomplete
> Status in linux source package in Bionic:
>   Confirmed
> Status in zram-config source package in Bionic:
>   Confirmed
>
> Bug description:
>   My main server has been running into hard lockups about once a week
>   ever since I switched to the 4.15 Ubuntu 18.04 kernel.
>
>   When this happens, nothing is printed to the console, it's effectively
>   stuck showing a login prompt. The system is running with panic=1 on
>   the cmdline but isn't rebooting so the kernel isn't even processing
>   this as a kernel panic.
>
>
>   As this felt like a potential hardware issue, I had my hosting provider
> give me a completely different system, different motherboard, different
> CPU, different RAM and different storage, I installed that system on 18.04
> and moved my data over, a week later, I hit the issue again.
>
>   We've since also had a LXD user reporting similar symptoms here also on
> varying hardware:
> https://github.com/lxc/lxd/issues/5197
>
>
>   My system doesn't have a lot of memory pressure with about 50% of free
> memory:
>
>   

Re: [Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-10 Thread Luis Rodriguez
Hi.. I had to remove zram config from my production servers long ago. ...
since then I don't have the issue.  I was using LXD containers a lot on the
hosts with different kind of usage,, But I don't have any other setup at
the moment

On Fri, Jan 10, 2020 at 12:11 AM Colin Ian King <1799...@bugs.launchpad.net>
wrote:

> Can reproduce this with stress-ng exercising high memory pressure scenario
> using:
> stress-ng --brk 0 -v --aiol 0
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1799497
>
> Title:
>   4.15 kernel hard lockup about once a week
>
> Status in linux package in Ubuntu:
>   Incomplete
> Status in zram-config package in Ubuntu:
>   Incomplete
> Status in linux source package in Bionic:
>   Confirmed
> Status in zram-config source package in Bionic:
>   Confirmed
>
> Bug description:
>   My main server has been running into hard lockups about once a week
>   ever since I switched to the 4.15 Ubuntu 18.04 kernel.
>
>   When this happens, nothing is printed to the console, it's effectively
>   stuck showing a login prompt. The system is running with panic=1 on
>   the cmdline but isn't rebooting so the kernel isn't even processing
>   this as a kernel panic.
>
>
>   As this felt like a potential hardware issue, I had my hosting provider
> give me a completely different system, different motherboard, different
> CPU, different RAM and different storage, I installed that system on 18.04
> and moved my data over, a week later, I hit the issue again.
>
>   We've since also had a LXD user reporting similar symptoms here also on
> varying hardware:
> https://github.com/lxc/lxd/issues/5197
>
>
>   My system doesn't have a lot of memory pressure with about 50% of free
> memory:
>
>   root@vorash:~# free -m
> totalusedfree  shared  buff/cache
>  available
>   Mem:  31819   17574 402 513   13842
>  13292
>   Swap: 159092687   13222
>
>   I will now try to increase console logging as much as possible on the
>   system in the hopes that next time it hangs we can get a better idea
>   of what happened but I'm not too hopeful given the complete silence on
>   the console when this occurs.
>
>   System is currently on:
> Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC
> 2018 x86_64 x86_64 x86_64 GNU/Linux
>
>   But I've seen this since the GA kernel on 4.15 so it's not a recent
> regression.
>   ---
>   ProblemType: Bug
>   AlsaDevices:
>total 0
>crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
>crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
>   AplayDevices: Error: [Errno 2] No such file or directory: 'aplay':
> 'aplay'
>   ApportVersion: 2.20.9-0ubuntu7.4
>   Architecture: amd64
>   ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord':
> 'arecord'
>   AudioDevicesInUse:
>Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed
> with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
>Cannot stat file /proc/22831/fd/10: Permission denied
>   DistroRelease: Ubuntu 18.04
>   HibernationDevice:
>RESUME=none
>CRYPTSETUP=n
>   IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig':
> 'iwconfig'
>   Lsusb:
>Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
>Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual
> Keyboard and Mouse
>Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
>   MachineType: Intel Corporation S1200SP
>   NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
>   Package: linux (not installed)
>   PciMultimedia:
>
>   ProcEnviron:
>TERM=xterm
>PATH=(custom, no user)
>XDG_RUNTIME_DIR=
>LANG=en_US.UTF-8
>SHELL=/bin/bash
>   ProcFB: 0 mgadrmfb
>   ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic
> root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0
> net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8
>   ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
>   RelatedPackageVersions:
>linux-restricted-modules-4.15.0-38-generic N/A
>linux-backports-modules-4.15.0-38-generic  N/A
>linux-firmware 1.173.1
>   RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
>   Tags:  bionic
>   Uname: Linux 4.15.0-38-generic x86_64
>   UnreportableReason: This report is about a package that is not installed.
>   UpgradeStatus: No upgrade log present (probably fresh install)
>   UserGroups:
>
>   _MarkForUpload: False
>   dmi.bios.date: 01/25/2018
>   dmi.bios.vendor: Intel Corporation
>   dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
>   dmi.board.asset.tag: Base Board Asset Tag
>   dmi.board.name: S1200SP
>   dmi.board.vendor: Intel Corporation
>   dmi.board.version: H57532-271
>   dmi.chassis.asset.tag: 
>   dmi.chassis.type: 

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2018-12-18 Thread Luis Rodriguez
OK.. it is been quite a while with no locks I had it once after the zram
config pacakge was removed,, but no other locks since then.

kernel version is  4.15.0-33 to 38 in different servers.. I am going to
update the servers to latest version reboot, and wait for a little
longer.

then I am going to install back zram-config on certain servers to see if
it shows up again.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Incomplete

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2018-11-03 Thread Luis Rodriguez
Got a hot locked with no zram-config installed.. Same behaviour, no log
information, can't even type in the console, no ssh, no ping.  ALso all
the LXD containers don't ping either

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Incomplete

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..:
  dmi.product.family: Family
  dmi.product.name: S1200SP
  dmi.product.version: 
  dmi.sys.vendor: Intel Corporation

To manage notifications about this bug 

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2018-11-01 Thread Luis Rodriguez
Correct.. I ould like to give it some more time to see if it doesn't
happen.  So far so good, no lockups.  I hadnt have to restart any server
in a week and a half.

I'll try to prepare the same setup on another server with zram-config to
see if it happens again on that particular server

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Incomplete

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..:
  dmi.product.family: Family
  dmi.product.name: S1200SP
  

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2018-11-01 Thread Luis Rodriguez
In my case it hasn't happen again.. Although I removed package zram-
config from the host servers  ( I think this is the only difference in
software from 16.04 to 18.04 that I added.  I would like to either
discard or confirm that that it has an effect on the issue

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Incomplete

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..:
  dmi.product.family: Family
  dmi.product.name: S1200SP
  dmi.product.version: 

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2018-10-24 Thread Luis Rodriguez
Hello, I sumbitted the report on LXD since that is the only thing I have
installed on the server that is actively running as Stéphane mentioned
on https://github.com/lxc/lxd/issues/5197

I also thought it maybe hardware issue, but since upgrading to 18.04 in
May I have experienced this on a variety of hardware, and even though I
thought it may be upgrade issue it is also not the case.

I also thought it was memory related, since now it occurs, as Stéphane
mentiones around once a week, but in my case on different servers.  THe
last server where it happened didn't have any issue for the last maybe
two months and was not that loaded in terms of memory, but it seems more
frequent in servers that are actively used in both memory and CPU.

It doesn't happen on blade hosts that only have 2-4 LXD containers and
4GB of RAM, it has only happened on 16GB, 24GB, 48GB and 128GB of RAM HP
and Dell servers, that have a little more load (minimum 6 containers up
to 20)

At least I a not alone, but have no clue how to recreate or address this
issue (since also logs provide no information)

I could also try some kernels.  On 4.4 as Stephane mentioned didn't
happen, int only started happening on GA (as he also mentiones) of
18.04.  I have been constantly upgrading the kernel to no avail. So it
seems it could have been introduced before.

strangely and thankfully it doesn't happen on my main production server
(Except yesterday crash on one of them). Mostly on development servers
that are actively used (developers are not happy)


** Bug watch added: LXD bug tracker #5197
   https://github.com/lxc/lxd/issues/5197

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Incomplete

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose