from:"Colin Ian King"

[Kernel-packages] [Bug 1866730] Re: Need patch for post 5.5 low-latency kernels

2020-03-10 Thread Colin Ian King

Thanks for the update on this compat fix.

I've tested this on:

upstream 5.6-rc5 lowlatency + generic
upstream 5.5 lowlatency + generic
ubuntu 5.4.0-18

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1866730

Title:
  Need patch for post 5.5 low-latency kernels

Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  CONFIG_PREEMPT_RCU=y enabled post 5.4.x kernels have __rcu_read_lock
  exposed as GPL-ONLY, which breaks zfs compilation on kernels with that
  enabled. (Ubuntu low-latency kernels have that enabled IIRC.)

  
  The patch for the .8 series implementing this inside zfs to circumvent this 
issue is here: 
https://github.com/openzfs/zfs/commit/2fcab8795c7c493845bfa277d44bc443802000b8 

  This is from this comment in the relevant issue:

  https://github.com/openzfs/zfs/issues/9745#issuecomment-592617605

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1866730/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1866730] Re: Need patch for post 5.5 low-latency kernels

2020-03-10 Thread Colin Ian King

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => Medium

** Changed in: zfs-linux (Ubuntu)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1866730

Title:
  Need patch for post 5.5 low-latency kernels

Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  CONFIG_PREEMPT_RCU=y enabled post 5.4.x kernels have __rcu_read_lock
  exposed as GPL-ONLY, which breaks zfs compilation on kernels with that
  enabled. (Ubuntu low-latency kernels have that enabled IIRC.)

  
  The patch for the .8 series implementing this inside zfs to circumvent this 
issue is here: 
https://github.com/openzfs/zfs/commit/2fcab8795c7c493845bfa277d44bc443802000b8 

  This is from this comment in the relevant issue:

  https://github.com/openzfs/zfs/issues/9745#issuecomment-592617605

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1866730/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

2020-03-10 Thread Colin Ian King

Can this be re-tested to see if this now fails after I cleaned up
kernel03?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1863989

Title:
  bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

Status in Stress-ng:
  Invalid
Status in ubuntu-kernel-tests:
  Fix Committed
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on Eoan zVM node kernel03

  Test hung at bad-altstack test.

  Reproducible rate: 4 out of 4 attempts

  02:36:12 DEBUG| [stdout] aiol STARTING
  02:36:17 DEBUG| [stdout] aiol RETURNED 0
  02:36:17 DEBUG| [stdout] aiol PASSED
  02:36:17 DEBUG| [stdout] bad-altstack STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@kernel03:kernel-test-results 
/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive

  dmesg only shows:
  [  102.352136] Adding 1048572k swap on 
/home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img.
  Priority:-3 extents:95 across:26763272k SSFS
  [  122.402895] NET: Registered protocol family 38

  It looks like this is caused by OOM issue, x3270 console flushed with
  OOM error messages.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-41-generic 5.3.0-41.33
  ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18
  Uname: Linux 5.3.0-41-generic s390x
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 
2: ls: cannot access '/dev/snd/': No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.11-0ubuntu8.4
  Architecture: s390x
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 
not found.
  Date: Thu Feb 20 06:06:04 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_GB.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M 
BOOT_IMAGE=0
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-41-generic N/A
   linux-backports-modules-5.3.0-41-generic  N/A
   linux-firmware1.183.4
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1852119] Re: Please add zfs modules to linux-raspi2

2020-03-04 Thread Colin Ian King

The Ubuntu kernel team recommends to have at least 4GB of free memory to
run ZFS on slow backing store devices for nominal performance.  Since
there is OS overhead (kernel, userspace processes etc) a 4GB Raspberry
Pi will perform sub-optimally. Note that the document you referenced in
commet #1 states:

"Computers that have less than 2 GiB of memory run ZFS slowly. 4 GiB of
memory is recommended for normal performance in basic workloads. "

Once you start to add in ZFS options such as compression and/or run
scrubs on a slow device it is likely you may start to see high memory
pressure issues occurring.  Hence we do not support ZFS unless you have
at least 4GB of memory free.

** Changed in: linux-raspi2 (Ubuntu)
   Status: Confirmed => Won't Fix

** Changed in: linux-raspi2 (Ubuntu)
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-raspi2 in Ubuntu.
https://bugs.launchpad.net/bugs/1852119

Title:
  Please add zfs modules to linux-raspi2

Status in linux-raspi2 package in Ubuntu:
  Won't Fix

Bug description:
  The 4gb RPI4 is more than capable of handling zfs.
  ( Even zfs root can be enabled manually with arm64 eoan builds using a 
variant of the steps at 
https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS . )

  Currently one has to install zfs-dkms for zfs support, but ideally one
  would have zfs modules come with the standard arm64 kernel so that one
  does not have to recompile zfs on the pi.

  Example:

  uname -a
  Linux rpi4 5.3.0-1011-raspi2 #12-Ubuntu SMP Fri Nov 1 09:07:06 UTC 2019 
aarch64 aarch64 aarch64 GNU/Linux
  @rpi4:~$ zpool status
pool: bpool
   state: ONLINE
  status: Some supported features are not enabled on the pool. The pool can
  still be used, but some features are unavailable.
  action: Enable all features using 'zpool upgrade'. Once this is done,
  the pool may no longer be accessible by software that does not support
  the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 0 days 00:00:00 with 0 errors on Mon Nov 11 
13:50:14 2019
  config:

  NAME  STATE 
READ WRITE CKSUM
  bpool ONLINE  
 0 0 0
usb-Samsung_Flash_Drive_FIT_0309318110004882-0:0-part3  ONLINE  
 0 0 0

  errors: No known data errors

pool: rpool
   state: ONLINE
scan: scrub repaired 0B in 0 days 00:01:21 with 0 errors on Mon Nov 11 
13:51:39 2019
  config:

  NAMESTATE READ WRITE CKSUM
  rpool   ONLINE   0 0 0
sda4  ONLINE   0 0 0

  errors: No known data errors

  dkms status
  zfs, 0.8.1, 5.3.0-1011-raspi2, aarch64: installed

  @rpi4:~$ cat /proc/cpuinfo 
  processor   : 0
  BogoMIPS: 108.00
  Features: fp asimd evtstrm crc32 cpuid
  CPU implementer : 0x41
  CPU architecture: 8
  CPU variant : 0x0
  CPU part: 0xd08
  CPU revision: 3

  processor   : 1
  BogoMIPS: 108.00
  Features: fp asimd evtstrm crc32 cpuid
  CPU implementer : 0x41
  CPU architecture: 8
  CPU variant : 0x0
  CPU part: 0xd08
  CPU revision: 3

  processor   : 2
  BogoMIPS: 108.00
  Features: fp asimd evtstrm crc32 cpuid
  CPU implementer : 0x41
  CPU architecture: 8
  CPU variant : 0x0
  CPU part: 0xd08
  CPU revision: 3

  processor   : 3
  BogoMIPS: 108.00
  Features: fp asimd evtstrm crc32 cpuid
  CPU implementer : 0x41
  CPU architecture: 8
  CPU variant : 0x0
  CPU part: 0xd08
  CPU revision: 3

  Hardware: BCM2835
  Revision: c03111
  Serial  : ---
  Model   : Raspberry Pi 4 Model B Rev 1.1
  @rpi4:~$ free -m
totalusedfree  shared  buff/cache   
available
  Mem:   3791 8832612  18 295
2836
  Swap:  4095   04095

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-raspi2/+bug/1852119/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

2020-03-03 Thread Colin Ian King

I found this was failing on kernel03 because there was very little space
for the test to enable a large swap file. I cleaned the machine up and
was unable to reproduce the failure. I'm assuming the tests were failing
on kernel03, if not what machine were they being run on?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1863989

Title:
  bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

Status in Stress-ng:
  Invalid
Status in ubuntu-kernel-tests:
  Fix Committed
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on Eoan zVM node kernel03

  Test hung at bad-altstack test.

  Reproducible rate: 4 out of 4 attempts

  02:36:12 DEBUG| [stdout] aiol STARTING
  02:36:17 DEBUG| [stdout] aiol RETURNED 0
  02:36:17 DEBUG| [stdout] aiol PASSED
  02:36:17 DEBUG| [stdout] bad-altstack STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@kernel03:kernel-test-results 
/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive

  dmesg only shows:
  [  102.352136] Adding 1048572k swap on 
/home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img.
  Priority:-3 extents:95 across:26763272k SSFS
  [  122.402895] NET: Registered protocol family 38

  It looks like this is caused by OOM issue, x3270 console flushed with
  OOM error messages.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-41-generic 5.3.0-41.33
  ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18
  Uname: Linux 5.3.0-41-generic s390x
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 
2: ls: cannot access '/dev/snd/': No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.11-0ubuntu8.4
  Architecture: s390x
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 
not found.
  Date: Thu Feb 20 06:06:04 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_GB.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M 
BOOT_IMAGE=0
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-41-generic N/A
   linux-backports-modules-5.3.0-41-generic  N/A
   linux-firmware1.183.4
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1814983] Re: zfs poor sustained read performance from ssd pool

2020-02-28 Thread Colin Ian King

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => High

** Changed in: zfs-linux (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814983

Title:
  zfs poor sustained read performance from ssd pool

Status in zfs-linux package in Ubuntu:
  Confirmed

Bug description:
  Hello,

  I'm seeing substantially slower read performance from an ssd pool than
  I expected.

  I have two pools on this computer; one ('fst') is four sata ssds, the
  other ('srv') is nine spinning metal drives.

  With a long-running ripgrep process on the fst pool, performance
  started out really good and grew to astonishingly good (iirc ~30kiops,
  as measured by zpool iostat -v 1). However after a few hours the
  performance has dropped to 30-40 iops. top reports an arc_reclaim and
  many arc_prune processes to be consuming most of the CPU time.

  I've included a screenshot of top, some output from zpool iostat -v 1,
  and arc_summary, with "===" to indicate the start of the next
  command's output:

  ===
  top (memory in gigabytes):

  top - 16:27:53 up 70 days, 16:03,  3 users,  load average: 35.67, 35.81, 35.58
  Tasks: 809 total,  19 running, 612 sleeping,   0 stopped,   0 zombie
  %Cpu(s):  0.0 us, 58.1 sy,  0.0 ni, 39.2 id,  2.6 wa,  0.0 hi,  0.0 si,  0.0 
st
  GiB Mem :  125.805 total,0.620 free,   96.942 used,   28.243 buff/cache
  GiB Swap:5.694 total,5.688 free,0.006 used.   27.840 avail Mem 

PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND   
  
   1523 root  20   00.0m   0.0m   0.0m R 100.0  0.0 290:52.26 
arc_reclaim 
   4484 root  20   00.0m   0.0m   0.0m R  56.2  0.0   1:18.79 arc_prune 
  
   6225 root  20   00.0m   0.0m   0.0m R  56.2  0.0   1:11.92 arc_prune 
  
   7601 root  20   00.0m   0.0m   0.0m S  56.2  0.0   2:50.25 arc_prune 
  
  30891 root  20   00.0m   0.0m   0.0m S  56.2  0.0   1:33.08 arc_prune 
  
   3057 root  20   00.0m   0.0m   0.0m S  55.9  0.0   9:00.95 arc_prune 
  
   3259 root  20   00.0m   0.0m   0.0m R  55.9  0.0   3:16.84 arc_prune 
  
  24008 root  20   00.0m   0.0m   0.0m S  55.9  0.0   1:55.71 arc_prune 
  
   1285 root  20   00.0m   0.0m   0.0m R  55.6  0.0   3:20.52 arc_prune 
  
   5345 root  20   00.0m   0.0m   0.0m R  55.6  0.0   1:15.99 arc_prune 
  
  30121 root  20   00.0m   0.0m   0.0m S  55.6  0.0   1:35.50 arc_prune 
  
  31192 root  20   00.0m   0.0m   0.0m S  55.6  0.0   6:17.16 arc_prune 
  
  32287 root  20   00.0m   0.0m   0.0m S  55.6  0.0   1:28.02 arc_prune 
  
  32625 root  20   00.0m   0.0m   0.0m R  55.6  0.0   1:27.34 arc_prune 
  
  22572 root  20   00.0m   0.0m   0.0m S  55.3  0.0  10:02.92 arc_prune 
  
  31989 root  20   00.0m   0.0m   0.0m R  55.3  0.0   1:28.03 arc_prune 
  
   3353 root  20   00.0m   0.0m   0.0m R  54.9  0.0   8:58.81 arc_prune 
  
  10252 root  20   00.0m   0.0m   0.0m R  54.9  0.0   2:36.37 arc_prune 
  
   1522 root  20   00.0m   0.0m   0.0m S  53.9  0.0 158:42.45 arc_prune 
  
   3694 root  20   00.0m   0.0m   0.0m R  53.9  0.0   1:20.79 arc_prune 
  
  13394 root  20   00.0m   0.0m   0.0m R  53.9  0.0  10:35.78 arc_prune 
  
  24592 root  20   00.0m   0.0m   0.0m R  53.9  0.0   1:54.19 arc_prune 
  
  25859 root  20   00.0m   0.0m   0.0m S  53.9  0.0   1:51.71 arc_prune 
  
   8194 root  20   00.0m   0.0m   0.0m S  53.6  0.0   0:54.51 arc_prune 
  
  18472 root  20   00.0m   0.0m   0.0m R  53.6  0.0   2:08.73 arc_prune 
  
  29525 root  20   00.0m   0.0m   0.0m R  53.6  0.0   1:35.81 arc_prune 
  
  32291 root  20   00.0m   0.0m   0.0m S  53.6  0.0

[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade

2020-02-27 Thread Colin Ian King

I've uploaded a fixed package, it's now going to proceed via the normal
SRU process.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860182

Title:
  zpool scrub malfunction after kernel upgrade

Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  == SRU Request [BIONIC] ==

  The HWE kernel on bionic provides zfs 0.8.1 driver which includes an
  improved scrub however, the progress stats reported by the kernel are
  incompatible to the 0.7.x zfs driver.

  == Fix ==

  Use the new zfs 8.x pool_scan_stat_t extra fields to calculate
  the scan progress when using zfs 8.x kernel drivers. Add detection of the 
kernel module version and use an approximation to the zfs 0.8.0 progress and 
rate reporting for newer kernels.

  For 0.7.5 we can pass the larger 8.x port_scan_stat_t to 0.7.5
  zfs w/o problems and ignore these new fields and continue
  to use the 0.7.5 rate calculations. 

  == Test ==

  Install the HWE kernel on Bionic, create some large ZFS pools and
  populate with a lot of data.  Issue:

  sudo zpool scrub poolname
  and then look at the progress using

  sudo zpool status

  Without the fix, the progress stats are incorrect. With the fix the
  duration and rate stats as a fairly good approximation of the
  progress. Since the newer 0.8.x zfs does scanning now in two phases
  the older zfs tools will only report accurate stats for phase #2 of
  the scan to keep it roughly compatible with the 0.7.x zfs utils
  output.

  == Regression Potential ==

  This is a userspace reporting fix so the zpool status output is only
  affected by this fix when doing a scrub, so the impact of this fix is
  very small and limited.

  

  I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE
  kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly:

  eric@eric-8700K:~$ zpool status
    pool: storagepool1
   state: ONLINE
    scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020
  config:

   NAME  STATE READ WRITE CKSUM
   storagepool1  ONLINE   0 0 0
     mirror-0ONLINE   0 0 0
   ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
   ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
     mirror-1ONLINE   0 0 0
   ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
   ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0

  I ran zpool scrub after upgrading the kernel and rebooting, and now it
  fails to work properly. It appeared to finish in about 5 minutes but
  did not, and says it is going slow:

  eric@eric-8700K:~$ sudo zpool status
    pool: storagepool1
   state: ONLINE
    scan: scrub in progress since Fri Jan 17 15:32:07 2020
   1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time)
   0B repaired, 100.00% done
  config:

   NAME  STATE READ WRITE CKSUM
   storagepool1  ONLINE   0 0 0
     mirror-0ONLINE   0 0 0
   ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
   ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
     mirror-1ONLINE   0 0 0
   ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
   ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0

  errors: No known data errors

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: zfsutils-linux 0.7.5-1ubuntu16.7
  ProcVersionSignature: Ubuntu 5.3.0-26.28~18.04.1-generic 5.3.13
  Uname: Linux 5.3.0-26-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.9-0ubuntu7.9
  Architecture: amd64
  CurrentDesktop: ubuntu:GNOME
  Date: Fri Jan 17 16:22:01 2020
  InstallationDate: Installed on 2018-03-07 (681 days ago)
  InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
  SourcePackage: zfs-linux
  UpgradeStatus: Upgraded to bionic on 2018-08-02 (533 days ago)
  modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission 
denied: '/etc/sudoers.d/zfs']

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1860182/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade

2020-02-27 Thread Colin Ian King

** Description changed:

+ == SRU Request [BIONIC] ==
+ 
+ The HWE kernel on bionic provides zfs 0.8.1 driver which includes an
+ improved scrub however, the progress stats reported by the kernel are
+ incompatible to the 0.7.x zfs driver.
+ 
+ == Fix ==
+ 
+ Use the new zfs 8.x pool_scan_stat_t extra fields to calculate
+ the scan progress when using zfs 8.x kernel drivers. Add detection of the 
kernel module version and use an approximation to the zfs 0.8.0 progress and 
rate reporting for newer kernels.
+ 
+ For 0.7.5 we can pass the larger 8.x port_scan_stat_t to 0.7.5
+ zfs w/o problems and ignore these new fields and continue
+ to use the 0.7.5 rate calculations. 
+ 
+ == Test ==
+ 
+ Install the HWE kernel on Bionic, create some large ZFS pools and
+ populate with a lot of data.  Issue:
+ 
+ sudo zpool scrub poolname
+ and then look at the progress using
+ 
+ sudo zpool status
+ 
+ Without the fix, the progress stats are incorrect. With the fix the
+ duration and rate stats as a fairly good approximation of the progress.
+ Since the newer 0.8.x zfs does scanning now in two phases the older zfs
+ tools will only report accurate stats for phase #2 of the scan to keep
+ it roughly compatible with the 0.7.x zfs utils output.
+ 
+ == Regression Potential ==
+ 
+ This is a userspace reporting fix so the zpool status output is only
+ affected by this fix when doing a scrub, so the impact of this fix is
+ very small and limited.
+ 
+ 
+ 
  I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE kernel
  (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly:
  
  eric@eric-8700K:~$ zpool status
-   pool: storagepool1
-  state: ONLINE
-   scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020
+   pool: storagepool1
+  state: ONLINE
+   scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020
  config:
  
-   NAME  STATE READ WRITE CKSUM
-   storagepool1  ONLINE   0 0 0
- mirror-0ONLINE   0 0 0
-   ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
-   ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
- mirror-1ONLINE   0 0 0
-   ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
-   ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0
- 
+  NAME  STATE READ WRITE CKSUM
+  storagepool1  ONLINE   0 0 0
+    mirror-0ONLINE   0 0 0
+  ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
+  ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
+    mirror-1ONLINE   0 0 0
+  ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
+  ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0
  
  I ran zpool scrub after upgrading the kernel and rebooting, and now it
  fails to work properly. It appeared to finish in about 5 minutes but did
  not, and says it is going slow:
  
- 
  eric@eric-8700K:~$ sudo zpool status
-   pool: storagepool1
-  state: ONLINE
-   scan: scrub in progress since Fri Jan 17 15:32:07 2020
-   1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time)
-   0B repaired, 100.00% done
+   pool: storagepool1
+  state: ONLINE
+   scan: scrub in progress since Fri Jan 17 15:32:07 2020
+  1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time)
+  0B repaired, 100.00% done
  config:
  
-   NAME  STATE READ WRITE CKSUM
-   storagepool1  ONLINE   0 0 0
- mirror-0ONLINE   0 0 0
-   ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
-   ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
- mirror-1ONLINE   0 0 0
-   ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
-   ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0
+  NAME  STATE READ WRITE CKSUM
+  storagepool1  ONLINE   0 0 0
+    mirror-0ONLINE   0 0 0
+  ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
+  ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
+    mirror-1ONLINE   0 0 0
+  ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE

[Kernel-packages] [Bug 1858495] Re: multiple long delays during kernel and userspace boot

2020-02-26 Thread Colin Ian King

Hi Ryan,

We would like to reproduce this bug to debug it further. Can you answer
the questions below relating to your initial comments in the bug:

"Booting some Bionic instances in Azure (gen1 machines).."
Q: What is a gen1 machine?  What instance type is this?

"..I see some large delays during kernel/userspace boot that it would be
good to understand what's going on. Additionally, there areas during
boot that see delays is different for an image that's been created from
a template vs. stock images."

Q: I don't know what these are. Can you explain how these are created?
Do you have any exact examples of a template and stock image?

Thanks, Colin

** Changed in: linux-signed-azure (Ubuntu)
   Status: In Progress => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1858495

Title:
  multiple long delays during kernel and userspace boot

Status in linux-signed-azure package in Ubuntu:
  Incomplete

Bug description:
  Booting some Bionic instances in Azure (gen1 machines), I see some
  large delays during kernel/userspace boot that it would be good to
  understand what's going on.  Additionally, there areas during boot
  that see delays is different for an image that's been created from a
  template vs. stock images.

  I'm attaching some data, 10 runs of the same image in a scaling set
  that run the initial boot.  Processing the journal output, looking at
  delays of over 2.0 shows some concern.

  
  [1.788581] localhost.localdomain kernel: * Found PM-Timer Bug on the 
chipset. Due to workarounds for a bug,
   * this clock source is slow. 
Consider trying other clock sources
  [3.545974] localhost.localdomain kernel: Unstable clock detected, 
switching default tracing clock to "global"
   If you want to keep using the 
local clock, then add:
 "trace_clock=local"   
   on the kernel command line  
  [6.401684] localhost.localdomain kernel: EXT4-fs (sda1): mounted 
filesystem with ordered data mode. Opts: (null)
  [   15.280390] localhost.localdomain kernel: EXT4-fs (sda1): re-mounted. 
Opts: discard

  
  After capturing bionic image as a template, and creating a new VM, we see new 
hot spots we didn't see before.

  
  # HotSpot maximum delta between kernel messages: 2.0
  # [2.846188] localhost.localdomain kernel: AES CTR mode by8 optimization 
enabled
  # [5.919313] localhost.localdomain kernel: raid6: avx2x4   gen() 21512 
MB/s
  #
  # [6.591530] localhost.localdomain kernel: EXT4-fs (sda1): mounted 
filesystem with ordered data mode. Opts: (null)
  # [9.031051] localhost.localdomain systemd[1]: systemd 237 running in 
system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP 
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD 
-IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
  #
  # [   13.773554] localhost.localdomain sh[871]: + exit 0
  # [   21.625467] localhost.localdomain kernel: UDF-fs: INFO Mounting volume 
'UDF Volume', timestamp 2019/12/17 00:00 (1000)
  #
  # [   24.919359] bugbif2be01 systemd-timesyncd[771]: Synchronized to time 
server 91.189.89.198:123 (ntp.ubuntu.com).
  # [   29.787339] bugbif2be01 cloud-init[1026]: Cloud-init v. 
19.2-36-g059d049c-0ubuntu2~18.04.1 running 'init' at Mon, 16 Dec 2019 18:14:47 
+. Up 25.20 seconds.

  The easiest comparison kernel-side is the systemd-analyze value:

  Grepping in the debug data:

  
  % grep "Startup finished.*kernel" bug-bionic-baseline-no*.debug/*/journal.log 
| cut -d" " -f 7-
  Startup finished in 3.209s (kernel) + 49.305s (userspace) = 52.515s.
  Startup finished in 3.355s (kernel) + 51.732s (userspace) = 55.088s.
  Startup finished in 3.287s (kernel) + 51.747s (userspace) = 55.035s.
  Startup finished in 3.129s (kernel) + 50.066s (userspace) = 53.195s.
  Startup finished in 3.350s (kernel) + 50.682s (userspace) = 54.032s.
  Startup finished in 3.355s (kernel) + 49.322s (userspace) = 52.678s.
  Startup finished in 3.219s (kernel) + 51.124s (userspace) = 54.343s.
  Startup finished in 3.128s (kernel) + 49.226s (userspace) = 52.354s.
  Startup finished in 3.193s (kernel) + 53.197s (userspace) = 56.390s.
  Startup finished in 3.118s (kernel) + 46.203s (userspace) = 49.322s.

  foofoo % grep "Startup finished.*kernel" 
bug-bionic-baseline-after*.debug/*/journal.log | cut -d" " -f 7-
  Startup finished in 7.685s (kernel) + 32.463s (userspace) = 40.148s.
  Startup finished in 7.041s (kernel) + 35.998s (userspace) = 43.040s.
  Startup finished in 7.808s (kernel) + 35.444s (userspace) = 43.253s.
  Startup finished in 7.206s (kernel) + 37.952s (userspace) = 45.159s.
  Startup finished in 8.426s (kernel) + 36.976s (userspace) = 45.403s.

[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

2020-02-25 Thread Colin Ian King

This is probably fixed with commit: https://kernel.ubuntu.com/git/ubuntu
/autotest-client-
tests.git/commit/?id=4db07fef60449c786364638d7978b239676624eb

I've run this a few times with the fix above and can't reproduce this
issue.

** Changed in: ubuntu-kernel-tests
   Status: New => Fix Committed

** Changed in: stress-ng
   Status: New => Invalid

** Changed in: ubuntu-kernel-tests
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1863989

Title:
  bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

Status in Stress-ng:
  Invalid
Status in ubuntu-kernel-tests:
  Fix Committed
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on Eoan zVM node kernel03

  Test hung at bad-altstack test.

  Reproducible rate: 4 out of 4 attempts

  02:36:12 DEBUG| [stdout] aiol STARTING
  02:36:17 DEBUG| [stdout] aiol RETURNED 0
  02:36:17 DEBUG| [stdout] aiol PASSED
  02:36:17 DEBUG| [stdout] bad-altstack STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@kernel03:kernel-test-results 
/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive

  dmesg only shows:
  [  102.352136] Adding 1048572k swap on 
/home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img.
  Priority:-3 extents:95 across:26763272k SSFS
  [  122.402895] NET: Registered protocol family 38

  It looks like this is caused by OOM issue, x3270 console flushed with
  OOM error messages.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-41-generic 5.3.0-41.33
  ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18
  Uname: Linux 5.3.0-41-generic s390x
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 
2: ls: cannot access '/dev/snd/': No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.11-0ubuntu8.4
  Architecture: s390x
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 
not found.
  Date: Thu Feb 20 06:06:04 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_GB.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M 
BOOT_IMAGE=0
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-41-generic N/A
   linux-backports-modules-5.3.0-41-generic  N/A
   linux-firmware1.183.4
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

2020-02-25 Thread Colin Ian King

@Sam. Can you re-run the test and if it's OK then I no longer require
the instance kernel03.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1863989

Title:
  bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

Status in Stress-ng:
  Invalid
Status in ubuntu-kernel-tests:
  Fix Committed
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on Eoan zVM node kernel03

  Test hung at bad-altstack test.

  Reproducible rate: 4 out of 4 attempts

  02:36:12 DEBUG| [stdout] aiol STARTING
  02:36:17 DEBUG| [stdout] aiol RETURNED 0
  02:36:17 DEBUG| [stdout] aiol PASSED
  02:36:17 DEBUG| [stdout] bad-altstack STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@kernel03:kernel-test-results 
/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive

  dmesg only shows:
  [  102.352136] Adding 1048572k swap on 
/home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img.
  Priority:-3 extents:95 across:26763272k SSFS
  [  122.402895] NET: Registered protocol family 38

  It looks like this is caused by OOM issue, x3270 console flushed with
  OOM error messages.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-41-generic 5.3.0-41.33
  ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18
  Uname: Linux 5.3.0-41-generic s390x
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 
2: ls: cannot access '/dev/snd/': No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.11-0ubuntu8.4
  Architecture: s390x
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 
not found.
  Date: Thu Feb 20 06:06:04 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_GB.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M 
BOOT_IMAGE=0
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-41-generic N/A
   linux-backports-modules-5.3.0-41-generic  N/A
   linux-firmware1.183.4
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

2020-02-25 Thread Colin Ian King

** Changed in: stress-ng
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: stress-ng
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1863989

Title:
  bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM

Status in Stress-ng:
  New
Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on Eoan zVM node kernel03

  Test hung at bad-altstack test.

  Reproducible rate: 4 out of 4 attempts

  02:36:12 DEBUG| [stdout] aiol STARTING
  02:36:17 DEBUG| [stdout] aiol RETURNED 0
  02:36:17 DEBUG| [stdout] aiol PASSED
  02:36:17 DEBUG| [stdout] bad-altstack STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@kernel03:kernel-test-results 
/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive

  dmesg only shows:
  [  102.352136] Adding 1048572k swap on 
/home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img.
  Priority:-3 extents:95 across:26763272k SSFS
  [  122.402895] NET: Registered protocol family 38

  It looks like this is caused by OOM issue, x3270 console flushed with
  OOM error messages.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-41-generic 5.3.0-41.33
  ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18
  Uname: Linux 5.3.0-41-generic s390x
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 
2: ls: cannot access '/dev/snd/': No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.11-0ubuntu8.4
  Architecture: s390x
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 
not found.
  Date: Thu Feb 20 06:06:04 2020
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lspci:
   
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_GB.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M 
BOOT_IMAGE=0
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-41-generic N/A
   linux-backports-modules-5.3.0-41-generic  N/A
   linux-firmware1.183.4
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B

2020-02-25 Thread Colin Ian King

I confirm in my testing I get a hard kernel lockup with no log output.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1864063

Title:
  vm-segv from ubuntu_stress_smoke_test failed on B

Status in Stress-ng:
  Incomplete
Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed

Bug description:
  Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1
  Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe 
generic kernel

  Test hang with vm-segv:
  05:58:36 DEBUG| [stdout] vm-addr PASSED
  05:58:36 DEBUG| [stdout] vm-rw STARTING
  05:58:41 DEBUG| [stdout] vm-rw RETURNED 0
  05:58:41 DEBUG| [stdout] vm-rw PASSED
  05:58:41 DEBUG| [stdout] vm-segv STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@onibi:kernel-test-results 
/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

2020-02-24 Thread Colin Ian King

Please ignore the above. Apparently the issue needs a little more
digging and the workaround is insufficient.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1861235

Title:
  zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

Status in Linux:
  Unknown
Status in zfs-linux package in Ubuntu:
  Incomplete
Status in zfs-linux source package in Bionic:
  New

Bug description:
  Same as bug 1861228 but with a newer kernel installed.

  [  790.702566] VERIFY(size != 0) failed
  [  790.702590] PANIC at range_tree.c:304:range_tree_find_impl()
  [  790.702611] Showing stack for process 28685
  [  790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P   O 
4.15.0-76-generic #86-Ubuntu
  [  790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 
2.0 12/17/2015
  [  790.702616] Call Trace:
  [  790.702626]  dump_stack+0x6d/0x8e
  [  790.702637]  spl_dumpstack+0x42/0x50 [spl]
  [  790.702640]  spl_panic+0xc8/0x110 [spl]
  [  790.702645]  ? __switch_to_asm+0x41/0x70
  [  790.702714]  ? arc_prune_task+0x1a/0x40 [zfs]
  [  790.702740]  ? dbuf_dirty+0x43d/0x850 [zfs]
  [  790.702745]  ? getrawmonotonic64+0x43/0xd0
  [  790.702746]  ? getrawmonotonic64+0x43/0xd0
  [  790.702775]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702778]  ? getrawmonotonic64+0x43/0xd0
  [  790.702805]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702807]  ? mutex_lock+0x12/0x40
  [  790.702833]  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
  [  790.702866]  range_tree_find_impl+0x88/0x90 [zfs]
  [  790.702870]  ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
  [  790.702902]  range_tree_clear+0x4f/0x60 [zfs]
  [  790.702930]  dnode_free_range+0x11f/0x5a0 [zfs]
  [  790.702957]  dmu_object_free+0x53/0x90 [zfs]
  [  790.702983]  dmu_free_long_object+0x9f/0xc0 [zfs]
  [  790.703010]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
  [  790.703036]  receive_writer_thread+0x6d2/0xa60 [zfs]
  [  790.703040]  ? set_curr_task_fair+0x2b/0x60
  [  790.703043]  ? spl_kmem_free+0x33/0x40 [spl]
  [  790.703048]  ? kfree+0x165/0x180
  [  790.703073]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
  [  790.703078]  thread_generic_wrapper+0x74/0x90 [spl]
  [  790.703081]  kthread+0x121/0x140
  [  790.703084]  ? __thread_exit+0x20/0x20 [spl]
  [  790.703085]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  790.703088]  ret_from_fork+0x35/0x40
  [  967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds.
  [  967.636979]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637076] txg_quiesce D0 14810  2 0x8000
  [  967.637080] Call Trace:
  [  967.637089]  __schedule+0x24e/0x880
  [  967.637092]  schedule+0x2c/0x80
  [  967.637106]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637114]  ? wait_woken+0x80/0x80
  [  967.637122]  __cv_wait+0x15/0x20 [spl]
  [  967.637210]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
  [  967.637278]  ? txg_delay+0x1b0/0x1b0 [zfs]
  [  967.637286]  thread_generic_wrapper+0x74/0x90 [spl]
  [  967.637291]  kthread+0x121/0x140
  [  967.637297]  ? __thread_exit+0x20/0x20 [spl]
  [  967.637299]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  967.637304]  ret_from_fork+0x35/0x40
  [  967.637326] INFO: task zfs:28590 blocked for more than 120 seconds.
  [  967.637371]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637467] zfs D0 28590  28587 0x8080
  [  967.637470] Call Trace:
  [  967.637474]  __schedule+0x24e/0x880
  [  967.637477]  schedule+0x2c/0x80
  [  967.637486]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637491]  ? wait_woken+0x80/0x80
  [  967.637498]  __cv_wait+0x15/0x20 [spl]
  [  967.637554]  dmu_recv_stream+0xa51/0xef0 [zfs]
  [  967.637630]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
  [  967.637679]  ? dbuf_read+0x34a/0x920 [zfs]
  [  967.637725]  ? dbuf_rele+0x36/0x40 [zfs]
  [  967.637728]  ? _cond_resched+0x19/0x40
  [  967.637798]  zfs_ioc_recv_new+0x33d/0x410 [zfs]
  [  967.637809]  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
  [  967.637816]  ? spl_vmem_alloc+0x19/0x20 [spl]
  [  967.637828]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
  [  967.637834]  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
  [  967.637840]  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
  [  967.637905]  zfsdev_ioctl+0x451/0x610 [zfs]
  [  967.637913]  do_vfs_ioctl+0xa8/0x630
  [  967.637917]  ? __audit_syscall_entry+0xbc/0x110
  [  967.637924]  ? syscall_trace_enter+0x1da/0x2d0
  [  967.637927]  SyS_ioctl+0x79/0x90
  [  967.637930]  do_syscall_64+0x73/0x130
  [  967.637935]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
  [  967.637938] RIP: 0033:0x7fc305a905d7
  [  967.637940] RSP: 002b:7ffc45e39618 EFLAGS: 0246 ORIG_RAX: 
0010
  [  967.637943] RAX: ffda

[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

2020-02-24 Thread Colin Ian King

I've uploaded a potential fix to a PPA, do you mind testing this using
the zfs-dkms kernel modules as follows:

sudo add-apt-repository ppa:colin-king/zfs-sru-1861235
sudo apt-get update
sudo apt-get install zfs-dkms

and reboot.

Then check the correct ZFS module is being used by:

dmesg | grep ZFS

It should be the 0.7.5-1ubuntu16.9~lp1861235 version.

And see if this helps avoid this issue.


** Changed in: linux (Ubuntu)
   Status: Confirmed => Incomplete

** Also affects: zfs-linux (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: zfs-linux (Ubuntu)
   Status: New => Incomplete

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => High

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: zfs-linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** No longer affects: linux (Ubuntu)

** No longer affects: linux (Ubuntu Bionic)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1861235

Title:
  zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

Status in Linux:
  Unknown
Status in zfs-linux package in Ubuntu:
  Incomplete
Status in zfs-linux source package in Bionic:
  New

Bug description:
  Same as bug 1861228 but with a newer kernel installed.

  [  790.702566] VERIFY(size != 0) failed
  [  790.702590] PANIC at range_tree.c:304:range_tree_find_impl()
  [  790.702611] Showing stack for process 28685
  [  790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P   O 
4.15.0-76-generic #86-Ubuntu
  [  790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 
2.0 12/17/2015
  [  790.702616] Call Trace:
  [  790.702626]  dump_stack+0x6d/0x8e
  [  790.702637]  spl_dumpstack+0x42/0x50 [spl]
  [  790.702640]  spl_panic+0xc8/0x110 [spl]
  [  790.702645]  ? __switch_to_asm+0x41/0x70
  [  790.702714]  ? arc_prune_task+0x1a/0x40 [zfs]
  [  790.702740]  ? dbuf_dirty+0x43d/0x850 [zfs]
  [  790.702745]  ? getrawmonotonic64+0x43/0xd0
  [  790.702746]  ? getrawmonotonic64+0x43/0xd0
  [  790.702775]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702778]  ? getrawmonotonic64+0x43/0xd0
  [  790.702805]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702807]  ? mutex_lock+0x12/0x40
  [  790.702833]  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
  [  790.702866]  range_tree_find_impl+0x88/0x90 [zfs]
  [  790.702870]  ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
  [  790.702902]  range_tree_clear+0x4f/0x60 [zfs]
  [  790.702930]  dnode_free_range+0x11f/0x5a0 [zfs]
  [  790.702957]  dmu_object_free+0x53/0x90 [zfs]
  [  790.702983]  dmu_free_long_object+0x9f/0xc0 [zfs]
  [  790.703010]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
  [  790.703036]  receive_writer_thread+0x6d2/0xa60 [zfs]
  [  790.703040]  ? set_curr_task_fair+0x2b/0x60
  [  790.703043]  ? spl_kmem_free+0x33/0x40 [spl]
  [  790.703048]  ? kfree+0x165/0x180
  [  790.703073]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
  [  790.703078]  thread_generic_wrapper+0x74/0x90 [spl]
  [  790.703081]  kthread+0x121/0x140
  [  790.703084]  ? __thread_exit+0x20/0x20 [spl]
  [  790.703085]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  790.703088]  ret_from_fork+0x35/0x40
  [  967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds.
  [  967.636979]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637076] txg_quiesce D0 14810  2 0x8000
  [  967.637080] Call Trace:
  [  967.637089]  __schedule+0x24e/0x880
  [  967.637092]  schedule+0x2c/0x80
  [  967.637106]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637114]  ? wait_woken+0x80/0x80
  [  967.637122]  __cv_wait+0x15/0x20 [spl]
  [  967.637210]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
  [  967.637278]  ? txg_delay+0x1b0/0x1b0 [zfs]
  [  967.637286]  thread_generic_wrapper+0x74/0x90 [spl]
  [  967.637291]  kthread+0x121/0x140
  [  967.637297]  ? __thread_exit+0x20/0x20 [spl]
  [  967.637299]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  967.637304]  ret_from_fork+0x35/0x40
  [  967.637326] INFO: task zfs:28590 blocked for more than 120 seconds.
  [  967.637371]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637467] zfs D0 28590  28587 0x8080
  [  967.637470] Call Trace:
  [  967.637474]  __schedule+0x24e/0x880
  [  967.637477]  schedule+0x2c/0x80
  [  967.637486]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637491]  ? wait_woken+0x80/0x80
  [  967.637498]  __cv_wait+0x15/0x20 [spl]
  [  967.637554]  dmu_recv_stream+0xa51/0xef0 [zfs]
  [  967.637630]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
  [  967

[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B

2020-02-24 Thread Colin Ian King

** Changed in: stress-ng
 Assignee: Colin Ian King (colin-king) => Kleber Sacilotto de Souza 
(kleber-souza)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1864063

Title:
  vm-segv from ubuntu_stress_smoke_test failed on B

Status in Stress-ng:
  Incomplete
Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1
  Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe 
generic kernel

  Test hang with vm-segv:
  05:58:36 DEBUG| [stdout] vm-addr PASSED
  05:58:36 DEBUG| [stdout] vm-rw STARTING
  05:58:41 DEBUG| [stdout] vm-rw RETURNED 0
  05:58:41 DEBUG| [stdout] vm-rw PASSED
  05:58:41 DEBUG| [stdout] vm-segv STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@onibi:kernel-test-results 
/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B

2020-02-24 Thread Colin Ian King

To reproduce (on an 8 CPU VM):

sudo apt-get update && sudo apt-get dist-upgrade
sudo apt-get build-dep stress-ng
git clone git://kernel.ubuntu.com/cking/stress-ng
cd stress-ng
make
sudo ./stress-ng --vm-segv 0 -t 10 -v

Comment out a ptrace line and rebuild and re-run and the hang does not
occur. So it's ptrace releated.

diff --git a/stress-vm-segv.c b/stress-vm-segv.c
index 39e4cbeb..54d590cd 100644
--- a/stress-vm-segv.c
+++ b/stress-vm-segv.c
@@ -129,7 +129,7 @@ kill_child:
stress_process_dumpable(false);
 
 #if defined(HAVE_PTRACE)
-   (void)ptrace(PTRACE_TRACEME);
+   //(void)ptrace(PTRACE_TRACEME);
kill(getpid(), SIGSTOP);
 #endif
(void)sigemptyset();

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1864063

Title:
  vm-segv from ubuntu_stress_smoke_test failed on B

Status in Stress-ng:
  Incomplete
Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1
  Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe 
generic kernel

  Test hang with vm-segv:
  05:58:36 DEBUG| [stdout] vm-addr PASSED
  05:58:36 DEBUG| [stdout] vm-rw STARTING
  05:58:41 DEBUG| [stdout] vm-rw RETURNED 0
  05:58:41 DEBUG| [stdout] vm-rw PASSED
  05:58:41 DEBUG| [stdout] vm-segv STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@onibi:kernel-test-results 
/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B

2020-02-24 Thread Colin Ian King

I spoke too soon. I was able to trip this 4.15.0-89.89 but not
4.15.0-88.  So this looks like a regression.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1864063

Title:
  vm-segv from ubuntu_stress_smoke_test failed on B

Status in Stress-ng:
  Incomplete
Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1
  Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe 
generic kernel

  Test hang with vm-segv:
  05:58:36 DEBUG| [stdout] vm-addr PASSED
  05:58:36 DEBUG| [stdout] vm-rw STARTING
  05:58:41 DEBUG| [stdout] vm-rw RETURNED 0
  05:58:41 DEBUG| [stdout] vm-rw PASSED
  05:58:41 DEBUG| [stdout] vm-segv STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@onibi:kernel-test-results 
/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B

2020-02-24 Thread Colin Ian King

I can't reproduce this on the systems I'm using. Can I get access to
onibi to try and reproduce this issue?

** Changed in: stress-ng
   Status: In Progress => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1864063

Title:
  vm-segv from ubuntu_stress_smoke_test failed on B

Status in Stress-ng:
  Incomplete
Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1
  Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe 
generic kernel

  Test hang with vm-segv:
  05:58:36 DEBUG| [stdout] vm-addr PASSED
  05:58:36 DEBUG| [stdout] vm-rw STARTING
  05:58:41 DEBUG| [stdout] vm-rw RETURNED 0
  05:58:41 DEBUG| [stdout] vm-rw PASSED
  05:58:41 DEBUG| [stdout] vm-segv STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@onibi:kernel-test-results 
/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B

2020-02-24 Thread Colin Ian King

Do you have any info on the number of CPUs, memory and swap size of
onibi? I can then see if I can reproduce the issue. Or better, access to
onibi would be most helpful to see if I can repro this issue.

** Changed in: stress-ng
   Status: New => In Progress

** Changed in: stress-ng
   Importance: Undecided => Medium

** Changed in: stress-ng
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1864063

Title:
  vm-segv from ubuntu_stress_smoke_test failed on B

Status in Stress-ng:
  In Progress
Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1
  Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe 
generic kernel

  Test hang with vm-segv:
  05:58:36 DEBUG| [stdout] vm-addr PASSED
  05:58:36 DEBUG| [stdout] vm-rw STARTING
  05:58:41 DEBUG| [stdout] vm-rw RETURNED 0
  05:58:41 DEBUG| [stdout] vm-rw PASSED
  05:58:41 DEBUG| [stdout] vm-segv STARTING
  + 
ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive
  + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o 
LogLevel=quiet -r ubuntu@onibi:kernel-test-results 
/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive

To manage notifications about this bug go to:
https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

2020-02-24 Thread Colin Ian King

What is interesting is the following commit modifies range_tree_clear()
so it performs a zero size check and returns before calling
range_tree_find_impl(). This commit is not in 18.10 and 19.04 Ubuntu ZFS
releases.

commit a1d477c
Author: Matthew Ahrens mahr...@delphix.com
Date: Thu Sep 22 09:30:13 2016 -0700

OpenZFS 7614, 9064 - zfs device evacuation/removal

OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete

the specific change is:

@@ -560,6 +536,9 @@ range_tree_clear(range_tree_t *rt, uint64_t start, uint64_t 
size)
 {
range_seg_t *rs;

+   if (size == 0)
+   return;
+
while ((rs = range_tree_find_impl(rt, start, size)) != NULL) {
uint64_t free_start = MAX(rs->rs_start, start);
uint64_t free_end = MIN(rs->rs_end, start + size);

I'm not sure why this check was added, but I guess it handles the cases
were zero sized allocations are allowed and stops these from doing any
unnecessary clearing and avoids the assertion. But the semantics change
is not clear in the commit message.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1861235

Title:
  zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Same as bug 1861228 but with a newer kernel installed.

  [  790.702566] VERIFY(size != 0) failed
  [  790.702590] PANIC at range_tree.c:304:range_tree_find_impl()
  [  790.702611] Showing stack for process 28685
  [  790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P   O 
4.15.0-76-generic #86-Ubuntu
  [  790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 
2.0 12/17/2015
  [  790.702616] Call Trace:
  [  790.702626]  dump_stack+0x6d/0x8e
  [  790.702637]  spl_dumpstack+0x42/0x50 [spl]
  [  790.702640]  spl_panic+0xc8/0x110 [spl]
  [  790.702645]  ? __switch_to_asm+0x41/0x70
  [  790.702714]  ? arc_prune_task+0x1a/0x40 [zfs]
  [  790.702740]  ? dbuf_dirty+0x43d/0x850 [zfs]
  [  790.702745]  ? getrawmonotonic64+0x43/0xd0
  [  790.702746]  ? getrawmonotonic64+0x43/0xd0
  [  790.702775]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702778]  ? getrawmonotonic64+0x43/0xd0
  [  790.702805]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702807]  ? mutex_lock+0x12/0x40
  [  790.702833]  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
  [  790.702866]  range_tree_find_impl+0x88/0x90 [zfs]
  [  790.702870]  ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
  [  790.702902]  range_tree_clear+0x4f/0x60 [zfs]
  [  790.702930]  dnode_free_range+0x11f/0x5a0 [zfs]
  [  790.702957]  dmu_object_free+0x53/0x90 [zfs]
  [  790.702983]  dmu_free_long_object+0x9f/0xc0 [zfs]
  [  790.703010]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
  [  790.703036]  receive_writer_thread+0x6d2/0xa60 [zfs]
  [  790.703040]  ? set_curr_task_fair+0x2b/0x60
  [  790.703043]  ? spl_kmem_free+0x33/0x40 [spl]
  [  790.703048]  ? kfree+0x165/0x180
  [  790.703073]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
  [  790.703078]  thread_generic_wrapper+0x74/0x90 [spl]
  [  790.703081]  kthread+0x121/0x140
  [  790.703084]  ? __thread_exit+0x20/0x20 [spl]
  [  790.703085]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  790.703088]  ret_from_fork+0x35/0x40
  [  967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds.
  [  967.636979]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637076] txg_quiesce D0 14810  2 0x8000
  [  967.637080] Call Trace:
  [  967.637089]  __schedule+0x24e/0x880
  [  967.637092]  schedule+0x2c/0x80
  [  967.637106]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637114]  ? wait_woken+0x80/0x80
  [  967.637122]  __cv_wait+0x15/0x20 [spl]
  [  967.637210]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
  [  967.637278]  ? txg_delay+0x1b0/0x1b0 [zfs]
  [  967.637286]  thread_generic_wrapper+0x74/0x90 [spl]
  [  967.637291]  kthread+0x121/0x140
  [  967.637297]  ? __thread_exit+0x20/0x20 [spl]
  [  967.637299]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  967.637304]  ret_from_fork+0x35/0x40
  [  967.637326] INFO: task zfs:28590 blocked for more than 120 seconds.
  [  967.637371]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637467] zfs D0 28590  28587 0x8080
  [  967.637470] Call Trace:
  [  967.637474]  __schedule+0x24e/0x880
  [  967.637477]  schedule+0x2c/0x80
  [  967.637486]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637491]  ? wait_woken+0x80/0x80
  [  967.637498]  __cv_wait+0x15/0x20 [spl]
  [  967.637554]  dmu_recv_stream+0xa51/0xef0 [zfs]
  [  967.637630]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
  [  967.637679]  ?

[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support

2020-02-24 Thread Colin Ian King

** Changed in: zfs-linux (Ubuntu)
   Status: Fix Committed => Fix Released

** Changed in: spl-linux (Ubuntu)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856704

Title:
  backport 5.3 zfs support to bionic for HWE kernel support

Status in spl-linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in spl-linux source package in Bionic:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Released

Bug description:
  == SRU Justification Bionic ==

  The HWE 5.3 kernel requires ZFS + SPL to support dkms module build
  functionality for kernels 4.15 through to 5.3.  Basically, the ZFS+SPL
  compat commits between 4.15 and 5.3 are required to allow the modules
  to build on kernels upto and include the HWE 5.3 kernel.

  == The Fix ==

  Backport of upstream commits:

  SPL:
  - 0002-fix-spl-build-shrinker-callback-check.patch
  - 0003-remove-deprecated-set-fs-pwd-check.patch
  - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch
  - 0005-Linux-4.20-compat-current_kernel_time.patch
  - 0006-Linux-4.18-compat-Use-ktime_get_coarse_real_ts64.patch
  - 0007-Linux-5.0-compat-Use-totalram_pages.patch
  - 0008-Linux-5.0-compat-Fix-SUBDIRs.patch
  - 0009-Linux-4.20-compat-Fix-VERIFY-RW_READ_HELD-hash-mh_co.patch
  - 0010-Linux-5.1-compat-get_ds-removed.patch
  - 0011-Linux-5.0-compat-Use-totalhigh_pages.patch
  - 0012-Linux-5.2-compat-rw_tryupgrade.patch
  - 0013-Linux-5.3-compat-rw_semaphore-owner.patch
  - 0014-Linux-5.3-compat-retire-rw_tryupgrade.patch
  - 0015-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch
  - 0016-Linux-compat-4.16-SECTOR_SIZE.patch
  - 0017-Linux-compat-spl-timespec_sub.patch
  - 0018-deprecate-splat-rwlock-test6.patch

  ZFS:
  - 3300-Linux-4.16-compat-inode_set_iversion.patch
  - 3301-Linux-4.16-compat-use-correct-_dec_and_test.patch
  - 3302-Linux-4.16-compat-get_disk_and_module.patch
  - 3303-Linux-compat-4.16-blk_queue_flag_-set-clear.patch
  - 3304-Linux-4.18-compat-inode-timespec-timespec64.patch
  - 3305-Linux-4.14-compat-blk_queue_stackable.patch
  - 3306-Linux-4.19-rc3-compat-Remove-refcount_t-compat.patch
  - 3307-Linux-5.0-compat-access_ok-drops-type-parameter.patch
  - 3308-Linux-5.0-compat-Use-totalram_pages.patch
  - 3309-Linux-5.0-compat-Convert-MS_-macros-to-SB_.patch
  - 3310-Linux-5.0-compat-Fix-SUBDIRs.patch
  - 3311-Linux-5.0-compat-Disable-vector-instructions-on-5.0-.patch
  - 3312-Linux-5.0-compat-Fix-bio_set_dev.patch
  - 3313-Linux-5.0-compat-Remove-incorrect-ASSERT.patch
  - 3314-Linux-5.0-compat-Use-totalhigh_pages.patch
  - 3315-Linux-5.0-compat-ASM_BUG-macro.patch
  - 3316-Linux-5.2-compat-rw_tryupgrade.patch
  - 3317-Linux-5.2-compat-Directly-call-wait_on_page_bit.patch
  - 3318-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch
  - 3319-Linux-5.3-Fix-switch-fall-though-compiler-errors.patch
  - 3320-zpios-deprecate-current-kernel-time.patch
  - 3321-add-compat-check-disk-size-change.patch

  == Testcase ==

  Without these commits users who install kernels and kernel headers
  from 4.16 through to 5.3 inclusive won't be able to build spl + zfs in
  Bionic because of the lack of the kernel compat fixes.  With the
  commits, zfs + spl dkms modules can build cleanly and pass the ubuntu
  ZFS regression tests found in the kernel team autotests git
  repository.

  == Risk ==

  This is a sizeable backport that touches a fair amount of spl + zfs
  kernel interfacing code. There is a risk that the backport may cause a
  regression in functionality that has not been exercised by the ZFS
  regression tests. This backport with the zfs regression testing
  ensures that no regression in core zfs functionality has been found.
  It must be noted that most of the patches are upstream compat fixes
  that are known to be working with the latest ZFS that is being used in
  focal, so we are confident the original compat changes work.

  Note that these updates have all been build tested on x86-64, arm64
  and s390x systems with kernels from 4.16 to 5.3 and regression tested
  with the ubuntu zfs regression tests.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1863136] Re: zfs-dkms will not compile on kernel 5.6rc1

2020-02-24 Thread Colin Ian King

Unfortunately we require some more 5.6 compat fixes as 5.6-rc3 fails to
build with the current 2 compat upstream fixes:

make[3]: Entering directory '/usr/src/linux-headers-5.6.0-050600rc3-generic'
  CC [M]  /var/lib/dkms/zfs/0.8.3/build/module/avl/avl.o
  CC [M]  /var/lib/dkms/zfs/0.8.3/build/module/icp/illumos-crypto.o
  LD [M]  /var/lib/dkms/zfs/0.8.3/build/module/avl/zavl.o
  CC [M]  /var/lib/dkms/zfs/0.8.3/build/module/lua/lapi.o
In file included from 
/var/lib/dkms/zfs/0.8.3/build/include/spl/sys/condvar.h:33,
 from 
/var/lib/dkms/zfs/0.8.3/build/include/sys/zfs_context.h:38,
 from 
/var/lib/dkms/zfs/0.8.3/build/include/sys/crypto/common.h:39,
 from 
/var/lib/dkms/zfs/0.8.3/build/module/icp/illumos-crypto.c:35:
/var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h:88:15: error: unknown type 
name ‘time_t’
   88 | static inline time_t
  |   ^~
/var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h: In function ‘gethrtime’:
/var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h:108:18: error: storage 
size of ‘ts’ isn’t known
  108 |  struct timespec ts;
  |  ^~
/var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h:109:2: error: implicit 
declaration of function ‘getrawmonotonic’ 
[-Werror=implicit-function-declaration]
  109 |  getrawmonotonic();
  |  ^~~
/var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h:108:18: warning: unused 
variable ‘ts’ [-Wunused-variable]
  108 |  struct timespec ts;
  |  ^~
cc1: some warnings being treated as errors

I'll revisit this once the appropriate 5.6 compat fixes have landed.


** Changed in: zfs-linux (Ubuntu)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1863136

Title:
  zfs-dkms will not compile on kernel 5.6rc1

Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  Bug mentioned here: https://github.com/zfsonlinux/zfs/issues/10001

  Patch for zfs master branch here:
  https://github.com/zfsonlinux/zfs/pull/9961

  Patch modified for zfs-dkms_0.8.3-1ubuntu3 here:
  https://paste.ubuntu.com/p/wsS9GFHjyv/

  This is working for me in getting a 5.5.x kernel to access zpools on
  arm64 and 5.5.2 mainline & 5.6rc1 mainline kernels to access zpools on
  amd64.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1863136/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-02-21 Thread Colin Ian King

Soak tested the -proposed kernel for 2 hours with no hang occurring.
Verified OK.

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in zram-config package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Committed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  == SRU Justification ==

  When using zram (as installed and configured with the zram-config package)
  systems can lockup after about a week of use.  This occurs because of
  a hang in a lock in zram.

  == Test Case ==

  Run stress-ng --brk 0 --stack 0 in a Bionic amd64 server VM with 1GM of
  memory, 16 CPU threads and zram-config installed.  Without the fix the
  kernel will hang in a spinlock after 1-2 hours of run time. With the fix,
  the hang does not occur.  Testing shows that with the fix, 5 x 16 CPU hours
  of stress testing with stress-ng works fine without the lockup occurring.

  == The fix ==

  Upstream commit c4d6c4cc7bfd ("zram: correct flag name of ZRAM_ACCESS") as
  a prerequisite followed by a minor context wiggle backport of the fix with
  commit 3c9959e02547 ("zram: fix lockdep warning of free block handling").

  == Regression Potential ==

  This touches the zram locking, so the core zram driver is affected. However
  the fixes are backports from 5.0, so the fixes have had a fair amount of
  testing in later kernels.


  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  As this felt like a potential hardware issue, I had my hosting
  provider give me a completely different system, different motherboard,
  different CPU, different RAM and different storage, I installed that
  system on 18.04 and moved my data over, a week later, I hit the issue
  again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
    https://github.com/lxc/lxd/issues/5197

  My system doesn't have a lot of memory pressure with about 50% of free
  memory:

  root@vorash:~# free -m
    totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
    Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:

  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1

[Kernel-packages] [Bug 1863136] Re: zfs-dkms will not compile on kernel 5.6rc1

2020-02-13 Thread Colin Ian King

Thanks for confirming that 5.6-rc1 won't yet build with the zfd-dkms. I
will roll in all the necessary compat fixes required for 5.5 and 5.6 to
build once we get to a later release candidate of the kernel to avoid
the extra uploading and regression testing that we run before making a
release.

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => Medium

** Changed in: zfs-linux (Ubuntu)
   Status: New => In Progress

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu)
   Status: In Progress => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1863136

Title:
  zfs-dkms will not compile on kernel 5.6rc1

Status in zfs-linux package in Ubuntu:
  Confirmed

Bug description:
  Bug mentioned here: https://github.com/zfsonlinux/zfs/issues/10001

  Patch for zfs master branch here:
  https://github.com/zfsonlinux/zfs/pull/9961

  Patch modified for zfs-dkms_0.8.3-1ubuntu3 here:
  https://paste.ubuntu.com/p/wsS9GFHjyv/

  This is working for me in getting a 5.5.x kernel to access zpools on
  arm64 and 5.5.2 mainline & 5.6rc1 mainline kernels to access zpools on
  amd64.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1863136/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1832384] Re: Unable to unmount apparently unused filesystem

2020-02-11 Thread Colin Ian King

** Changed in: linux (Ubuntu)
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832384

Title:
  Unable to unmount apparently unused filesystem

Status in linux package in Ubuntu:
  Fix Released

Bug description:
  We periodically see an issue where unmounting a ZFS filesystem fails
  with EBUSY, even though there appears to be no one using it.

  # cat /proc/self/mounts | grep 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive zfs 
rw,nosuid,nodev,noexec,relatime,xattr,noacl 0 0

  'lsof' and 'fuser' show no processes using any of the files in the
  problematic filesystem:

  # ls -l 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/
  total 221
  -rw-r- 1 500 500  52736 May 22 11:01 1_19_1008904362.dbf
  -rw-r- 1 500 500 541696 May 22 11:03 1_20_1008904362.dbf
  # fuser 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_20_1008904362.dbf
  # fuser 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_19_1008904362.dbf
  # fuser 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/
  # lsof | grep 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  #

  The filesystem was shared over NFS, but has since been unshared:

  # showmount -e | grep 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  #

  Since no one appears to be using the filesystem, our expectation is
  that it should be possible to unmount the filesystem. However,
  attempts to unmount the filesystem fail with EBUSY:

  # zfs destroy 
domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  umount: 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target 
is busy.
  cannot unmount 
'/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive': 
umount failed
  # umount 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  umount: 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target 
is busy.

  
  Using bpftrace, we can see that the unmount is failing in 
'propagate_mount_busy()' in the kernel. Using a live kernel debugger, we can 
look at the 'mount' struct for this particular mount and see that the 
'mnt_count' refcount summed across all CPUs is 2. For filesystems that are 
eligible for unmounting, the refcount is 1.

  The only way to work around this issue that we have found is to
  reboot, at which point the filesystem can be unmounted and destroyed.

  
  So far, we have only been able to reproduce this using a workload driven by 
our application. The application mananges ZFS filesystems in groups, and the 
lifecycle of each group looks something like

  - Create and mount a group of filesystems, 1 parent and 4 children:
  /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370
  
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/datafile
  
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/external
  
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/temp
  - Share all 5 filesystems over NFS
  - A client mounts all 5 shares using NFSv3
  - For a few hours, the client does NFS operations on the filesystems and 
the server occasionally takes ZFS snapshots of them
  - Unshare filesystems
  - Unmount filesystems
  - Delete filesystems

  These groups of filesystems are constantly being created and
  destroyed. At any given time, we have ~30k filesystems on the system,
  about 5k of which are shared. On average, one out of ~200-300k
  unmounts fails with this EBUSY error. To create and destroy this many
  filesystems takes us about a week or so.

  Note that we are using ZFS built from https://github.com/delphix/zfs,
  which is essentially master ZFS on Linux.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-50-generic 4.15.0-50.54
  ProcVersionSignature: Ubuntu 4.15.0-50.54-generic 4.15.18
  Uname: Linux 4.15.0-50-generic x86_64
  NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 May 20 19:10 seq
   crw-rw 1 root audio 116, 33 May 20 19:10 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v',

[Kernel-packages] [Bug 1832384] Re: Unable to unmount apparently unused filesystem

2020-02-11 Thread Colin Ian King

@John, I was wondering what to do about this bug report. Is it still an
issue or shall I close it?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832384

Title:
  Unable to unmount apparently unused filesystem

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  We periodically see an issue where unmounting a ZFS filesystem fails
  with EBUSY, even though there appears to be no one using it.

  # cat /proc/self/mounts | grep 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive zfs 
rw,nosuid,nodev,noexec,relatime,xattr,noacl 0 0

  'lsof' and 'fuser' show no processes using any of the files in the
  problematic filesystem:

  # ls -l 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/
  total 221
  -rw-r- 1 500 500  52736 May 22 11:01 1_19_1008904362.dbf
  -rw-r- 1 500 500 541696 May 22 11:03 1_20_1008904362.dbf
  # fuser 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_20_1008904362.dbf
  # fuser 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_19_1008904362.dbf
  # fuser 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/
  # lsof | grep 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  #

  The filesystem was shared over NFS, but has since been unshared:

  # showmount -e | grep 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  #

  Since no one appears to be using the filesystem, our expectation is
  that it should be possible to unmount the filesystem. However,
  attempts to unmount the filesystem fail with EBUSY:

  # zfs destroy 
domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  umount: 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target 
is busy.
  cannot unmount 
'/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive': 
umount failed
  # umount 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  umount: 
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target 
is busy.

  
  Using bpftrace, we can see that the unmount is failing in 
'propagate_mount_busy()' in the kernel. Using a live kernel debugger, we can 
look at the 'mount' struct for this particular mount and see that the 
'mnt_count' refcount summed across all CPUs is 2. For filesystems that are 
eligible for unmounting, the refcount is 1.

  The only way to work around this issue that we have found is to
  reboot, at which point the filesystem can be unmounted and destroyed.

  
  So far, we have only been able to reproduce this using a workload driven by 
our application. The application mananges ZFS filesystems in groups, and the 
lifecycle of each group looks something like

  - Create and mount a group of filesystems, 1 parent and 4 children:
  /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370
  
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/datafile
  
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/external
  
/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive
  /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/temp
  - Share all 5 filesystems over NFS
  - A client mounts all 5 shares using NFSv3
  - For a few hours, the client does NFS operations on the filesystems and 
the server occasionally takes ZFS snapshots of them
  - Unshare filesystems
  - Unmount filesystems
  - Delete filesystems

  These groups of filesystems are constantly being created and
  destroyed. At any given time, we have ~30k filesystems on the system,
  about 5k of which are shared. On average, one out of ~200-300k
  unmounts fails with this EBUSY error. To create and destroy this many
  filesystems takes us about a week or so.

  Note that we are using ZFS built from https://github.com/delphix/zfs,
  which is essentially master ZFS on Linux.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-50-generic 4.15.0-50.54
  ProcVersionSignature: Ubuntu 4.15.0-50.54-generic 4.15.18
  Uname: Linux 4.15.0-50-generic x86_64
  NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 May 20 19:10 seq
   crw-rw 1 root audio 116, 33 May 20 19:10 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2020-02-11 Thread Colin Ian King

AppleTalk is disabled on focal s390x 5.4.0-12 kernels so this bug cannot
be tripped. Marking this as fixed released even though it's not a direct
fix, it does stop the issue.

** Changed in: linux (Ubuntu)
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Fix Released

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:27 DEBUG| [stdout] [ 3728.214335]

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2020-02-11 Thread Colin Ian King

Just to say, I did retry the reproducer test and also re-ran the adt
tests to double check that this no longer fails on 5.4.0-12.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Fix Released

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:37 DEBUG| [stdout] [ 3738.474354]

[Kernel-packages] [Bug 1822118] Re: Kernel Panic while rebooting cloud instance

2020-02-11 Thread Colin Ian King

@Finom, that's a good observation, much appreciated.

** Changed in: systemd (Ubuntu)
   Importance: Undecided => High

** Changed in: systemd (Ubuntu)
 Assignee: (unassigned) => Dimitri John Ledkov (xnox)

** Description changed:

- Description:   In the event a particular Azure cloud instance is
- rebooted it's possible that it may never recover and the instance will
- break indefinitely.
+ Very occasionally systemd panics on reboots of an azure instance. A
+ workaround to this issue is described in comment #20
+ 
+ 
+ 
+ 
+ 
+ Description:   In the event a particular Azure cloud instance is rebooted 
it's possible that it may never recover and the instance will break 
indefinitely.
  
  In My case, it was a kernel panic. See specifics below..
- 
  
  Series: Disco
  Instance Size: Basic_A3
  Region: (Default) US-WEST-2
  Kernel Version: 4.18.0-1013-azure #13-Ubuntu SMP Thu Feb 28 22:54:16 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux
  
- 
- I had a simple script to reboot an instance (X) amount of times, I chose 50, 
so the machine would power cycle by issuing a "reboot" from the terminal prompt 
just as a user would.   Once the machine came up, it captured dmesg and other 
bits then rebooted again until it reached 50. 
+ I had a simple script to reboot an instance (X) amount of times, I chose
+ 50, so the machine would power cycle by issuing a "reboot" from the
+ terminal prompt just as a user would.   Once the machine came up, it
+ captured dmesg and other bits then rebooted again until it reached 50.
  
  After the 4th attempt, my script timed out, I took a look at the
  instance console log and the following displayed on the console.
- 
  
  [  OK  ] Reached target Reboot.
  /shutdown: error while loading shared libra[   89.498980] Kernel panic - not 
syncing: Attempted to kill init! exitcode=0x7f00
  [   89.498980]
  [   89.500042] CPU: 0 PID: 1 Comm: shutdown Not tainted 4.18.0-1013-azure 
#13-Ubuntu
  [   89.508026] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS 090007  06/02/2017
  [   89.508026] Call Trace:
  [   89.508026]  dump_stack+0x63/0x8a
  [   89.508026]  panic+0xe7/0x247
  [   89.508026]  do_exit.cold.23+0x26/0x75
  [   89.508026]  do_group_exit+0x43/0xb0
  [   89.508026]  __x64_sys_exit_group+0x18/0x20
  [   89.508026]  do_syscall_64+0x5a/0x110
  [   89.508026]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [   89.508026] RIP: 0033:0x7f7bf0154d86
  [   89.508026] Code: Bad RIP value.
  [   89.508026] RSP: 002b:7ffd6be693b8 EFLAGS: 0206 ORIG_RAX: 
00e7
  [   89.508026] RAX: ffda RBX: 7f7bf015e420 RCX: 
7f7bf0154d86
  [   89.508026] RDX: 007f RSI: 003c RDI: 
007f
  [   89.508026] RBP: 7f7bef9449c0 R08: 00e7 R09: 

  [   89.508026] R10: 7ffd6be6974c R11: 0206 R12: 
0018
  [   89.508026] R13: 7f7bef944ac8 R14: 7f7bef944a00 R15: 

  [   89.508026] Kernel Offset: 0x1600 from 0x8100 (relocation 
range: 0x8000-0xbfff)
  [   89.508026] ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x7f00
  [   89.508026]  ]---
  
- 
  this only occurred once in my testing.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1822118

Title:
  Kernel Panic while rebooting cloud instance

Status in linux-azure package in Ubuntu:
  Incomplete
Status in systemd package in Ubuntu:
  Confirmed

Bug description:
  Very occasionally systemd panics on reboots of an azure instance. A
  workaround to this issue is described in comment #20

  
  

  
  Description:   In the event a particular Azure cloud instance is rebooted 
it's possible that it may never recover and the instance will break 
indefinitely.

  In My case, it was a kernel panic. See specifics below..

  Series: Disco
  Instance Size: Basic_A3
  Region: (Default) US-WEST-2
  Kernel Version: 4.18.0-1013-azure #13-Ubuntu SMP Thu Feb 28 22:54:16 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  I had a simple script to reboot an instance (X) amount of times, I
  chose 50, so the machine would power cycle by issuing a "reboot" from
  the terminal prompt just as a user would.   Once the machine came up,
  it captured dmesg and other bits then rebooted again until it reached
  50.

  After the 4th attempt, my script timed out, I took a look at the
  instance console log and the following displayed on the console.

  [  OK  ] Reached target Reboot.
  /shutdown: error while loading shared libra[   89.498980] Kernel panic - not 
syncing: Attempted to kill init! exitcode=0x7f00
  [   89.498980]
  [   89.500042] CPU: 0 PID: 1 Comm: shutdown Not tainted 4.18.0-1013-azure 
#13-Ubuntu
  [   89.508026] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS 090007

[Kernel-packages] [Bug 1855100] Re: bpf self tests break 5.4.0-7-generic on power8 system

2020-02-11 Thread Colin Ian King

..and on a power9 box too.  Marking as fix committed for  5.4.0-12

** Changed in: linux (Ubuntu)
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855100

Title:
  bpf self tests break 5.4.0-7-generic on power8 system

Status in linux package in Ubuntu:
  Fix Released

Bug description:
  Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of
  the bare metal system.

  Last output seen while ssh'd into the box:

  11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls
  11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # running socket test
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # [PASS]
  11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests
  11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # running psock_fanout test
  11:52:34 DEBUG| [stdout] # 
  client_loop: send disconnect: Broken pipe

  last output in (truncated) nohup output:

  f -emit-llvm -c progs/pyperf180.c -o - || \
  11:52:15 DEBUG| [stdout]echo "clang failed") | \
  11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe  \
  11:52:15 DEBUG| [stdout]-filetype=obj -o 
/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o

  this suggests the bpf selftests are causing the breakage.

  last output logged in /var/log/dmesg.log :

  Dec  4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to 
MEM_GOING_OFFLINE

  next entries on dmesg.log show machine had rebooted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855100/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1855100] Re: bpf self tests break 5.4.0-7-generic on power8 system

2020-02-10 Thread Colin Ian King

I've re-run this on a power8 VM with 5.4.0-12 and cannot trigger this
failure.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855100

Title:
  bpf self tests break 5.4.0-7-generic on power8 system

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of
  the bare metal system.

  Last output seen while ssh'd into the box:

  11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls
  11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # running socket test
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # [PASS]
  11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests
  11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # running psock_fanout test
  11:52:34 DEBUG| [stdout] # 
  client_loop: send disconnect: Broken pipe

  last output in (truncated) nohup output:

  f -emit-llvm -c progs/pyperf180.c -o - || \
  11:52:15 DEBUG| [stdout]echo "clang failed") | \
  11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe  \
  11:52:15 DEBUG| [stdout]-filetype=obj -o 
/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o

  this suggests the bpf selftests are causing the breakage.

  last output logged in /var/log/dmesg.log :

  Dec  4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to 
MEM_GOING_OFFLINE

  next entries on dmesg.log show machine had rebooted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855100/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1861228] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

2020-02-10 Thread Colin Ian King

*** This bug is a duplicate of bug 1861235 ***
https://bugs.launchpad.net/bugs/1861235

** Bug watch added: Github Issue Tracker for ZFS #8637
   https://github.com/zfsonlinux/zfs/issues/8637

** Also affects: linux via
   https://github.com/zfsonlinux/zfs/issues/8637
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1861228

Title:
  zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hello, I believe these errors happened due to a zfs recv command that
  was executing at the time:

  [10823702.582392] VERIFY(size != 0) failed
  [10823702.582428] PANIC at range_tree.c:304:range_tree_find_impl()
  [10823702.582463] Showing stack for process 693172
  [10823702.582466] CPU: 7 PID: 693172 Comm: receive_writer Tainted: P  
 O 4.15.0-60-generic #67-Ubuntu
  [10823702.582466] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 
2.0 12/17/2015
  [10823702.582467] Call Trace:
  [10823702.582475]  dump_stack+0x63/0x8b
  [10823702.582489]  spl_dumpstack+0x42/0x50 [spl]
  [10823702.582494]  spl_panic+0xc8/0x110 [spl]
  [10823702.582539]  ? dbuf_dirty+0x43d/0x850 [zfs]
  [10823702.582542]  ? getrawmonotonic64+0x43/0xd0
  [10823702.582544]  ? getrawmonotonic64+0x43/0xd0
  [10823702.582581]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [10823702.582583]  ? getrawmonotonic64+0x43/0xd0
  [10823702.582619]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [10823702.582621]  ? mutex_lock+0x12/0x40
  [10823702.582654]  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
  [10823702.582697]  range_tree_find_impl+0x88/0x90 [zfs]
  [10823702.582702]  ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
  [10823702.582743]  range_tree_clear+0x4f/0x60 [zfs]
  [10823702.582780]  dnode_free_range+0x11f/0x5a0 [zfs]
  [10823702.582815]  dmu_object_free+0x53/0x90 [zfs]
  [10823702.582850]  dmu_free_long_object+0x9f/0xc0 [zfs]
  [10823702.582885]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
  [10823702.582918]  receive_writer_thread+0x6d2/0xa60 [zfs]
  [10823702.582920]  ? set_curr_task_fair+0x2b/0x60
  [10823702.582925]  ? spl_kmem_free+0x33/0x40 [spl]
  [10823702.582928]  ? kfree+0x165/0x180
  [10823702.582961]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
  [10823702.582967]  thread_generic_wrapper+0x74/0x90 [spl]
  [10823702.582969]  kthread+0x121/0x140
  [10823702.582974]  ? __thread_exit+0x20/0x20 [spl]
  [10823702.582975]  ? kthread_create_worker_on_cpu+0x70/0x70
  [10823702.582978]  ret_from_fork+0x35/0x40
  [10823907.445420] INFO: task txg_quiesce:4485 blocked for more than 120 
seconds.
  [10823907.445486]   Tainted: P   O 4.15.0-60-generic 
#67-Ubuntu
  [10823907.445535] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [10823907.445589] txg_quiesce D0  4485  2 0x8000
  [10823907.445594] Call Trace:
  [10823907.445608]  __schedule+0x24e/0x880
  [10823907.445613]  schedule+0x2c/0x80
  [10823907.445629]  cv_wait_common+0x11e/0x140 [spl]
  [10823907.445638]  ? wait_woken+0x80/0x80
  [10823907.445647]  __cv_wait+0x15/0x20 [spl]
  [10823907.445766]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
  [10823907.445835]  ? txg_delay+0x1b0/0x1b0 [zfs]
  [10823907.445843]  thread_generic_wrapper+0x74/0x90 [spl]
  [10823907.445848]  kthread+0x121/0x140
  [10823907.445854]  ? __thread_exit+0x20/0x20 [spl]
  [10823907.445857]  ? kthread_create_worker_on_cpu+0x70/0x70
  [10823907.445861]  ret_from_fork+0x35/0x40
  [10823907.445916] INFO: task zfs:688217 blocked for more than 120 seconds.
  [10823907.445962]   Tainted: P   O 4.15.0-60-generic 
#67-Ubuntu
  [10823907.446010] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [10823907.446063] zfs D0 688217 688214 0x8080
  [10823907.446066] Call Trace:
  [10823907.446071]  __schedule+0x24e/0x880
  [10823907.446075]  schedule+0x2c/0x80
  [10823907.446084]  cv_wait_common+0x11e/0x140 [spl]
  [10823907.446088]  ? wait_woken+0x80/0x80
  [10823907.446095]  __cv_wait+0x15/0x20 [spl]
  [10823907.446151]  dmu_recv_stream+0xa51/0xef0 [zfs]
  [10823907.446227]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
  [10823907.446232]  ? ttwu_do_activate+0x77/0x80
  [10823907.446303]  zfs_ioc_recv_new+0x33d/0x410 [zfs]
  [10823907.446312]  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
  [10823907.446320]  ? spl_vmem_alloc+0x19/0x20 [spl]
  [10823907.446332]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
  [10823907.446338]  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
  [10823907.446344]  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
  [10823907.446409]  zfsdev_ioctl+0x451/0x610 [zfs]
  [10823907.446415]  do_vfs_ioctl+0xa8/0x630
  [10823907.446419]  ? __audit_syscall_entry+0xbc/0x110
  [10823907.446424]  ? syscall_trace_enter+0x1da/0x2d0
  [10823907.446426]  SyS_ioctl+0x79/0x90
  [10823907.446430]

[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

2020-02-10 Thread Colin Ian King

Can you describe the zfs environment and the command that was being
actioned that triggered this issue?

** Bug watch added: Github Issue Tracker for ZFS #8637
   https://github.com/zfsonlinux/zfs/issues/8637

** Also affects: linux via
   https://github.com/zfsonlinux/zfs/issues/8637
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1861235

Title:
  zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Same as bug 1861228 but with a newer kernel installed.

  [  790.702566] VERIFY(size != 0) failed
  [  790.702590] PANIC at range_tree.c:304:range_tree_find_impl()
  [  790.702611] Showing stack for process 28685
  [  790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P   O 
4.15.0-76-generic #86-Ubuntu
  [  790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 
2.0 12/17/2015
  [  790.702616] Call Trace:
  [  790.702626]  dump_stack+0x6d/0x8e
  [  790.702637]  spl_dumpstack+0x42/0x50 [spl]
  [  790.702640]  spl_panic+0xc8/0x110 [spl]
  [  790.702645]  ? __switch_to_asm+0x41/0x70
  [  790.702714]  ? arc_prune_task+0x1a/0x40 [zfs]
  [  790.702740]  ? dbuf_dirty+0x43d/0x850 [zfs]
  [  790.702745]  ? getrawmonotonic64+0x43/0xd0
  [  790.702746]  ? getrawmonotonic64+0x43/0xd0
  [  790.702775]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702778]  ? getrawmonotonic64+0x43/0xd0
  [  790.702805]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702807]  ? mutex_lock+0x12/0x40
  [  790.702833]  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
  [  790.702866]  range_tree_find_impl+0x88/0x90 [zfs]
  [  790.702870]  ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
  [  790.702902]  range_tree_clear+0x4f/0x60 [zfs]
  [  790.702930]  dnode_free_range+0x11f/0x5a0 [zfs]
  [  790.702957]  dmu_object_free+0x53/0x90 [zfs]
  [  790.702983]  dmu_free_long_object+0x9f/0xc0 [zfs]
  [  790.703010]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
  [  790.703036]  receive_writer_thread+0x6d2/0xa60 [zfs]
  [  790.703040]  ? set_curr_task_fair+0x2b/0x60
  [  790.703043]  ? spl_kmem_free+0x33/0x40 [spl]
  [  790.703048]  ? kfree+0x165/0x180
  [  790.703073]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
  [  790.703078]  thread_generic_wrapper+0x74/0x90 [spl]
  [  790.703081]  kthread+0x121/0x140
  [  790.703084]  ? __thread_exit+0x20/0x20 [spl]
  [  790.703085]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  790.703088]  ret_from_fork+0x35/0x40
  [  967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds.
  [  967.636979]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637076] txg_quiesce D0 14810  2 0x8000
  [  967.637080] Call Trace:
  [  967.637089]  __schedule+0x24e/0x880
  [  967.637092]  schedule+0x2c/0x80
  [  967.637106]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637114]  ? wait_woken+0x80/0x80
  [  967.637122]  __cv_wait+0x15/0x20 [spl]
  [  967.637210]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
  [  967.637278]  ? txg_delay+0x1b0/0x1b0 [zfs]
  [  967.637286]  thread_generic_wrapper+0x74/0x90 [spl]
  [  967.637291]  kthread+0x121/0x140
  [  967.637297]  ? __thread_exit+0x20/0x20 [spl]
  [  967.637299]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  967.637304]  ret_from_fork+0x35/0x40
  [  967.637326] INFO: task zfs:28590 blocked for more than 120 seconds.
  [  967.637371]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637467] zfs D0 28590  28587 0x8080
  [  967.637470] Call Trace:
  [  967.637474]  __schedule+0x24e/0x880
  [  967.637477]  schedule+0x2c/0x80
  [  967.637486]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637491]  ? wait_woken+0x80/0x80
  [  967.637498]  __cv_wait+0x15/0x20 [spl]
  [  967.637554]  dmu_recv_stream+0xa51/0xef0 [zfs]
  [  967.637630]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
  [  967.637679]  ? dbuf_read+0x34a/0x920 [zfs]
  [  967.637725]  ? dbuf_rele+0x36/0x40 [zfs]
  [  967.637728]  ? _cond_resched+0x19/0x40
  [  967.637798]  zfs_ioc_recv_new+0x33d/0x410 [zfs]
  [  967.637809]  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
  [  967.637816]  ? spl_vmem_alloc+0x19/0x20 [spl]
  [  967.637828]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
  [  967.637834]  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
  [  967.637840]  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
  [  967.637905]  zfsdev_ioctl+0x451/0x610 [zfs]
  [  967.637913]  do_vfs_ioctl+0xa8/0x630
  [  967.637917]  ? __audit_syscall_entry+0xbc/0x110
  [  967.637924]  ? syscall_trace_enter+0x1da/0x2d0
  [  967.637927]  SyS_ioctl+0x79/0x90
  [  967.637930]  do_syscall_64+0x73/0x130
  [  967.637935]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
  [

[Kernel-packages] [Bug 1858495] Re: multiple long delays during kernel and userspace boot

2020-02-08 Thread Colin Ian King

** Changed in: linux-signed-azure (Ubuntu)
   Importance: Undecided => Medium

** Changed in: linux-signed-azure (Ubuntu)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1858495

Title:
  multiple long delays during kernel and userspace boot

Status in linux-signed-azure package in Ubuntu:
  In Progress

Bug description:
  Booting some Bionic instances in Azure (gen1 machines), I see some
  large delays during kernel/userspace boot that it would be good to
  understand what's going on.  Additionally, there areas during boot
  that see delays is different for an image that's been created from a
  template vs. stock images.

  I'm attaching some data, 10 runs of the same image in a scaling set
  that run the initial boot.  Processing the journal output, looking at
  delays of over 2.0 shows some concern.

  
  [1.788581] localhost.localdomain kernel: * Found PM-Timer Bug on the 
chipset. Due to workarounds for a bug,
   * this clock source is slow. 
Consider trying other clock sources
  [3.545974] localhost.localdomain kernel: Unstable clock detected, 
switching default tracing clock to "global"
   If you want to keep using the 
local clock, then add:
 "trace_clock=local"   
   on the kernel command line  
  [6.401684] localhost.localdomain kernel: EXT4-fs (sda1): mounted 
filesystem with ordered data mode. Opts: (null)
  [   15.280390] localhost.localdomain kernel: EXT4-fs (sda1): re-mounted. 
Opts: discard

  
  After capturing bionic image as a template, and creating a new VM, we see new 
hot spots we didn't see before.

  
  # HotSpot maximum delta between kernel messages: 2.0
  # [2.846188] localhost.localdomain kernel: AES CTR mode by8 optimization 
enabled
  # [5.919313] localhost.localdomain kernel: raid6: avx2x4   gen() 21512 
MB/s
  #
  # [6.591530] localhost.localdomain kernel: EXT4-fs (sda1): mounted 
filesystem with ordered data mode. Opts: (null)
  # [9.031051] localhost.localdomain systemd[1]: systemd 237 running in 
system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP 
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD 
-IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
  #
  # [   13.773554] localhost.localdomain sh[871]: + exit 0
  # [   21.625467] localhost.localdomain kernel: UDF-fs: INFO Mounting volume 
'UDF Volume', timestamp 2019/12/17 00:00 (1000)
  #
  # [   24.919359] bugbif2be01 systemd-timesyncd[771]: Synchronized to time 
server 91.189.89.198:123 (ntp.ubuntu.com).
  # [   29.787339] bugbif2be01 cloud-init[1026]: Cloud-init v. 
19.2-36-g059d049c-0ubuntu2~18.04.1 running 'init' at Mon, 16 Dec 2019 18:14:47 
+. Up 25.20 seconds.

  The easiest comparison kernel-side is the systemd-analyze value:

  Grepping in the debug data:

  
  % grep "Startup finished.*kernel" bug-bionic-baseline-no*.debug/*/journal.log 
| cut -d" " -f 7-
  Startup finished in 3.209s (kernel) + 49.305s (userspace) = 52.515s.
  Startup finished in 3.355s (kernel) + 51.732s (userspace) = 55.088s.
  Startup finished in 3.287s (kernel) + 51.747s (userspace) = 55.035s.
  Startup finished in 3.129s (kernel) + 50.066s (userspace) = 53.195s.
  Startup finished in 3.350s (kernel) + 50.682s (userspace) = 54.032s.
  Startup finished in 3.355s (kernel) + 49.322s (userspace) = 52.678s.
  Startup finished in 3.219s (kernel) + 51.124s (userspace) = 54.343s.
  Startup finished in 3.128s (kernel) + 49.226s (userspace) = 52.354s.
  Startup finished in 3.193s (kernel) + 53.197s (userspace) = 56.390s.
  Startup finished in 3.118s (kernel) + 46.203s (userspace) = 49.322s.

  foofoo % grep "Startup finished.*kernel" 
bug-bionic-baseline-after*.debug/*/journal.log | cut -d" " -f 7-
  Startup finished in 7.685s (kernel) + 32.463s (userspace) = 40.148s.
  Startup finished in 7.041s (kernel) + 35.998s (userspace) = 43.040s.
  Startup finished in 7.808s (kernel) + 35.444s (userspace) = 43.253s.
  Startup finished in 7.206s (kernel) + 37.952s (userspace) = 45.159s.
  Startup finished in 8.426s (kernel) + 36.976s (userspace) = 45.403s.
  Startup finished in 6.731s (kernel) + 35.484s (userspace) = 42.216s.
  Startup finished in 7.152s (kernel) + 32.664s (userspace) = 39.817s.
  Startup finished in 7.429s (kernel) + 36.177s (userspace) = 43.606s.
  Startup finished in 9.075s (kernel) + 32.494s (userspace) = 41.570s.
  Startup finished in 7.281s (kernel) + 32.732s (userspace) = 40.013s.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-5.0.0-1027-azure 5.0.0-1027.29~18.04.1
  ProcVersionSignature: User Name 5.0.0-1027.29~18.04.1-azure 5.0.21
  Uname: Linux 5.0.0-1027-azure x86_64

[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support

2020-02-08 Thread Colin Ian King

Tested these updates with the kernel team ZFS autotest regression tests
on the following architectures:

arm64 - PASSED
amd64 - PASSED
s390x - PASSED
ppc64el - PASSED

I re-ran the failed lxd test as referenced in comment #5 and it passed,
so I believe the original failure was an artifact of the test system and
not with ZFS per se.


** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856704

Title:
  backport 5.3 zfs support to bionic for HWE kernel support

Status in spl-linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Committed
Status in spl-linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed

Bug description:
  == SRU Justification Bionic ==

  The HWE 5.3 kernel requires ZFS + SPL to support dkms module build
  functionality for kernels 4.15 through to 5.3.  Basically, the ZFS+SPL
  compat commits between 4.15 and 5.3 are required to allow the modules
  to build on kernels upto and include the HWE 5.3 kernel.

  == The Fix ==

  Backport of upstream commits:

  SPL:
  - 0002-fix-spl-build-shrinker-callback-check.patch
  - 0003-remove-deprecated-set-fs-pwd-check.patch
  - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch
  - 0005-Linux-4.20-compat-current_kernel_time.patch
  - 0006-Linux-4.18-compat-Use-ktime_get_coarse_real_ts64.patch
  - 0007-Linux-5.0-compat-Use-totalram_pages.patch
  - 0008-Linux-5.0-compat-Fix-SUBDIRs.patch
  - 0009-Linux-4.20-compat-Fix-VERIFY-RW_READ_HELD-hash-mh_co.patch
  - 0010-Linux-5.1-compat-get_ds-removed.patch
  - 0011-Linux-5.0-compat-Use-totalhigh_pages.patch
  - 0012-Linux-5.2-compat-rw_tryupgrade.patch
  - 0013-Linux-5.3-compat-rw_semaphore-owner.patch
  - 0014-Linux-5.3-compat-retire-rw_tryupgrade.patch
  - 0015-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch
  - 0016-Linux-compat-4.16-SECTOR_SIZE.patch
  - 0017-Linux-compat-spl-timespec_sub.patch
  - 0018-deprecate-splat-rwlock-test6.patch

  ZFS:
  - 3300-Linux-4.16-compat-inode_set_iversion.patch
  - 3301-Linux-4.16-compat-use-correct-_dec_and_test.patch
  - 3302-Linux-4.16-compat-get_disk_and_module.patch
  - 3303-Linux-compat-4.16-blk_queue_flag_-set-clear.patch
  - 3304-Linux-4.18-compat-inode-timespec-timespec64.patch
  - 3305-Linux-4.14-compat-blk_queue_stackable.patch
  - 3306-Linux-4.19-rc3-compat-Remove-refcount_t-compat.patch
  - 3307-Linux-5.0-compat-access_ok-drops-type-parameter.patch
  - 3308-Linux-5.0-compat-Use-totalram_pages.patch
  - 3309-Linux-5.0-compat-Convert-MS_-macros-to-SB_.patch
  - 3310-Linux-5.0-compat-Fix-SUBDIRs.patch
  - 3311-Linux-5.0-compat-Disable-vector-instructions-on-5.0-.patch
  - 3312-Linux-5.0-compat-Fix-bio_set_dev.patch
  - 3313-Linux-5.0-compat-Remove-incorrect-ASSERT.patch
  - 3314-Linux-5.0-compat-Use-totalhigh_pages.patch
  - 3315-Linux-5.0-compat-ASM_BUG-macro.patch
  - 3316-Linux-5.2-compat-rw_tryupgrade.patch
  - 3317-Linux-5.2-compat-Directly-call-wait_on_page_bit.patch
  - 3318-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch
  - 3319-Linux-5.3-Fix-switch-fall-though-compiler-errors.patch
  - 3320-zpios-deprecate-current-kernel-time.patch
  - 3321-add-compat-check-disk-size-change.patch

  == Testcase ==

  Without these commits users who install kernels and kernel headers
  from 4.16 through to 5.3 inclusive won't be able to build spl + zfs in
  Bionic because of the lack of the kernel compat fixes.  With the
  commits, zfs + spl dkms modules can build cleanly and pass the ubuntu
  ZFS regression tests found in the kernel team autotests git
  repository.

  == Risk ==

  This is a sizeable backport that touches a fair amount of spl + zfs
  kernel interfacing code. There is a risk that the backport may cause a
  regression in functionality that has not been exercised by the ZFS
  regression tests. This backport with the zfs regression testing
  ensures that no regression in core zfs functionality has been found.
  It must be noted that most of the patches are upstream compat fixes
  that are known to be working with the latest ZFS that is being used in
  focal, so we are confident the original compat changes work.

  Note that these updates have all been build tested on x86-64, arm64
  and s390x systems with kernels from 4.16 to 5.3 and regression tested
  with the ubuntu zfs regression tests.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade

2020-02-08 Thread Colin Ian King

OK, I'll look into this sometime this week. Thanks for the information.

** Changed in: zfs-linux (Ubuntu)
   Importance: Medium => High

** Changed in: zfs-linux (Ubuntu)
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860182

Title:
  zpool scrub malfunction after kernel upgrade

Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE
  kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly:

  eric@eric-8700K:~$ zpool status
pool: storagepool1
   state: ONLINE
scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020
  config:

NAME  STATE READ WRITE CKSUM
storagepool1  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0


  I ran zpool scrub after upgrading the kernel and rebooting, and now it
  fails to work properly. It appeared to finish in about 5 minutes but
  did not, and says it is going slow:


  eric@eric-8700K:~$ sudo zpool status
pool: storagepool1
   state: ONLINE
scan: scrub in progress since Fri Jan 17 15:32:07 2020
1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time)
0B repaired, 100.00% done
  config:

NAME  STATE READ WRITE CKSUM
storagepool1  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0

  errors: No known data errors

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: zfsutils-linux 0.7.5-1ubuntu16.7
  ProcVersionSignature: Ubuntu 5.3.0-26.28~18.04.1-generic 5.3.13
  Uname: Linux 5.3.0-26-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.9-0ubuntu7.9
  Architecture: amd64
  CurrentDesktop: ubuntu:GNOME
  Date: Fri Jan 17 16:22:01 2020
  InstallationDate: Installed on 2018-03-07 (681 days ago)
  InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
  SourcePackage: zfs-linux
  UpgradeStatus: Upgraded to bionic on 2018-08-02 (533 days ago)
  modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission 
denied: '/etc/sudoers.d/zfs']

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1860182/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade

2020-02-07 Thread Colin Ian King

** Changed in: zfs-linux (Ubuntu)
   Importance: High => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860182

Title:
  zpool scrub malfunction after kernel upgrade

Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE
  kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly:

  eric@eric-8700K:~$ zpool status
pool: storagepool1
   state: ONLINE
scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020
  config:

NAME  STATE READ WRITE CKSUM
storagepool1  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0


  I ran zpool scrub after upgrading the kernel and rebooting, and now it
  fails to work properly. It appeared to finish in about 5 minutes but
  did not, and says it is going slow:


  eric@eric-8700K:~$ sudo zpool status
pool: storagepool1
   state: ONLINE
scan: scrub in progress since Fri Jan 17 15:32:07 2020
1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time)
0B repaired, 100.00% done
  config:

NAME  STATE READ WRITE CKSUM
storagepool1  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0

  errors: No known data errors

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: zfsutils-linux 0.7.5-1ubuntu16.7
  ProcVersionSignature: Ubuntu 5.3.0-26.28~18.04.1-generic 5.3.13
  Uname: Linux 5.3.0-26-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.9-0ubuntu7.9
  Architecture: amd64
  CurrentDesktop: ubuntu:GNOME
  Date: Fri Jan 17 16:22:01 2020
  InstallationDate: Installed on 2018-03-07 (681 days ago)
  InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
  SourcePackage: zfs-linux
  UpgradeStatus: Upgraded to bionic on 2018-08-02 (533 days ago)
  modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission 
denied: '/etc/sudoers.d/zfs']

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1860182/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-02-07 Thread Colin Ian King

Cornered this to zswap and not an issue with mm or I/O.  Figured out
that 3 hours soak testing on each bisect step is the only reliably way
to do a bisect.  Bisected between 4.20 and 5.0 finally cornered the
issue and hence the commits required to fix this.

** Description changed:

+ == SRU Justification ==
+ 
+ When using zram (as installed and configured with the zram-config package)
+ systems can lockup after about a week of use.  This occurs because of
+ a hang in a lock in zram.
+ 
+ == Test Case ==
+ 
+ Run stress-ng --brk 0 --stack 0 in a Bionic amd64 server VM with 1GM of
+ memory, 16 CPU threads and zram-config installed.  Without the fix the
+ kernel will hang in a spinlock after 1-2 hours of run time. With the fix,
+ the hang does not occur.  Testing shows that with the fix, 5 x 16 CPU hours
+ of stress testing with stress-ng works fine without the lockup occurring.
+ 
+ == The fix ==
+ 
+ Upstream commit c4d6c4cc7bfd ("zram: correct flag name of ZRAM_ACCESS") as
+ a prerequisite followed by a minor context wiggle backport of the fix with
+ commit 3c9959e02547 ("zram: fix lockdep warning of free block handling").
+ 
+ == Regression Potential ==
+ 
+ This touches the zram locking, so the core zram driver is affected. However
+ the fixes are backports from 5.0, so the fixes have had a fair amount of
+ testing in later kernels.
+ 
+ 
  My main server has been running into hard lockups about once a week ever
  since I switched to the 4.15 Ubuntu 18.04 kernel.
  
  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on the
  cmdline but isn't rebooting so the kernel isn't even processing this as
  a kernel panic.
  
- 
- As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.
+ As this felt like a potential hardware issue, I had my hosting provider
+ give me a completely different system, different motherboard, different
+ CPU, different RAM and different storage, I installed that system on
+ 18.04 and moved my data over, a week later, I hit the issue again.
  
  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
-   https://github.com/lxc/lxd/issues/5197
+   https://github.com/lxc/lxd/issues/5197
  
- 
- My system doesn't have a lot of memory pressure with about 50% of free memory:
+ My system doesn't have a lot of memory pressure with about 50% of free
+ memory:
  
  root@vorash:~# free -m
-   totalusedfree  shared  buff/cache   
available
+   totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222
  
  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea of
  what happened but I'm not too hopeful given the complete silence on the
  console when this occurs.
  
  System is currently on:
-   Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux
+   Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux
  
  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
- --- 
+ ---
  ProblemType: Bug
  AlsaDevices:
-  total 0
-  crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
-  crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
+  total 0
+  crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
+  crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
-  Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
-  Cannot stat file /proc/22831/fd/10: Permission denied
+  Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
+  Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
-  RESUME=none
-  CRYPTSETUP=n
+  RESUME=none
+  CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
-  Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
-  Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
-  Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
+  Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
+

[Kernel-packages] [Bug 1862101] Re: ubuntu_zfs_fstest / ubuntu_zfs_xfs_generic failed to build on Focal 5.4

2020-02-06 Thread Colin Ian King

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => Critical

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1862101

Title:
  ubuntu_zfs_fstest / ubuntu_zfs_xfs_generic failed to build on Focal
  5.4

Status in ubuntu-kernel-tests:
  New
Status in linux package in Ubuntu:
  Incomplete
Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  The test build will failed because of unmet dependencies of zfsutils-
  linux and zfs-dkms package

  apt-get install --yes --force-yes build-essential gdb git gcc
  zfsutils-linux

  stdout:
  Reading package lists...
  Building dependency tree...
  Reading state information...
  build-essential is already the newest version (12.8ubuntu1).
  gcc is already the newest version (4:9.2.1-3.1ubuntu1).
  gdb is already the newest version (9.0.90.20200117-0ubuntu1).
  git is already the newest version (1:2.25.0-1ubuntu1).
  Some packages could not be installed. This may mean that you have
  requested an impossible situation or if you are using the unstable
  distribution that some required packages have not yet been created
  or been moved out of Incoming.
  The following information may help to resolve the situation:

  The following packages have unmet dependencies:
   zfsutils-linux : Breaks: zfs-dkms (< 0.8.3-1ubuntu2)
  stderr:
  W: --force-yes is deprecated, use one of the options starting with --allow 
instead.
  E: Unable to correct problems, you have held broken packages.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1862101/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1861359] Re: swap storms kills interactive use

2020-01-31 Thread Colin Ian King

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1861359

Title:
  swap storms kills interactive use

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hello, several times since upgrading to focal from 19.04 I've found my
  computer entirely unresponsive for periods of twenty or thirty
  seconds. No mouse movement, no keyboard input, the screen output does
  not change.

  My computer was using swap space and despite very slow writeout speeds
  well below what the NVME drive can handle, the computer was unusable.

  I've captured some vmstat 1 output and top output that I started
  collecting during the event. (Normally one very long painful period is
  followed by several shorter periods of uselessness.)

  Thanks

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: linux-image-5.4.0-12-generic 5.4.0-12.15
  ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8
  Uname: Linux 5.4.0-12-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu15
  Architecture: amd64
  Date: Wed Jan 29 23:44:05 2020
  ProcEnviron:
   TERM=rxvt-unicode-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-signed-5.4
  UpgradeStatus: Upgraded to focal on 2020-01-24 (5 days ago)
  --- 
  ProblemType: Bug
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 
k5.4.0-12-generic.
  ApportVersion: 2.20.11-0ubuntu16
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC0:  sarnold2734 F pulseaudio
   /dev/snd/controlC1:  sarnold2734 F pulseaudio
  Card0.Amixer.info:
   Card hw:0 'PCH'/'HDA Intel PCH at 0x2fe1028000 irq 145'
 Mixer name : 'Realtek ALC285'
 Components : 'HDA:10ec0285,17aa225c,0012 
HDA:8086280b,80860101,0010'
 Controls  : 53
 Simple ctrls  : 15
  Card1.Amixer.info:
   Card hw:1 'Audio'/'Generic ThinkPad Dock USB Audio at 
usb-:00:14.0-4.2.4, high speed'
 Mixer name : 'USB Mixer'
 Components : 'USB17ef:306f'
 Controls  : 9
 Simple ctrls  : 4
  DistroRelease: Ubuntu 20.04
  HibernationDevice: RESUME=none
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  MachineType: LENOVO 20KHCTO1WW
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  ProcEnviron:
   TERM=rxvt-unicode-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 i915drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu@/vmlinuz-5.4.0-12-generic 
root=ZFS=rpool/ROOT/ubuntu ro root=ZFS=rpool/ROOT/ubuntu quiet splash 
acpi_osi=! "acpi_osi=Windows 2015" vt.handoff=1
  ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8
  RelatedPackageVersions:
   linux-restricted-modules-5.4.0-12-generic N/A
   linux-backports-modules-5.4.0-12-generic  N/A
   linux-firmware1.185
  Tags:  focal
  Uname: Linux 5.4.0-12-generic x86_64
  UpgradeStatus: Upgraded to focal on 2020-01-24 (5 days ago)
  UserGroups: adm cdrom libvirt lpadmin plugdev sambashare sbuild sudo
  _MarkForUpload: True
  dmi.bios.date: 11/25/2019
  dmi.bios.vendor: LENOVO
  dmi.bios.version: N23ET69W (1.44 )
  dmi.board.asset.tag: Not Available
  dmi.board.name: 20KHCTO1WW
  dmi.board.vendor: LENOVO
  dmi.board.version: SDK0J40709 WIN
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: None
  dmi.modalias: 
dmi:bvnLENOVO:bvrN23ET69W(1.44):bd11/25/2019:svnLENOVO:pn20KHCTO1WW:pvrThinkPadX1Carbon6th:rvnLENOVO:rn20KHCTO1WW:rvrSDK0J40709WIN:cvnLENOVO:ct10:cvrNone:
  dmi.product.family: ThinkPad X1 Carbon 6th
  dmi.product.name: 20KHCTO1WW
  dmi.product.sku: LENOVO_MT_20KH_BU_Think_FM_ThinkPad X1 Carbon 6th
  dmi.product.version: ThinkPad X1 Carbon 6th
  dmi.sys.vendor: LENOVO
  --- 
  ProblemType: Bug
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 
k5.4.0-12-generic.
  ApportVersion: 2.20.11-0ubuntu16
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC0:  sarnold2734 F pulseaudio
   /dev/snd/controlC1:  sarnold2734 F pulseaudio
  Card0.Amixer.info:
   Card hw:0 'PCH'/'HDA Intel PCH at 0x2fe1028000 irq 145'
 Mixer name : 'Realtek ALC285'
 Components : 'HDA:10ec0285,17aa225c,0012 
HDA:8086280b,80860101,0010'
 Controls  : 53
 Simple ctrls  : 15
  Card1.Amixer.info:
   Card hw:1 'Audio'/'Generic ThinkPad Dock USB Audio at 
usb-:00:14.0-4.2.4, high speed'
 Mixer name : 'USB Mixer'
 Components : 'USB17ef:306f'
 Controls  : 9
 Simple ctrls  : 4

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-30 Thread Colin Ian King

Running w/o swapfile and zswap and just stress-ng brk and stack
stressors with NO file I/O can also lock the system.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in zram-config package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..:
  dmi.product.family: Family
  dmi.product.name: S1200SP
  dmi.product.version: 
  dmi.sys.vendor: Intel

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-30 Thread Colin Ian King

Couple more notes:

1. Disable file based swap on /swapfile - can reproduce issue
2. Use partition based swap on 2nd disk - can reproduce issue

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in zram-config package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..:
  dmi.product.family: Family
  dmi.product.name: S1200SP
  dmi.product.version:

[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

2020-01-29 Thread Colin Ian King

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: linux (Ubuntu)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu)
   Importance: Medium => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1861235

Title:
  zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Same as bug 1861228 but with a newer kernel installed.

  [  790.702566] VERIFY(size != 0) failed
  [  790.702590] PANIC at range_tree.c:304:range_tree_find_impl()
  [  790.702611] Showing stack for process 28685
  [  790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P   O 
4.15.0-76-generic #86-Ubuntu
  [  790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 
2.0 12/17/2015
  [  790.702616] Call Trace:
  [  790.702626]  dump_stack+0x6d/0x8e
  [  790.702637]  spl_dumpstack+0x42/0x50 [spl]
  [  790.702640]  spl_panic+0xc8/0x110 [spl]
  [  790.702645]  ? __switch_to_asm+0x41/0x70
  [  790.702714]  ? arc_prune_task+0x1a/0x40 [zfs]
  [  790.702740]  ? dbuf_dirty+0x43d/0x850 [zfs]
  [  790.702745]  ? getrawmonotonic64+0x43/0xd0
  [  790.702746]  ? getrawmonotonic64+0x43/0xd0
  [  790.702775]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702778]  ? getrawmonotonic64+0x43/0xd0
  [  790.702805]  ? dmu_zfetch+0x49a/0x500 [zfs]
  [  790.702807]  ? mutex_lock+0x12/0x40
  [  790.702833]  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
  [  790.702866]  range_tree_find_impl+0x88/0x90 [zfs]
  [  790.702870]  ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
  [  790.702902]  range_tree_clear+0x4f/0x60 [zfs]
  [  790.702930]  dnode_free_range+0x11f/0x5a0 [zfs]
  [  790.702957]  dmu_object_free+0x53/0x90 [zfs]
  [  790.702983]  dmu_free_long_object+0x9f/0xc0 [zfs]
  [  790.703010]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
  [  790.703036]  receive_writer_thread+0x6d2/0xa60 [zfs]
  [  790.703040]  ? set_curr_task_fair+0x2b/0x60
  [  790.703043]  ? spl_kmem_free+0x33/0x40 [spl]
  [  790.703048]  ? kfree+0x165/0x180
  [  790.703073]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
  [  790.703078]  thread_generic_wrapper+0x74/0x90 [spl]
  [  790.703081]  kthread+0x121/0x140
  [  790.703084]  ? __thread_exit+0x20/0x20 [spl]
  [  790.703085]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  790.703088]  ret_from_fork+0x35/0x40
  [  967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds.
  [  967.636979]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637076] txg_quiesce D0 14810  2 0x8000
  [  967.637080] Call Trace:
  [  967.637089]  __schedule+0x24e/0x880
  [  967.637092]  schedule+0x2c/0x80
  [  967.637106]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637114]  ? wait_woken+0x80/0x80
  [  967.637122]  __cv_wait+0x15/0x20 [spl]
  [  967.637210]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
  [  967.637278]  ? txg_delay+0x1b0/0x1b0 [zfs]
  [  967.637286]  thread_generic_wrapper+0x74/0x90 [spl]
  [  967.637291]  kthread+0x121/0x140
  [  967.637297]  ? __thread_exit+0x20/0x20 [spl]
  [  967.637299]  ? kthread_create_worker_on_cpu+0x70/0x70
  [  967.637304]  ret_from_fork+0x35/0x40
  [  967.637326] INFO: task zfs:28590 blocked for more than 120 seconds.
  [  967.637371]   Tainted: P   O 4.15.0-76-generic #86-Ubuntu
  [  967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  967.637467] zfs D0 28590  28587 0x8080
  [  967.637470] Call Trace:
  [  967.637474]  __schedule+0x24e/0x880
  [  967.637477]  schedule+0x2c/0x80
  [  967.637486]  cv_wait_common+0x11e/0x140 [spl]
  [  967.637491]  ? wait_woken+0x80/0x80
  [  967.637498]  __cv_wait+0x15/0x20 [spl]
  [  967.637554]  dmu_recv_stream+0xa51/0xef0 [zfs]
  [  967.637630]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
  [  967.637679]  ? dbuf_read+0x34a/0x920 [zfs]
  [  967.637725]  ? dbuf_rele+0x36/0x40 [zfs]
  [  967.637728]  ? _cond_resched+0x19/0x40
  [  967.637798]  zfs_ioc_recv_new+0x33d/0x410 [zfs]
  [  967.637809]  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
  [  967.637816]  ? spl_vmem_alloc+0x19/0x20 [spl]
  [  967.637828]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
  [  967.637834]  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
  [  967.637840]  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
  [  967.637905]  zfsdev_ioctl+0x451/0x610 [zfs]
  [  967.637913]  do_vfs_ioctl+0xa8/0x630
  [  967.637917]  ? __audit_syscall_entry+0xbc/0x110
  [  967.637924]  ? syscall_trace_enter+0x1da/0x2d0
  [  967.637927]  SyS_ioctl+0x79/0x90
  [  967.637930]  do_syscall_64+0x73/0x130
  [  967.637935]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
  [  967.637938] RIP: 0033:0x7fc305a905d7
  [  967.637940] RSP: 002b:7ffc45e39618 EFLAGS: 0246 ORIG_RAX: 
00

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-28 Thread Colin Ian King

Captured the hard lock on the following

(gdb) stepi
0x8c4e29e5 in ?? ()
=> 0x8c4e29e5:  eb ec   jmp0x8c4e29d3
(gdb) stepi
0x8c4e29d3 in ?? ()
=> 0x8c4e29d3:  8b 07   mov(%rdi),%eax
(gdb) stepi
0x8c4e29d5 in ?? ()
=> 0x8c4e29d5:  85 c0   test   %eax,%eax
(gdb) stepi
0x8c4e29d7 in ?? ()
=> 0x8c4e29d7:  75 0a   jne0x8c4e29e3
(gdb) stepi
0x8c4e29e3 in ?? ()
=> 0x8c4e29e3:  f3 90   pause  
(gdb) stepi
0x8c4e29e5 in ?? ()
=> 0x8c4e29e5:  eb ec   jmp0x8c4e29d3

This maps to:

810e29c0 :

810e29d3:   8b 07   mov(%rdi),%eax
810e29d5:   85 c0   test   %eax,%eax
810e29d7:   75 0a   jne810e29e3 

810e29d9:   f0 0f b1 17 lock cmpxchg %edx,(%rdi)
810e29dd:   85 c0   test   %eax,%eax
810e29df:   75 f2   jne810e29d3 

810e29e1:   5d  pop%rbp
810e29e2:   c3  retq
810e29e3:   f3 90   pause
810e29e5:   eb ec   jmp810e29d3 


-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in zram-config package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic

[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640

2020-01-27 Thread Colin Ian King

Dann, tested on my 6640 on an older kernel, now get:

sudo dmidecode
# dmidecode 3.1
# No SMBIOS nor DMI entry point found, sorry.

I guess that's expected.

I'd like to see what Ethan gets on his H/W as I'm not running a cloud
installation on my dev board.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dmidecode in Ubuntu.
https://bugs.launchpad.net/bugs/1858615

Title:
  dmidecode triggers system reboot on Inforce 6640

Status in cloud-init:
  Invalid
Status in dmidecode package in Ubuntu:
  Fix Released
Status in dmidecode source package in Bionic:
  In Progress
Status in dmidecode source package in Eoan:
  In Progress
Status in dmidecode source package in Focal:
  Fix Released
Status in dmidecode package in Debian:
  Unknown

Bug description:
  [Impact]
  Running 'sudo dmidecode' on non-UEFI ARM systems can cause them to 
crash/reboot. cloud-init apparently runs dmidecode as root, so it breaks any 
cloud-init based installation.

  [Test Case]
  sudo dmidecode

  [Fix]
  Upstream has the following fix:

  commit e12ec26e19e02281d3e7258c3aabb88a5cf5ec1d
  Author: Jean Delvare 
  Date: Mon Aug 26 14:20:15 2019 +0200

  dmidecode: Only scan /dev/mem for entry point on x86

  [Regression Risk]
  In Ubuntu, dmidecode only builds on amd64, arm64, armhf & i386.
  The fix is to disable code on !x86, so the regression risk is restricted to 
ARM platforms, where we know /dev/mem trolling is bad news.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640

2020-01-27 Thread Colin Ian King

It needs backporting to eoan, disco bionic, I was just about to upload a
fix to my ppa so I could get it sponsored. Do you want to take it from
here Dann?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dmidecode in Ubuntu.
https://bugs.launchpad.net/bugs/1858615

Title:
  dmidecode triggers system reboot on Inforce 6640

Status in cloud-init:
  Invalid
Status in dmidecode package in Ubuntu:
  Fix Released
Status in dmidecode source package in Bionic:
  In Progress
Status in dmidecode source package in Eoan:
  In Progress
Status in dmidecode source package in Focal:
  Fix Released
Status in dmidecode package in Debian:
  Unknown

Bug description:
  Device: Inforce 6640
  
https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc
  SoC: Snapdragon 820

  sysname='Linux',
  nodename='ubuntu',
  release='4.15.0-1069-snapdragon', 
  version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', 
  machine='aarch64'

  The issue is caused by following commit.
  Inforce 6640 doesn't have functional demidecode.
  System will reboot when executing dmidecode.

  commit 3416e2ee7f65defdb15aab861a85767d13e8c34c
  Author: Robert Schweikert 
  Date: Sat Oct 29 09:29:53 2016 -0400
  dmidecode: Allow dmidecode to be used on aarch64
  aarch64 systems have functional dmidecode, so allow that to be used.
  - aarch64 has support for dmidecode as well

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640

2020-01-27 Thread Colin Ian King

Upstream has a fix like the one I was hinting at in comment #9, I'll SRU
this fix.

commit e12ec26e19e02281d3e7258c3aabb88a5cf5ec1d
Author: Jean Delvare 
Date:   Mon Aug 26 14:20:15 2019 +0200

dmidecode: Only scan /dev/mem for entry point on x86

x86 is the only architecture which can have a DMI entry point scanned
from /dev/mem. Do not attempt it on other architectures, because not
only it can't work, but it can even cause the system to reboot.

This fixes support request #109697:
https://savannah.nongnu.org/support/?109697


** Changed in: dmidecode (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: dmidecode (Ubuntu)
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dmidecode in Ubuntu.
https://bugs.launchpad.net/bugs/1858615

Title:
  dmidecode triggers system reboot on Inforce 6640

Status in cloud-init:
  Invalid
Status in dmidecode package in Ubuntu:
  In Progress

Bug description:
  Device: Inforce 6640
  
https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc
  SoC: Snapdragon 820

  sysname='Linux',
  nodename='ubuntu',
  release='4.15.0-1069-snapdragon', 
  version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', 
  machine='aarch64'

  The issue is caused by following commit.
  Inforce 6640 doesn't have functional demidecode.
  System will reboot when executing dmidecode.

  commit 3416e2ee7f65defdb15aab861a85767d13e8c34c
  Author: Robert Schweikert 
  Date: Sat Oct 29 09:29:53 2016 -0400
  dmidecode: Allow dmidecode to be used on aarch64
  aarch64 systems have functional dmidecode, so allow that to be used.
  - aarch64 has support for dmidecode as well

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640

2020-01-27 Thread Colin Ian King

I guess the next question is why dmidecode being run as root is required
on a cloud init? What happens when arches don't have DMI data?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dmidecode in Ubuntu.
https://bugs.launchpad.net/bugs/1858615

Title:
  dmidecode triggers system reboot on Inforce 6640

Status in cloud-init:
  Invalid
Status in dmidecode package in Ubuntu:
  In Progress

Bug description:
  Device: Inforce 6640
  
https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc
  SoC: Snapdragon 820

  sysname='Linux',
  nodename='ubuntu',
  release='4.15.0-1069-snapdragon', 
  version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', 
  machine='aarch64'

  The issue is caused by following commit.
  Inforce 6640 doesn't have functional demidecode.
  System will reboot when executing dmidecode.

  commit 3416e2ee7f65defdb15aab861a85767d13e8c34c
  Author: Robert Schweikert 
  Date: Sat Oct 29 09:29:53 2016 -0400
  dmidecode: Allow dmidecode to be used on aarch64
  aarch64 systems have functional dmidecode, so allow that to be used.
  - aarch64 has support for dmidecode as well

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640

2020-01-27 Thread Colin Ian King

dmidocode.c directly accesses memory and assumes it's an x86 without any
checking that the arch is x86.. Randomly scanning arbitrary hunks of
memory on non-x86 as root will lead to all sorts of woe:

memory_scan:
if (!(opt.flags & FLAG_QUIET))
printf("Scanning %s for entry point.\n", opt.devmem);
/* Fallback to memory scan (x86, x86_64) */
if ((buf = mem_chunk(0xF, 0x1, opt.devmem)) == NULL)
{
ret = 1;
goto exit_free;
}

It probably needs wrapping with:

#if defined(__x86_64__) || defined(__x86_64) || \
defined(__i386__)   || defined(__i386)

...

#endif

Anyhow, I don't think this is a kernel specific issue. I can trigger
this with various kernels - we just don't protect users with
CAP_SYS_ADMIN rights doing crazy probing on /dev/mem.

** Changed in: dmidecode (Ubuntu)
 Assignee: Colin Ian King (colin-king) => (unassigned)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dmidecode in Ubuntu.
https://bugs.launchpad.net/bugs/1858615

Title:
  dmidecode triggers system reboot on Inforce 6640

Status in cloud-init:
  Invalid
Status in dmidecode package in Ubuntu:
  Triaged

Bug description:
  Device: Inforce 6640
  
https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc
  SoC: Snapdragon 820

  sysname='Linux',
  nodename='ubuntu',
  release='4.15.0-1069-snapdragon', 
  version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', 
  machine='aarch64'

  The issue is caused by following commit.
  Inforce 6640 doesn't have functional demidecode.
  System will reboot when executing dmidecode.

  commit 3416e2ee7f65defdb15aab861a85767d13e8c34c
  Author: Robert Schweikert 
  Date: Sat Oct 29 09:29:53 2016 -0400
  dmidecode: Allow dmidecode to be used on aarch64
  aarch64 systems have functional dmidecode, so allow that to be used.
  - aarch64 has support for dmidecode as well

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640

2020-01-27 Thread Colin Ian King

So, dmidecode directly mmap's to /dev/mem and does some probing based on
the belief that the system is a x86 architecture even on arm
architectures.

openat(AT_FDCWD, "/dev/mem", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFCHR|0640, st_rdev=makedev(0x1, 0x1), ...}) = 0
mmap(NULL, 65536, PROT_READ, MAP_SHARED, 3, 0xf) = 0x7f9f6fd000

etc

So that's kind of intrusive and as root one can read any sort of
physical addresses in /dev/mem that may cause breakage.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dmidecode in Ubuntu.
https://bugs.launchpad.net/bugs/1858615

Title:
  dmidecode triggers system reboot on Inforce 6640

Status in cloud-init:
  Invalid
Status in dmidecode package in Ubuntu:
  Triaged

Bug description:
  Device: Inforce 6640
  
https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc
  SoC: Snapdragon 820

  sysname='Linux',
  nodename='ubuntu',
  release='4.15.0-1069-snapdragon', 
  version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', 
  machine='aarch64'

  The issue is caused by following commit.
  Inforce 6640 doesn't have functional demidecode.
  System will reboot when executing dmidecode.

  commit 3416e2ee7f65defdb15aab861a85767d13e8c34c
  Author: Robert Schweikert 
  Date: Sat Oct 29 09:29:53 2016 -0400
  dmidecode: Allow dmidecode to be used on aarch64
  aarch64 systems have functional dmidecode, so allow that to be used.
  - aarch64 has support for dmidecode as well

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support

2020-01-27 Thread Colin Ian King

@ubuntu stable folks - can this be uploaded sometime soon?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856704

Title:
  backport 5.3 zfs support to bionic for HWE kernel support

Status in spl-linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Committed
Status in spl-linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed

Bug description:
  == SRU Justification Bionic ==

  The HWE 5.3 kernel requires ZFS + SPL to support dkms module build
  functionality for kernels 4.15 through to 5.3.  Basically, the ZFS+SPL
  compat commits between 4.15 and 5.3 are required to allow the modules
  to build on kernels upto and include the HWE 5.3 kernel.

  == The Fix ==

  Backport of upstream commits:

  SPL:
  - 0002-fix-spl-build-shrinker-callback-check.patch
  - 0003-remove-deprecated-set-fs-pwd-check.patch
  - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch
  - 0005-Linux-4.20-compat-current_kernel_time.patch
  - 0006-Linux-4.18-compat-Use-ktime_get_coarse_real_ts64.patch
  - 0007-Linux-5.0-compat-Use-totalram_pages.patch
  - 0008-Linux-5.0-compat-Fix-SUBDIRs.patch
  - 0009-Linux-4.20-compat-Fix-VERIFY-RW_READ_HELD-hash-mh_co.patch
  - 0010-Linux-5.1-compat-get_ds-removed.patch
  - 0011-Linux-5.0-compat-Use-totalhigh_pages.patch
  - 0012-Linux-5.2-compat-rw_tryupgrade.patch
  - 0013-Linux-5.3-compat-rw_semaphore-owner.patch
  - 0014-Linux-5.3-compat-retire-rw_tryupgrade.patch
  - 0015-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch
  - 0016-Linux-compat-4.16-SECTOR_SIZE.patch
  - 0017-Linux-compat-spl-timespec_sub.patch
  - 0018-deprecate-splat-rwlock-test6.patch

  ZFS:
  - 3300-Linux-4.16-compat-inode_set_iversion.patch
  - 3301-Linux-4.16-compat-use-correct-_dec_and_test.patch
  - 3302-Linux-4.16-compat-get_disk_and_module.patch
  - 3303-Linux-compat-4.16-blk_queue_flag_-set-clear.patch
  - 3304-Linux-4.18-compat-inode-timespec-timespec64.patch
  - 3305-Linux-4.14-compat-blk_queue_stackable.patch
  - 3306-Linux-4.19-rc3-compat-Remove-refcount_t-compat.patch
  - 3307-Linux-5.0-compat-access_ok-drops-type-parameter.patch
  - 3308-Linux-5.0-compat-Use-totalram_pages.patch
  - 3309-Linux-5.0-compat-Convert-MS_-macros-to-SB_.patch
  - 3310-Linux-5.0-compat-Fix-SUBDIRs.patch
  - 3311-Linux-5.0-compat-Disable-vector-instructions-on-5.0-.patch
  - 3312-Linux-5.0-compat-Fix-bio_set_dev.patch
  - 3313-Linux-5.0-compat-Remove-incorrect-ASSERT.patch
  - 3314-Linux-5.0-compat-Use-totalhigh_pages.patch
  - 3315-Linux-5.0-compat-ASM_BUG-macro.patch
  - 3316-Linux-5.2-compat-rw_tryupgrade.patch
  - 3317-Linux-5.2-compat-Directly-call-wait_on_page_bit.patch
  - 3318-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch
  - 3319-Linux-5.3-Fix-switch-fall-though-compiler-errors.patch
  - 3320-zpios-deprecate-current-kernel-time.patch
  - 3321-add-compat-check-disk-size-change.patch

  == Testcase ==

  Without these commits users who install kernels and kernel headers
  from 4.16 through to 5.3 inclusive won't be able to build spl + zfs in
  Bionic because of the lack of the kernel compat fixes.  With the
  commits, zfs + spl dkms modules can build cleanly and pass the ubuntu
  ZFS regression tests found in the kernel team autotests git
  repository.

  == Risk ==

  This is a sizeable backport that touches a fair amount of spl + zfs
  kernel interfacing code. There is a risk that the backport may cause a
  regression in functionality that has not been exercised by the ZFS
  regression tests. This backport with the zfs regression testing
  ensures that no regression in core zfs functionality has been found.
  It must be noted that most of the patches are upstream compat fixes
  that are known to be working with the latest ZFS that is being used in
  focal, so we are confident the original compat changes work.

  Note that these updates have all been build tested on x86-64, arm64
  and s390x systems with kernels from 4.16 to 5.3 and regression tested
  with the ubuntu zfs regression tests.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640

2020-01-27 Thread Colin Ian King

Hi, can you provide me instructions on how to get and install the image
for this board? I'd like to reproduce this issue and get a suitable fix
for this.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dmidecode in Ubuntu.
https://bugs.launchpad.net/bugs/1858615

Title:
  dmidecode triggers system reboot on Inforce 6640

Status in cloud-init:
  Invalid
Status in dmidecode package in Ubuntu:
  Triaged

Bug description:
  Device: Inforce 6640
  
https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc
  SoC: Snapdragon 820

  sysname='Linux',
  nodename='ubuntu',
  release='4.15.0-1069-snapdragon', 
  version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', 
  machine='aarch64'

  The issue is caused by following commit.
  Inforce 6640 doesn't have functional demidecode.
  System will reboot when executing dmidecode.

  commit 3416e2ee7f65defdb15aab861a85767d13e8c34c
  Author: Robert Schweikert 
  Date: Sat Oct 29 09:29:53 2016 -0400
  dmidecode: Allow dmidecode to be used on aarch64
  aarch64 systems have functional dmidecode, so allow that to be used.
  - aarch64 has support for dmidecode as well

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1855100] Re: bpf self tests break 5.4.0-7-generic on power8 system

2020-01-20 Thread Colin Ian King

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855100

Title:
  bpf self tests break 5.4.0-7-generic on power8 system

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of
  the bare metal system.

  Last output seen while ssh'd into the box:

  11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls
  11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # running socket test
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # [PASS]
  11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests
  11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # running psock_fanout test
  11:52:34 DEBUG| [stdout] # 
  client_loop: send disconnect: Broken pipe

  last output in (truncated) nohup output:

  f -emit-llvm -c progs/pyperf180.c -o - || \
  11:52:15 DEBUG| [stdout]echo "clang failed") | \
  11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe  \
  11:52:15 DEBUG| [stdout]-filetype=obj -o 
/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o

  this suggests the bpf selftests are causing the breakage.

  last output logged in /var/log/dmesg.log :

  Dec  4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to 
MEM_GOING_OFFLINE

  next entries on dmesg.log show machine had rebooted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855100/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade

2020-01-17 Thread Colin Ian King

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => High

** Changed in: zfs-linux (Ubuntu)
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860182

Title:
  zpool scrub malfunction after kernel upgrade

Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE
  kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly:

  eric@eric-8700K:~$ zpool status
pool: storagepool1
   state: ONLINE
scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020
  config:

NAME  STATE READ WRITE CKSUM
storagepool1  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0


  I ran zpool scrub after upgrading the kernel and rebooting, and now it
  fails to work properly. It appeared to finish in about 5 minutes but
  did not, and says it is going slow:


  eric@eric-8700K:~$ sudo zpool status
pool: storagepool1
   state: ONLINE
scan: scrub in progress since Fri Jan 17 15:32:07 2020
1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time)
0B repaired, 100.00% done
  config:

NAME  STATE READ WRITE CKSUM
storagepool1  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3  ONLINE   0 0 0
ata-ST2000DM001-1CH164_Z1E285A4   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD  ONLINE   0 0 0
ata-ST2000DM006-2DM164_Z4ZA3ENE   ONLINE   0 0 0

  errors: No known data errors

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: zfsutils-linux 0.7.5-1ubuntu16.7
  ProcVersionSignature: Ubuntu 5.3.0-26.28~18.04.1-generic 5.3.13
  Uname: Linux 5.3.0-26-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.9-0ubuntu7.9
  Architecture: amd64
  CurrentDesktop: ubuntu:GNOME
  Date: Fri Jan 17 16:22:01 2020
  InstallationDate: Installed on 2018-03-07 (681 days ago)
  InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
  SourcePackage: zfs-linux
  UpgradeStatus: Upgraded to bionic on 2018-08-02 (533 days ago)
  modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission 
denied: '/etc/sudoers.d/zfs']

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1860182/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support

2020-01-17 Thread Colin Ian King

** Description changed:

- 5.3 kernel functionality back through to 4.15 is required for 5.3 HWE
- kernel support in ZFS and SPL modules.
+ == SRU Justification Bionic ==
+ 
+ The HWE 5.3 kernel requires ZFS + SPL to support dkms module build
+ functionality for kernels 4.15 through to 5.3.  Basically, the ZFS+SPL
+ compat commits between 4.15 and 5.3 are required to allow the modules to
+ build on kernels upto and include the HWE 5.3 kernel.
+ 
+ == The Fix ==
+ 
+ Backport of upstream commits:
+ 
+ SPL:
+ - 0002-fix-spl-build-shrinker-callback-check.patch
+ - 0003-remove-deprecated-set-fs-pwd-check.patch
+ - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch
+ - 0005-Linux-4.20-compat-current_kernel_time.patch
+ - 0006-Linux-4.18-compat-Use-ktime_get_coarse_real_ts64.patch
+ - 0007-Linux-5.0-compat-Use-totalram_pages.patch
+ - 0008-Linux-5.0-compat-Fix-SUBDIRs.patch
+ - 0009-Linux-4.20-compat-Fix-VERIFY-RW_READ_HELD-hash-mh_co.patch
+ - 0010-Linux-5.1-compat-get_ds-removed.patch
+ - 0011-Linux-5.0-compat-Use-totalhigh_pages.patch
+ - 0012-Linux-5.2-compat-rw_tryupgrade.patch
+ - 0013-Linux-5.3-compat-rw_semaphore-owner.patch
+ - 0014-Linux-5.3-compat-retire-rw_tryupgrade.patch
+ - 0015-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch
+ - 0016-Linux-compat-4.16-SECTOR_SIZE.patch
+ - 0017-Linux-compat-spl-timespec_sub.patch
+ - 0018-deprecate-splat-rwlock-test6.patch
+ 
+ ZFS:
+ - 3300-Linux-4.16-compat-inode_set_iversion.patch
+ - 3301-Linux-4.16-compat-use-correct-_dec_and_test.patch
+ - 3302-Linux-4.16-compat-get_disk_and_module.patch
+ - 3303-Linux-compat-4.16-blk_queue_flag_-set-clear.patch
+ - 3304-Linux-4.18-compat-inode-timespec-timespec64.patch
+ - 3305-Linux-4.14-compat-blk_queue_stackable.patch
+ - 3306-Linux-4.19-rc3-compat-Remove-refcount_t-compat.patch
+ - 3307-Linux-5.0-compat-access_ok-drops-type-parameter.patch
+ - 3308-Linux-5.0-compat-Use-totalram_pages.patch
+ - 3309-Linux-5.0-compat-Convert-MS_-macros-to-SB_.patch
+ - 3310-Linux-5.0-compat-Fix-SUBDIRs.patch
+ - 3311-Linux-5.0-compat-Disable-vector-instructions-on-5.0-.patch
+ - 3312-Linux-5.0-compat-Fix-bio_set_dev.patch
+ - 3313-Linux-5.0-compat-Remove-incorrect-ASSERT.patch
+ - 3314-Linux-5.0-compat-Use-totalhigh_pages.patch
+ - 3315-Linux-5.0-compat-ASM_BUG-macro.patch
+ - 3316-Linux-5.2-compat-rw_tryupgrade.patch
+ - 3317-Linux-5.2-compat-Directly-call-wait_on_page_bit.patch
+ - 3318-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch
+ - 3319-Linux-5.3-Fix-switch-fall-though-compiler-errors.patch
+ - 3320-zpios-deprecate-current-kernel-time.patch
+ - 3321-add-compat-check-disk-size-change.patch
+ 
+ == Testcase ==
+ 
+ Without these commits users who install kernels and kernel headers from
+ 4.16 through to 5.3 inclusive won't be able to build spl + zfs in Bionic
+ because of the lack of the kernel compat fixes.  With the commits, zfs +
+ spl dkms modules can build cleanly and pass the ubuntu ZFS regression
+ tests found in the kernel team autotests git repository.
+ 
+ == Risk ==
+ 
+ This is a sizeable backport that touches a fair amount of spl + zfs
+ kernel interfacing code. There is a risk that the backport may cause a
+ regression in functionality that has not been exercised by the ZFS
+ regression tests. This backport with the zfs regression testing ensures
+ that no regression in core zfs functionality has been found.   It must
+ be noted that most of the patches are upstream compat fixes that are
+ known to be working with the latest ZFS that is being used in focal, so
+ we are confident the original compat changes work.
+ 
+ Note that these updates have all been build tested on x86-64, arm64 and
+ s390x systems with kernels from 4.16 to 5.3 and regression tested with
+ the ubuntu zfs regression tests.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856704

Title:
  backport 5.3 zfs support to bionic for HWE kernel support

Status in spl-linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Committed
Status in spl-linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed

Bug description:
  == SRU Justification Bionic ==

  The HWE 5.3 kernel requires ZFS + SPL to support dkms module build
  functionality for kernels 4.15 through to 5.3.  Basically, the ZFS+SPL
  compat commits between 4.15 and 5.3 are required to allow the modules
  to build on kernels upto and include the HWE 5.3 kernel.

  == The Fix ==

  Backport of upstream commits:

  SPL:
  - 0002-fix-spl-build-shrinker-callback-check.patch
  - 0003-remove-deprecated-set-fs-pwd-check.patch
  - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch
  -

[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support

2020-01-17 Thread Colin Ian King

** Changed in: spl-linux (Ubuntu)
   Status: In Progress => Fix Committed

** Changed in: spl-linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

** Changed in: zfs-linux (Ubuntu)
   Status: In Progress => Fix Committed

** Changed in: zfs-linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856704

Title:
  backport 5.3 zfs support to bionic for HWE kernel support

Status in spl-linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Committed
Status in spl-linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed

Bug description:
  5.3 kernel functionality back through to 4.15 is required for 5.3 HWE
  kernel support in ZFS and SPL modules.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption

2020-01-17 Thread Colin Ian King

** Also affects: zfs-linux (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: zfs-linux (Ubuntu)
   Status: New => Fix Committed

** Changed in: zfs-linux (Ubuntu)
   Status: Fix Committed => Fix Released

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => High

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1857040

Title:
  zfs: upstream support for hardware-accelerated encryption

Status in linux package in Ubuntu:
  In Progress
Status in zfs-linux package in Ubuntu:
  Fix Released

Bug description:
  I understand that in Linux 5.0+, certain encryption-related symbols
  have been marked GPL-only, making them unavailable for use by zfs.  As
  a result, using encryption in zfs pools increases cpu load / decreases
  disk throughput.

  There are a pair of upstream pull requests that should improve the
  performance (with performance measurement done on x86-64).  Can these
  be pulled into the Ubuntu kernel?

  https://github.com/zfsonlinux/zfs/pull/9515
  https://github.com/zfsonlinux/zfs/pull/9296

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-16 Thread Colin Ian King

After quite a bit of experimentation I found that I can reproduce the bug if I 
have zram *and* also swap on the filesystem enabled while exercising the brk 
stressors and aiol (to cause lots of I/O). Eventually the system grinds to a 
halt, we lose interactivity and we eventually get lockups as follows:
[ 2012.040006] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! 
[stress-ng-brk:1632]
[ 2012.040922] Modules linked in: zram(E) kvm_intel(E) kvm(E) irqbypass(E) 
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) 
aesni_intel(E) aes_x86_64(E) crypto_simd(E) glue_helper(E) cryptd(E) psmouse(E) 
input_leds(E) floppy(E) virtio_scsi(E) serio_raw(E) i2c_piix4(E) mac_hid(E) 
pata_acpi(E) qemu_fw_cfg(E) 9pnet_virtio(E) 9p(E) 9pnet(E) fscache(E)
[ 2012.044655] CPU: 2 PID: 1632 Comm: stress-ng-brk Tainted: GEL   
4.15.18 #1
[ 2012.045581] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1 04/01/2014
[ 2012.046555] RIP: 0010:__raw_callee_save___pv_queued_spin_unlock+0x10/0x17
[ 2012.047340] RSP: 0018:b73382083718 EFLAGS: 0246 ORIG_RAX: 
ff11
[ 2012.048238] RAX: 0001 RBX:  RCX: 0002
[ 2012.049078] RDX:  RSI: 9d327c2f6918 RDI: a3269978
[ 2012.049909] RBP: b73382083720 R08: 9d327c2f6918 R09: 9d327c0a5328
[ 2012.050746] R10: 9d327c1e2310 R11: 9d327c1e2328 R12: 9d327c2f6800
[ 2012.051574] R13: 9d327c1e2328 R14: 9d327c1e2310 R15: 9d327c1e2200
[ 2012.052436] FS:  7f89f2ccd740() GS:9d327f28() 
knlGS:
[ 2012.053382] CS:  0010 DS:  ES:  CR0: 80050033
[ 2012.054058] CR2: 7f1350a8dd90 CR3: 311a4004 CR4: 00160ee0
[ 2012.054889] Call Trace:
[ 2012.055192]  get_swap_pages+0x193/0x360
[ 2012.055652]  get_swap_page+0x13f/0x1e0
[ 2012.056123]  add_to_swap+0x14/0x70
[ 2012.056530]  shrink_page_list+0x81d/0xbc0
[ 2012.057013]  shrink_inactive_list+0x242/0x590
[ 2012.057523]  shrink_node_memcg+0x364/0x770
[ 2012.058012]  shrink_node+0xf7/0x300
[ 2012.058432]  ? shrink_node+0xf7/0x300
[ 2012.058863]  do_try_to_free_pages+0xc9/0x330
[ 2012.059368]  try_to_free_pages+0xee/0x1b0
[ 2012.059842]  __alloc_pages_slowpath+0x3fc/0xe00
[ 2012.060424]  __alloc_pages_nodemask+0x29a/0x2c0
[ 2012.060963]  alloc_pages_vma+0x88/0x1f0
[ 2012.061414]  __handle_mm_fault+0x8b7/0x12e0
[ 2012.061909]  handle_mm_fault+0xb1/0x210
[ 2012.062375]  __do_page_fault+0x281/0x4b0
[ 2012.062848]  do_page_fault+0x2e/0xe0
[ 2012.063274]  ? async_page_fault+0x2f/0x50
[ 2012.063751]  do_async_page_fault+0x51/0x80
[ 2012.064262]  async_page_fault+0x45/0x50
[ 2012.064719] RIP: 0033:0x55ec1997bd0a
[ 2012.065147] RSP: 002b:7ffeacd21600 EFLAGS: 00010246
[ 2012.065754] RAX: 55ec28601000 RBX: 0005 RCX: 7f89f2de956b
[ 2012.066580] RDX: 55ec28601000 RSI: 7ffeacd216d0 RDI: 55ec28602000
[ 2012.067410] RBP: 7ffeacd216c0 R08:  R09: 7f89f3d0c2f0
[ 2012.068290] R10:  R11: 0246 R12: 
[ 2012.069129] R13: 0002 R14: 0001 R15: 7ffeacd216d0
[ 2012.069965] Code: 50 41 51 41 52 41 53 e8 3b 05 00 00 41 5b 41 5a 41 59 41 
58 5f 5e 5a 59 5d c3 90 55 48 89 e5 52 b8 01 00 00 00 31 d2 f0 0f b0 17 <3c> 01 
75 03 5a 5d c3 56 0f b6 f0 e8 bc ff ff ff 5e 5a 5d c3 0f

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in zram-config package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase

[Kernel-packages] [Bug 1858495] Re: multiple long delays during kernel and userspace boot

2020-01-16 Thread Colin Ian King

** Changed in: linux-signed-azure (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-signed-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1858495

Title:
  multiple long delays during kernel and userspace boot

Status in linux-signed-azure package in Ubuntu:
  New

Bug description:
  Booting some Bionic instances in Azure (gen1 machines), I see some
  large delays during kernel/userspace boot that it would be good to
  understand what's going on.  Additionally, there areas during boot
  that see delays is different for an image that's been created from a
  template vs. stock images.

  I'm attaching some data, 10 runs of the same image in a scaling set
  that run the initial boot.  Processing the journal output, looking at
  delays of over 2.0 shows some concern.

  
  [1.788581] localhost.localdomain kernel: * Found PM-Timer Bug on the 
chipset. Due to workarounds for a bug,
   * this clock source is slow. 
Consider trying other clock sources
  [3.545974] localhost.localdomain kernel: Unstable clock detected, 
switching default tracing clock to "global"
   If you want to keep using the 
local clock, then add:
 "trace_clock=local"   
   on the kernel command line  
  [6.401684] localhost.localdomain kernel: EXT4-fs (sda1): mounted 
filesystem with ordered data mode. Opts: (null)
  [   15.280390] localhost.localdomain kernel: EXT4-fs (sda1): re-mounted. 
Opts: discard

  
  After capturing bionic image as a template, and creating a new VM, we see new 
hot spots we didn't see before.

  
  # HotSpot maximum delta between kernel messages: 2.0
  # [2.846188] localhost.localdomain kernel: AES CTR mode by8 optimization 
enabled
  # [5.919313] localhost.localdomain kernel: raid6: avx2x4   gen() 21512 
MB/s
  #
  # [6.591530] localhost.localdomain kernel: EXT4-fs (sda1): mounted 
filesystem with ordered data mode. Opts: (null)
  # [9.031051] localhost.localdomain systemd[1]: systemd 237 running in 
system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP 
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD 
-IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
  #
  # [   13.773554] localhost.localdomain sh[871]: + exit 0
  # [   21.625467] localhost.localdomain kernel: UDF-fs: INFO Mounting volume 
'UDF Volume', timestamp 2019/12/17 00:00 (1000)
  #
  # [   24.919359] bugbif2be01 systemd-timesyncd[771]: Synchronized to time 
server 91.189.89.198:123 (ntp.ubuntu.com).
  # [   29.787339] bugbif2be01 cloud-init[1026]: Cloud-init v. 
19.2-36-g059d049c-0ubuntu2~18.04.1 running 'init' at Mon, 16 Dec 2019 18:14:47 
+. Up 25.20 seconds.

  The easiest comparison kernel-side is the systemd-analyze value:

  Grepping in the debug data:

  
  % grep "Startup finished.*kernel" bug-bionic-baseline-no*.debug/*/journal.log 
| cut -d" " -f 7-
  Startup finished in 3.209s (kernel) + 49.305s (userspace) = 52.515s.
  Startup finished in 3.355s (kernel) + 51.732s (userspace) = 55.088s.
  Startup finished in 3.287s (kernel) + 51.747s (userspace) = 55.035s.
  Startup finished in 3.129s (kernel) + 50.066s (userspace) = 53.195s.
  Startup finished in 3.350s (kernel) + 50.682s (userspace) = 54.032s.
  Startup finished in 3.355s (kernel) + 49.322s (userspace) = 52.678s.
  Startup finished in 3.219s (kernel) + 51.124s (userspace) = 54.343s.
  Startup finished in 3.128s (kernel) + 49.226s (userspace) = 52.354s.
  Startup finished in 3.193s (kernel) + 53.197s (userspace) = 56.390s.
  Startup finished in 3.118s (kernel) + 46.203s (userspace) = 49.322s.

  foofoo % grep "Startup finished.*kernel" 
bug-bionic-baseline-after*.debug/*/journal.log | cut -d" " -f 7-
  Startup finished in 7.685s (kernel) + 32.463s (userspace) = 40.148s.
  Startup finished in 7.041s (kernel) + 35.998s (userspace) = 43.040s.
  Startup finished in 7.808s (kernel) + 35.444s (userspace) = 43.253s.
  Startup finished in 7.206s (kernel) + 37.952s (userspace) = 45.159s.
  Startup finished in 8.426s (kernel) + 36.976s (userspace) = 45.403s.
  Startup finished in 6.731s (kernel) + 35.484s (userspace) = 42.216s.
  Startup finished in 7.152s (kernel) + 32.664s (userspace) = 39.817s.
  Startup finished in 7.429s (kernel) + 36.177s (userspace) = 43.606s.
  Startup finished in 9.075s (kernel) + 32.494s (userspace) = 41.570s.
  Startup finished in 7.281s (kernel) + 32.732s (userspace) = 40.013s.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-5.0.0-1027-azure 5.0.0-1027.29~18.04.1
  ProcVersionSignature: User Name 5.0.0-1027.29~18.04.1-azure 5.0.21
  Uname: Linux 5.0.0-1027-azure x86_64
  ApportVers

[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640

2020-01-14 Thread Colin Ian King

Oh, stupid me, I've just read the info in comment #1

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dmidecode in Ubuntu.
https://bugs.launchpad.net/bugs/1858615

Title:
  dmidecode triggers system reboot on Inforce 6640

Status in cloud-init:
  Invalid
Status in dmidecode package in Ubuntu:
  Triaged

Bug description:
  Device: Inforce 6640
  
https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc
  SoC: Snapdragon 820

  sysname='Linux',
  nodename='ubuntu',
  release='4.15.0-1069-snapdragon', 
  version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', 
  machine='aarch64'

  The issue is caused by following commit.
  Inforce 6640 doesn't have functional demidecode.
  System will reboot when executing dmidecode.

  commit 3416e2ee7f65defdb15aab861a85767d13e8c34c
  Author: Robert Schweikert 
  Date: Sat Oct 29 09:29:53 2016 -0400
  dmidecode: Allow dmidecode to be used on aarch64
  aarch64 systems have functional dmidecode, so allow that to be used.
  - aarch64 has support for dmidecode as well

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640

2020-01-14 Thread Colin Ian King

Does the kernel expose the DMI tables via the sysfs following sysfs
file:  /sys/firmware/dmi/tables/DMI ?

If so, can you do the following:

sudo cat  /sys/firmware/dmi/tables/DMI > dmi.raw

and attach it to the bug report.  Also a dump of the kernel dmesg log
after it boots may be useful to see if it's a broken firmware DMI table
or a kernel issue.

** Changed in: dmidecode (Ubuntu)
   Status: New => Triaged

** Changed in: dmidecode (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: dmidecode (Ubuntu)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to dmidecode in Ubuntu.
https://bugs.launchpad.net/bugs/1858615

Title:
  dmidecode triggers system reboot on Inforce 6640

Status in cloud-init:
  Invalid
Status in dmidecode package in Ubuntu:
  Triaged

Bug description:
  Device: Inforce 6640
  
https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc
  SoC: Snapdragon 820

  sysname='Linux',
  nodename='ubuntu',
  release='4.15.0-1069-snapdragon', 
  version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', 
  machine='aarch64'

  The issue is caused by following commit.
  Inforce 6640 doesn't have functional demidecode.
  System will reboot when executing dmidecode.

  commit 3416e2ee7f65defdb15aab861a85767d13e8c34c
  Author: Robert Schweikert 
  Date: Sat Oct 29 09:29:53 2016 -0400
  dmidecode: Allow dmidecode to be used on aarch64
  aarch64 systems have functional dmidecode, so allow that to be used.
  - aarch64 has support for dmidecode as well

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-09 Thread Colin Ian King

Can reproduce this with stress-ng exercising high memory pressure scenario 
using:
stress-ng --brk 0 -v --aiol 0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in zram-config package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..:
  dmi.product.family: Family
  dmi.product.name: S1200SP
  dmi.product.version: 
  dmi.sys.vendor: Intel

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-09 Thread Colin Ian King

Can reproduce this with stress-ng exercising high memory pressure scenario 
using:
stress-ng --brk 0 -v --aiol 0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in zram-config package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..:
  dmi.product.family: Family
  dmi.product.name: S1200SP
  dmi.product.version: 
  dmi.sys.vendor: Intel

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-09 Thread Colin Ian King

I'm assuming the defaults are being used for the moment, this means 50%
of total memory being used in total distributed across the number of
CPUs, as defined in /usr/bin/init-zram-swapping

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in zram-config package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..:
  dmi.product.family: Family
  dmi.product.name: S1200SP

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-09 Thread Colin Ian King

It would be useful to know if one has made any specific zram config
changes, and if so, what your current config is just to help with the
debugging of this issue.

** Changed in: linux (Ubuntu)
   Status: Confirmed => Incomplete

** Changed in: zram-config (Ubuntu)
   Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Incomplete
Status in zram-config package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Confirmed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias:

[Kernel-packages] [Bug 1853044] Re: 5.3.0-23-generic causes fans to spin when idle

2020-01-09 Thread Colin Ian King

I'll get a kernel sorted out for testing by EOD.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853044

Title:
  5.3.0-23-generic causes fans to spin when idle

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  After upgrading to 5.3.0-23-generic the fans in my machine don't stop
  running. They always sound like something is utilizing CPU - even with
  no applications running after boot.

  If I boot back to 5.3.0-19-generic it's fine.

  My microcode version is reported as 0xd4 and iucode-tool reports:

  iucode-tool: system has processor(s) with signature 0x000506e3

  Let me know if you need anything else.

  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: linux-image-5.3.0-23-generic 5.3.0-23.25
  ProcVersionSignature: Ubuntu 5.3.0-23.25-generic 5.3.7
  Uname: Linux 5.3.0-23-generic x86_64
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  ApportVersion: 2.20.11-0ubuntu8.2
  Architecture: amd64
  AudioDevicesInUse:
   USERPID ACCESS COMMAND
   /dev/snd/controlC2:  dean   2898 F pulseaudio
   /dev/snd/pcmC2D0p:   dean   2898 F...m pulseaudio
   /dev/snd/controlC0:  dean   2898 F pulseaudio
   /dev/snd/controlC1:  dean   2898 F pulseaudio
  CurrentDesktop: ubuntu:GNOME
  Date: Mon Nov 18 13:03:34 2019
  HibernationDevice: RESUME=UUID=55a42c82-50bf-4e75-a133-dbd3aa93611b
  InstallationDate: Installed on 2018-07-24 (482 days ago)
  InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 
(20180724)
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 i915drmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.3.0-23-generic 
root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-5.3.0-23-generic N/A
   linux-backports-modules-5.3.0-23-generic  N/A
   linux-firmware1.183.2
  SourcePackage: linux
  UpgradeStatus: Upgraded to eoan on 2019-07-19 (121 days ago)
  dmi.bios.date: 05/16/2018
  dmi.bios.vendor: Intel Corp.
  dmi.bios.version: KYSKLi70.86A.0055.2018.0516.1629
  dmi.board.name: NUC6i7KYB
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H90766-406
  dmi.chassis.type: 3
  dmi.chassis.vendor: Intel Corporation
  dmi.chassis.version: 1.0
  dmi.modalias: 
dmi:bvnIntelCorp.:bvrKYSKLi70.86A.0055.2018.0516.1629:bd05/16/2018:svn:pn:pvr:rvnIntelCorporation:rnNUC6i7KYB:rvrH90766-406:cvnIntelCorporation:ct3:cvr1.0:

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853044/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption

2020-01-09 Thread Colin Ian King

The next spin of the focal kernel will pick this up when it is built
with the new zfs-dkms driver.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1857040

Title:
  zfs: upstream support for hardware-accelerated encryption

Status in linux package in Ubuntu:
  In Progress

Bug description:
  I understand that in Linux 5.0+, certain encryption-related symbols
  have been marked GPL-only, making them unavailable for use by zfs.  As
  a result, using encryption in zfs pools increases cpu load / decreases
  disk throughput.

  There are a pair of upstream pull requests that should improve the
  performance (with performance measurement done on x86-64).  Can these
  be pulled into the Ubuntu kernel?

  https://github.com/zfsonlinux/zfs/pull/9515
  https://github.com/zfsonlinux/zfs/pull/9296

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week

2020-01-07 Thread Colin Ian King

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799497

Title:
  4.15 kernel hard lockup about once a week

Status in linux package in Ubuntu:
  Confirmed
Status in zram-config package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zram-config source package in Bionic:
  Confirmed

Bug description:
  My main server has been running into hard lockups about once a week
  ever since I switched to the 4.15 Ubuntu 18.04 kernel.

  When this happens, nothing is printed to the console, it's effectively
  stuck showing a login prompt. The system is running with panic=1 on
  the cmdline but isn't rebooting so the kernel isn't even processing
  this as a kernel panic.

  
  As this felt like a potential hardware issue, I had my hosting provider give 
me a completely different system, different motherboard, different CPU, 
different RAM and different storage, I installed that system on 18.04 and moved 
my data over, a week later, I hit the issue again.

  We've since also had a LXD user reporting similar symptoms here also on 
varying hardware:
https://github.com/lxc/lxd/issues/5197

  
  My system doesn't have a lot of memory pressure with about 50% of free memory:

  root@vorash:~# free -m
totalusedfree  shared  buff/cache   
available
  Mem:  31819   17574 402 513   13842   
13292
  Swap: 159092687   13222

  I will now try to increase console logging as much as possible on the
  system in the hopes that next time it hangs we can get a better idea
  of what happened but I'm not too hopeful given the complete silence on
  the console when this occurs.

  System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  But I've seen this since the GA kernel on 4.15 so it's not a recent 
regression.
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Oct 23 16:12 seq
   crw-rw 1 root audio 116, 33 Oct 23 16:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.4
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse:
   Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with 
exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
   Cannot stat file /proc/22831/fd/10: Permission denied
  DistroRelease: Ubuntu 18.04
  HibernationDevice:
   RESUME=none
   CRYPTSETUP=n
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard 
and Mouse
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Intel Corporation S1200SP
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 mgadrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic 
root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 
panic=1 verbose console=tty0 console=ttyS0,115200n8
  ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-38-generic N/A
   linux-backports-modules-4.15.0-38-generic  N/A
   linux-firmware 1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  Tags:  bionic
  Uname: Linux 4.15.0-38-generic x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: False
  dmi.bios.date: 01/25/2018
  dmi.bios.vendor: Intel Corporation
  dmi.bios.version: S1200SP.86B.03.01.1029.012520180838
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: S1200SP
  dmi.board.vendor: Intel Corporation
  dmi.board.version: H57532-271
  dmi.chassis.asset.tag: 
  dmi.chassis.type: 23
  dmi.chassis.vendor: ...
  dmi.chassis.version: ..
  dmi.modalias: 
dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..:
  dmi.product.family: Family
  dmi.product.name: S1200SP
  dmi.product.version: 
  dmi.sys.vendor: Intel Corporation

To man

[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption

2020-01-07 Thread Colin Ian King

Also should apply:

commit 10fa254539ec41c6b043785d4e7ab34bce383b9f
Author: Brian Behlendorf 
Date:   Thu Oct 24 10:17:33 2019 -0700

Linux 4.14, 4.19, 5.0+ compat: SIMD save/restore

but this also requires a rather tricky backport of:

commit 006e9a40882468be68f276c946bae812b74ac35c
Author: Matthew Macy 
Date:   Thu Sep 5 09:34:54 2019 -0700

OpenZFS restructuring - move platform specific headers

and also we are dependant on a backport of:

commit 608f8749a1055e6769899788e11bd51fd396f9e5
Author: Brian Behlendorf 
Date:   Tue Oct 1 12:50:34 2019 -0700

Perform KABI checks in parallel

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1857040

Title:
  zfs: upstream support for hardware-accelerated encryption

Status in linux package in Ubuntu:
  In Progress

Bug description:
  I understand that in Linux 5.0+, certain encryption-related symbols
  have been marked GPL-only, making them unavailable for use by zfs.  As
  a result, using encryption in zfs pools increases cpu load / decreases
  disk throughput.

  There are a pair of upstream pull requests that should improve the
  performance (with performance measurement done on x86-64).  Can these
  be pulled into the Ubuntu kernel?

  https://github.com/zfsonlinux/zfs/pull/9515
  https://github.com/zfsonlinux/zfs/pull/9296

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1858650] Re: package zfsutils-linux 0.8.1-1ubuntu14.2 failed to install/upgrade: installed zfsutils-linux package post-installation script subprocess returned error exit status

2020-01-07 Thread Colin Ian King

ZFS kernel modules are not supported for small memory ARM platforms such
as raspberry pi as it requires at least 4GB of memory to perform without
causing memory pressure issues.

** Changed in: zfs-linux (Ubuntu)
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1858650

Title:
  package zfsutils-linux 0.8.1-1ubuntu14.2 failed to install/upgrade:
  installed zfsutils-linux package post-installation script subprocess
  returned error exit status 1

Status in zfs-linux package in Ubuntu:
  Won't Fix

Bug description:
  cant install zfsutils-linux on headless nor desktop

  Jan 07 15:06:06 ubuntu zfs[3775]: The ZFS modules are not loaded.
  Jan 07 15:06:06 ubuntu zfs[3775]: Try running '/sbin/modprobe zfs' as root to 
lo
  ad them.
  Jan 07 15:06:06 ubuntu systemd[1]: zfs-mount.service: Main process exited, 
code=
  exited, status=1/FAILURE
  Jan 07 15:06:06 ubuntu systemd[1]: zfs-mount.service: Failed with result 
'exit-c
  ode'.
  Jan 07 15:06:06 ubuntu systemd[1]: Failed to start Mount ZFS filesystems.

  ProblemType: Package
  DistroRelease: Ubuntu 19.10
  Package: zfsutils-linux 0.8.1-1ubuntu14.2
  ProcVersionSignature: Ubuntu 5.3.0-1015.17-raspi2 5.3.13
  Uname: Linux 5.3.0-1015-raspi2 aarch64
  ApportVersion: 2.20.11-0ubuntu8.2
  Architecture: arm64
  Date: Tue Jan  7 15:06:06 2020
  ErrorMessage: installed zfsutils-linux package post-installation script 
subprocess returned error exit status 1
  Python3Details: /usr/bin/python3.7, Python 3.7.5, python3-minimal, 3.7.5-1
  PythonDetails: N/A
  RelatedPackageVersions:
   dpkg 1.19.7ubuntu2
   apt  1.9.4
  SourcePackage: zfs-linux
  Title: package zfsutils-linux 0.8.1-1ubuntu14.2 failed to install/upgrade: 
installed zfsutils-linux package post-installation script subprocess returned 
error exit status 1
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1858650/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption

2020-01-06 Thread Colin Ian King

** Changed in: linux (Ubuntu)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1857040

Title:
  zfs: upstream support for hardware-accelerated encryption

Status in linux package in Ubuntu:
  In Progress

Bug description:
  I understand that in Linux 5.0+, certain encryption-related symbols
  have been marked GPL-only, making them unavailable for use by zfs.  As
  a result, using encryption in zfs pools increases cpu load / decreases
  disk throughput.

  There are a pair of upstream pull requests that should improve the
  performance (with performance measurement done on x86-64).  Can these
  be pulled into the Ubuntu kernel?

  https://github.com/zfsonlinux/zfs/pull/9515
  https://github.com/zfsonlinux/zfs/pull/9296

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption

2019-12-22 Thread Colin Ian King

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1857040

Title:
  zfs: upstream support for hardware-accelerated encryption

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I understand that in Linux 5.0+, certain encryption-related symbols
  have been marked GPL-only, making them unavailable for use by zfs.  As
  a result, using encryption in zfs pools increases cpu load / decreases
  disk throughput.

  There are a pair of upstream pull requests that should improve the
  performance (with performance measurement done on x86-64).  Can these
  be pulled into the Ubuntu kernel?

  https://github.com/zfsonlinux/zfs/pull/9515
  https://github.com/zfsonlinux/zfs/pull/9296

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1856900] Re: stress-ng sysinfo stressor fails on ppc64el with linux 5.4.0-9.12

2019-12-18 Thread Colin Ian King

I believe this is because a FUSE based file system is being used in the
prior ADT testing and sysinfo is breaking on the FUSE filesystem, so it
may be a problem with with the fuse fs itself or the fuse file system
that is using the kernel fuse core.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856900

Title:
  stress-ng sysinfo stressor fails on ppc64el with linux 5.4.0-9.12

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  During autopkgtest testing the sysinfo stressor failed, causing the
  kernel to oops.

  16:20:34 DEBUG| [stdout] sysinfo STARTING
  16:20:39 DEBUG| [stdout] sysinfo RETURNED 0
  16:20:39 DEBUG| [stdout] sysinfo FAILED (kernel oopsed)
  16:20:39 DEBUG| [stdout] [ 6521.203448] kernel tried to execute 
exec-protected page (c000c25ffce0) - exploit attempt? (uid: 0)
  16:20:39 DEBUG| [stdout] [ 6521.207260] BUG: Unable to handle kernel 
instruction fetch
  16:20:39 DEBUG| [stdout] [ 6521.207307] Faulting instruction address: 
0xc000c25ffce0
  16:20:39 DEBUG| [stdout] [ 6521.207367] Oops: Kernel access of bad area, sig: 
11 [#1]
  16:20:39 DEBUG| [stdout] [ 6521.207416] LE PAGE_SIZE=64K MMU=Hash SMP 
NR_CPUS=2048 NUMA pSeries
  16:20:39 DEBUG| [stdout] [ 6521.207481] Modules linked in: unix_diag sctp 
vhost_vsock vmw_vsock_virtio_transport_common vsock zfs(PO) zunicode(PO) 
zavl(PO) icp(PO) zlua(PO) userio zcommon(PO) znvpair(PO) cuse spl(O) kvm_pr kvm 
snd_seq snd_seq_device snd_timer snd soundcore hci_vhci bluetooth ecdh_generic 
ecc uhid hid vhost_net vhost tap atm algif_rng aegis128 algif_aead anubis 
fcrypt khazad seed sm4_generic tea crc32_generic md4 michael_mic nhpoly1305 
poly1305_generic rmd128 rmd160 rmd256 rmd320 sha3_generic sm3_generic 
streebog_generic tgr192 wp512 xxhash_generic blowfish_generic blowfish_common 
cast5_generic des_generic libdes salsa20_generic chacha_generic 
camellia_generic cast6_generic cast_common serpent_generic twofish_generic 
twofish_common algif_skcipher aufs sch_etf sch_fq dccp_ipv6 dccp_ipv4 dccp 
ip6table_nat ip6_tables iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 algif_hash af_alg ip_vti ip6_vti fou6 sit ipip tunnel4 fou 
geneve act_mirred cls_basic esp6 authenc echainiv
  16:20:39 DEBUG| [stdout] [ 6521.208045]  iptable_filter xt_policy veth 
esp4_offload esp4 xfrm_user xfrm_algo macsec vxlan ip6_udp_tunnel udp_tunnel 
vrf 8021q garp mrp bridge stp llc ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel 
gre cls_u32 sch_htb dummy tls binfmt_misc af_packet_diag tcp_diag udp_diag 
raw_diag inet_diag iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel 
ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c 
crc32c_vpmsum virtio_blk virtio_net net_failover failover [last unloaded: 
trace_printk]
  16:20:39 DEBUG| [stdout] [ 6521.209360] CPU: 1 PID: 2647099 Comm: fuse_mnt 
Tainted: P   OE 5.4.0-9-generic #12-Ubuntu
  16:20:39 DEBUG| [stdout] [ 6521.209457] NIP:  c000c25ffce0 LR: 
c063f058 CTR: c000c25ffce0
  16:20:39 DEBUG| [stdout] [ 6521.209528] REGS: c00109703810 TRAP: 0400   
Tainted: P   OE  (5.4.0-9-generic)
  16:20:39 DEBUG| [stdout] [ 6521.209608] MSR:  800010009033 
  CR: 88002440  XER: 2000
  16:20:39 DEBUG| [stdout] [ 6521.209681] CFAR: c063f054 IRQMASK: 0 
  16:20:39 DEBUG| [stdout]GPR00: c063f034 
c00109703aa0 c1a4bb00 c0007cef3000 
  16:20:39 DEBUG| [stdout]GPR04: c000c25ffc18 
   
  16:20:39 DEBUG| [stdout]GPR08:  
   
  16:20:39 DEBUG| [stdout]GPR12: c000c25ffce0 
c0003fffee00 79b6987b4410  
  16:20:39 DEBUG| [stdout]GPR16: 79b698b3 
79b6987b0320 79b69771f240 79b6987b4420 
  16:20:39 DEBUG| [stdout]GPR20:  
 79b6880010a0 79b698a4d3a0 
  16:20:39 DEBUG| [stdout]GPR24: c00109d56cc0 
c001fde0cd8c c000c25ffce0 c00109d56ca0 
  16:20:39 DEBUG| [stdout]GPR28: c00109d56cc0 
 c0007cef3000 c00109d56c90 
  16:20:39 DEBUG| [stdout] [ 6521.210276] NIP [c000c25ffce0] 
0xc000c25ffce0
  16:20:39 DEBUG| [stdout] [ 6521.210355] LR [c063f058] 
fuse_request_end+0x128/0x2f0
  16:20:39 DEBUG| [stdout] [ 6521.210423] Call Trace:
  16:20:39 DEBUG| [stdout] [ 6521.210448] [c00109703aa0] [c063f034] 
fuse_request_end+0x104/0x2f0 (unreliable)
  16:20:39 DEBUG| [stdout] [ 6521.210520] [c00109703af0] [c0642ebc] 
fuse_dev_do_write+0x2cc/0x5c0
  16:20:39 DEBUG| [stdout] [ 6521.210591] [c00109703b70] [c0643654]

[Kernel-packages] [Bug 1856900] Re: stress-ng sysinfo stressor fails on ppc64el with linux 5.4.0-9.12

2019-12-18 Thread Colin Ian King

I've seen something very similar to this on this platform and I believe
it's a combination of previous regressions tests and the stress-ng
sysinfo test that triggers this.  Running the stress-ng stressor after a
clean boot won't trigger this issue.

** Changed in: linux (Ubuntu)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856900

Title:
  stress-ng sysinfo stressor fails on ppc64el with linux 5.4.0-9.12

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  During autopkgtest testing the sysinfo stressor failed, causing the
  kernel to oops.

  16:20:34 DEBUG| [stdout] sysinfo STARTING
  16:20:39 DEBUG| [stdout] sysinfo RETURNED 0
  16:20:39 DEBUG| [stdout] sysinfo FAILED (kernel oopsed)
  16:20:39 DEBUG| [stdout] [ 6521.203448] kernel tried to execute 
exec-protected page (c000c25ffce0) - exploit attempt? (uid: 0)
  16:20:39 DEBUG| [stdout] [ 6521.207260] BUG: Unable to handle kernel 
instruction fetch
  16:20:39 DEBUG| [stdout] [ 6521.207307] Faulting instruction address: 
0xc000c25ffce0
  16:20:39 DEBUG| [stdout] [ 6521.207367] Oops: Kernel access of bad area, sig: 
11 [#1]
  16:20:39 DEBUG| [stdout] [ 6521.207416] LE PAGE_SIZE=64K MMU=Hash SMP 
NR_CPUS=2048 NUMA pSeries
  16:20:39 DEBUG| [stdout] [ 6521.207481] Modules linked in: unix_diag sctp 
vhost_vsock vmw_vsock_virtio_transport_common vsock zfs(PO) zunicode(PO) 
zavl(PO) icp(PO) zlua(PO) userio zcommon(PO) znvpair(PO) cuse spl(O) kvm_pr kvm 
snd_seq snd_seq_device snd_timer snd soundcore hci_vhci bluetooth ecdh_generic 
ecc uhid hid vhost_net vhost tap atm algif_rng aegis128 algif_aead anubis 
fcrypt khazad seed sm4_generic tea crc32_generic md4 michael_mic nhpoly1305 
poly1305_generic rmd128 rmd160 rmd256 rmd320 sha3_generic sm3_generic 
streebog_generic tgr192 wp512 xxhash_generic blowfish_generic blowfish_common 
cast5_generic des_generic libdes salsa20_generic chacha_generic 
camellia_generic cast6_generic cast_common serpent_generic twofish_generic 
twofish_common algif_skcipher aufs sch_etf sch_fq dccp_ipv6 dccp_ipv4 dccp 
ip6table_nat ip6_tables iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 algif_hash af_alg ip_vti ip6_vti fou6 sit ipip tunnel4 fou 
geneve act_mirred cls_basic esp6 authenc echainiv
  16:20:39 DEBUG| [stdout] [ 6521.208045]  iptable_filter xt_policy veth 
esp4_offload esp4 xfrm_user xfrm_algo macsec vxlan ip6_udp_tunnel udp_tunnel 
vrf 8021q garp mrp bridge stp llc ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel 
gre cls_u32 sch_htb dummy tls binfmt_misc af_packet_diag tcp_diag udp_diag 
raw_diag inet_diag iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel 
ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c 
crc32c_vpmsum virtio_blk virtio_net net_failover failover [last unloaded: 
trace_printk]
  16:20:39 DEBUG| [stdout] [ 6521.209360] CPU: 1 PID: 2647099 Comm: fuse_mnt 
Tainted: P   OE 5.4.0-9-generic #12-Ubuntu
  16:20:39 DEBUG| [stdout] [ 6521.209457] NIP:  c000c25ffce0 LR: 
c063f058 CTR: c000c25ffce0
  16:20:39 DEBUG| [stdout] [ 6521.209528] REGS: c00109703810 TRAP: 0400   
Tainted: P   OE  (5.4.0-9-generic)
  16:20:39 DEBUG| [stdout] [ 6521.209608] MSR:  800010009033 
  CR: 88002440  XER: 2000
  16:20:39 DEBUG| [stdout] [ 6521.209681] CFAR: c063f054 IRQMASK: 0 
  16:20:39 DEBUG| [stdout]GPR00: c063f034 
c00109703aa0 c1a4bb00 c0007cef3000 
  16:20:39 DEBUG| [stdout]GPR04: c000c25ffc18 
   
  16:20:39 DEBUG| [stdout]GPR08:  
   
  16:20:39 DEBUG| [stdout]GPR12: c000c25ffce0 
c0003fffee00 79b6987b4410  
  16:20:39 DEBUG| [stdout]GPR16: 79b698b3 
79b6987b0320 79b69771f240 79b6987b4420 
  16:20:39 DEBUG| [stdout]GPR20:  
 79b6880010a0 79b698a4d3a0 
  16:20:39 DEBUG| [stdout]GPR24: c00109d56cc0 
c001fde0cd8c c000c25ffce0 c00109d56ca0 
  16:20:39 DEBUG| [stdout]GPR28: c00109d56cc0 
 c0007cef3000 c00109d56c90 
  16:20:39 DEBUG| [stdout] [ 6521.210276] NIP [c000c25ffce0] 
0xc000c25ffce0
  16:20:39 DEBUG| [stdout] [ 6521.210355] LR [c063f058] 
fuse_request_end+0x128/0x2f0
  16:20:39 DEBUG| [stdout] [ 6521.210423] Call Trace:
  16:20:39 DEBUG| [stdout] [ 6521.210448] [c00109703aa0] [c063f034] 
fuse_request_end+0x104/0x2f0 (unreliable)
  16:20:39 DEBUG| [

[Kernel-packages] [Bug 1856704] [NEW] backport 5.3 zfs support to bionic for HWE kernel support

2019-12-17 Thread Colin Ian King

Public bug reported:

5.3 kernel functionality back through to 4.15 is required for 5.3 HWE
kernel support in ZFS and SPL modules.

** Affects: spl-linux (Ubuntu)
 Importance: High
 Assignee: Colin Ian King (colin-king)
 Status: In Progress

** Affects: zfs-linux (Ubuntu)
 Importance: High
 Assignee: Colin Ian King (colin-king)
 Status: In Progress

** Also affects: spl-linux (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: spl-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: spl-linux (Ubuntu)
   Status: New => Incomplete

** Changed in: spl-linux (Ubuntu)
   Importance: Undecided => High

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => High

** Changed in: spl-linux (Ubuntu)
   Status: Incomplete => In Progress

** Changed in: zfs-linux (Ubuntu)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856704

Title:
  backport 5.3 zfs support to bionic for HWE kernel support

Status in spl-linux package in Ubuntu:
  In Progress
Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  5.3 kernel functionality back through to 4.15 is required for 5.3 HWE
  kernel support in ZFS and SPL modules.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1856084] Re: Livelock between ZFS evict and writeback threads

2019-12-16 Thread Colin Ian King

I've tested zfs from the -proposed pockets with the ubuntu ZFS autotest
regression tests:

ubuntu_zfs_fstest
ubuntu_zfs_smoke_test
ubuntu_zfs_stress
ubuntu_zfs_xfs_generic

All the following passed the regression testing.

bionic: 0.7.5-1ubuntu16.7 
disco:  0.7.12-1ubuntu5.1 
eoan:   0.8.1-1ubuntu14.3

I was unable to trip and lockups, so as far as I'm concerned I'm happy
for these updates to be released.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856084

Title:
  Livelock between ZFS evict and writeback threads

Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Disco:
  Fix Committed
Status in zfs-linux source package in Eoan:
  Fix Committed
Status in zfs-linux source package in Focal:
  Fix Released
Status in zfs-linux package in Debian:
  Unknown

Bug description:
  Livelock between ZFS evict and writeback threads

  [Impact]
  ZIO pipeline stalls, causing ZFS workloads to hang indefinitely

  [Description]
  For certain ZFS workloads, we start seeing hung task timeouts in the kernel 
logs due to zil_commit() stalling. This is due to zfs_zget() not detecting 
whether a znode has been marked for deletion before attempting to access it, 
causing a constant "retry loop" in zfs_get_data() if that znode has been 
unlinked already. An example of the stack traces follows:

  [72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [72742.070429] mysqld  D0  5713   2881 0x0320
  [72742.073220] Call Trace:
  [72742.075305]  __schedule+0x24e/0x880
  [72742.090436]  schedule+0x2c/0x80
  [72742.090438]  schedule_preempt_disabled+0xe/0x10
  [72742.090441]  __mutex_lock.isra.5+0x276/0x4e0
  [72742.090547]  ? dmu_tx_destroy+0x105/0x130 [zfs]
  [72742.090555]  __mutex_lock_slowpath+0x13/0x20
  [72742.115374]  ? __mutex_lock_slowpath+0x13/0x20
  [72742.132266]  mutex_lock+0x2f/0x40
  [72742.134207]  zil_commit_impl+0x1b0/0x1b30 [zfs]
  [72742.150428]  ? spl_kmem_alloc+0x115/0x180 [spl]
  [72742.152622]  ? mutex_lock+0x12/0x40
  [72742.154819]  ? zfs_refcount_add_many+0x9a/0x100 [zfs]
  [72742.171450]  zil_commit+0xde/0x150 [zfs]
  [72742.173687]  zfs_fsync+0x77/0xe0 [zfs]
  [72742.175044]  zpl_fsync+0x80/0x110 [zfs]
  [72742.191690]  vfs_fsync_range+0x51/0xb0
  [72742.193876]  do_fsync+0x3d/0x70
  [72742.195126]  SyS_fsync+0x10/0x20
  [72742.211059]  do_syscall_64+0x73/0x130
  [72742.214078]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

  It's possible to hit this issue due to a race between the ZFS evict
  and writeback threads. If the z_iput task is trying to evict a znode
  that's currently sitting in the writeback thread, both will "livelock"
  each other and stall the ZIO pipeline, causing other ZFS operations
  (such as zil_commit) to hang indefinitely.

  This has been documented and fixed upstream in PR#9583 [0]. We need to
  pull two fixes from upstream: the first one fixes the zfs_zget() issue
  in the writeback thread, while the second fixes a regression on
  O_TMPFILE descriptors caused by the first one.

  Upstream patches:
   - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8)
   - Check for unlinked znodes after igrab() (0c46813805f4)

  [Test Case]
  Being a race condition, this issue has been hard to reproduce consistently. 
The racing window between evict() and the ZFS writeback thread is quite strict, 
but users have reported this to show up after some hours of running 
LXD-containerized mySQL workloads.

  [Regression Potential]
  These patches have been tested both in the ZFS test suite and in production 
environments, so the potential for further regressions should be low.
  Additional regressions would likely cause issues with the ZFS 
writeback/commit and IO pipeline, so they should be spotted fairly quickly.

  [0] https://github.com/zfsonlinux/zfs/pull/9583
  [1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8
  [2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856084/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1856084] Re: Livelock between ZFS evict and writeback threads

2019-12-16 Thread Colin Ian King

*I was unable to trip any lockups

** Tags added: verification-done-bionic verification-done-disco
verification-done-eoan

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856084

Title:
  Livelock between ZFS evict and writeback threads

Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Disco:
  Fix Committed
Status in zfs-linux source package in Eoan:
  Fix Committed
Status in zfs-linux source package in Focal:
  Fix Released
Status in zfs-linux package in Debian:
  Unknown

Bug description:
  Livelock between ZFS evict and writeback threads

  [Impact]
  ZIO pipeline stalls, causing ZFS workloads to hang indefinitely

  [Description]
  For certain ZFS workloads, we start seeing hung task timeouts in the kernel 
logs due to zil_commit() stalling. This is due to zfs_zget() not detecting 
whether a znode has been marked for deletion before attempting to access it, 
causing a constant "retry loop" in zfs_get_data() if that znode has been 
unlinked already. An example of the stack traces follows:

  [72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [72742.070429] mysqld  D0  5713   2881 0x0320
  [72742.073220] Call Trace:
  [72742.075305]  __schedule+0x24e/0x880
  [72742.090436]  schedule+0x2c/0x80
  [72742.090438]  schedule_preempt_disabled+0xe/0x10
  [72742.090441]  __mutex_lock.isra.5+0x276/0x4e0
  [72742.090547]  ? dmu_tx_destroy+0x105/0x130 [zfs]
  [72742.090555]  __mutex_lock_slowpath+0x13/0x20
  [72742.115374]  ? __mutex_lock_slowpath+0x13/0x20
  [72742.132266]  mutex_lock+0x2f/0x40
  [72742.134207]  zil_commit_impl+0x1b0/0x1b30 [zfs]
  [72742.150428]  ? spl_kmem_alloc+0x115/0x180 [spl]
  [72742.152622]  ? mutex_lock+0x12/0x40
  [72742.154819]  ? zfs_refcount_add_many+0x9a/0x100 [zfs]
  [72742.171450]  zil_commit+0xde/0x150 [zfs]
  [72742.173687]  zfs_fsync+0x77/0xe0 [zfs]
  [72742.175044]  zpl_fsync+0x80/0x110 [zfs]
  [72742.191690]  vfs_fsync_range+0x51/0xb0
  [72742.193876]  do_fsync+0x3d/0x70
  [72742.195126]  SyS_fsync+0x10/0x20
  [72742.211059]  do_syscall_64+0x73/0x130
  [72742.214078]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

  It's possible to hit this issue due to a race between the ZFS evict
  and writeback threads. If the z_iput task is trying to evict a znode
  that's currently sitting in the writeback thread, both will "livelock"
  each other and stall the ZIO pipeline, causing other ZFS operations
  (such as zil_commit) to hang indefinitely.

  This has been documented and fixed upstream in PR#9583 [0]. We need to
  pull two fixes from upstream: the first one fixes the zfs_zget() issue
  in the writeback thread, while the second fixes a regression on
  O_TMPFILE descriptors caused by the first one.

  Upstream patches:
   - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8)
   - Check for unlinked znodes after igrab() (0c46813805f4)

  [Test Case]
  Being a race condition, this issue has been hard to reproduce consistently. 
The racing window between evict() and the ZFS writeback thread is quite strict, 
but users have reported this to show up after some hours of running 
LXD-containerized mySQL workloads.

  [Regression Potential]
  These patches have been tested both in the ZFS test suite and in production 
environments, so the potential for further regressions should be low.
  Additional regressions would likely cause issues with the ZFS 
writeback/commit and IO pipeline, so they should be spotted fairly quickly.

  [0] https://github.com/zfsonlinux/zfs/pull/9583
  [1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8
  [2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856084/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1856084] Re: Livelock between ZFS evict and writeback threads

2019-12-13 Thread Colin Ian King

I've checked that the zfs kernel driver builds and it passes the ZFS
regression tests. Patches look good, so I've uploaded these packages.

** Changed in: zfs-linux (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: zfs-linux (Ubuntu Disco)
   Importance: Undecided => Medium

** Changed in: zfs-linux (Ubuntu Eoan)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856084

Title:
  Livelock between ZFS evict and writeback threads

Status in zfs-linux package in Ubuntu:
  Confirmed
Status in zfs-linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Disco:
  Confirmed
Status in zfs-linux source package in Eoan:
  Confirmed
Status in zfs-linux source package in Focal:
  Confirmed
Status in zfs-linux package in Debian:
  Unknown

Bug description:
  Livelock between ZFS evict and writeback threads

  [Impact]
  ZIO pipeline stalls, causing ZFS workloads to hang indefinitely

  [Description]
  For certain ZFS workloads, we start seeing hung task timeouts in the kernel 
logs due to zil_commit() stalling. This is due to zfs_zget() not detecting 
whether a znode has been marked for deletion before attempting to access it, 
causing a constant "retry loop" in zfs_get_data() if that znode has been 
unlinked already. An example of the stack traces follows:

  [72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [72742.070429] mysqld  D0  5713   2881 0x0320
  [72742.073220] Call Trace:
  [72742.075305]  __schedule+0x24e/0x880
  [72742.090436]  schedule+0x2c/0x80
  [72742.090438]  schedule_preempt_disabled+0xe/0x10
  [72742.090441]  __mutex_lock.isra.5+0x276/0x4e0
  [72742.090547]  ? dmu_tx_destroy+0x105/0x130 [zfs]
  [72742.090555]  __mutex_lock_slowpath+0x13/0x20
  [72742.115374]  ? __mutex_lock_slowpath+0x13/0x20
  [72742.132266]  mutex_lock+0x2f/0x40
  [72742.134207]  zil_commit_impl+0x1b0/0x1b30 [zfs]
  [72742.150428]  ? spl_kmem_alloc+0x115/0x180 [spl]
  [72742.152622]  ? mutex_lock+0x12/0x40
  [72742.154819]  ? zfs_refcount_add_many+0x9a/0x100 [zfs]
  [72742.171450]  zil_commit+0xde/0x150 [zfs]
  [72742.173687]  zfs_fsync+0x77/0xe0 [zfs]
  [72742.175044]  zpl_fsync+0x80/0x110 [zfs]
  [72742.191690]  vfs_fsync_range+0x51/0xb0
  [72742.193876]  do_fsync+0x3d/0x70
  [72742.195126]  SyS_fsync+0x10/0x20
  [72742.211059]  do_syscall_64+0x73/0x130
  [72742.214078]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

  It's possible to hit this issue due to a race between the ZFS evict
  and writeback threads. If the z_iput task is trying to evict a znode
  that's currently sitting in the writeback thread, both will "livelock"
  each other and stall the ZIO pipeline, causing other ZFS operations
  (such as zil_commit) to hang indefinitely.

  This has been documented and fixed upstream in PR#9583 [0]. We need to
  pull two fixes from upstream: the first one fixes the zfs_zget() issue
  in the writeback thread, while the second fixes a regression on
  O_TMPFILE descriptors caused by the first one.

  Upstream patches:
   - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8)
   - Check for unlinked znodes after igrab() (0c46813805f4)

  [Test Case]
  Being a race condition, this issue has been hard to reproduce consistently. 
The racing window between evict() and the ZFS writeback thread is quite strict, 
but users have reported this to show up after some hours of running 
LXD-containerized mySQL workloads.

  [Regression Potential]
  These patches have been tested both in the ZFS test suite and in production 
environments, so the potential for further regressions should be low.
  Additional regressions would likely cause issues with the ZFS 
writeback/commit and IO pipeline, so they should be spotted fairly quickly.

  [0] https://github.com/zfsonlinux/zfs/pull/9583
  [1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8
  [2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856084/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1856084] Re: Livelock between ZFS evict and writeback threads

2019-12-11 Thread Colin Ian King

** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1856084

Title:
  Livelock between ZFS evict and writeback threads

Status in zfs-linux package in Ubuntu:
  Confirmed
Status in zfs-linux package in Debian:
  Unknown

Bug description:
  Livelock between ZFS evict and writeback threads

  [Impact]
  ZIO pipeline stalls, causing ZFS workloads to hang indefinitely

  [Description]
  For certain ZFS workloads, we start seeing hung task timeouts in the kernel 
logs due to zil_commit() stalling. This is due to zfs_zget() not detecting 
whether a znode has been marked for deletion before attempting to access it, 
causing a constant "retry loop" in zfs_get_data() if that znode has been 
unlinked already. An example of the stack traces follows:

  [72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [72742.070429] mysqld  D0  5713   2881 0x0320
  [72742.073220] Call Trace:
  [72742.075305]  __schedule+0x24e/0x880
  [72742.090436]  schedule+0x2c/0x80
  [72742.090438]  schedule_preempt_disabled+0xe/0x10
  [72742.090441]  __mutex_lock.isra.5+0x276/0x4e0
  [72742.090547]  ? dmu_tx_destroy+0x105/0x130 [zfs]
  [72742.090555]  __mutex_lock_slowpath+0x13/0x20
  [72742.115374]  ? __mutex_lock_slowpath+0x13/0x20
  [72742.132266]  mutex_lock+0x2f/0x40
  [72742.134207]  zil_commit_impl+0x1b0/0x1b30 [zfs]
  [72742.150428]  ? spl_kmem_alloc+0x115/0x180 [spl]
  [72742.152622]  ? mutex_lock+0x12/0x40
  [72742.154819]  ? zfs_refcount_add_many+0x9a/0x100 [zfs]
  [72742.171450]  zil_commit+0xde/0x150 [zfs]
  [72742.173687]  zfs_fsync+0x77/0xe0 [zfs]
  [72742.175044]  zpl_fsync+0x80/0x110 [zfs]
  [72742.191690]  vfs_fsync_range+0x51/0xb0
  [72742.193876]  do_fsync+0x3d/0x70
  [72742.195126]  SyS_fsync+0x10/0x20
  [72742.211059]  do_syscall_64+0x73/0x130
  [72742.214078]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

  It's possible to hit this issue due to a race between the ZFS evict
  and writeback threads. If the z_iput task is trying to evict a znode
  that's currently sitting in the writeback thread, both will "livelock"
  each other and stall the ZIO pipeline, causing other ZFS operations
  (such as zil_commit) to hang indefinitely.

  This has been documented and fixed upstream in PR#9583 [0]. We need to
  pull two fixes from upstream: the first one fixes the zfs_zget() issue
  in the writeback thread, while the second fixes a regression on
  O_TMPFILE descriptors caused by the first one.

  Upstream patches:
   - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8)
   - Check for unlinked znodes after igrab() (0c46813805f4)

  [Test Case]
  Being a race condition, this issue has been hard to reproduce consistently. 
The racing window between evict() and the ZFS writeback thread is quite strict, 
but users have reported this to show up after some hours of running 
LXD-containerized mySQL workloads.

  [Regression Potential]
  These patches have been tested both in the ZFS test suite and in production 
environments, so the potential for further regressions should be low.
  Additional regressions would likely cause issues with the ZFS 
writeback/commit and IO pipeline, so they should be spotted fairly quickly.

  [0] https://github.com/zfsonlinux/zfs/pull/9583
  [1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8
  [2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856084/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1824407] Re: remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files

2019-12-08 Thread Colin Ian King

Tested with 5.3.0-25-generic #27-Ubuntu with the regression test and it
now works fine. Marking bug as verification-done for eoan

** Tags removed: verification-needed-eoan
** Tags added: verification-done-eoan

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824407

Title:
  remount of multilower moved pivoted-root overlayfs root, results in
  I/O errors on some modified files

Status in linux package in Ubuntu:
  In Progress
Status in linux-hwe package in Ubuntu:
  Invalid
Status in linux-hwe source package in Bionic:
  In Progress
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in Focal:
  In Progress

Bug description:
  == SRU Justification Disco, Eoan, Focal ==

  Multiple squashfs filesystems with overlayfs cause file corruption issues
  when modifying zero sized files

  == Fix ==

  The current fix is pending in
  
https://github.com/amir73il/linux/commit/b2d4f0ea5af42e16e154254de99da064f3ac551a

  == Test case ==

  With an Ubuntu ISO on the cdrom drive, use:

  #!/bin/bash -x
  mkdir -p /cdrom
  mount -t iso9660 -o ro,noatime /dev/sr0 /cdrom
  sleep 1
  mkdir -p /cow
  mount -t tmpfs -o 'rw,noatime,mode=755' tmpfs /cow
  sleep 1
  mkdir -p /cow/upper
  mkdir -p /cow/work
  modprobe -q -b overlay
  sleep 1
  modprobe -q -b loop
  sleep 1
  dev=$(losetup -f)
  mkdir -p /filesystem.squashfs
  losetup $dev /cdrom/casper/filesystem.squashfs
  mount -t squashfs -o ro,noatime $dev /filesystem.squashfs
  sleep 1

  dev=$(losetup -f)
  mkdir -p /installer.squashfs
  losetup $dev /cdrom/casper/installer.squashfs
  mount -t squashfs -o ro,noatime $dev /installer.squashfs
  sleep 1

  mkdir -p /root-tmp
  mount -t overlay -o 
'upperdir=/cow/upper,lowerdir=/installer.squashfs:/filesystem.squashfs,workdir=/cow/work'
 /cow /root-tmp

  FILE=/root-tmp/etc/.pwd.lock

  echo foo > $FILE
  cat $FILE
  sync
  #
  # dropping caches or remounting causes the bug
  #
  echo 3 > /proc/sys/vm/drop_caches
  cat $FILE

  Without the fix the cat of the file will produce an error. With the
  the cat will work correctly.

  == Regression Potential ==

  There is an unhandled corner case:
  - two filesystems, A and B, both have null uuid
  - upper layer is on A
  - lower layer 1 is also on A
  - lower layer 2 is on B

  However, since this is an issue without the fix and will be addressed
  later with subsequent fixes once they are OK with upstream I think the
  risk is minimal considering nobody is complaining about these corner
  cases with the current broken overlayfs squashfs layering.

  ---

  1) Download focal subiquity pending image, or eoan release image
  2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI)
  3) After --- insert the following options

     break=top debug init=/bin/bash

  4) Continue boot (Enter in BIOS, ctrl+x in UEFI)
  5) in the initramfs execute:

  rm /scripts/casper-bottom/25adduser
  exit

  6) you will be dropped into pivoted root filesystem, before systemd is execed 
as pid one
  7) /run/initramfs/ will contain a debug log, showing how everything was 
mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower 
overlay setup from them, moved to /root, and then pivot-root to /root done to 
finally end up as /. Underlying layers are moved into /cow for your convenience.

  8) At this point modifying zero-byte length files, that exist in the
  lowest layer, but not the middle one, in certain ways, will results in
  them to be corrupted, after / is remounted.

  9) Corruption examples

  (On both focal & eoan)

  cat /etc/.pwd.lock
  systemd-sysusers
  cat /etc/.pwd.lock
  mount -o remount /
  cat /etc/.pwd.lock
  overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000)
  cat: /etc/.pwd.lock: Input/output error

  (Only on eoan)

  cat /etc/machine-id
  systemd-machine-id-setup
  cat /etc/machine-id
  mount -o remount /
  cat /etc/machine-id
  overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000)
  cat: /etc/machine-id: Input/output error

  Lots of things break once machine-id and .pwd.lock are corrupted. I.e.
  unable to dhcp, connect to dbus, add/remove/change users or groups,
  etc.

  We were unable to recreate the issue outside of booting things with
  casper. Ie. statically on a regular host machine without pivot-root.
  But hopefully booting to a quite state with nothing running is
  sufficient to reproduce this.

  Instead of booting with `bebroken init=/bin/bash` you can boot with
  `bebroken systemd.mask=systemd-remount-fs.service` this will complete
  the boot, with /etc/machine-id & .pwd.lock modified, meaning that
  remount of / will cause IO errors on those files.

  Currently, we are shipping two hacks in casper's 25adduser script to
  "rm" the offending files, and

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2019-12-06 Thread Colin Ian King

** Changed in: linux (Ubuntu)
   Importance: High => Low

** Changed in: linux (Ubuntu)
   Status: Incomplete => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Triaged

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:37 DEBUG| [stdout] [ 3738.474354] unregister_netdevice: waiting

[Kernel-packages] [Bug 1822133] Re: Azure Instance never recovered during series of instance reboots.

2019-12-06 Thread Colin Ian King

Indeed, the commit is in in 4.15.0-1057 and has been released. Marking
this bug as fixed released.

commit b502cfeffec81be8564189e5498fd3f252b27900
Author: Taehee Yoo 
Date:   Wed Sep 4 14:40:49 2019 -0300

ip: frags: fix crash in ip_do_fragment()

BugLink: https://bugs.launchpad.net/bugs/1842447

commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream

A kernel crash occurrs when defragmented packet is fragmented
in ip_do_fragment().
In defragment routine, skb_orphan() is called and
skb->ip_defrag_offset is set. but skb->sk and
skb->ip_defrag_offset are same union member. so that
frag->sk is not NULL.
Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
defragmented packet is fragmented.


** Changed in: linux-azure (Ubuntu)
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1822133

Title:
  Azure Instance never recovered during series of instance reboots.

Status in linux-azure package in Ubuntu:
  Fix Released

Bug description:
  Description: During SRU Testing of various Azure Instances, there will
  be some cases where the instance will not respond following a system
  reboot.  SRU Testing only restarts a giving instance once, after it
  preps all of the necessary files to-be-tested.

  Series: Disco
  Instance Size: Basic_A3
  Region: (Default) US-WEST-2
  Kernel Version: 4.18.0-1013-azure #13-Ubuntu SMP Thu Feb 28 22:54:16 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  I initiated a series of tests which rebooted Azure Cloud instances 50
  times. During the 49th Reboot, an Instance failed to return from a
  reboot.. Upon grabbing the console output the following was seen
  scrolling endlessly. I have seen this failure in cases where the
  instance only restarted a handful of times >5

  [84.247704]hyperv_fb: unable to send packet via vmbus
  [84.247704]hyperv_fb: unable to send packet via vmbus
  [84.247704]hyperv_fb: unable to send packet via vmbus
  [84.247704]hyperv_fb: unable to send packet via vmbus
  [84.247704]hyperv_fb: unable to send packet via vmbus
  [84.247704]hyperv_fb: unable to send packet via vmbus
  [84.247704]hyperv_fb: unable to send packet via vmbus
  [84.247704]hyperv_fb: unable to send packet via vmbus

  In another test attempt I saw the following failure:

  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes
  ERROR ExtHandler /proc/net/route contains no routes

  
  Both of these failures broke networking, Both of these failures were seen at 
least twice to three times, thus may explain why in some cases we never recover 
from an instance reboot.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1822133/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1854959] Re: stress-ng sysinfo stressor trips kernel oops on ppc64el with 5.4.0.7-8

2019-12-06 Thread Colin Ian King

*** This bug is a duplicate of bug 1854968 ***
https://bugs.launchpad.net/bugs/1854968

Same root corruption issue as bug 1854968

** This bug has been marked a duplicate of bug 1854968
   stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854959

Title:
  stress-ng sysinfo stressor trips kernel oops on ppc64el with 5.4.0.7-8

Status in linux package in Ubuntu:
  In Progress

Bug description:
  stress-ng on ppc64el with 5.4.0.7-8, sysinfo stressor seems to tickle
  a bug:

  06:26:02 DEBUG| [stdout] sysinfo FAILED (kernel oopsed)
  06:26:02 DEBUG| [stdout] [ 7262.965483] kernel tried to execute 
exec-protected page (c00017407ce0) - exploit attempt? (uid: 0)
  06:26:02 DEBUG| [stdout] [ 7262.968030] BUG: Unable to handle kernel 
instruction fetch
  06:26:02 DEBUG| [stdout] [ 7262.968121] Faulting instruction address: 
0xc00017407ce0
  06:26:02 DEBUG| [stdout] [ 7262.968224] Oops: Kernel access of bad area, sig: 
11 [#1]
  06:26:02 DEBUG| [stdout] [ 7262.968292] LE PAGE_SIZE=64K MMU=Hash SMP 
NR_CPUS=2048 NUMA pSeries
  06:26:02 DEBUG| [stdout] [ 7262.968403] Modules linked in: unix_diag sctp 
zfs(PO) zunicode(PO) zavl(PO) icp(PO) zlua(PO) zcommon(PO) znvpair(PO) spl(O) 
snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock 
vmw_vsock_virtio_transport_common vsock kvm_pr kvm hci_vhci bluetooth 
ecdh_generic ecc userio uhid hid vhost_net vhost tap cuse dccp_ipv4 dccp psnap 
llc algif_rng aegis128 algif_aead anubis fcrypt khazad seed sm4_generic tea 
crc32_generic md4 michael_mic nhpoly1305 poly1305_generic rmd128 rmd160 rmd256 
rmd320 sha3_generic sm3_generic streebog_generic tgr192 wp512 xxhash_generic 
algif_hash blowfish_generic blowfish_common cast5_generic des_generic libdes 
salsa20_generic chacha_generic camellia_generic cast6_generic cast_common 
serpent_generic twofish_generic twofish_common algif_skcipher af_alg aufs 
binfmt_misc af_packet_diag tcp_diag udp_diag raw_diag inet_diag iptable_mangle 
xt_TCPMSS xt_tcpudp bpfilter dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
vmx_crypto crct10dif_vpmsum sch_fq_codel ip_tables
  06:26:02 DEBUG| [stdout] [ 7262.969078]  x_tables autofs4 btrfs xor 
zstd_compress raid6_pq libcrc32c crc32c_vpmsum virtio_net virtio_blk 
net_failover failover [last unloaded: trace_printk]
  06:26:02 DEBUG| [stdout] [ 7262.970416] CPU: 1 PID: 2613531 Comm: fuse_mnt 
Tainted: P   OE 5.4.0-7-generic #8-Ubuntu
  06:26:02 DEBUG| [stdout] [ 7262.970532] NIP:  c00017407ce0 LR: 
c063e968 CTR: c00017407ce0
  06:26:02 DEBUG| [stdout] [ 7262.970623] REGS: c001d8393810 TRAP: 0400   
Tainted: P   OE  (5.4.0-7-generic)
  06:26:02 DEBUG| [stdout] [ 7262.970737] MSR:  800010009033 
  CR: 88002440  XER: 2000
  06:26:02 DEBUG| [stdout] [ 7262.970850] CFAR: c063e964 IRQMASK: 0 
  06:26:02 DEBUG| [stdout]GPR00: c063e944 
c001d8393aa0 c1a5bf00 c0003d95ec00 
  06:26:02 DEBUG| [stdout]GPR04: c00017407c18 
   
  06:26:02 DEBUG| [stdout]GPR08:  
   
  06:26:02 DEBUG| [stdout]GPR12: c00017407ce0 
c0003fffee00 7c8ab4814410  
  06:26:02 DEBUG| [stdout]GPR16: 7c8ab4b9 
7c8ab4810320 7c8ab2f6f240 7c8ab4814420 
  06:26:02 DEBUG| [stdout]GPR20:  
 7c8aa8000b60 7c8ab4aad3a0 
  06:26:02 DEBUG| [stdout]GPR24: c001f38f7da0 
c001fbb81e4c c00017407ce0 c001f38f7d80 
  06:26:02 DEBUG| [stdout]GPR28: c001f38f7da0 
 c0003d95ec00 c001f38f7d70 
  06:26:02 DEBUG| [stdout] [ 7262.971713] NIP [c00017407ce0] 
0xc00017407ce0
  06:26:02 DEBUG| [stdout] [ 7262.971804] LR [c063e968] 
fuse_request_end+0x128/0x2f0
  06:26:02 DEBUG| [stdout] [ 7262.971893] Call Trace:
  06:26:02 DEBUG| [stdout] [ 7262.971930] [c001d8393aa0] [c063e944] 
fuse_request_end+0x104/0x2f0 (unreliable)
  06:26:02 DEBUG| [stdout] [ 7262.972035] [c001d8393af0] [c06427cc] 
fuse_dev_do_write+0x2cc/0x5c0
  06:26:02 DEBUG| [stdout] [ 7262.972138] [c001d8393b70] [c0642f64] 
fuse_dev_write+0x74/0xd0
  06:26:02 DEBUG| [stdout] [ 7262.972221] [c001d8393c00] [c04702b0] 
do_iter_readv_writev+0x240/0x290
  06:26:02 DEBUG| [stdout] [ 7262.972334] [c001d8393c70] [c0472bc8] 
do_iter_write+0xc8/0x280
  06:26:02 DEBUG| [stdout] [ 7262.972424] [c001d8393cc0] [c0472e90] 
vfs_writev+0xe0/0x180
  06:26:02 DEBUG| [stdout] [ 7262.972508] [c001d8393dc0] [c0472fcc] 
do_writev+0x9c/0x1a0
  06:26:02 DEBUG| [stdout] [ 7262.972588] [c001d8393e20]

[Kernel-packages] [Bug 1855151] Re: adt bpf tests crash 5.4.0-7 on ppc64el on power box

2019-12-06 Thread Colin Ian King

*** This bug is a duplicate of bug 1854968 ***
https://bugs.launchpad.net/bugs/1854968

Same root issue as bug 1854968

** This bug has been marked a duplicate of bug 1854968
   stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855151

Title:
  adt bpf tests crash 5.4.0-7 on ppc64el on power box

Status in linux package in Ubuntu:
  In Progress

Bug description:
  Running the ADT tests on a power box, the bpf tests crash the kernel
  as follows:

  [ 2745.079592] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
  [ 2745.079808] Faulting instruction address: 0x
  [ 2745.079824] Oops: Kernel access of bad area, sig: 11 [#1]
  [ 2745.079993] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
  [ 2745.080011] Modules linked in: af_packet_diag tcp_diag udp_diag raw_diag 
inet_diag binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev 
input_leds mac_hid ofpart 
  cmdlinepart powernv_flash mtd ibmpowernv at24 uio_pdrv_genirq uio 
ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd powernv_rng vmx_crypto 
sch_fq_codel ip_tables x_tables autofs4 bt
  rfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid hid ast drm_vram_he
  lper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm tg3 ahci libahci 
drm_panel_orientation_quirks [last unloaded: no
  tifier_error_inject]
  [ 2745.080195] CPU: 0 PID: 366 Comm: reuseport_bpf_c Not tainted 
5.4.0-7-generic #8
  [ 2745.080214] NIP:   LR: c0ce8710 CTR: 

  [ 2745.080233] REGS: c007ff6eb550 TRAP: 0400   Not tainted  
(5.4.0-7-generic)
  [ 2745.080250] MSR:  900040009033   CR: 24002282 
 XER: 2000
  [ 2745.080272] CFAR: c000de44 IRQMASK: 0 
  [ 2745.080272] GPR00: c0d67c9c c007ff6eb7e0 c1a5bf00 
c004258e10e0 
  [ 2745.080272] GPR04: c00802830038 c004258e10e0 0028 
e3c2 
  [ 2745.080272] GPR08:    
 
  [ 2745.080272] GPR12:  c1cf  
0001 
  [ 2745.080272] GPR16: 22b8 017f e3c2 
017f 
  [ 2745.080272] GPR20: c198c100   
22b8 
  [ 2745.080272] GPR24:  0028 0080 
017f 
  [ 2745.080272] GPR28: c0080283 18ed5e01 c004258e10e0 
c0075f0ff000 
  [ 2745.080409] NIP [] 0x0
  [ 2745.080423] LR [c0ce8710] reuseport_select_sock+0x100/0x400
  [ 2745.080439] Call Trace:
  [ 2745.080448] [c007ff6eb7e0] [c007ff6eb8a0] 0xc007ff6eb8a0 
(unreliable)
  [ 2745.080469] [c007ff6eb880] [c0d67c9c] 
inet_lhash2_lookup+0x1ec/0x220
  [ 2745.080490] [c007ff6eb900] [c0d6849c] 
__inet_lookup_listener+0x1ec/0x1f0
  [ 2745.080509] [c007ff6eb9d0] [c0d96608] tcp_v4_rcv+0x6e8/0xe70
  [ 2745.080527] [c007ff6ebb00] [c0d5a480] 
ip_protocol_deliver_rcu+0x60/0x2b0
  [ 2745.080547] [c007ff6ebb50] [c0d5a740] 
ip_local_deliver_finish+0x70/0x90
  [ 2745.080566] [c007ff6ebb70] [c0d5a7ec] 
ip_local_deliver+0x8c/0x140
  [ 2745.080585] [c007ff6ebbe0] [c0d59aec] ip_rcv_finish+0xbc/0xf0
  [ 2745.080602] [c007ff6ebc20] [c0d5a9a0] ip_rcv+0x100/0x110
  [ 2745.080619] [c007ff6ebca0] [c0cab220] 
__netif_receive_skb_one_core+0x70/0xb0
  [ 2745.080638] [c007ff6ebce0] [c0cac4f0] 
process_backlog+0xd0/0x230
  [ 2745.080657] [c007ff6ebd50] [c0cadc68] net_rx_action+0x1e8/0x520
  [ 2745.080674] [c007ff6ebe70] [c0ee2a7c] __do_softirq+0x15c/0x3b8
  [ 2745.080692] [c007ff6ebf90] [c0030678] call_do_softirq+0x14/0x24
  [ 2745.080709] [c0070656f7c0] [c001bf58] 
do_softirq_own_stack+0x38/0x50
  [ 2745.080729] [c0070656f7e0] [c0143d60] 
do_softirq.part.0+0x80/0xb0
  [ 2745.080914] [c0070656f810] [c0143e54] 
__local_bh_enable_ip+0xc4/0xf0
  [ 2745.080933] [c0070656f830] [c0d5f8fc] 
ip_finish_output2+0x1fc/0x740
  [ 2745.080953] [c0070656f8d0] [c0d61fe4] ip_output+0xd4/0x190
  [ 2745.080971] [c0070656f960] [c0d61444] ip_local_out+0x64/0x90
  [ 2745.080988] [c0070656f9a0] [c0d61838] 
__ip_queue_xmit+0x168/0x4d0
  [ 2745.081007] [c0070656fa30] [c0d90a3c] ip_queue_xmit+0x1c/0x30
  [ 2745.081024] [c0070656fa50] [c0d887e4] 
__tcp_transmit_skb+0x574/0xda0
  [ 2745.081044] [c0070656fb00] [c0d89a88] tcp_connect+0x4b8/0x600
  [ 2745.081060] [c0070656fbb0]

[Kernel-packages] [Bug 1824407] Re: remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files

2019-12-06 Thread Colin Ian King

Hrm, I can't see the fix in the Ubuntu-5.3.0-24.26 kernel, so I think
comment #34 a premature SRU test request. As it stands, I tested
Ubuntu-5.3.0-24.26 and the issue still exists, and looking at the source
the fix isn't present so that correlates with my test observations.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824407

Title:
  remount of multilower moved pivoted-root overlayfs root, results in
  I/O errors on some modified files

Status in linux package in Ubuntu:
  In Progress
Status in linux-hwe package in Ubuntu:
  Invalid
Status in linux-hwe source package in Bionic:
  In Progress
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in Focal:
  In Progress

Bug description:
  == SRU Justification Disco, Eoan, Focal ==

  Multiple squashfs filesystems with overlayfs cause file corruption issues
  when modifying zero sized files

  == Fix ==

  The current fix is pending in
  
https://github.com/amir73il/linux/commit/b2d4f0ea5af42e16e154254de99da064f3ac551a

  == Test case ==

  With an Ubuntu ISO on the cdrom drive, use:

  #!/bin/bash -x
  mkdir -p /cdrom
  mount -t iso9660 -o ro,noatime /dev/sr0 /cdrom
  sleep 1
  mkdir -p /cow
  mount -t tmpfs -o 'rw,noatime,mode=755' tmpfs /cow
  sleep 1
  mkdir -p /cow/upper
  mkdir -p /cow/work
  modprobe -q -b overlay
  sleep 1
  modprobe -q -b loop
  sleep 1
  dev=$(losetup -f)
  mkdir -p /filesystem.squashfs
  losetup $dev /cdrom/casper/filesystem.squashfs
  mount -t squashfs -o ro,noatime $dev /filesystem.squashfs
  sleep 1

  dev=$(losetup -f)
  mkdir -p /installer.squashfs
  losetup $dev /cdrom/casper/installer.squashfs
  mount -t squashfs -o ro,noatime $dev /installer.squashfs
  sleep 1

  mkdir -p /root-tmp
  mount -t overlay -o 
'upperdir=/cow/upper,lowerdir=/installer.squashfs:/filesystem.squashfs,workdir=/cow/work'
 /cow /root-tmp

  FILE=/root-tmp/etc/.pwd.lock

  echo foo > $FILE
  cat $FILE
  sync
  #
  # dropping caches or remounting causes the bug
  #
  echo 3 > /proc/sys/vm/drop_caches
  cat $FILE

  Without the fix the cat of the file will produce an error. With the
  the cat will work correctly.

  == Regression Potential ==

  There is an unhandled corner case:
  - two filesystems, A and B, both have null uuid
  - upper layer is on A
  - lower layer 1 is also on A
  - lower layer 2 is on B

  However, since this is an issue without the fix and will be addressed
  later with subsequent fixes once they are OK with upstream I think the
  risk is minimal considering nobody is complaining about these corner
  cases with the current broken overlayfs squashfs layering.

  ---

  1) Download focal subiquity pending image, or eoan release image
  2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI)
  3) After --- insert the following options

     break=top debug init=/bin/bash

  4) Continue boot (Enter in BIOS, ctrl+x in UEFI)
  5) in the initramfs execute:

  rm /scripts/casper-bottom/25adduser
  exit

  6) you will be dropped into pivoted root filesystem, before systemd is execed 
as pid one
  7) /run/initramfs/ will contain a debug log, showing how everything was 
mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower 
overlay setup from them, moved to /root, and then pivot-root to /root done to 
finally end up as /. Underlying layers are moved into /cow for your convenience.

  8) At this point modifying zero-byte length files, that exist in the
  lowest layer, but not the middle one, in certain ways, will results in
  them to be corrupted, after / is remounted.

  9) Corruption examples

  (On both focal & eoan)

  cat /etc/.pwd.lock
  systemd-sysusers
  cat /etc/.pwd.lock
  mount -o remount /
  cat /etc/.pwd.lock
  overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000)
  cat: /etc/.pwd.lock: Input/output error

  (Only on eoan)

  cat /etc/machine-id
  systemd-machine-id-setup
  cat /etc/machine-id
  mount -o remount /
  cat /etc/machine-id
  overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000)
  cat: /etc/machine-id: Input/output error

  Lots of things break once machine-id and .pwd.lock are corrupted. I.e.
  unable to dhcp, connect to dbus, add/remove/change users or groups,
  etc.

  We were unable to recreate the issue outside of booting things with
  casper. Ie. statically on a regular host machine without pivot-root.
  But hopefully booting to a quite state with nothing running is
  sufficient to reproduce this.

  Instead of booting with `bebroken init=/bin/bash` you can boot with
  `bebroken systemd.mask=systemd-remount-fs.service` this will complete
  the boot, with /etc/machine-id & .pwd.lock modified, meaning that
  remount of / will cause IO errors on those files.

  Currently, we are shipping two hacks

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2019-12-05 Thread Colin Ian King

See: https://lkml.org/lkml/2019/12/5/476

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:37 DEBUG| [stdout] [ 3738.474354] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:48 DEBUG| [stdout] [ 3748.734396]

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2019-12-05 Thread Colin Ian King

This goes right back to 4.6.x:

.6.7 crash (see below)
4.7.10 crash in xfrm6_dst_ifdown
4.8.17 crash in xfrm6_dst_ifdown
4.12.14 crash (see below)
4.13.16 reports "unregister_netdevice: waiting for eth0 to become free. Usage 
count = 2"
4.14.157 reports "unregister_netdevice: waiting for eth0 to become free. Usage 
count = 2""
4.15.18 .. 5.4 hangs on socket() call

4.6.7:
[   34.457967] BUG: scheduling while atomic: kworker/u8:0/6/0x0200
[   34.458021] Modules linked in: esp6 xfrm6_mode_transport drbg ansi_cprng 
seqiv esp4 xfrm4_mode_transport xfrm_user xfrm_algo l2tp_ip6 l2tp_eth l2tp_ip 
l2tp_netlink veth l2tp_core ip6_udp_tunnel udp_tunnel squashfs binfmt_misc 
dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev kvm_intel kvm 
irqbypass joydev input_leds snd_hda_codec_generic serio_raw snd_hda_intel 
snd_hda_codec parport_pc 8250_fintek parport snd_hda_core qemu_fw_cfg snd_hwdep 
snd_pcm snd_timer mac_hid snd soundcore sch_fq_codel virtio_rng ip_tables 
x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 
multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel qxl ttm 
drm_kms_helper syscopyarea sysfillrect aesni_intel sysimgblt
[   34.458086]  fb_sys_fops aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd i2c_piix4 drm psmouse pata_acpi floppy
[   34.458100] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 
4.6.7-040607-generic #201608160432
[   34.458103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[   34.458131] Workqueue: netns cleanup_net
[   34.458135]  0286 2fa171e7 88007c8e7ab8 
813f7594
[   34.458139]  88007fc96b80 7fff 88007c8e7ac8 
810a8f6b
[   34.458143]  88007c8e7b18 8184905b 00ff88007c8e7ae8 
8106463e
[   34.458147] Call Trace:
[   34.458161]  [] dump_stack+0x63/0x8f
[   34.458166]  [] __schedule_bug+0x4b/0x60
[   34.458185]  [] __schedule+0x5eb/0x7a0
[   34.458191]  [] ? kvm_sched_clock_read+0x1e/0x30
[   34.458195]  [] schedule+0x35/0x80
[   34.458203]  [] schedule_timeout+0x1b2/0x270
[   34.458207]  [] ? __schedule+0x304/0x7a0
[   34.458212]  [] wait_for_completion+0xb3/0x140
[   34.458217]  [] ? wake_up_q+0x70/0x70
[   34.458226]  [] __wait_rcu_gp+0xc8/0xf0
[   34.458231]  [] synchronize_sched.part.58+0x38/0x50
[   34.458235]  [] ? call_rcu_bh+0x20/0x20
[   34.458239]  [] ? 
trace_raw_output_rcu_utilization+0x60/0x60
[   34.458244]  [] synchronize_sched+0x33/0x40
[   34.458251]  [] __l2tp_session_unhash+0xd1/0xe0 [l2tp_core]
[   34.458256]  [] l2tp_tunnel_closeall+0x9e/0x140 [l2tp_core]
[   34.458261]  [] l2tp_tunnel_delete+0x19/0x70 [l2tp_core]
[   34.458265]  [] l2tp_exit_net+0x4b/0x80 [l2tp_core]
[   34.458269]  [] ops_exit_list.isra.4+0x38/0x60
[   34.458273]  [] cleanup_net+0x1c4/0x2a0
[   34.458281]  [] process_one_work+0x1fc/0x490
[   34.458285]  [] worker_thread+0x4b/0x500
[   34.458290]  [] ? process_one_work+0x490/0x490
[   34.458293]  [] kthread+0xd8/0xf0
[   34.458298]  [] ret_from_fork+0x22/0x40
[   34.458302]  [] ? kthread_create_on_node+0x1b0/0x1b0
[   34.514067] [ cut here ]

4.12.14:
[   20.760253] [ cut here ]
[   20.760256] kernel BUG at /home/kernel/COD/linux/net/ipv6/xfrm6_policy.c:265!
[   20.760299] invalid opcode:  [#1] SMP
[   20.760320] Modules linked in: appletalk psnap llc esp6 xfrm6_mode_transport 
esp4 xfrm4_mode_transport xfrm_user xfrm_algo l2tp_ip6 l2tp_eth l2tp_ip 
l2tp_netlink veth l2tp_core ip6_udp_tunnel udp_tunnel binfmt_misc dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev ppdev snd_hda_codec_generic 
kvm_intel kvm irqbypass snd_hda_intel snd_hda_codec snd_hda_core input_leds 
snd_hwdep serio_raw snd_pcm snd_timer hid_generic snd soundcore parport_pc 
parport mac_hid qemu_fw_cfg sch_fq_codel virtio_rng ip_tables x_tables autofs4 
usbhid hid btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 
crypto_simd qxl glue_helper ttm cryptd drm_kms_helper psmouse
[   20.760677]  syscopyarea sysfillrect virtio_blk sysimgblt fb_sys_fops drm 
floppy virtio_net i2c_piix4 pata_acpi
[   20.760731] CPU: 3 PID: 49 Comm: kworker/u8:1 Not tainted 
4.12.14-041214-generic #201709200843
[   20.760772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-1 04/01/2014
[   20.760814] Workqueue: netns cleanup_net
[   20.760836] task: 8aa4bcbbad00 task.stack: 9dc5804c
[   20.760867] RIP: 0010:xfrm6_dst_ifdown+0xa0/0xb0
[   20.760890] RSP: 0018:9dc5804c3be0 EFLAGS: 00010246
[   20.760916] RAX: 8aa4b6e6a000 RBX: 8aa4bc1b3500 RCX: 
[   20.760950] RDX: 0001 RSI: 8aa4b6f39000 RDI: 8aa4bc1b3500
[   20.760984] RBP: 9dc5804c3c08 R08:

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2019-12-05 Thread Colin Ian King

Ah, fails on 5.2.0-15-generic, 5.3.0-18 generic too. Appears that the
regression test was enabled quite recently:

commit b5b9181c2403025b2c7ae7ea44333fd8fe6dbb54 (between 5.4-rc3 and 5.4-rc4)
Author: David Ahern 
Date:   Mon Oct 21 19:02:43 2019 -0600

selftests: Make l2tp.sh executable

commit e858ef1cd4bc1bdfcd18114a8195236e336cee42 (between 5.4-rc3 and 5.4-rc4)
Author: David Ahern 
Date:   Mon Aug 5 15:41:37 2019 -0700

Since this breaks in 5.3 then this issue is in eoan and hence is not a
focal regression.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo 
to become free. Usage count = 1

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2019-12-05 Thread Colin Ian King

Occurs between 5.3 and 5.4-rc1

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:37 DEBUG| [stdout] [ 3738.474354] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:48 DEBUG| [stdout] [ 3748.734396]

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2019-12-05 Thread Colin Ian King

Easy steps to reproduce this issue:

sudo modprobe l2tp_core
sudo ./linux-5.4.0/tools/testing/selftests/net/l2tp.sh
./close

where close is compiled from:

#include 
#include 
#include 
#include 

int main()
{
int fd;

printf("calling socket..\n");
fd = socket(AF_APPLETALK, SOCK_STREAM, 0);
printf("socket returned: %d\n", fd);
}

When running the above program we just see "calling socket" and it
blocks forever on the socket call.  After a couple of minutes we get the
kernel hung task warning.  We also see repeated messages:

unregister_netdevice: waiting for eth0 to become free. Usage count = 1

I'll bisect the kernel next.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2019-12-05 Thread Colin Ian King

The unregister_netdevice issue occurs running the kernel self test in
testing/selftests/net/l2tp.sh after modprobing the l2tp driver.  A hang
can be the produced by running the stress-ng close stressor, this is
just expediting an eventual hang caused by this test.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:27 DEBUG| [stdout] [ 3728.214335]

[Kernel-packages] [Bug 1855151] Re: adt bpf tests crash 5.4.0-7 on ppc64el on power box

2019-12-04 Thread Colin Ian King

17:59:24 DEBUG| [stdout] # send cpu 63, receive socket 63
17:59:24 DEBUG| [stdout] # send cpu 65, receive socket 65
17:59:24 DEBUG| [stdout] # send cpu 67, receive socket 67
17:59:24 DEBUG| [stdout] # send cpu 69, receive socket 69
17:59:24 DEBUG| [stdout] # send cpu 71, receive socket 71
17:59:24 DEBUG| [stdout] # send cpu 73, receive socket 73
[ 3269.552837] test_bpf: #0 TAX jited:1 
[ 3269.552885] Oops: Exception in kernel mode, sig: 4 [#1]
[ 3269.552916] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[ 3269.552928] Modules linked in: test_bpf(+) tls af_packet_diag tcp_diag 
udp_diag raw_diag inet_diag binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua joydev input_leds
 mac_hid ofpart cmdlinepart powernv_flash mtd at24 opal_prd uio_pdrv_genirq uio 
ipmi_powernv ipmi_devintf ipmi_msghandler ibmpowernv vmx_crypto powernv_rng 
sch_fq_codel ip_tables x_t
ables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid hid
 ast drm_vram_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm ahci tg3 libahci 
drm_panel_orientation_quirks [l
ast unloaded: notifier_error_inject]
[ 3269.847244] CPU: 55 PID: 137 Comm: modprobe Not tainted 5.4.0-7-generic 
#8
[ 3269.926547] NIP:  c008029f80b4 LR: c0080465106c CTR: c008029f80b4
[ 3269.927427] REGS: c00712eb3410 TRAP: 0e40   Not tainted  
(5.4.0-7-generic)
[ 3269.928286] MSR:  9288b033   CR: 
28222422  XER: 2000
[ 3270.036372] CFAR: c000de44 IRQMASK: 0 
[ 3270.036372] GPR00: c00804651044 c00712eb36a0 c0080465dd00 
c00415ee1600 
[ 3270.036372] GPR04: c00802850038  01f401dc 
00025a599268f4d4 
[ 3270.036372] GPR08: 0018 018acb48de01 0018f194 
c00804651ac0 
[ 3270.036372] GPR12: c008029f80b4 c007ff741c80 0008 
007b 
[ 3270.036372] GPR16: 00081234aaab 0241 024c 
20c49ba5e353f7cf 
[ 3270.036372] GPR20: c00415ee1600 c00804656dc9 c00804656e74 
03e8 
[ 3270.036372] GPR24: c00802850038 1234 c00804656e50 
c0080285 
[ 3270.036372] GPR28:  02f94279bb09  
c00804655dc0 
[ 3270.306180] NIP [c008029f80b4] 0xc008029f80b4
[ 3270.307006] LR [c0080465106c] run_one+0x2b0/0x41c [test_bpf]
[ 3270.307912] Call Trace:
[ 3270.307923] [c00712eb36a0] [c00804651044] run_one+0x288/0x41c 
[test_bpf] (unreliable)
[ 3270.415622] [c00712eb37b0] [c00804651474] test_bpf+0x29c/0x3d8 
[test_bpf]
[ 3270.416485] [c00712eb38a0] [c00804651714] test_bpf_init+0x164/0x468 
[test_bpf]
[ 3270.505901] [c00712eb3990] [c00100c4] do_one_initcall+0x64/0x2b0
[ 3270.506777] [c00712eb3a60] [c0225bec] do_init_module+0x7c/0x2e0
[ 3270.507674] [c00712eb3af0] [c0228e88] load_module+0x1628/0x1a40
[ 3270.606197] [c00712eb3d00] [c02295a8] 
__do_sys_finit_module+0xc8/0x150
[ 3270.607134] [c00712eb3e20] [c000b278] system_call+0x5c/0x68
[ 3270.608814] Instruction dump:
[ 3270.608857]        
 
[ 3270.713687]        
 
[ 3270.716164] ---[ end trace fd593383c9195849 ]---
17:59:24 DEBUG| [stdout] # send c[ 3270.826052] 
pu 75, receive socket 75
17:59:24 DEBUG| [stdout] # send cpu 77, receive socket 77

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855151

Title:
  adt bpf tests crash 5.4.0-7 on ppc64el on power box

Status in linux package in Ubuntu:
  In Progress

Bug description:
  Running the ADT tests on a power box, the bpf tests crash the kernel
  as follows:

  [ 2745.079592] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
  [ 2745.079808] Faulting instruction address: 0x
  [ 2745.079824] Oops: Kernel access of bad area, sig: 11 [#1]
  [ 2745.079993] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
  [ 2745.080011] Modules linked in: af_packet_diag tcp_diag udp_diag raw_diag 
inet_diag binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev 
input_leds mac_hid ofpart 
  cmdlinepart powernv_flash mtd ibmpowernv at24 uio_pdrv_genirq uio 
ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd powernv_rng vmx_crypto 
sch_fq_codel ip_tables x_tables autofs4 bt
  rfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid hid ast drm_vram_he
  lper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm tg3 ahci libahci

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2019-12-04 Thread Colin Ian King

I added a background task to dump out new dmesg messages and I now see
messages such as the following *before* any stress-ng tests run. I think
we can therefore assume the damage to the kernel occurred in prior ADT
tests.

11:02:46 DEBUG| [stdout] [ 3093.210307] unregister_netdevice: waiting
for lo to become free. Usage count = 1

Current hypothesis is that corruption is happening with the bpf kernel
regression tests.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo 
to become free. Usage count

[Kernel-packages] [Bug 1855151] [NEW] adt bpf tests crash 5.4.0-7 on ppc64el on power box

2019-12-04 Thread Colin Ian King

   
 
[ 2745.096394] ---[ end trace d347ca85a257c66f ]---
[ 2745.208020] 
[ 2746.208219] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 274[ 2796.226294116,5] OPAL: Reboot request...
6.316857] Rebooting in 10 seconds..

The final ADT test output recorded was:

17:03:13 DEBUG| [stdout] #  IPv6 TCP 
17:03:13 DEBUG| [stdout] # Testing EBPF mod 10...
17:03:13 DEBUG| [stdout] # Socket 0: 0
17:03:13 DEBUG| [stdout] # Socket 1: 1
17:03:13 DEBUG| [stdout] # Socket 2: 2
17:03:13 DEBUG| [stdout] # Socket 3: 3
... etc ...
17:03:13 DEBUG| [stdout] # Socket 4: 4
17:03:13 DEBUG| [stdout] # Socket 5: 5
17:03:13 DEBUG| [stdout] # Socket 9: 19
17:03:13 DEBUG| [stdout] # Reprograming, testing mod 5...
17:03:13 DEBUG| [stdout] # Socket 0: 0
...
17:03:13 DEBUG| [stdout] # Socket 3: 18
17:03:13 DEBUG| [stdout] # Socket 4: 19
...
17:03:13 DEBUG| [stdout] # Testing CBPF mod 10...
17:03:13 DEBUG| [stdout] # Socket 0: 0
...
17:03:13 DEBUG| [stdout] # Reprograming, testing mod 5...
17:03:13 DEBUG| [stdout] # Socket 0: 0
17:03:13 DEBUG| [stdout] # Socket 1: 1
...
17:03:13 DEBUG| [stdout] # Socket 4: 19
17:03:13 DEBUG| [stdout] # Testing too many filters...
17:03:13 DEBUG| [stdout] # Testing filters on non-SO_REUSEPORT socket...
17:03:13 DEBUG| [stdout] #  IPv6 TCP w/ mapped IPv4 
17:03:13 DEBUG| [stdout] # Testing EBPF mod 10...
17:03:13 DEBUG| [stdout] # Socket 0: 0
17:03:13 DEBUG| [stdout] # Socket 1: 1
...
17:03:13 DEBUG| [stdout] # Reprograming, testing mod 5...
17:03:13 DEBUG| [stdout] # Socket 0: 0
17:03:13 DEBUG| [stdout] # Socket 1: 1
...
17:03:13 DEBUG| [stdout] # Testing CBPF mod 10...
17:03:13 DEBUG| [stdout] # Socket 0: 0
17:03:13 DEBUG| [stdout] # Socket 1: 1
...
17:03:13 DEBUG| [stdout] # Reprograming, testing mod 5...
17:03:13 DEBUG| [stdout] # Socket 0: 0
17:03:13 DEBUG| [stdout] # Socket 1: 1
...
17:03:13 DEBUG| [stdout] # Testing filter add without bind...
17:03:13 DEBUG| [stdout] # SUCCESS
17:03:13 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
17:03:13 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
17:03:13 DEBUG| [stdout] #  IPv4 UDP 
17:03:13 DEBUG| [stdout] # send cpu 0, receive socket 0
17:03:13 DEBUG| [stdout] # send cpu 1, receive socket 1
...
17:03:13 DEBUG| [stdout] # send cpu 125, receive socket 125
17:03:13 DEBUG| [stdout] # send cpu 127, receive socket 127
17:03:13 DEBUG| [stdout] #  IPv4 TCP 
[ end of output as machine panic's ]

..so it occurred sometime around or after this. I'll re-run this with
the ipmi tool on the console to see if I can see how far it got before
the kernel panic'd.

** Affects: linux (Ubuntu)
 Importance: High
 Assignee: Colin Ian King (colin-king)
 Status: In Progress

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: linux (Ubuntu)
   Status: New => In Progress

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855151

Title:
  adt bpf tests crash 5.4.0-7 on ppc64el on power box

Status in linux package in Ubuntu:
  In Progress

Bug description:
  Running the ADT tests on a power box, the bpf tests crash the kernel
  as follows:

  [ 2745.079592] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
  [ 2745.079808] Faulting instruction address: 0x
  [ 2745.079824] Oops: Kernel access of bad area, sig: 11 [#1]
  [ 2745.079993] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
  [ 2745.080011] Modules linked in: af_packet_diag tcp_diag udp_diag raw_diag 
inet_diag binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev 
input_leds mac_hid ofpart 
  cmdlinepart powernv_flash mtd ibmpowernv at24 uio_pdrv_genirq uio 
ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd powernv_rng vmx_crypto 
sch_fq_codel ip_tables x_tables autofs4 bt
  rfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid hid ast drm_vram_he
  lper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt 
fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm tg3 ahci libahci 
drm_panel_orientation_quirks [last unloaded: no
  tifier_error_inject]
  [ 2745.080195] CPU: 0 PID: 366 Comm: reuseport_bpf_c Not tainted 
5.4.0-7-generic #8
  [ 2745.080214] NIP:   LR: c0ce8710 CTR: 

  [ 2745.080233] REGS: c007ff6eb550 TRAP: 0400   Not tainted  
(5.4.0-7-generic)
  [ 2745.080250] MSR:  900040009033   CR: 24002282 
 XER: 2000
  [ 2745.080272] CFAR: c000de44 IRQMASK: 0 
  [ 2745.080272] GPR00: c0d67c9c c007ff6eb7e0 c1a5bf00 
c004258e10e0 
  [ 2745.080272] GPR04: c00802830038 c004258e10e0 0028 
e3c2 
  [ 2745.080272] GPR08: 000

[Kernel-packages] [Bug 1855143] Re: 5.4.0-7 kernel crash on boot on power box

2019-12-04 Thread Colin Ian King

Re-installed the kernel, it's booting fine now. I wonder if I had some
kind of corruption from a previous test crash. Can't reproduce this now.
Marking it as Invalid.

** Changed in: linux (Ubuntu)
   Status: New => Invalid

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855143

Title:
  5.4.0-7 kernel crash on boot on power box

Status in linux package in Ubuntu:
  Invalid

Bug description:
  boot failures with 5.4.0-7-generic on OPAL power box:

  I was running ADT tests and the machine hung/rebooted. I was unable to
  log in. After I rebooted the machine with the ipmi tool the machine
  crashed with the following kernel output:

  [   51.081421774,5] SkiBoot skiboot-5.4.8-5787ad3 starting...
  [   51.081426316,5] initial console log level: memory 7, driver 5
  [   51.081429224,6] CPU: P8 generation processor(max 8 threads/core)
  [   51.081432044,7] CPU: Boot CPU PIR is 0x0470 PVR is 0x004d0200
  [   51.081435009,7] CPU: Initial max PIR set to 0x1fff
  [   51.082535316,5] OPAL table: 0x300bfc40 .. 0x300c0110, branch table: 
0x30002000
  [   51.082543101,5] FDT: Parsing fdt @0xff0
  [   51.087692296,5] XSCOM: chip 0x0 at 0x3fc00 [P8 DD2.0]
  [   51.087702232,5] XSCOM: chip 0x8 at 0x3fc40 [P8 DD2.0]
  [   51.087709775,6] XSTOP: XSCOM addr = 0x2010c82, FIR bit = 31
  [   51.087713185,6] MFSI 0:0: Initialized
  [   51.087715462,6] MFSI 0:2: Initialized
  [   51.087717669,6] MFSI 0:1: Initialized
  [   51.087720203,6] MFSI 8:0: Initialized
  [   51.087722365,6] MFSI 8:2: Initialized
  [   51.087724518,6] MFSI 8:1: Initialized
  [   51.088044434,5] LPC: LPC[000]: Initialized, access via XSCOM @0xb0020
  [   51.088162270,5] LPC: LPC: Default bus on chip 0x0
  [   51.088303476,6] MEM: parsing reserved memory from node 
/ibm,hostboot/reserved-memory
  [   51.088313438,7] HOMER: Init chip 0
  [   51.088316406,7]   PBA BAR0 : 0x0007fd80
  [   51.088319108,7]   PBA MASK0: 0x0030
  [   51.088321761,7]   HOMER Image at 0x7fd80 size 4MB
  [   51.088325579,7]   PBA BAR2 : 0x4007fda0
  [   51.088328358,7]   PBA MASK2: 0x
  [   51.088330928,7]   SLW Image at 0x7fda0 size 1MB
  [   51.088334409,7]   PBA BAR3 : 0x0007ff80
  [   51.088337060,7]   PBA MASK3: 0x0070
  [   51.088339732,7]   OCC Common Area at 0x7ff80 size 8MB
  [   51.088342594,7] HOMER: Init chip 8
  [   51.088345257,7]   PBA BAR0 : 0x0007fdc0
  [   51.088347872,7]   PBA MASK0: 0x0030
  [   51.088350519,7]   HOMER Image at 0x7fdc0 size 4MB
  [   51.088354173,7]   PBA BAR2 : 0x4007fde0
  [   51.088356860,7]   PBA MASK2: 0x
  [   51.088359365,7]   SLW Image at 0x7fde0 size 1MB
  [   51.088362788,7]   PBA BAR3 : 0x0007ff80
  [   51.088365419,7]   PBA MASK3: 0x0070
  [   51.088367946,7]   OCC Common Area at 0x7ff80 size 8MB
  [   51.088387526,7] CPU idle state device tree init
  [   51.088391002,4] SLW: HB-provided idle states property found
  [   51.088567406,7] AST: PNOR LPC offset: 0x0c00
  [   51.088650577,5] PLAT: Using virtual UART
  [   51.088977615,7] UART: Using LPC IRQ 4
  [   51.203625382,5] PLAT: Detected Firestone platform
  [   51.219765305,5] PLAT: Detected BMC platform AMI
  [   51.239417466,5] CENTAUR: Found centaur for chip 0x0 channel 4
  [   51.239524825,5] CENTAUR:   FSI host: 0x0 cMFSI0 port 7
  [   51.241283553,5] CENTAUR: Found centaur for chip 0x0 channel 5
  [   51.241759761,5] CENTAUR:   FSI host: 0x0 cMFSI0 port 6
  [   51.242362656,5] PSI[0x000]: Found PSI bridge [active=0]
  [   51.242690427,5] PSI[0x008]: Found PSI bridge [active=0]
  [   51.245117930,5] CPU: All 128 processors called in...
  [2.472212005,5] FLASH: Found system flash: Macronix MXxxL51235F id:0
  [2.472354468,5] BT: Interface initialized, IO 0x00e4
  [3.421491873,5] NVRAM: Size is 576 KB
  [4.095942958,5] STB: secure mode off
  [4.096004331,5] STB: trusted mode off
  [4.096965839,5] CAPI: Preloading ucode 200ea
  [4.097023615,5] FLASH: Queueing preload of 2/200ea
  [4.097202595,5] FLASH: Queueing preload of 0/0
  [4.097723471,5] FLASH: Queueing preload of 1/0
  [4.097739635,7] FFS: Partition map size: 0x1000
  [4.101069429,7] FLASH: CAPP partition has ECC
  [4.117588444,5] STB: sb_verify skipped resource 2, secure_mode=0
  [4.117607170,5] Chip 0 Found PBCQ0 at /xscom@3fc00/pbcq@2012000
  [4.117610665,7] PHB3[0:0]: X[PE]=0x02012000 X[PCI]=0x09012000 
X[SPCI]=0x09013c00
  [4.117690635,7] PHB3[0:0] REGS = 0x0003fffe4000 [4k]
  [4.124862367,7] PHB3[0:0] PCIBAR   = 0x0003fffe4000
  [4.144741905,7] PHB3[0:0] MMIO0= 0x2000 
[0x0100]
  [4.147663099,7] PHB3[0:0]

[Kernel-packages] [Bug 1855143] [NEW] 5.4.0-7 kernel crash on boot on power box

2019-12-04 Thread Colin Ian King

Public bug reported:

boot failures with 5.4.0-7-generic on OPAL power box:

I was running ADT tests and the machine hung/rebooted. I was unable to
log in. After I rebooted the machine with the ipmi tool the machine
crashed with the following kernel output:

[   51.081421774,5] SkiBoot skiboot-5.4.8-5787ad3 starting...
[   51.081426316,5] initial console log level: memory 7, driver 5
[   51.081429224,6] CPU: P8 generation processor(max 8 threads/core)
[   51.081432044,7] CPU: Boot CPU PIR is 0x0470 PVR is 0x004d0200
[   51.081435009,7] CPU: Initial max PIR set to 0x1fff
[   51.082535316,5] OPAL table: 0x300bfc40 .. 0x300c0110, branch table: 
0x30002000
[   51.082543101,5] FDT: Parsing fdt @0xff0
[   51.087692296,5] XSCOM: chip 0x0 at 0x3fc00 [P8 DD2.0]
[   51.087702232,5] XSCOM: chip 0x8 at 0x3fc40 [P8 DD2.0]
[   51.087709775,6] XSTOP: XSCOM addr = 0x2010c82, FIR bit = 31
[   51.087713185,6] MFSI 0:0: Initialized
[   51.087715462,6] MFSI 0:2: Initialized
[   51.087717669,6] MFSI 0:1: Initialized
[   51.087720203,6] MFSI 8:0: Initialized
[   51.087722365,6] MFSI 8:2: Initialized
[   51.087724518,6] MFSI 8:1: Initialized
[   51.088044434,5] LPC: LPC[000]: Initialized, access via XSCOM @0xb0020
[   51.088162270,5] LPC: LPC: Default bus on chip 0x0
[   51.088303476,6] MEM: parsing reserved memory from node 
/ibm,hostboot/reserved-memory
[   51.088313438,7] HOMER: Init chip 0
[   51.088316406,7]   PBA BAR0 : 0x0007fd80
[   51.088319108,7]   PBA MASK0: 0x0030
[   51.088321761,7]   HOMER Image at 0x7fd80 size 4MB
[   51.088325579,7]   PBA BAR2 : 0x4007fda0
[   51.088328358,7]   PBA MASK2: 0x
[   51.088330928,7]   SLW Image at 0x7fda0 size 1MB
[   51.088334409,7]   PBA BAR3 : 0x0007ff80
[   51.088337060,7]   PBA MASK3: 0x0070
[   51.088339732,7]   OCC Common Area at 0x7ff80 size 8MB
[   51.088342594,7] HOMER: Init chip 8
[   51.088345257,7]   PBA BAR0 : 0x0007fdc0
[   51.088347872,7]   PBA MASK0: 0x0030
[   51.088350519,7]   HOMER Image at 0x7fdc0 size 4MB
[   51.088354173,7]   PBA BAR2 : 0x4007fde0
[   51.088356860,7]   PBA MASK2: 0x
[   51.088359365,7]   SLW Image at 0x7fde0 size 1MB
[   51.088362788,7]   PBA BAR3 : 0x0007ff80
[   51.088365419,7]   PBA MASK3: 0x0070
[   51.088367946,7]   OCC Common Area at 0x7ff80 size 8MB
[   51.088387526,7] CPU idle state device tree init
[   51.088391002,4] SLW: HB-provided idle states property found
[   51.088567406,7] AST: PNOR LPC offset: 0x0c00
[   51.088650577,5] PLAT: Using virtual UART
[   51.088977615,7] UART: Using LPC IRQ 4
[   51.203625382,5] PLAT: Detected Firestone platform
[   51.219765305,5] PLAT: Detected BMC platform AMI
[   51.239417466,5] CENTAUR: Found centaur for chip 0x0 channel 4
[   51.239524825,5] CENTAUR:   FSI host: 0x0 cMFSI0 port 7
[   51.241283553,5] CENTAUR: Found centaur for chip 0x0 channel 5
[   51.241759761,5] CENTAUR:   FSI host: 0x0 cMFSI0 port 6
[   51.242362656,5] PSI[0x000]: Found PSI bridge [active=0]
[   51.242690427,5] PSI[0x008]: Found PSI bridge [active=0]
[   51.245117930,5] CPU: All 128 processors called in...
[2.472212005,5] FLASH: Found system flash: Macronix MXxxL51235F id:0
[2.472354468,5] BT: Interface initialized, IO 0x00e4
[3.421491873,5] NVRAM: Size is 576 KB
[4.095942958,5] STB: secure mode off
[4.096004331,5] STB: trusted mode off
[4.096965839,5] CAPI: Preloading ucode 200ea
[4.097023615,5] FLASH: Queueing preload of 2/200ea
[4.097202595,5] FLASH: Queueing preload of 0/0
[4.097723471,5] FLASH: Queueing preload of 1/0
[4.097739635,7] FFS: Partition map size: 0x1000
[4.101069429,7] FLASH: CAPP partition has ECC
[4.117588444,5] STB: sb_verify skipped resource 2, secure_mode=0
[4.117607170,5] Chip 0 Found PBCQ0 at /xscom@3fc00/pbcq@2012000
[4.117610665,7] PHB3[0:0]: X[PE]=0x02012000 X[PCI]=0x09012000 
X[SPCI]=0x09013c00
[4.117690635,7] PHB3[0:0] REGS = 0x0003fffe4000 [4k]
[4.124862367,7] PHB3[0:0] PCIBAR   = 0x0003fffe4000
[4.144741905,7] PHB3[0:0] MMIO0= 0x2000 [0x0100]
[4.147663099,7] PHB3[0:0] MMIO1= 0x3fe0 [0x8000]
[4.151015049,7] PHB3[0:0] BAREN= 0xf800
[4.151018735,7] PHB3[0:0] NEWBAREN = 0xf800
[4.152491015,7] PHB3[0:0] IRSNC= 0x0100
[4.177266431,5] STB: tb_measure skipped resource 2, trusted_mode=0
[4.177266472,7] PHB3[0:0] IRSNM= 0xff00
[4.177269336,7] PHB3[0:0] LSI  = 0xff00
[4.177278668,5] Chip 0 Found PBCQ1 at /xscom@3fc00/pbcq@2012400
[4.177282022,7] PHB3[0:1]: X[PE]=0x02012400 X[PCI]=0x09012400 
X[SPCI]=0x09013c40
[4.178715842,7] PHB3[0:1] REGS = 0x0003fffe4010 [4k]
[4.183043807,7] PHB3[0:1] PCIBAR   = 0x0003fffe4010
[4.190163295,5] Chip 8 Found PBCQ0 at

[Kernel-packages] [Bug 1855100] [NEW] bpf self tests break 5.4.0-7-generic on power8 system

2019-12-04 Thread Colin Ian King

Public bug reported:

Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of
the bare metal system.

Last output seen while ssh'd into the box:

11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls
11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests
11:52:34 DEBUG| [stdout] # 
11:52:34 DEBUG| [stdout] # running socket test
11:52:34 DEBUG| [stdout] # 
11:52:34 DEBUG| [stdout] # [PASS]
11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests
11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests
11:52:34 DEBUG| [stdout] # 
11:52:34 DEBUG| [stdout] # running psock_fanout test
11:52:34 DEBUG| [stdout] # 
client_loop: send disconnect: Broken pipe

last output in (truncated) nohup output:

f -emit-llvm -c progs/pyperf180.c -o - || \
11:52:15 DEBUG| [stdout]echo "clang failed") | \
11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe  \
11:52:15 DEBUG| [stdout]-filetype=obj -o 
/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o

this suggests the bpf selftests are causing the breakage.

last output logged in /var/log/dmesg.log :

Dec  4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to 
MEM_GOING_OFFLINE
Dec  4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to 
MEM_GOING_OFFLINE
Dec  4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to 
MEM_GOING_OFFLINE
Dec  4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to 
MEM_GOING_OFFLINE
Dec  4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to 
MEM_GOING_OFFLINE

next entries on dmesg.log show machine had rebooted.

** Affects: linux (Ubuntu)
 Importance: High
 Status: New

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855100

Title:
  bpf self tests break 5.4.0-7-generic on power8 system

Status in linux package in Ubuntu:
  New

Bug description:
  Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of
  the bare metal system.

  Last output seen while ssh'd into the box:

  11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls
  11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # running socket test
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # [PASS]
  11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests
  11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests
  11:52:34 DEBUG| [stdout] # 
  11:52:34 DEBUG| [stdout] # running psock_fanout test
  11:52:34 DEBUG| [stdout] # 
  client_loop: send disconnect: Broken pipe

  last output in (truncated) nohup output:

  f -emit-llvm -c progs/pyperf180.c -o - || \
  11:52:15 DEBUG| [stdout]echo "clang failed") | \
  11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe  \
  11:52:15 DEBUG| [stdout]-filetype=obj -o 
/home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o

  this suggests the bpf selftests are causing the breakage.

  last output logged in /var/log/dmesg.log :

  Dec  4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to 
MEM_GOING_OFFLINE
  Dec  4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to 
MEM_GOING_OFFLINE

  next entries on dmesg.log show machine had rebooted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855100/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1824407] Re: remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files

2019-12-04 Thread Colin Ian King

Verified for disco:

Run reproducer script with old kernel: 5.0.0-37-generic, results:

cat /root-tmp/etc/.pwd.lock
cat: /root-tmp/etc/.pwd.lock: Input/output error

Run with -proposed kernel: 5.0.0-38-generic

cat /root-tmp/etc/.pwd.lock
foo

Marking as verification-done-disco

** Tags removed: verification-needed-disco
** Tags added: verification-done-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1824407

Title:
  remount of multilower moved pivoted-root overlayfs root, results in
  I/O errors on some modified files

Status in linux package in Ubuntu:
  In Progress
Status in linux-hwe package in Ubuntu:
  Invalid
Status in linux-hwe source package in Bionic:
  In Progress
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in Focal:
  In Progress

Bug description:
  == SRU Justification Disco, Eoan, Focal ==

  Multiple squashfs filesystems with overlayfs cause file corruption issues
  when modifying zero sized files

  == Fix ==

  The current fix is pending in
  
https://github.com/amir73il/linux/commit/b2d4f0ea5af42e16e154254de99da064f3ac551a

  == Test case ==

  With an Ubuntu ISO on the cdrom drive, use:

  #!/bin/bash -x
  mkdir -p /cdrom
  mount -t iso9660 -o ro,noatime /dev/sr0 /cdrom
  sleep 1
  mkdir -p /cow
  mount -t tmpfs -o 'rw,noatime,mode=755' tmpfs /cow
  sleep 1
  mkdir -p /cow/upper
  mkdir -p /cow/work
  modprobe -q -b overlay
  sleep 1
  modprobe -q -b loop
  sleep 1
  dev=$(losetup -f)
  mkdir -p /filesystem.squashfs
  losetup $dev /cdrom/casper/filesystem.squashfs
  mount -t squashfs -o ro,noatime $dev /filesystem.squashfs
  sleep 1

  dev=$(losetup -f)
  mkdir -p /installer.squashfs
  losetup $dev /cdrom/casper/installer.squashfs
  mount -t squashfs -o ro,noatime $dev /installer.squashfs
  sleep 1

  mkdir -p /root-tmp
  mount -t overlay -o 
'upperdir=/cow/upper,lowerdir=/installer.squashfs:/filesystem.squashfs,workdir=/cow/work'
 /cow /root-tmp

  FILE=/root-tmp/etc/.pwd.lock

  echo foo > $FILE
  cat $FILE
  sync
  #
  # dropping caches or remounting causes the bug
  #
  echo 3 > /proc/sys/vm/drop_caches
  cat $FILE

  Without the fix the cat of the file will produce an error. With the
  the cat will work correctly.

  == Regression Potential ==

  There is an unhandled corner case:
  - two filesystems, A and B, both have null uuid
  - upper layer is on A
  - lower layer 1 is also on A
  - lower layer 2 is on B

  However, since this is an issue without the fix and will be addressed
  later with subsequent fixes once they are OK with upstream I think the
  risk is minimal considering nobody is complaining about these corner
  cases with the current broken overlayfs squashfs layering.

  ---

  1) Download focal subiquity pending image, or eoan release image
  2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI)
  3) After --- insert the following options

     break=top debug init=/bin/bash

  4) Continue boot (Enter in BIOS, ctrl+x in UEFI)
  5) in the initramfs execute:

  rm /scripts/casper-bottom/25adduser
  exit

  6) you will be dropped into pivoted root filesystem, before systemd is execed 
as pid one
  7) /run/initramfs/ will contain a debug log, showing how everything was 
mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower 
overlay setup from them, moved to /root, and then pivot-root to /root done to 
finally end up as /. Underlying layers are moved into /cow for your convenience.

  8) At this point modifying zero-byte length files, that exist in the
  lowest layer, but not the middle one, in certain ways, will results in
  them to be corrupted, after / is remounted.

  9) Corruption examples

  (On both focal & eoan)

  cat /etc/.pwd.lock
  systemd-sysusers
  cat /etc/.pwd.lock
  mount -o remount /
  cat /etc/.pwd.lock
  overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000)
  cat: /etc/.pwd.lock: Input/output error

  (Only on eoan)

  cat /etc/machine-id
  systemd-machine-id-setup
  cat /etc/machine-id
  mount -o remount /
  cat /etc/machine-id
  overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000)
  cat: /etc/machine-id: Input/output error

  Lots of things break once machine-id and .pwd.lock are corrupted. I.e.
  unable to dhcp, connect to dbus, add/remove/change users or groups,
  etc.

  We were unable to recreate the issue outside of booting things with
  casper. Ie. statically on a regular host machine without pivot-root.
  But hopefully booting to a quite state with nothing running is
  sufficient to reproduce this.

  Instead of booting with `bebroken init=/bin/bash` you can boot with
  `bebroken systemd.mask=systemd-remount-fs.service` this will complete
  the boot, with /etc/machine-id & .pwd.lock modified, meaning that

[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x

2019-12-03 Thread Colin Ian King

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854968

Title:
  stress-ng sctp stressor breaks 5.4.0.7-8  on s390x

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT
  regression testing:

  
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
  /autopkgtest-focal-canonical-kernel-team-
  unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz

  14:44:30 DEBUG| [stdout] sctp STARTING
  14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 
256/256)
  14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked 
for more than 122 seconds.
  14:47:44 DEBUG| [stdout] [ 3684.814345]   Tainted: P   OE 
5.4.0-7-generic #8-Ubuntu
  14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 
2063618 0x0800
  14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace:
  14:47:44 DEBUG| [stdout] [ 3684.814360] ([<be310914>] 
__schedule+0x304/0x7b0)
  14:47:44 DEBUG| [stdout] [ 3684.814362]  [<be310e0a>] 
schedule+0x4a/0xe0 
  14:47:44 DEBUG| [stdout] [ 3684.814366]  [<bdb071cc>] 
rwsem_down_write_slowpath+0x22c/0x530 
  14:47:44 DEBUG| [stdout] [ 3684.814370]  [<be14d66c>] 
register_pernet_subsys+0x2c/0x60 
  14:47:44 DEBUG| [stdout] [ 3684.814411]  [<03ff80766638>] 
sctp_init+0x2f0/0x520 [sctp] 
  14:47:44 DEBUG| [stdout] [ 3684.814414]  [<bda288c0>] 
do_one_initcall+0x40/0x200 
  14:47:44 DEBUG| [stdout] [ 3684.814416]  [<bdb594a0>] 
do_init_module+0x70/0x270 
  14:47:44 DEBUG| [stdout] [ 3684.814418]  [<bdb5b892>] 
load_module+0x1142/0x1440 
  14:47:44 DEBUG| [stdout] [ 3684.814419]  [<bdb5bdc4>] 
__do_sys_finit_module+0xa4/0xf0 
  14:47:44 DEBUG| [stdout] [ 3684.814421]  [<be315fc6>] 
system_call+0x2aa/0x2c8 
  14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo 
to become free. Usage count = 1
  14:48:37 DEBUG| [stdout

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 2248 matches

Mail list logo