[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2019-07-24 Thread Brad Figg
** Tags added: cscc

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  Include any warning/errors/backtraces from the system logs
  dmesg output

  [  

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-10-08 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.6.5.6-0ubuntu25

---
zfs-linux (0.6.5.6-0ubuntu25) xenial; urgency=medium

  * Fix zpl_mount() deadlock (LP: #1781364)
- Upstream ZFS fix ac09630d8b0b ("Fix zpl_mount() deadlock")
  fixes deadlock on multiple parallelized mount/umounts

 -- Colin Ian King   Thu, 12 Jul 2018 09:18:24
+0100

** Changed in: zfs-linux (Ubuntu Xenial)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-09-27 Thread Colin Ian King
Tested 0.6.5.6-0ubuntu25 and it works without any issues, so marking
this as verified.

** Tags removed: verification-required-xenial
** Tags added: verification-done-xenial

** Changed in: linux (Ubuntu Xenial)
 Assignee: tenox (senseimyijaki) => Colin Ian King (colin-king)

** Changed in: linux (Ubuntu Bionic)
 Assignee: tenox (senseimyijaki) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu Bionic)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu Xenial)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: zfs-linux (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-09-27 Thread Łukasz Zemczak
Hello Colin, or anyone else affected,

Accepted zfs-linux into xenial-proposed. The package will build now and
be available at https://launchpad.net/ubuntu/+source/zfs-
linux/0.6.5.6-0ubuntu25 in a few hours, and then in the -proposed
repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-xenial to verification-done-xenial. If it does not
fix the bug for you, please add a comment stating that, and change the
tag to verification-failed-xenial. In either case, without details of
your testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: zfs-linux (Ubuntu Xenial)
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-09-14 Thread Colin Ian King
Diff: http://launchpadlibrarian.net/388404653/zfs-
linux_0.6.5.6-0ubuntu24_0.6.5.6-0ubuntu25.diff.gz

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-09-14 Thread Colin Ian King
Wrong URL, ignore that.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  Include any warning/errors/backtraces from the system logs
  dmesg output

  [  

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-09-14 Thread Colin Ian King
Diff:

http://launchpadlibrarian.net/388038683/zfs-
linux_0.6.5.6-0ubuntu24_0.6.5.6-0ubuntu25.diff.gz

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-09-12 Thread Colin Ian King
** Tags removed: verification-done-xenial
** Tags added: verification-required-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Released
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  Include any 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-23 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.4.0-134.160

---
linux (4.4.0-134.160) xenial; urgency=medium

  * linux: 4.4.0-134.160 -proposed tracker (LP: #1787177)

  * locking sockets broken due to missing AppArmor socket mediation patches
(LP: #1780227)
- UBUNTU SAUCE: apparmor: fix apparmor mediating locking non-fs, unix 
sockets

  * Backport namespaced fscaps to xenial 4.4 (LP: #1778286)
- Introduce v3 namespaced file capabilities
- commoncap: move assignment of fs_ns to avoid null pointer dereference
- capabilities: fix buffer overread on very short xattr
- commoncap: Handle memory allocation failure.

  * Xenial update to 4.4.140 stable release (LP: #1784409)
- usb: cdc_acm: Add quirk for Uniden UBC125 scanner
- USB: serial: cp210x: add CESINEL device ids
- USB: serial: cp210x: add Silicon Labs IDs for Windows Update
- n_tty: Fix stall at n_tty_receive_char_special().
- staging: android: ion: Return an ERR_PTR in ion_map_kernel
- n_tty: Access echo_* variables carefully.
- x86/boot: Fix early command-line parsing when matching at end
- ath10k: fix rfc1042 header retrieval in QCA4019 with eth decap mode
- i2c: rcar: fix resume by always initializing registers before transfer
- ipv4: Fix error return value in fib_convert_metrics()
- kprobes/x86: Do not modify singlestep buffer while resuming
- nvme-pci: initialize queue memory before interrupts
- netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain()
- ARM: dts: imx6q: Use correct SDMA script for SPI5 core
- ubi: fastmap: Correctly handle interrupted erasures in EBA
- mm: hugetlb: yield when prepping struct pages
- tracing: Fix missing return symbol in function_graph output
- scsi: sg: mitigate read/write abuse
- s390: Correct register corruption in critical section cleanup
- drbd: fix access after free
- cifs: Fix infinite loop when using hard mount option
- jbd2: don't mark block as modified if the handle is out of credits
- ext4: make sure bitmaps and the inode table don't overlap with bg
  descriptors
- ext4: always check block group bounds in ext4_init_block_bitmap()
- ext4: only look at the bg_flags field if it is valid
- ext4: verify the depth of extent tree in ext4_find_extent()
- ext4: include the illegal physical block in the bad map ext4_error msg
- ext4: clear i_data in ext4_inode_info when removing inline data
- ext4: add more inode number paranoia checks
- ext4: add more mount time checks of the superblock
- ext4: check superblock mapped prior to committing
- HID: i2c-hid: Fix "incomplete report" noise
- HID: hiddev: fix potential Spectre v1
- HID: debug: check length before copy_to_user()
- x86/mce: Detect local MCEs properly
- x86/mce: Fix incorrect "Machine check from unknown source" message
- media: cx25840: Use subdev host data for PLL override
- mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
- dm bufio: avoid sleeping while holding the dm_bufio lock
- dm bufio: drop the lock when doing GFP_NOIO allocation
- mtd: rawnand: mxc: set spare area size register explicitly
- dm bufio: don't take the lock in dm_bufio_shrink_count
- mtd: cfi_cmdset_0002: Change definition naming to retry write operation
- mtd: cfi_cmdset_0002: Change erase functions to retry for error
- mtd: cfi_cmdset_0002: Change erase functions to check chip good only
- netfilter: nf_log: don't hold nf_log_mutex during user access
- staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write()
- Linux 4.4.140

  * Xenial update to 4.4.139 stable release (LP: #1784382)
- xfrm6: avoid potential infinite loop in _decode_session6()
- netfilter: ebtables: handle string from userspace with care
- ipvs: fix buffer overflow with sync daemon and service
- atm: zatm: fix memcmp casting
- net: qmi_wwan: Add Netgear Aircard 779S
- net/sonic: Use dma_mapping_error()
- Revert "Btrfs: fix scrub to repair raid6 corruption"
- tcp: do not overshoot window_clamp in tcp_rcv_space_adjust()
- Btrfs: make raid6 rebuild retry more
- usb: musb: fix remote wakeup racing with suspend
- bonding: re-evaluate force_primary when the primary slave name changes
- tcp: verify the checksum of the first data segment in a new connection
- ext4: update mtime in ext4_punch_hole even if no blocks are released
- ext4: fix fencepost error in check for inode count overflow during resize
- driver core: Don't ignore class_dir_create_and_add() failure.
- btrfs: scrub: Don't use inode pages for device replace
- ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream()
- ALSA: hda: add dock and led support for HP EliteBook 830 G5
- ALSA: hda: add dock and led support for HP ProBook 640 G4
- cpufreq: Fix new policy initialization during limits updates via 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-23 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-33.36

---
linux (4.15.0-33.36) bionic; urgency=medium

  * linux: 4.15.0-33.36 -proposed tracker (LP: #1787149)

  * RTNL assertion failure on ipvlan (LP: #1776927)
- ipvlan: drop ipv6 dependency
- ipvlan: use per device spinlock to protect addrs list updates
- SAUCE: fix warning from "ipvlan: drop ipv6 dependency"

  * ubuntu_bpf_jit test failed on Bionic s390x systems (LP: #1753941)
- test_bpf: flag tests that cannot be jited on s390

  * HDMI/DP audio can't work on the laptop of Dell Latitude 5495 (LP: #1782689)
- drm/nouveau: fix nouveau_dsm_get_client_id()'s return type
- drm/radeon: fix radeon_atpx_get_client_id()'s return type
- drm/amdgpu: fix amdgpu_atpx_get_client_id()'s return type
- platform/x86: apple-gmux: fix gmux_get_client_id()'s return type
- ALSA: hda: use PCI_BASE_CLASS_DISPLAY to replace PCI_CLASS_DISPLAY_VGA
- vga_switcheroo: set audio client id according to bound GPU id

  * locking sockets broken due to missing AppArmor socket mediation patches
(LP: #1780227)
- UBUNTU SAUCE: apparmor: fix apparmor mediating locking non-fs, unix 
sockets

  * Update2 for ocxl driver (LP: #1781436)
- ocxl: Fix page fault handler in case of fault on dying process

  * netns: unable to follow an interface that moves to another netns
(LP: #1774225)
- net: core: Expose number of link up/down transitions
- dev: always advertise the new nsid when the netns iface changes
- dev: advertise the new ifindex when the netns iface changes

  * [Bionic] Disk IO hangs when using BFQ as io scheduler (LP: #1780066)
- block, bfq: fix occurrences of request finish method's old name
- block, bfq: remove batches of confusing ifdefs
- block, bfq: add requeue-request hook

  * HP ProBook 455 G5 needs mute-led-gpio fixup (LP: #1781763)
- ALSA: hda: add mute led support for HP ProBook 455 G5

  * [Bionic] bug fixes to improve stability of the ThunderX2 i2c driver
(LP: #1781476)
- i2c: xlp9xx: Fix issue seen when updating receive length
- i2c: xlp9xx: Make sure the transfer size is not more than
  I2C_SMBUS_BLOCK_SIZE

  * x86/kvm: fix LAPIC timer drift when guest uses periodic mode (LP: #1778486)
- x86/kvm: fix LAPIC timer drift when guest uses periodic mode

  * Please include ax88179_178a and r8152 modules in d-i udeb (LP: #1771823)
- [Config:] d-i: Add ax88179_178a and r8152 to nic-modules

  * Nvidia fails after switching its mode (LP: #1778658)
- PCI: Restore config space on runtime resume despite being unbound

  * Kernel error "task zfs:pid blocked for more than 120 seconds" (LP: #1781364)
- SAUCE: (noup) zfs to 0.7.5-1ubuntu16.3

  * CVE-2018-12232
- PATCH 1/1] socket: close race condition between sock_close() and
  sockfs_setattr()

  * CVE-2018-10323
- xfs: set format back to extents if xfs_bmap_extents_to_btree

  * change front mic location for more lenovo m7/8/9xx machines (LP: #1781316)
- ALSA: hda/realtek - Fix the problem of two front mics on more machines
- ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION

  * Cephfs + fscache: unable to handle kernel NULL pointer dereference at
 IP: jbd2__journal_start+0x22/0x1f0 (LP: #1783246)
- ceph: track read contexts in ceph_file_info

  * Touchpad of ThinkPad P52 failed to work with message "lost sync at byte"
(LP: #1779802)
- Input: elantech - fix V4 report decoding for module with middle key
- Input: elantech - enable middle button of touchpads on ThinkPad P52

  * xhci_hcd :00:14.0: Root hub is not suspended (LP: #1779823)
- usb: xhci: dbc: Fix lockdep warning
- usb: xhci: dbc: Don't decrement runtime PM counter if DBC is not started

  * CVE-2018-13406
- video: uvesafb: Fix integer overflow in allocation

  * CVE-2018-10840
- ext4: correctly handle a zero-length xattr with a non-zero e_value_offs

  * CVE-2018-11412
- ext4: do not allow external inodes for inline data

  * CVE-2018-10881
- ext4: clear i_data in ext4_inode_info when removing inline data

  * CVE-2018-12233
- jfs: Fix inconsistency between memory allocation and ea_buf->max_size

  * CVE-2018-12904
- kvm: nVMX: Enforce cpl=0 for VMX instructions

  * Error parsing PCC subspaces from PCCT (LP: #1528684)
- mailbox: PCC: erroneous error message when parsing ACPI PCCT

  * CVE-2018-13094
- xfs: don't call xfs_da_shrink_inode with NULL bp

  * other users' coredumps can be read via setgid directory and killpriv bypass
(LP: #1779923) // CVE-2018-13405
- Fix up non-directory creation in SGID directories

  * Invoking obsolete 'firmware_install' target breaks snap build (LP: #1782166)
- snapcraft.yaml: stop invoking the obsolete (and non-existing)
  'firmware_install' target

  * snapcraft.yaml: missing ubuntu-retpoline-extract-one script breaks the build
(LP: #1782116)
- snapcraft.yaml: copy 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-19 Thread Colin Ian King
The bug will be fixed once the zfs package and the bionic kernel (that
contains the zfs driver changes) will be released. So far, just the zfs
package has been released and we are waiting for the kernel to complete
the SRU update and verification phase - this takes a bit longer as the
kernel contains a lot more other changes and we have to do more
exhaustive testing.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-19 Thread Sam Van den Eynde
I don't see the confirmation for Bionic in this bug report. Any update
when the 4.17 kernel lands in bionic-proposed? Or do I need another
kernel version for Bionic? What do I need exactly for my Bionic server?

This bug prevents me from updating my lxd containers, it will hang the
system consistently.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-09 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.7.5-1ubuntu16.3

---
zfs-linux (0.7.5-1ubuntu16.3) bionic; urgency=medium

  * Fix zpl_mount() deadlock (LP: #1781364)
- Upstream ZFS fix ac09630d8b0b ("Fix zpl_mount() deadlock")
  fixes deadlock on multiple parallelized mount/umounts

 -- Colin Ian King   Thu, 12 Jul 2018 09:18:24
+0100

** Changed in: zfs-linux (Ubuntu Bionic)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-07 Thread Colin Ian King
@Vasiliy, hopefully by early next week.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  Include any warning/errors/backtraces from the system logs
  

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-07 Thread Colin Ian King
Verified passed for Ubuntu Bionic using the reproducer described in
comment #1.  Marking as verified.

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-07 Thread Colin Ian King
Verified passed for Ubuntu Xenial using the reproducer described in
comment #1. Marking as verified.

** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-07 Thread Vasiliy
I can confirm now that fix works for me.
Do somebody knows when this kernel and zfsutils package will move from proposed 
to updates?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-07 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-03 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'. If the problem still exists,
change the tag 'verification-needed-xenial' to 'verification-failed-
xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Released
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-08-03 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.17.0-6.7

---
linux (4.17.0-6.7) cosmic; urgency=medium

  * linux: 4.17.0-6.7 -proposed tracker (LP: #1783396)

  * [Regression] EXT4-fs error (device sda2): ext4_validate_block_bitmap:383:
comm stress-ng: bg 4705: bad block bitmap checksum (LP: #1781709)
- SAUCE: Revert "UBUNTU: SAUCE: ext4: fix ext4_validate_inode_bitmap: comm
  stress-ng: Corrupt inode bitmap"
- SAUCE: ext4: check for allocation block validity with block group locked

  * Cosmic update to 4.17.9 stable release (LP: #1783201)
- userfaultfd: hugetlbfs: fix userfaultfd_huge_must_wait() pte access
- mm: hugetlb: yield when prepping struct pages
- mm: teach dump_page() to correctly output poisoned struct pages
- PCI / ACPI / PM: Resume bridges w/o drivers on suspend-to-RAM
- ACPICA: Drop leading newlines from error messages
- ACPI / battery: Safe unregistering of hooks
- drm/amdgpu: Make struct amdgpu_atif private to amdgpu_acpi.c
- tracing: Avoid string overflow
- tracing: Fix missing return symbol in function_graph output
- scsi: sg: mitigate read/write abuse
- scsi: aacraid: Fix PD performance regression over incorrect qd being set
- scsi: target: Fix truncated PR-in ReadKeys response
- s390: Correct register corruption in critical section cleanup
- drbd: fix access after free
- vfio: Use get_user_pages_longterm correctly
- ARM: dts: imx51-zii-rdu1: fix touchscreen pinctrl
- ARM: dts: omap3: Fix am3517 mdio and emac clock references
- ARM: dts: dra7: Disable metastability workaround for USB2
- cifs: Fix use after free of a mid_q_entry
- cifs: Fix memory leak in smb2_set_ea()
- cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting
- cifs: Fix infinite loop when using hard mount option
- drm: Use kvzalloc for allocating blob property memory
- drm/udl: fix display corruption of the last line
- drm/amdgpu: Add amdgpu_atpx_get_dhandle()
- drm/amdgpu: Dynamically probe for ATIF handle (v2)
- jbd2: don't mark block as modified if the handle is out of credits
- ext4: add corruption check in ext4_xattr_set_entry()
- ext4: always verify the magic number in xattr blocks
- ext4: make sure bitmaps and the inode table don't overlap with bg
  descriptors
- ext4: always check block group bounds in ext4_init_block_bitmap()
- ext4: only look at the bg_flags field if it is valid
- ext4: verify the depth of extent tree in ext4_find_extent()
- ext4: include the illegal physical block in the bad map ext4_error msg
- ext4: clear i_data in ext4_inode_info when removing inline data
- ext4: never move the system.data xattr out of the inode body
- ext4: avoid running out of journal credits when appending to an inline 
file
- ext4: add more inode number paranoia checks
- ext4: add more mount time checks of the superblock
- ext4: check superblock mapped prior to committing
- HID: i2c-hid: Fix "incomplete report" noise
- HID: hiddev: fix potential Spectre v1
- HID: debug: check length before copy_to_user()
- HID: core: allow concurrent registration of drivers
- i2c: core: smbus: fix a potential missing-check bug
- i2c: smbus: kill memory leak on emulated and failed DMA SMBus xfers
- fs: allow per-device dax status checking for filesystems
- dax: change bdev_dax_supported() to support boolean returns
- dax: check for QUEUE_FLAG_DAX in bdev_dax_supported()
- dm: prevent DAX mounts if not supported
- mtd: cfi_cmdset_0002: Change definition naming to retry write operation
- mtd: cfi_cmdset_0002: Change erase functions to retry for error
- mtd: cfi_cmdset_0002: Change erase functions to check chip good only
- netfilter: nf_log: don't hold nf_log_mutex during user access
- staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write()
- Revert mm/vmstat.c: fix vmstat_update() preemption BUG
- Linux 4.17.6
- bpf: reject passing modified ctx to helper functions
- MIPS: Call dump_stack() from show_regs()
- MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
- MIPS: Fix ioremap() RAM check
- drm/etnaviv: Check for platform_device_register_simple() failure
- drm/etnaviv: Fix driver unregistering
- drm/etnaviv: bring back progress check in job timeout handler
- ACPICA: Clear status of all events when entering S5
- mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz pinctrl states
- mmc: dw_mmc: fix card threshold control configuration
- mmc: renesas_sdhi_internal_dmac: Cannot clear the RX_IN_USE in abort
- ibmasm: don't write out of bounds in read handler
- staging: rtl8723bs: Prevent an underflow in rtw_check_beacon_data().
- staging: r8822be: Fix RTL8822be can't find any wireless AP
- ata: Fix ZBC_OUT command block check
- ata: Fix ZBC_OUT all bit handling
- mei: discard 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-28 Thread tenox
** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => tenox (senseimyijaki)

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => tenox (senseimyijaki)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-27 Thread Kleber Sacilotto de Souza
** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => Fix Committed

** Changed in: linux (Ubuntu Bionic)
   Status: Fix Committed => Confirmed

** Changed in: linux (Ubuntu Xenial)
   Status: Confirmed => Fix Committed

** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Fix Committed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-18 Thread Colin Ian King
The bug will be automatically updated when the -proposed kernel
containing the fix is ready, please wait for that message.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-18 Thread Simos Xenitellis 
@Vasiliy: Indeed.

The version in -proposed is "4.15.0-29.31" (source:
https://launchpad.net/ubuntu/bionic/+queue?queue_state=3_text
=linux-image)

The page for that version at 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1782173
does not have a reference to this bug number #1781364.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-18 Thread Vasiliy
It seems like 4.15.0-29 (uploaded right now to proposed) is not the
kernel we are looking for.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  Include any 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-16 Thread Simos Xenitellis 
zfsutils-linux (zfs-linux, zfs-linux_0.7.5-1ubuntu16.3) is already in proposed,
https://launchpad.net/ubuntu/bionic/+queue?queue_state=3_text=zfs-linux

Please report when linux-image gets into -proposed,
https://launchpad.net/ubuntu/bionic/+queue?queue_state=3_text=linux-image

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-16 Thread Vasiliy
Ah, I thought that kernel (module) in proposed already patched. I understood 
comment #8 like "don't forget to upgrade kernel from proposed!", while it is 
"wait until fixed kernel in proposed".
Now it makes sense, thank you.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-16 Thread Colin Ian King
Vasiliy, please refer to comment #8

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  Include any warning/errors/backtraces from the system logs
  dmesg output


[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-16 Thread Colin Ian King
The kernel driver fix will land in the next -proposed kernel as the
Ubuntu ZFS driver comes bundled with the kernel.

If you build zfs from source, then that will build the kernel driver as
a DKMS module with the fix in it and *that* will work.

One needs both the zfs userspace and the kernel for the entire bug fix.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-16 Thread Vasiliy
My experiments was here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1773392

I installed new 18.04 + zfs + lxd   - zfs hangs (well, that would be awkward if 
would not)
+zfsutils-linux=0.7.5-1ubuntu16.3   - zfs hangs (ok, we need upgrade kernel too)
+kernel=4.15.0.28 (all packages)- still hangs :( Am I missing something?
+everything_0.7.5=0.7.5-1ubuntu16.3 - still hangs...
+upgrade+dist-upgrade from proposed - still hangs...

So this bug doesn't fixed for me... And it is strange, because I used to
install zfs 0.7.9 from source with commit that fixed bug
(https://github.com/zfsonlinux/zfs/pull/7693) - and that worked for me
on default kernel (4.15.0.20).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-16 Thread Colin Ian King
This fix will only work once it lands in the updated kernel as well as
the user space packages, so please test once the updated kernel is also
in -proposed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-16 Thread Łukasz Zemczak
Hello Colin, or anyone else affected,

Accepted zfs-linux into bionic-proposed. The package will build now and
be available at https://launchpad.net/ubuntu/+source/zfs-
linux/0.7.5-1ubuntu16.3 in a few hours, and then in the -proposed
repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-bionic to verification-done-bionic. If it does not
fix the bug for you, please add a comment stating that, and change the
tag to verification-failed-bionic. In either case, without details of
your testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance!

** Changed in: zfs-linux (Ubuntu Bionic)
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-13 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux (Ubuntu Bionic)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-13 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux (Ubuntu Xenial)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-13 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: zfs-linux (Ubuntu Xenial)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-13 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: zfs-linux (Ubuntu Bionic)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Confirmed
Status in zfs-linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in zfs-linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, XENIAL, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-12 Thread Colin Ian King
** Description changed:

- == SRU Justification, BIONIC ==
+ == SRU Justification, XENIAL, BIONIC ==
  
  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.
  
  == How to reproduce bug ==
  
  In a VM, 2 CPUs, 16GB of memory running Bionic:
  
  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init
  
  (and with the default init options)
  
  then run:
  
  lxd-benchmark launch --count 96 --parallel 96
  
  This will reliably show the lockup every time without the fix.  With the
  fix (detailed below) one cannot reproduce the lockup.
  
  == Fix ==
  
  Upstream ZFS commit
  
  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700
  
- Fix zpl_mount() deadlock
+ Fix zpl_mount() deadlock
  
  == Regression Potential ==
  
  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises this
  code under multiple thread/CPU contention and shown not to break.
  
  --
  
  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691
  
  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got
  
  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  
  Describe how to reproduce the problem
  
  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.
  
  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  
  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.
  
  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:
  
  Now run the following to launch 48 containers in batches of 12.
  
  lxd-benchmark launch --count 48 --parallel 12
  
  In two out of four attempts, I got the kernel errors.
  
  I also tried
  
  echo 1 >/sys/module/spl/parameters/spl_taskq_kick
  
  but did not manage to continue.
  Include any warning/errors/backtraces from the system logs
  dmesg output
  
  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991408] lxd D0  4455  1 0x
  [  725.991412] Call Trace:
  [  725.991424]  __schedule+0x297/0x8b0
  [  725.991428]  schedule+0x2c/0x80
  [  725.991429]  rwsem_down_write_failed+0x162/0x360
  [  725.991460]  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
  [  725.991465]  call_rwsem_down_write_failed+0x17/0x30
  [  725.991468]  ? 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-12 Thread Seth Forshee
** Changed in: linux (Ubuntu Cosmic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  Fix Committed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  New
Status in linux source package in Bionic:
  New
Status in zfs-linux source package in Bionic:
  New
Status in linux source package in Cosmic:
  Fix Committed
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  Include any warning/errors/backtraces from the system logs
  

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-12 Thread Colin Ian King
** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: zfs-linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  New
Status in linux source package in Bionic:
  New
Status in zfs-linux source package in Bionic:
  New
Status in linux source package in Cosmic:
  In Progress
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]:

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-12 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.7.9-3ubuntu4

---
zfs-linux (0.7.9-3ubuntu4) cosmic; urgency=medium

  * Fix zpl_mount() deadlock (LP: #1781364)
- Upstream ZFS fix ac09630d8b0b ("Fix zpl_mount() deadlock")
  fixes deadlock on multiple parallelized mount/umounts

 -- Colin Ian King   Thu, 12 Jul 2018 09:18:24
+0100

** Changed in: zfs-linux (Ubuntu Cosmic)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  New
Status in linux source package in Bionic:
  New
Status in zfs-linux source package in Bionic:
  New
Status in linux source package in Cosmic:
  In Progress
Status in zfs-linux source package in Cosmic:
  Fix Released

Bug description:
  == SRU Justification, BIONIC ==

  Exercising ZFS with lxd with many mount/umounts can cause lockups and
  120 second timeout messages.

  == How to reproduce bug ==

  In a VM, 2 CPUs, 16GB of memory running Bionic:

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  sudo lxd init

  (and with the default init options)

  then run:

  lxd-benchmark launch --count 96 --parallel 96

  This will reliably show the lockup every time without the fix.  With
  the fix (detailed below) one cannot reproduce the lockup.

  == Fix ==

  Upstream ZFS commit

  commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
  Author: Brian Behlendorf 
  Date: Wed Jul 11 15:49:10 2018 -0700

  Fix zpl_mount() deadlock

  == Regression Potential ==

  This just changes the locking in the mount path of ZFS and will only
  affect ZFS mount/unmounts.  The regression potential is small as this
  touches a very small code path that has been exhaustively exercises
  this code under multiple thread/CPU contention and shown not to break.

  --

  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]:
   Do you want to configure a new storage pool? (yes/no) [default=yes]:
   Name of the new storage pool [default=default]:
   Name of the storage backend to use (dir, zfs) [default=zfs]:
   Create a new ZFS pool? (yes/no) [default=yes]:
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]:
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]:
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes]
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-12 Thread Colin Ian King
** Description changed:

+ == SRU Justification, BIONIC ==
+ 
+ Exercising ZFS with lxd with many mount/umounts can cause lockups and
+ 120 second timeout messages.
+ 
+ == How to reproduce bug ==
+ 
+ In a VM, 2 CPUs, 16GB of memory running Bionic:
+ 
+ sudo apt update
+ sudo apt install lxd lxd-client lxd-tools zfsutils-linux
+ sudo lxd init
+ 
+ (and with the default init options)
+ 
+ then run:
+ 
+ lxd-benchmark launch --count 96 --parallel 96
+ 
+ This will reliably show the lockup every time without the fix.  With the
+ fix (detailed below) one cannot reproduce the lockup.
+ 
+ == Fix ==
+ 
+ Upstream ZFS commit
+ 
+ commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
+ Author: Brian Behlendorf 
+ Date: Wed Jul 11 15:49:10 2018 -0700
+ 
+ Fix zpl_mount() deadlock
+ 
+ == Regression Potential ==
+ 
+ This just changes the locking in the mount path of ZFS and will only
+ affect ZFS mount/unmounts.  The regression potential is small as this
+ touches a very small code path that has been exhaustively exercises this
+ code under multiple thread/CPU contention and shown not to break.
+ 
+ --
+ 
  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691
  
  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got
  
  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  
  Describe how to reproduce the problem
  
- Start an Ubuntu 18.04 LTS server.
- Install LXD if not already installed.
+ Start an Ubuntu 18.04 LTS server.
+ Install LXD if not already installed.
  
  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux
  
- Configure LXD with sudo lxd init. When prompted for the storage
+ Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.
  
  $ sudo lxd init
- Would you like to use LXD clustering? (yes/no) [default=no]: 
-  Do you want to configure a new storage pool? (yes/no) [default=yes]: 
-  Name of the new storage pool [default=default]: 
-  Name of the storage backend to use (dir, zfs) [default=zfs]: 
-  Create a new ZFS pool? (yes/no) [default=yes]: 
-  Would you like to use an existing block device? (yes/no) [default=no]: yes
-  Path to the existing block device: /dev/sdb
-  Would you like to connect to a MAAS server? (yes/no) [default=no]: 
-  Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
-  Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
-  Would you like LXD to be available over the network? (yes/no) [default=no]: 
-  Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes] 
-  Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]: 
- 
- Now run the following to launch 48 containers in batches of 12.
+ Would you like to use LXD clustering? (yes/no) [default=no]:
+  Do you want to configure a new storage pool? (yes/no) [default=yes]:
+  Name of the new storage pool [default=default]:
+  Name of the storage backend to use (dir, zfs) [default=zfs]:
+  Create a new ZFS pool? (yes/no) [default=yes]:
+  Would you like to use an existing block device? (yes/no) [default=no]: yes
+  Path to the existing block device: /dev/sdb
+  Would you like to connect to a MAAS server? (yes/no) [default=no]:
+  Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
+  Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
+  Would you like LXD to be available over the network? (yes/no) [default=no]:
+  Would you like stale cached images to be updated automatically? 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-12 Thread Bug Watch Updater
** Changed in: linux
   Status: Unknown => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Fix Released
Status in linux package in Ubuntu:
  In Progress
Status in zfs-linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  New
Status in zfs-linux source package in Bionic:
  New
Status in linux source package in Cosmic:
  In Progress
Status in zfs-linux source package in Cosmic:
  In Progress

Bug description:
  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]: 
   Do you want to configure a new storage pool? (yes/no) [default=yes]: 
   Name of the new storage pool [default=default]: 
   Name of the storage backend to use (dir, zfs) [default=zfs]: 
   Create a new ZFS pool? (yes/no) [default=yes]: 
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]: 
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]: 
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes] 
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]: 

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  Include any warning/errors/backtraces from the system logs
  dmesg output

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991408] lxd D0  4455  1 0x
  [  725.991412] Call Trace:
  [  725.991424]  __schedule+0x297/0x8b0
  [  725.991428]  schedule+0x2c/0x80
  [  725.991429]  rwsem_down_write_failed+0x162/0x360
  [  725.991460]  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
  [  725.991465]  call_rwsem_down_write_failed+0x17/0x30
  [  725.991468]  ? call_rwsem_down_write_failed+0x17/0x30
  [  725.991469]  down_write+0x2d/0x40
  [  725.991472]  grab_super+0x30/0x90
  [  725.991501]  ? zpl_create+0x160/0x160 [zfs]
  [  725.991504]  sget_userns+0x91/0x490
  [  725.991507]  ? get_anon_bdev+0x100/0x100
  [  725.991534]  ? zpl_create+0x160/0x160 [zfs]
  [  725.991537]  sget+0x7d/0xa0
  [  725.991540]  ? get_anon_bdev+0x100/0x100
  [  725.991567]  zpl_mount+0xa8/0x160 [zfs]
  [  725.991570]  mount_fs+0x37/0x150
  [  725.991574]  vfs_kern_mount.part.23+0x5d/0x110
  [  725.991576]  do_mount+0x5ed/0xce0
  [  725.991577]  ? 

[Kernel-packages] [Bug 1781364] Re: Kernel error "task zfs:pid blocked for more than 120 seconds"

2018-07-12 Thread Colin Ian King
Upstream ZFS fix:

commit ac09630d8b0bf6c92084a30fdaefd03fd0adbdc1
Author: Brian Behlendorf 
Date:   Wed Jul 11 15:49:10 2018 -0700

Fix zpl_mount() deadlock


** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: zfs-linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Cosmic)
   Importance: High
 Assignee: Colin Ian King (colin-king)
   Status: In Progress

** Also affects: zfs-linux (Ubuntu Cosmic)
   Importance: High
 Assignee: Colin Ian King (colin-king)
   Status: In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781364

Title:
  Kernel error "task zfs:pid blocked for more than 120 seconds"

Status in Linux:
  Unknown
Status in linux package in Ubuntu:
  In Progress
Status in zfs-linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  New
Status in zfs-linux source package in Bionic:
  New
Status in linux source package in Cosmic:
  In Progress
Status in zfs-linux source package in Cosmic:
  In Progress

Bug description:
  ZFS bug report: https://github.com/zfsonlinux/zfs/issues/7691

  "I am using LXD containers that are configured to use a ZFS storage backend.
  I create many containers using a benchmark tool, which probably stresses the 
use of ZFS.
  In two out of four attempts, I got

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991624] INFO: task txg_sync:4202 blocked for more than 120 seconds.
  [  725.998264]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.005071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.013313] INFO: task lxd:99919 blocked for more than 120 seconds.
  [  726.019609]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.026418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.034560] INFO: task zfs:100513 blocked for more than 120 seconds.
  [  726.040936]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.047746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.055791] INFO: task zfs:100584 blocked for more than 120 seconds.
  [  726.062170]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  726.068979] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

  Describe how to reproduce the problem

  Start an Ubuntu 18.04 LTS server.
  Install LXD if not already installed.

  sudo apt update
  sudo apt install lxd lxd-client lxd-tools zfsutils-linux

  Configure LXD with sudo lxd init. When prompted for the storage
  backend, select ZFS and specify an empty disk.

  $ sudo lxd init
  Would you like to use LXD clustering? (yes/no) [default=no]: 
   Do you want to configure a new storage pool? (yes/no) [default=yes]: 
   Name of the new storage pool [default=default]: 
   Name of the storage backend to use (dir, zfs) [default=zfs]: 
   Create a new ZFS pool? (yes/no) [default=yes]: 
   Would you like to use an existing block device? (yes/no) [default=no]: yes
   Path to the existing block device: /dev/sdb
   Would you like to connect to a MAAS server? (yes/no) [default=no]: 
   Would you like to create a new local network bridge? (yes/no) [default=yes]: 
no
   Would you like to configure LXD to use an existing bridge or host interface? 
(yes/no) [default=no]: no
   Would you like LXD to be available over the network? (yes/no) [default=no]: 
   Would you like stale cached images to be updated automatically? (yes/no) 
[default=yes] 
   Would you like a YAML "lxd init" preseed to be printed? (yes/no) 
[default=no]: 

  Now run the following to launch 48 containers in batches of 12.

  lxd-benchmark launch --count 48 --parallel 12

  In two out of four attempts, I got the kernel errors.

  I also tried

  echo 1 >/sys/module/spl/parameters/spl_taskq_kick

  but did not manage to continue.
  Include any warning/errors/backtraces from the system logs
  dmesg output

  [  725.970508] INFO: task lxd:4455 blocked for more than 120 seconds.
  [  725.976730]   Tainted: P   O 4.15.0-20-generic #21-Ubuntu
  [  725.983551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  725.991408] lxd D0  4455  1 0x
  [  725.991412] Call Trace:
  [  725.991424]  __schedule+0x297/0x8b0
  [  725.991428]  schedule+0x2c/0x80
  [  725.991429]  rwsem_down_write_failed+0x162/0x360
  [  725.991460]  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
  [  725.991465]  call_rwsem_down_write_failed+0x17/0x30
  [  725.991468]  ? call_rwsem_down_write_failed+0x17/0x30
  [