[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2019-09-15 Thread Bug Watch Updater
** Changed in: zfs
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  Fix Released
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Zesty:
  Fix Released
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: btrfs
    Storage version: 4.4
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-09-04 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.6.5.6-0ubuntu18

---
zfs-linux (0.6.5.6-0ubuntu18) xenial; urgency=medium

  * Improve cloning performance for large numbers of clones (LP: #1567557)
Bump zcmd buffer from 16K to 256K.
  * Correctly install libraries such as libnvpair.so.1 to  /lib and not 
/usr/lib,
this also fixes boot issues (LP: #1596835)

 -- Colin Ian King   Fri, 11 Aug 2017 08:33:23
+0100

** Changed in: zfs-linux (Ubuntu Xenial)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Zesty:
  Fix Released
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: btrfs
    Storage version: 4.4
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-09-04 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.6.5.9-5ubuntu4.2

---
zfs-linux (0.6.5.9-5ubuntu4.2) zesty; urgency=medium

  * Improve cloning performance for large numbers of clones (LP: #1567557)
Bump zcmd buffer from 16K to 256K.

 -- Colin Ian King   Thu, 10 Aug 2017 13:48:11
+0100

** Changed in: zfs-linux (Ubuntu Zesty)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Zesty:
  Fix Released
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: btrfs
    Storage version: 4.4
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-09-04 Thread Łukasz Zemczak
As per comment #24, considering the upload to be tested for zesty as
well ^

** Tags added: verification-done-zesty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Released
Status in zfs-linux source package in Zesty:
  Fix Released
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: btrfs
    Storage version: 4.4
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-31 Thread Colin Ian King
@Brian, Yes indeed I have performed exhaustive testing on this on
Xenial, Zesty, Artful. A full exhaustive test takes 5 hours and I have
performed the upgrade steps and checked this out.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Zesty:
  Fix Committed
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: btrfs
    Storage version: 4.4
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-31 Thread Brian Murray
Has this also been verified with Ubuntu 17.04? While its unlikely people
are upgrading from 16.04 to 17.04 it is an upgrade path and we should
ensure this fix does not regress and release these updates together.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Zesty:
  Fix Committed
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: btrfs
    Storage version: 4.4
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-22 Thread Stéphane Graber
** No longer affects: lxd (Ubuntu Xenial)

** No longer affects: lxd (Ubuntu Zesty)

** No longer affects: lxd (Ubuntu Artful)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in zfs-linux source package in Xenial:
  Fix Committed
Status in zfs-linux source package in Zesty:
  Fix Committed
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: btrfs
    Storage version: 4.4
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-22 Thread Stéphane Graber
** No longer affects: lxd (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in lxd source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  Fix Committed
Status in lxd source package in Zesty:
  New
Status in zfs-linux source package in Zesty:
  Fix Committed
Status in lxd source package in Artful:
  Confirmed
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: btrfs
    Storage version: 4.4
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-18 Thread Colin Ian King
Tested with zfs 0.6.5.6-0ubuntu18 on Xenial. Passes zfs smoke tests. The
performance increase on Xenial is not so significant than later
releases, but I'm seeing double digit % increase in performance with a
test of 3000 clones.

** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in lxd source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  Fix Committed
Status in lxd source package in Zesty:
  New
Status in zfs-linux source package in Zesty:
  Fix Committed
Status in lxd source package in Artful:
  Confirmed
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: btrfs
    Storage version: 4.4
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-17 Thread Brian Murray
Hello Stéphane, or anyone else affected,

Accepted zfs-linux into xenial-proposed. The package will build now and
be available at https://launchpad.net/ubuntu/+source/zfs-
linux/0.6.5.6-0ubuntu18 in a few hours, and then in the -proposed
repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-xenial to verification-done-xenial. If it does not
fix the bug for you, please add a comment stating that, and change the
tag to verification-failed-xenial. In either case, details of your
testing will help us make a better decision.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance!

** Changed in: zfs-linux (Ubuntu Xenial)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in lxd source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  Fix Committed
Status in lxd source package in Zesty:
  New
Status in zfs-linux source package in Zesty:
  Fix Committed
Status in lxd source package in Artful:
  Confirmed
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-10 Thread Brian Murray
Hello Stéphane, or anyone else affected,

Accepted zfs-linux into zesty-proposed. The package will build now and
be available at https://launchpad.net/ubuntu/+source/zfs-
linux/0.6.5.9-5ubuntu4.2 in a few hours, and then in the -proposed
repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-zesty to verification-done-zesty. If it does not fix
the bug for you, please add a comment stating that, and change the tag
to verification-failed-zesty. In either case, details of your testing
will help us make a better decision.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance!

** Changed in: zfs-linux (Ubuntu Zesty)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in lxd source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  In Progress
Status in lxd source package in Zesty:
  New
Status in zfs-linux source package in Zesty:
  Fix Committed
Status in lxd source package in Artful:
  Confirmed
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  [SRU Justification]

  Creating tens of hundreds of clones can be prohibitively slow. The
  underlying mechanism to gather clone information is using a 16K buffer
  which limits performance.  Also, the initial assumption is to pass in
  zero sized buffer to   the underlying ioctl() to get an idea of the
  size of the buffer required to fetch information back to userspace.
  If we bump the initial buffer to a larger size then we reduce the need
  for two ioctl calls which improves performance.

  [Fix]
  Bump initial buffer size from 16K to 256K

  [Regression Potential]
  This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.

  [Test case]
  Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.


  
  --

  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
    Server backend: lxd
    Server version: 2.0.0.rc8
    Kernel: Linux
    Kernel architecture: x86_64
    Kernel version: 4.4.0-16-generic
    Storage backend: zfs
    Storage version: 5
    Container backend: lxc
    Container version: 2.0.0.rc15

  Test variables:
    Container count: 1024
    Container mode: privileged
    Image: images:alpine/edge/amd64
    Batches: 128
    Batch size: 8
    Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-10 Thread Colin Ian King
** Description changed:

+ [SRU Justification]
+ 
+ Creating tens of hundreds of clones can be prohibitively slow. The
+ underlying mechanism to gather clone information is using a 16K buffer
+ which limits performance.  Also, the initial assumption is to pass in
+ zero sized buffer to   the underlying ioctl() to get an idea of the size
+ of the buffer required to fetch information back to userspace.  If we
+ bump the initial buffer to a larger size then we reduce the need for two
+ ioctl calls which improves performance.
+ 
+ [Fix]
+ Bump initial buffer size from 16K to 256K
+ 
+ [Regression Potential]
+ This is minimal as this is just a tweak in the initial buffer size and larger 
sizes are handled correctly by ZFS since they are normally used on the second 
ioctl() call once we have established the size of the buffer required from the 
first ioctl() call. Larger initial buffers just remove the need for the initial 
size estimation for most cases where the number of clones is less than ~5000.  
There is a risk that a larger buffer size could lead to a ENOMEM issue when 
allocating the buffer, but the size of buffer used is still trivial for modern 
large 64 bit servers running ZFS.
+ 
+ [Test case]
+ Create 4000 clones. With the fix this takes 35-40% less time than without the 
fix. See the example test.sh script as an example of how to create this many 
clones.
+ 
+ 
+ 
+ --
+ 
  I've been running some scale tests for LXD and what I've noticed is that
  "zfs clone" gets slower and slower as the zfs filesystem is getting
  busier.
  
  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.
  
  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.
  
  The test is run on a completely dedicated physical server with the pool
  on a dedicated SSD, the exact same machine and SSD was used for the
  btrfs test.
  
  The zfs filesystem is configured with those settings:
-  - relatime=on
-  - sync=disabled
-  - xattr=sa
+  - relatime=on
+  - sync=disabled
+  - xattr=sa
  
  So it shouldn't be related to pending sync() calls...
  
  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.
  
  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
-   Server backend: lxd
-   Server version: 2.0.0.rc8
-   Kernel: Linux
-   Kernel architecture: x86_64
-   Kernel version: 4.4.0-16-generic
-   Storage backend: zfs
-   Storage version: 5
-   Container backend: lxc
-   Container version: 2.0.0.rc15
+   Server backend: lxd
+   Server version: 2.0.0.rc8
+   Kernel: Linux
+   Kernel architecture: x86_64
+   Kernel version: 4.4.0-16-generic
+   Storage backend: zfs
+   Storage version: 5
+   Container backend: lxc
+   Container version: 2.0.0.rc15
  
  Test variables:
-   Container count: 1024
-   Container mode: privileged
-   Image: images:alpine/edge/amd64
-   Batches: 128
-   Batch size: 8
-   Remainder: 0
+   Container count: 1024
+   Container mode: privileged
+   Image: images:alpine/edge/amd64
+   Batches: 128
+   Batch size: 8
+   Remainder: 0
  
  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s
  
  === btrfs ===
  Test environment:
-   Server backend: lxd
-   Server version: 2.0.0.rc8
-   Kernel: Linux
-   Kernel architecture: x86_64
-   Kernel version: 4.4.0-16-generic
-   Storage backend: btrfs
-   Storage version: 4.4
-   Container backend: lxc
-   Container version: 2.0.0.rc15
+   Server backend: lxd
+   Server version: 2.0.0.rc8
+   Kernel: Linux
+   Kernel architecture: x86_64
+   Kernel version: 4.4.0-16-generic
+   Storage backend: btrfs
+   Storage version: 4.4
+   Container backend: lxc
+   Container version: 2.0.0.rc15
  
  Test variables:
-   Container count: 1024
-   Container mode: privileged
-   Image: images:alpine/edge/amd64
-   Batches: 128
-   Batch size: 8
-   Remainder: 0
+   Container count: 1024
+   Container mode: privileged
+   Image: images:alpine/edge/amd64
+   Batches: 128
+   Batch size: 8
+   Remainder: 0
  
  [Apr  3 07:42:12.053] Importing 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-10 Thread Colin Ian King
** Changed in: zfs-linux (Ubuntu Xenial)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu Zesty)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu Zesty)
   Status: New => In Progress

** Changed in: zfs-linux (Ubuntu Xenial)
   Status: New => In Progress

** Changed in: zfs-linux (Ubuntu Xenial)
   Importance: Undecided => Low

** Changed in: zfs-linux (Ubuntu Zesty)
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in lxd source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  In Progress
Status in lxd source package in Zesty:
  New
Status in zfs-linux source package in Zesty:
  In Progress
Status in lxd source package in Artful:
  Confirmed
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-08 Thread Launchpad Bug Tracker
This bug was fixed in the package zfs-linux - 0.6.5.11-1ubuntu2

---
zfs-linux (0.6.5.11-1ubuntu2) artful; urgency=medium

  * Improve cloning performance for large numbers of clones (LP: #1567557)
Bump zcmd buffer from 16K to 256K.

 -- Colin Ian King   Tue, 8 Aug 2017 17:19:21
+0100

** Changed in: zfs-linux (Ubuntu Artful)
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Fix Released
Status in lxd source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  New
Status in lxd source package in Zesty:
  New
Status in zfs-linux source package in Zesty:
  New
Status in lxd source package in Artful:
  Confirmed
Status in zfs-linux source package in Artful:
  Fix Released

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-08 Thread Colin Ian King
I've bumped up the zcmd size to reduce the double ioctl() calls and
we're seeing 25-40% improvement clones > 1000. This is an interim fix
until we get a better fix from upstream.  I'm going to SRU this once
we're OK with it in Arful.  Attached are the new benchmarks.


** Attachment added: "Updated benchmarks comparing old and new performance 
stats for large numbers of clones"
   
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1567557/+attachment/4928925/+files/zfs-clone-performance.ods

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged
Status in lxd source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  New
Status in lxd source package in Zesty:
  New
Status in zfs-linux source package in Zesty:
  New
Status in lxd source package in Artful:
  Confirmed
Status in zfs-linux source package in Artful:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-08-08 Thread Colin Ian King
** Also affects: zfs-linux (Ubuntu Artful)
   Importance: Low
 Assignee: Colin Ian King (colin-king)
   Status: Triaged

** Also affects: lxd (Ubuntu Artful)
   Importance: High
   Status: Confirmed

** Also affects: zfs-linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: lxd (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: zfs-linux (Ubuntu Zesty)
   Importance: Undecided
   Status: New

** Also affects: lxd (Ubuntu Zesty)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged
Status in lxd source package in Xenial:
  New
Status in zfs-linux source package in Xenial:
  New
Status in lxd source package in Zesty:
  New
Status in zfs-linux source package in Zesty:
  New
Status in lxd source package in Artful:
  Confirmed
Status in zfs-linux source package in Artful:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone"

2017-07-20 Thread Colin Ian King
** Summary changed:

- Performance degradation of "zfs clone" when under load
+ Performance degradation of "zfs clone"

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone"

Status in Native ZFS for Linux:
  New
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-19 Thread Bug Watch Updater
** Changed in: zfs
   Status: Unknown => New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in Native ZFS for Linux:
  New
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-19 Thread Colin Ian King
Based on measurements, we get a x^3 polynomial in clone time.  Sub 2000
clones it's not too bad, but over 2000 clones the delay in the ioctl()
ramps up pretty quickly.  I've got a fairly solid set of stats and drew
some estimates based on the trend line. See attached datasheet.


** Attachment added: "Spreadsheet of clone times based on number of clones"
   
https://bugs.launchpad.net/zfs/+bug/1567557/+attachment/4917705/+files/zfs-clone-performance.ods

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in Native ZFS for Linux:
  Unknown
Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-19 Thread Colin Ian King
Running strace against zfs create I see the following ioctl() taking the
time:

1500475028.118005 ioctl(3, _IOC(0, 0x5a, 0x12, 0x00), 0x7ffc7c2184f0) = -1 
ENOMEM (Cannot allocate memory) <0.390093>
1500475028.508153 mmap(NULL, 290816, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd487b000 <0.17>
1500475028.508201 ioctl(3, _IOC(0, 0x5a, 0x12, 0x00), 0x7ffc7c2184f0) = 0 
<0.382304>

I believe this and ioctl on /dev/zfs, namely ZFS_IOC_OBJSET_STATS which
is getting stats on all the zfs file systems. This ioctl takes longer to
do as the number of clones increases. I believe this is the API causing
the bottleneck.

perf shows that over 99.9% of the zfs clone is indeed performing this
ioctl:

-   99.39% 0.00%  zfs[kernel.kallsyms]   [k] sys_ioctl  
   
 sys_ioctl  
   
 do_vfs_ioctl   
   
   - zfsdev_ioctl   
   
  - 99.33% zfs_ioc_objset_stats 
   
 - 99.30% zfs_ioc_objset_stats_impl.part.20 
  
- 99.21% dmu_objset_stats   
  
   - dsl_dataset_stats  
  
  - 99.18% get_clones_stat  
   
 - 60.46% fnvlist_add_nvlist
   
  nvlist_add_nvlist 
   
  nvlist_add_common.part.51 
   
  nvlist_copy_embedded.isra.54  
   
  nvlist_copy_pairs.isra.52 
   
- nvlist_add_common.part.51 
   
   - 30.35% nvlist_copy_embedded.isra.54
   
nvlist_copy_pairs.isra.52   
   
  + nvlist_add_common.part.51   

   - 29.62% nvlist_remove_all.part.49   

strcmp  
 
 - 31.23% fnvlist_add_boolean   
 
- nvlist_add_boolean
  
- nvlist_add_common.part.51 

   - 30.20% nvlist_remove_all.part.49   
  
strcmp  
  
 0.94% strcmp   
   
 - 6.37% dsl_dataset_hold_obj   
   
- 6.28% dmu_bonus_hold  
   
   - 5.89% dnode_hold   
   
  - 5.83% dnode_hold_impl   
   
 - 5.40% dbuf_read  
   
- dmu_zfetch   

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Colin Ian King
So compound that with the huge amount of CPU suckage I'm seeing with
lxd...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Stéphane Graber
That's pure ZFS completely outside of LXD.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Stéphane Graber
(but doing something pretty similar to what LXD does internally as far
as clones and mountpoint handling)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Stéphane Graber
I'm seeing similar performance degradation on xenial except that the
server I used seems to be pretty slow to start with too:

root@athos:~# sh test.sh 
Creating 100 clones
Took: 11 seconds (9/s)
Creating 200 clones
Took: 46 seconds (4/s)
Creating 400 clones
Took: 297 seconds (1/s)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Stéphane Graber
Creating 100 clones
Took: 4 seconds (25/s)
Creating 200 clones
Took: 13 seconds (15/s)
Creating 400 clones
Took: 46 seconds (8/s)
Creating 600 clones
Took: 156 seconds (3/s)


```
#!/bin/sh
zfs destroy -R castiana/testzfs
rm -Rf /tmp/testzfs

zfs create castiana/testzfs -o mountpoint=none
zfs snapshot castiana/testzfs@base
mkdir /tmp/testzfs

clone() {
echo "Creating ${1} clones"
BASE=$(date +%s)
for i in $(seq ${1}); do
UUID=$(uuidgen)
zfs clone castiana/testzfs@base castiana/testzfs-$UUID -o 
mountpoint=/tmp/testzfs/${UUID}
done
END=$(date +%s)
TOTAL=$((END-BASE))
PER=$((${1}/${TOTAL}))
echo "Took: $TOTAL seconds ($PER/s)"
}

clone 100
clone 200
clone 400
clone 600
```

That's on an up to date artful system. I'll do a similar test on an up
to date xenial system.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in lxd package in Ubuntu:
  Confirmed
Status in zfs-linux package in Ubuntu:
  Triaged

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Colin Ian King
Looks like there is a lot of contention on a futex and the underlying
database too. Perf record and perf report can show that most of the
issues are more to do with lxd, database and futex lock contention. Once
these are resolved, I'll be happy to re-analyze the entire ZFS + lxd
stack, but I believe the main issue is lxd eating cycles and not ZFS per
se.

-   27.21% 1.05%  lxd  [kernel.kallsyms]
  [k] entry_SYSCALL_64_fastpath 
   - 26.16% entry_SYSCALL_64_fastpath   

  - 19.28% sys_futex

 - 18.91% do_futex  

- 12.70% futex_wake 

   + 9.91% wake_up_q

   + 0.90% _raw_spin_lock   
   
+ 5.00% futex_wait  
   
  + 1.48% sys_select

  + 0.73% sys_newstat   
   
  + 0.57% syscall_return_slowpath   

  + 0.54% sys_pread64   

-   25.16% 0.00%  lxd  libgolang-1.8-std.so.1   
  [.] runtime.goexit▒
   - runtime.goexit 

  - 14.98% main.(*operation).Run.func1  

 - 13.92% main.createFromImage.func1

- 13.86% main.containerCreateFromImage  

   - 12.26% main.containerCreateInternal

  - 12.24% main.containerLXCCreate  

 - 11.30% main.networkUpdateStatic  

- 9.16% main.containerLoadByName

   - 4.79% main.dbContainerGet  

  + 2.15% main.dbContainerConfig

  + 1.13% main.dbQueryRowScan   

  + 0.82% main.dbContainerProfiles  

  + 0.59% main.dbDevices

   + 4.33% main.containerLXCLoad
   
+ 1.98% main.(*containerLXC).fillNetworkDevice  

 + 0.86% main.(*containerLXC).ConfigKeySet  

   + 0.70% main.dbImageGet  

 0.67% main.(*storageZfs).ContainerCreateFromImage  

 + 1.05% main.containerStatePut.func1   

  + 2.37% main.deviceEventListener  

  + 2.21% runtime.mcall 
 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Colin Ian King
Running health-check against lxd I see that there is a lot of contention
on a futex and a load of context switches going on in lxd.




** Attachment added: "output from health-check when examining lxd"
   
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1567557/+attachment/4909747/+files/health-check-lxd-with-zfs.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in lxd package in Ubuntu:
  New
Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Colin Ian King
I've now compared ZFS with BTRFS and the overall CPU load profile from
lxd in both scenarios is surprisingly similar - lxd is eating a lot of
CPU and I think there is an underlying issue there rather than a
massively broken performance issue with ZFS.

See the attached data sheet; the CPU consumption graphs for lxd in both
scenarios show that lxd's CPU consumption increases with the number of
container instances and this seems to be causing throttling of the ZFS
backing store.  Why is lxd eating so much CPU?


** Attachment added: "lxd-zfs-btrfs-comparisons.ods"
   
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1567557/+attachment/4909746/+files/lxd-zfs-btrfs-comparisons.ods

** Also affects: lxd (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: zfs-linux (Ubuntu)
   Status: Incomplete => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in lxd package in Ubuntu:
  New
Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Colin Ian King
After a load tests, I've found the same anomaly with btrfs. I'm using a
80GB btrfs raw device, 16GB memory, 16 CPU threads:

BTRFS:
Time
  24.02632
  54.58464
 125.520   128
 272.835   256
 809.038   512
3451.172  1024

So the pinch point happens later than with ZFS, but there seems to be a
scaling issue here too.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Colin Ian King
Most of the CPU being consumed is in strcmp()

-   99.39% 0.00%  zfs[kernel.kallsyms]   [k] zfsdev_ioctl   
▒
   - 99.39% zfsdev_ioctl

  - 99.34% zfs_ioc_objset_stats 

 - 99.22% zfs_ioc_objset_stats_impl.part.20 

- 99.01% dmu_objset_stats   

   - 99.00% dsl_dataset_stats   

  - 98.98% get_clones_stat  

 - 62.55% fnvlist_add_nvlist

  nvlist_add_nvlist 

  nvlist_add_common.part.51 

  nvlist_copy_embedded.isra.54  

- nvlist_copy_pairs.isra.52 

   - 62.52% nvlist_add_common.part.51   

  - 31.25% nvlist_copy_embedded.isra.54 

 - nvlist_copy_pairs.isra.52

- 31.23% nvlist_add_common.part.51  

   - 30.78% nvlist_remove_all.part.49   

strcmp  

  - 30.86% nvlist_remove_all.part.49

   strcmp   

 - 31.73% fnvlist_add_boolean   

- nvlist_add_boolean

   - 31.72% nvlist_add_common.part.51   

  - 31.33% nvlist_remove_all.part.49

   strcmp   

 + 4.16% dsl_dataset_hold_obj

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
 

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-07-05 Thread Colin Ian King
I've created 4096 zfs clones using the same commands that lxd is using
and I have observed that the biggest pinch point is when fetching the
zfs clone stats as this has to lock using dsl_dataset_hold_obj, fetch
the data and then unlock with dsl_dataset_rele.  Traversing hundreds of
clones is linear and slow, taking ~1 second with ~4000+ clones.

+   99.22% 0.00%  zfs[kernel.kallsyms]   [k] 
entry_SYSCALL_64_fastpath   
   
-   99.14% 0.00%  zfs[kernel.kallsyms]   [k] zfsdev_ioctl   

   - zfsdev_ioctl   

  - 99.08% zfs_ioc_objset_stats 

 - 98.87% zfs_ioc_objset_stats_impl.part.20 

- 98.81% dmu_objset_stats   

   - 98.81% dsl_dataset_stats   

  - 98.76% get_clones_stat  

 - 63.56% fnvlist_add_nvlist
   
  nvlist_add_nvlist 

  nvlist_add_common.part.51 

  nvlist_copy_embedded.isra.54  

+ nvlist_copy_pairs.isra.52 

 + 32.74% fnvlist_add_boolean   

 + 1.81% dsl_dataset_hold_obj

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s

[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-05-31 Thread Stéphane Graber
I'm trying to remember if we had to bump any of the sysctls to actually
reach 1024 containers, I don't think any of the usual suspects would be
in play until you reach 2000+ Alpine containers though.

If you do run out of some kernel resources, you can try applying the following 
sysctls to get you past that:
net.ipv6.neigh.default.gc_thresh3 = 8192
net.ipv6.neigh.default.gc_thresh2 = 4096
net.ipv6.neigh.default.gc_thresh1 = 1024
net.ipv6.route.gc_thresh = 8192
kernel.pty.max = 65536
kernel.pid_max = 2097152
fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-05-31 Thread Stéphane Graber
Our test machines aren't particularly impressive, just 12GB of RAM or so.
Note that as can be seen above, we're using Alpine (busybox) images rather than 
Ubuntu to limit the resource usage and get us to a lot more containers per 
system.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-05-31 Thread Colin Ian King
I've tried to reproduce this but when I pass more than 256 containers I
run out of resources. What kind of configuration are you using to test
this.  Once I can reproduce this I'll try and see what the performance
constraint is.

** Changed in: zfs-linux (Ubuntu)
   Status: In Progress => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in zfs-linux package in Ubuntu:
  Incomplete

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1567557] Re: Performance degradation of "zfs clone" when under load

2017-04-20 Thread Colin Ian King
** Changed in: zfs-linux (Ubuntu)
   Importance: Undecided => Medium

** Changed in: zfs-linux (Ubuntu)
 Assignee: (unassigned) => Colin Ian King (colin-king)

** Changed in: zfs-linux (Ubuntu)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1567557

Title:
  Performance degradation of "zfs clone" when under load

Status in zfs-linux package in Ubuntu:
  In Progress

Bug description:
  I've been running some scale tests for LXD and what I've noticed is
  that "zfs clone" gets slower and slower as the zfs filesystem is
  getting busier.

  It feels like "zfs clone" requires some kind of pool-wide lock or
  something and so needs for all operations to complete before it can
  clone a new filesystem.

  A basic LXD scale test with btrfs vs zfs shows what I mean, see below
  for the reports.

  The test is run on a completely dedicated physical server with the
  pool on a dedicated SSD, the exact same machine and SSD was used for
  the btrfs test.

  The zfs filesystem is configured with those settings:
   - relatime=on
   - sync=disabled
   - xattr=sa

  So it shouldn't be related to pending sync() calls...

  The workload in this case is ultimately 1024 containers running busybox as 
their init system and udhcpc grabbing an IP.
  The problem gets significantly worse if spawning busier containers, say a 
full Ubuntu system.

  === zfs ===
  root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 
--image=images:alpine/edge/amd64 --privileged=true
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 06:42:51.170] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 06:42:52.657] Starting the test
  [Apr  3 06:42:53.994] Started 8 containers in 1.336s
  [Apr  3 06:42:55.521] Started 16 containers in 2.864s
  [Apr  3 06:42:58.632] Started 32 containers in 5.975s
  [Apr  3 06:43:05.399] Started 64 containers in 12.742s
  [Apr  3 06:43:20.343] Started 128 containers in 27.686s
  [Apr  3 06:43:57.269] Started 256 containers in 64.612s
  [Apr  3 06:46:09.112] Started 512 containers in 196.455s
  [Apr  3 06:58:19.309] Started 1024 containers in 926.652s
  [Apr  3 06:58:19.309] Test completed in 926.652s

  === btrfs ===
  Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15

  Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0

  [Apr  3 07:42:12.053] Importing image into local store: 
64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
  [Apr  3 07:42:13.351] Starting the test
  [Apr  3 07:42:14.793] Started 8 containers in 1.442s
  [Apr  3 07:42:16.495] Started 16 containers in 3.144s
  [Apr  3 07:42:19.881] Started 32 containers in 6.530s
  [Apr  3 07:42:26.798] Started 64 containers in 13.447s
  [Apr  3 07:42:42.048] Started 128 containers in 28.697s
  [Apr  3 07:43:13.210] Started 256 containers in 59.859s
  [Apr  3 07:44:26.238] Started 512 containers in 132.887s
  [Apr  3 07:47:30.708] Started 1024 containers in 317.357s
  [Apr  3 07:47:30.708] Test completed in 317.357s

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1567557/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp