[Bug 1833319] Re: Performance degradation when copying from LVM snapshot backed by NVMe disk

2019-07-24 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.4.0-157.185

---
linux (4.4.0-157.185) xenial; urgency=medium

  * linux: 4.4.0-157.185 -proposed tracker (LP: #1837476)

  * systemd 229-4ubuntu21.22 ADT test failure with linux 4.4.0-156.183 (storage)
(LP: #1837235)
- Revert "block/bio: Do not zero user pages"
- Revert "block: Clear kernel memory before copying to user"
- Revert "bio_copy_from_iter(): get rid of copying iov_iter"

linux (4.4.0-156.183) xenial; urgency=medium

  * linux: 4.4.0-156.183 -proposed tracker (LP: #1836880)

  * BCM43602 802.11ac Wireless regression - PCI ID 14e4:43ba (LP: #1836801)
- brcmfmac: add eth_type_trans back for PCIe full dongle

linux (4.4.0-155.182) xenial; urgency=medium

  * linux: 4.4.0-155.182 -proposed tracker (LP: #1834918)

  * Geneve tunnels don't work when ipv6 is disabled (LP: #1794232)
- geneve: correctly handle ipv6.disable module parameter

  * Kernel modules generated incorrectly when system is localized to a non-
English language (LP: #1828084)
- scripts: override locale from environment when running recordmcount.pl

  * Handle overflow in proc_get_long of sysctl (LP: #1833935)
- sysctl: handle overflow in proc_get_long

  * Xenial update: 4.4.181 upstream stable release (LP: #1832661)
- x86/speculation/mds: Revert CPU buffer clear on double fault exit
- x86/speculation/mds: Improve CPU buffer clear documentation
- ARM: exynos: Fix a leaked reference by adding missing of_node_put
- crypto: vmx - fix copy-paste error in CTR mode
- crypto: crct10dif-generic - fix use via crypto_shash_digest()
- crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
- ALSA: usb-audio: Fix a memory leak bug
- ALSA: hda/hdmi - Consider eld_valid when reporting jack event
- ALSA: hda/realtek - EAPD turn on later
- ASoC: max98090: Fix restore of DAPM Muxes
- ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
- mm/mincore.c: make mincore() more conservative
- ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
- mfd: da9063: Fix OTP control register names to match datasheets for
  DA9063/63L
- tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
- ext4: actually request zeroing of inode table after grow
- ext4: fix ext4_show_options for file systems w/o journal
- Btrfs: do not start a transaction at iterate_extent_inodes()
- bcache: fix a race between cache register and cacheset unregister
- bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
- ipmi:ssif: compare block number correctly for multi-part return messages
- crypto: gcm - Fix error return code in crypto_gcm_create_common()
- crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
- crypto: chacha20poly1305 - set cra_name correctly
- crypto: salsa20 - don't access already-freed walk.iv
- crypto: arm/aes-neonbs - don't access already-freed walk.iv
- writeback: synchronize sync(2) against cgroup writeback membership 
switches
- fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going
  into workqueue when umount
- ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
- KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
- net: avoid weird emergency message
- net/mlx4_core: Change the error print to info print
- ppp: deflate: Fix possible crash in deflate_init
- tipc: switch order of device registration to fix a crash
- tipc: fix modprobe tipc failed after switch order of device registration
- stm class: Fix channel free in stm output free path
- md: add mddev->pers to avoid potential NULL pointer dereference
- intel_th: msu: Fix single mode with IOMMU
- of: fix clang -Wunsequenced for be32_to_cpu()
- cifs: fix strcat buffer overflow and reduce raciness in
  smb21_set_oplock_level()
- media: ov6650: Fix sensor possibly not detected on probe
- NFS4: Fix v4.0 client state corruption when mount
- clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider
- fuse: fix writepages on 32bit
- fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
- iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114
- ceph: flush dirty inodes before proceeding with remount
- tracing: Fix partial reading of trace event's id file
- memory: tegra: Fix integer overflow on tick value calculation
- perf intel-pt: Fix instructions sampling rate
- perf intel-pt: Fix improved sample timestamp
- perf intel-pt: Fix sample timestamp wrt non-taken branches
- fbdev: sm712fb: fix brightness control on reboot, don't set SR30
- fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75
- fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F
- fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA
- fbdev: sm712fb: fix crashes during framebuffer writes by correctly 

[Bug 1833319] Re: Performance degradation when copying from LVM snapshot backed by NVMe disk

2019-07-07 Thread Matthew Ruffell
I enabled -proposed and installed 4.4.0-1088-aws #99-Ubuntu, and again
went through the test case on a c5.large instance on aws.

The problem is solved and performance is restored, and performs the same
as a non-snapshot mounted disk.

Again, we can see merging has been enabled with:

$ cat /sys/block/nvme1n1/queue/nomerges
0

The problem is fixed. Happy with verification status of done.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1833319

Title:
  Performance degradation when copying from LVM snapshot backed by NVMe
  disk

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1833319/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1833319] Re: Performance degradation when copying from LVM snapshot backed by NVMe disk

2019-07-03 Thread Matthew Ruffell
I enabled enabled -proposed and installed 4.4.0-155.182, and went through the
test case on a c5.large instance on aws. Note, I used the -generic kernel since
-aws doesn't seem to be ready yet.

The problem is solved and performance is the same as non-snapshot
mounted disks.

We can see that merging has been enabled by looking at the flag:

$ cat /sys/block/nvme1n1/queue/nomerges
0

The problem is fixed. Changing tag to verified.

** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1833319

Title:
  Performance degradation when copying from LVM snapshot backed by NVMe
  disk

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1833319/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1833319] Re: Performance degradation when copying from LVM snapshot backed by NVMe disk

2019-07-03 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'. If the problem still exists,
change the tag 'verification-needed-xenial' to 'verification-failed-
xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1833319

Title:
  Performance degradation when copying from LVM snapshot backed by NVMe
  disk

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1833319/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1833319] Re: Performance degradation when copying from LVM snapshot backed by NVMe disk

2019-06-26 Thread Khaled El Mously
** Changed in: linux (Ubuntu Xenial)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1833319

Title:
  Performance degradation when copying from LVM snapshot backed by NVMe
  disk

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1833319/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1833319] Re: Performance degradation when copying from LVM snapshot backed by NVMe disk

2019-06-18 Thread Matthew Ruffell
** Description changed:

  BugLink: https://bugs.launchpad.net/bugs/1833319
  
  [Impact]
  When copying files from a mounted LVM snapshot which resides on NVMe storage 
devices, there is a massive performance degradation in the rate sectors are 
read from the disk.
  
  The kernel is not merging sector requests and is instead issuing many small
  sector requests to the NVMe storage controller instead of one larger request.
  
  Experiments have shown a 14x-25x performance degradation in reads, where
  copies used to take seconds, now take minutes, and copies which took
  thirty minutes now take many hours.
  
- [Fix]
- 
  The following was found with btrace, running alongside cat (see
  Testing):
- 
- Standard lvm copy:
- $ cat /mnt/dummy1 1> /dev/null
- LVM snapshot copy:
- $ cat /tmp/mount.backup_OXV/dummy2 1> /dev/null
- 
- Tracing:
- # btrace /dev/nvme1n1 > trace.data
- 
- Looking at the "control" case, of copying from /mnt, which is the
- standard lvm volume, see see a trace like:
- 
- 259,01   13 0.002545516  1579  A   R 280576 + 512 <- (252,0) 
278528
- 259,01   14 0.002545701  1579  Q   R 280576 + 512 [cat]
- 259,01   15 0.002547020  1579  G   R 280576 + 512 [cat]
- 259,01   16 0.002547631  1579  U   N [cat] 1
- 259,01   17 0.002547775  1579  I  RS 280576 + 512 [cat]
- 259,01   18 0.002551381  1579  D  RS 280576 + 512 [cat]
- 259,01   19 0.004099666 0  C  RS 280576 + 512 [0]
  
  A = IO remapped to different device
  Q = IO handled by request queue
  G = Get request
  U = Unplug request
  I = IO inserted onto request queue
  D = IO issued to driver
  C = IO completion
  
- Firstly, the request is mapped from a different device, from /mnt which
- is dm-1 to the nvme disk. A 512 sector read is placed on the IO request
- queue, where it is then inserted into the driver request queue and then
- the driver is commanded to fetch the data, and then it completes.
- 
- Now, when reading from the LVM snapshot, we see:
+ When reading from the LVM snapshot, we see:
  
  259,01  113 0.001117160  1606  A   R 837872 + 8 <- (252,0) 835824
  259,01  114 0.001117276  1606  Q   R 837872 + 8 [cat]
  259,01  115 0.001117451  1606  G   R 837872 + 8 [cat]
  259,01  116 0.001117979  1606  A   R 837880 + 8 <- (252,0) 835832
  259,01  117 0.001118119  1606  Q   R 837880 + 8 [cat]
  259,01  118 0.001118285  1606  G   R 837880 + 8 [cat]
  259,01  122 0.001121613  1606  I  RS 837640 + 8 [cat]
  259,01  123 0.001121687  1606  I  RS 837648 + 8 [cat]
  259,01  124 0.001121758  1606  I  RS 837656 + 8 [cat]
  ...
  259,01  154 0.001126118   377  D  RS 837648 + 8 [kworker/1:1H]
  259,01  155 0.001126445   377  D  RS 837656 + 8 [kworker/1:1H]
  259,01  156 0.001126871   377  D  RS 837664 + 8 [kworker/1:1H]
  ...
  259,01  183 0.001848512 0  C  RS 837632 + 8 [0]
  
  Now what is happening here, is that a request for 8 sector read is
  placed onto the IO request queue, and is then inserted one at a time to
  the driver request queue and then fetched by the driver.
  
- Comparing this behaviour to reading data from a LVM snapshot on 4.6 mainline+
- or the Ubuntu 4.15 HWE kernel:
+ Comparing this behaviour to reading data from a LVM snapshot on 4.6+
+ mainline or the Ubuntu 4.15 HWE kernel:
  
  M = IO back merged with request on queue
  
  259,00  194 0.000532515  1897  A   R 7358960 + 8 <- (253,0) 
7356912
  259,00  195 0.000532634  1897  Q   R 7358960 + 8 [cat]
  259,00  196 0.000532810  1897  M   R 7358960 + 8 [cat]
  259,00  197 0.000533864  1897  A   R 7358968 + 8 <- (253,0) 
7356920
  259,00  198 0.000533991  1897  Q   R 7358968 + 8 [cat]
  259,00  199 0.000534177  1897  M   R 7358968 + 8 [cat]
  259,00  200 0.000534474  1897 UT   N [cat] 1
  259,00  201 0.000534586  1897  I   R 7358464 + 512 [cat]
  259,00  202 0.000537055  1897  D   R 7358464 + 512 [cat]
  259,00  203 0.002242539 0  C   R 7358464 + 512 [0]
  
  This shows us a 8 sector read is added to the request queue, and is then
  subsequently [M]erged backward with other requests on the queue until the sum 
of all of those merged requests becomes 512 sectors. From there, the 512 sector 
read is placed onto the IO queue, where it is fetched by the device driver, and 
completes.
  
- The problem is that the 4.4 xenial kernel is not merging 8 sector
- requests.
+ [Fix]
  
- I came across this bugzilla entry,
+ The problem is that the NVMe driver on 4.4 xenial kernel is not merging
+ 8 sector requests.
  
- https://bugzilla.kernel.org/show_bug.cgi?id=117051
- 
- and we see that merging is controlled by a sysfs entry,
+ Merging is controlled per device by this sysfs entry:
  /sys/block/nvme1n1/queue/nomerges
  
 

[Bug 1833319] Re: Performance degradation when copying from LVM snapshot backed by NVMe disk

2019-06-18 Thread Matthew Ruffell
** Description changed:

- BugLink:
+ BugLink: https://bugs.launchpad.net/bugs/1833319
  
  [Impact]
- When copying files from a mounted LVM snapshot which resides on NVMe storage
- devices, there is a massive performance degradation in the rate sectors are 
- read from the disk.
+ When copying files from a mounted LVM snapshot which resides on NVMe storage 
devices, there is a massive performance degradation in the rate sectors are 
read from the disk.
  
  The kernel is not merging sector requests and is instead issuing many small
  sector requests to the NVMe storage controller instead of one larger request.
  
- Experiments have shown a 14x-25x performance degradation in reads, where 
copies
- used to take seconds, now take minutes, and copies which took thirty minutes 
- now take many hours.
- 
+ Experiments have shown a 14x-25x performance degradation in reads, where
+ copies used to take seconds, now take minutes, and copies which took
+ thirty minutes now take many hours.
  
  [Fix]
  
  The following was found with btrace, running alongside cat (see
  Testing):
  
  Standard lvm copy:
  $ cat /mnt/dummy1 1> /dev/null
  LVM snapshot copy:
  $ cat /tmp/mount.backup_OXV/dummy2 1> /dev/null
  
  Tracing:
  # btrace /dev/nvme1n1 > trace.data
  
- Looking at the "control" case, of copying from /mnt, which is the standard lvm
- volume, see see a trace like:
+ Looking at the "control" case, of copying from /mnt, which is the
+ standard lvm volume, see see a trace like:
  
  259,01   13 0.002545516  1579  A   R 280576 + 512 <- (252,0) 
278528
  259,01   14 0.002545701  1579  Q   R 280576 + 512 [cat]
  259,01   15 0.002547020  1579  G   R 280576 + 512 [cat]
  259,01   16 0.002547631  1579  U   N [cat] 1
  259,01   17 0.002547775  1579  I  RS 280576 + 512 [cat]
  259,01   18 0.002551381  1579  D  RS 280576 + 512 [cat]
  259,01   19 0.004099666 0  C  RS 280576 + 512 [0]
  
  A = IO remapped to different device
  Q = IO handled by request queue
  G = Get request
  U = Unplug request
  I = IO inserted onto request queue
  D = IO issued to driver
  C = IO completion
  
- Firstly, the request is mapped from a different device, from /mnt which is 
dm-1
- to the nvme disk. A 512 sector read is placed on the IO request queue, where 
it
- is then inserted into the driver request queue and then the driver is 
commanded
- to fetch the data, and then it completes.
+ Firstly, the request is mapped from a different device, from /mnt which
+ is dm-1 to the nvme disk. A 512 sector read is placed on the IO request
+ queue, where it is then inserted into the driver request queue and then
+ the driver is commanded to fetch the data, and then it completes.
  
  Now, when reading from the LVM snapshot, we see:
  
  259,01  113 0.001117160  1606  A   R 837872 + 8 <- (252,0) 835824
  259,01  114 0.001117276  1606  Q   R 837872 + 8 [cat]
  259,01  115 0.001117451  1606  G   R 837872 + 8 [cat]
  259,01  116 0.001117979  1606  A   R 837880 + 8 <- (252,0) 835832
  259,01  117 0.001118119  1606  Q   R 837880 + 8 [cat]
  259,01  118 0.001118285  1606  G   R 837880 + 8 [cat]
  259,01  122 0.001121613  1606  I  RS 837640 + 8 [cat]
  259,01  123 0.001121687  1606  I  RS 837648 + 8 [cat]
  259,01  124 0.001121758  1606  I  RS 837656 + 8 [cat]
  ...
  259,01  154 0.001126118   377  D  RS 837648 + 8 [kworker/1:1H]
  259,01  155 0.001126445   377  D  RS 837656 + 8 [kworker/1:1H]
  259,01  156 0.001126871   377  D  RS 837664 + 8 [kworker/1:1H]
  ...
  259,01  183 0.001848512 0  C  RS 837632 + 8 [0]
  
- Now what is happening here, is that a request for 8 sector read is placed onto
- the IO request queue, and is then inserted one at a time to the driver request
- queue and then fetched by the driver.
+ Now what is happening here, is that a request for 8 sector read is
+ placed onto the IO request queue, and is then inserted one at a time to
+ the driver request queue and then fetched by the driver.
  
  Comparing this behaviour to reading data from a LVM snapshot on 4.6 mainline+
  or the Ubuntu 4.15 HWE kernel:
  
  M = IO back merged with request on queue
  
  259,00  194 0.000532515  1897  A   R 7358960 + 8 <- (253,0) 
7356912
  259,00  195 0.000532634  1897  Q   R 7358960 + 8 [cat]
  259,00  196 0.000532810  1897  M   R 7358960 + 8 [cat]
  259,00  197 0.000533864  1897  A   R 7358968 + 8 <- (253,0) 
7356920
  259,00  198 0.000533991  1897  Q   R 7358968 + 8 [cat]
  259,00  199 0.000534177  1897  M   R 7358968 + 8 [cat]
  259,00  200 0.000534474  1897 UT   N [cat] 1
  259,00  201 0.000534586  1897  I   R 7358464 + 512 [cat]
  259,00  202 0.000537055  1897  D   R 7358464 + 512 [cat]
  259,0