[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-12-06 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.8.0-30.32

---
linux (4.8.0-30.32) yakkety; urgency=low

  * CVE-2016-8655 (LP: #1646318)
- packet: fix race condition in packet_set_ring

 -- Brad Figg   Thu, 01 Dec 2016 08:02:53 -0800

** Changed in: linux (Ubuntu Zesty)
   Status: In Progress => Fix Released

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2016-8655

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released
Status in linux source package in Zesty:
  Fix Released

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-29 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.4.0-51.72

---
linux (4.4.0-51.72) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
- LP: #1644611

  * 4.4.0-1037-snapdragon #41: kernel panic on boot (LP: #1644596)
- Revert "dma-mapping: introduce the DMA_ATTR_NO_WARN attribute"
- Revert "powerpc: implement the DMA_ATTR_NO_WARN attribute"
- Revert "nvme: use the DMA_ATTR_NO_WARN attribute"

linux (4.4.0-50.71) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
- LP: #1644169

  * xenial 4.4.0-49.70 kernel breaks LXD userspace (LP: #1644165)
- Revert "UBUNTU: SAUCE: (namespace) fuse: Allow user namespace mounts by
  default"
- Revert "UBUNTU: SAUCE: (namespace) fs: Don't remove suid for CAP_FSETID 
for
  userns root"
- Revert "(namespace) Revert "UBUNTU: SAUCE: fs: Don't remove suid for
  CAP_FSETID in s_user_ns""
- Revert "UBUNTU: SAUCE: (namespace) fs: Allow superblock owner to change
  ownership of inodes"
- Revert "(namespace) Revert "UBUNTU: SAUCE: fs: Allow superblock owner to
  change ownership of inodes with unmappable ids""
- Revert "UBUNTU: SAUCE: (namespace) security/integrity: Harden against
  malformed xattrs"
- Revert "(namespace) Revert "UBUNTU: SAUCE: ima/evm: Allow root in 
s_user_ns
  to set xattrs""
- Revert "(namespace) dquot: For now explicitly don't support filesystems
  outside of init_user_ns"
- Revert "(namespace) quota: Handle quota data stored in s_user_ns in
  quota_setxquota"
- Revert "(namespace) quota: Ensure qids map to the filesystem"
- Revert "(namespace) Revert "UBUNTU: SAUCE: quota: Convert ids relative to
  s_user_ns""
- Revert "(namespace) Revert "UBUNTU: SAUCE: quota: Require that qids passed
  to dqget() be valid and map into s_user_ns""
- Revert "(namespace) vfs: Don't create inodes with a uid or gid unknown to
  the vfs"
- Revert "(namespace) vfs: Don't modify inodes with a uid or gid unknown to
  the vfs"
- Revert "UBUNTU: SAUCE: (namespace) fuse: Translate ids in posix acl 
xattrs"
- Revert "UBUNTU: SAUCE: (namespace) posix_acl: Export
  posix_acl_fix_xattr_userns() to modules"
- Revert "(namespace) vfs: Verify acls are valid within superblock's
  s_user_ns."
- Revert "(namespace) Revert "UBUNTU: SAUCE: fs: Update posix_acl support to
  handle user namespace mounts""
- Revert "(namespace) fs: Refuse uid/gid changes which don't map into
  s_user_ns"
- Revert "(namespace) Revert "UBUNTU: SAUCE: fs: Refuse uid/gid changes 
which
  don't map into s_user_ns""
- Revert "(namespace) mnt: Move the FS_USERNS_MOUNT check into sget_userns"

linux (4.4.0-49.70) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
- LP: #1640921

  * Infiniband driver (kernel module) needed for Azure (LP: #1641139)
- SAUCE: RDMA Infiniband for Windows Azure
- [Config] CONFIG_HYPERV_INFINIBAND_ND=m
- SAUCE: Makefile RDMA infiniband driver for Windows Azure
- [Config] Add hv_network_direct.ko to generic inclusion list
- SAUCE: RDMA Infiniband for Windows Azure is dependent on amd64

linux (4.4.0-48.69) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
- LP: #1640758

  * lxc-attach to malicious container allows access to host (LP: #1639345)
- Revert "UBUNTU: SAUCE: (noup) ptrace: being capable wrt a process requires
  mapped uids/gids"
- (upstream) mm: Add a user_ns owner to mm_struct and fix ptrace permission
  checks

  * take 'P' command from upstream xmon (LP: #1637978)
- powerpc/xmon: Add xmon command to dump process/task similar to ps(1)

  * zfs: importing zpool with vdev on zvol hangs kernel (LP: #1636517)
- SAUCE: (noup) Update zfs to 0.6.5.6-0ubuntu15

  * I2C touchpad does not work on AMD platform (LP: #1612006)
- pinctrl/amd: Configure GPIO register using BIOS settings
- pinctrl/amd: switch to using a bool for level

  * [LTCTest] vfio_pci not loaded on Ubuntu 16.10 by default (LP: #1636733)
- [Config] CONFIG_VFIO_PCI=y for ppc64el

  * QEMU throws failure msg while booting guest with SRIOV VF (LP: #1630554)
- KVM: PPC: Always select KVM_VFIO, plus Makefile cleanup

  * Allow fuse user namespace mounts by default in xenial (LP: #1634964)
- (namespace) mnt: Move the FS_USERNS_MOUNT check into sget_userns
- (namespace) Revert "UBUNTU: SAUCE: fs: Refuse uid/gid changes which don't
  map into s_user_ns"
- (namespace) fs: Refuse uid/gid changes which don't map into s_user_ns
- (namespace) Revert "UBUNTU: SAUCE: fs: Update posix_acl support to handle
  user namespace mounts"
- (namespace) vfs: Verify acls are valid within superblock's s_user_ns.
- SAUCE: (namespace) posix_acl: Export posix_acl_fix_xattr_userns() to 
modules
- SAUCE: (namespace) fuse: Translate ids in posix acl xattrs
- (namespace) vfs: Don't modify inodes with a uid 

[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-29 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.8.0-28.30

---
linux (4.8.0-28.30) yakkety; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
- LP: #1641083

  * lxc-attach to malicious container allows access to host (LP: #1639345)
- Revert "UBUNTU: SAUCE: (noup) ptrace: being capable wrt a process requires
  mapped uids/gids"
- (upstream) mm: Add a user_ns owner to mm_struct and fix ptrace permission
  checks

  * [Feature] AVX-512 new instruction sets (avx512_4vnniw, avx512_4fmaps)
(LP: #1637526)
- x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features

  * zfs: importing zpool with vdev on zvol hangs kernel (LP: #1636517)
- SAUCE: (noup) Update zfs to 0.6.5.8-0ubuntu4.1

  * Move some device drivers build from kernel built-in to modules
(LP: #1637303)
- [Config] CONFIG_TIGON3=m for all arches
- [Config] CONFIG_VIRTIO_BLK=m, CONFIG_VIRTIO_NET=m

  * I2C touchpad does not work on AMD platform (LP: #1612006)
- pinctrl/amd: Configure GPIO register using BIOS settings

  * guest experiencing Transmit Timeouts on CX4 (LP: #1636330)
- powerpc/64: Re-fix race condition between going idle and entering guest
- powerpc/64: Fix race condition in setting lock bit in idle/wakeup code

  * QEMU throws failure msg while booting guest with SRIOV VF (LP: #1630554)
- KVM: PPC: Always select KVM_VFIO, plus Makefile cleanup

  * [Feature] KBL - New device ID for Kabypoint(KbP) (LP: #1591618)
- SAUCE: mfd: lpss: Fix Intel Kaby Lake PCH-H properties

  * hio: SSD data corruption under stress test (LP: #1638700)
- SAUCE: hio: set bi_error field to signal an I/O error on a BIO
- SAUCE: hio: splitting bio in the entry of .make_request_fn

  * cleanup primary tree for linux-hwe layering issues (LP: #1637473)
- [Config] switch Vcs-Git: to yakkety repository
- [Packaging] handle both linux-lts* and linux-hwe* as backports
- [Config] linux-tools-common and linux-cloud-tools-common are one per 
series
- [Config] linux-source-* is in the primary linux namespace
- [Config] linux-tools -- always suggest the base package

  * SRU: sync zfsutils-linux and spl-linux changes to linux (LP: #1635656)
- SAUCE: (noup) Update spl to 0.6.5.8-2, zfs to 0.6.5.8-0ubuntu4 (LP:
  #1635656)

  * [Feature] SKX: perf uncore PMU support (LP: #1591810)
- perf/x86/intel/uncore: Add Skylake server uncore support
- perf/x86/intel/uncore: Remove hard-coded implementation for Node ID 
mapping
  location
- perf/x86/intel/uncore: Handle non-standard counter offset

  * [Feature] Purley: Memory Protection Keys (LP: #1591804)
- x86/pkeys: Add fault handling for PF_PK page fault bit
- mm: Implement new pkey_mprotect() system call
- x86/pkeys: Make mprotect_key() mask off additional vm_flags
- x86/pkeys: Allocation/free syscalls
- x86: Wire up protection keys system calls
- generic syscalls: Wire up memory protection keys syscalls
- pkeys: Add details of system call use to Documentation/
- x86/pkeys: Default to a restrictive init PKRU
- x86/pkeys: Allow configuration of init_pkru
- x86/pkeys: Add self-tests

  * kernel invalid opcode in intel_powerclamp (LP: #1630774)
- SAUCE: (no-up) thermal/powerclamp: correct cpu support check

  * please include mlx5_core modules in linux-image-generic package
(LP: #1635223)
- [Config] Include mlx5 in main package

  * [LTCTest] vfio_pci not loaded on Ubuntu 16.10 by default (LP: #1636733)
- [Config] CONFIG_VFIO_PCI=y for ppc64el

  * Yakkety update to v4.8.6 stable release (LP: #1638748)
- drm/vc4: Fix races when the CS reads from render targets.
- drm/prime: Pass the right module owner through to dma_buf_export()
- drm/i915/backlight: setup and cache pwm alternate increment value
- drm/i915/backlight: setup backlight pwm alternate increment on backlight
  enable
- drm/amdgpu: fix IB alignment for UVD
- drm/amdgpu/dce10: disable hpd on local panels
- drm/amdgpu/dce8: disable hpd on local panels
- drm/amdgpu/dce11: disable hpd on local panels
- drm/amdgpu/dce11: add missing drm_mode_config_cleanup call
- drm/amdgpu: initialize the context reset_counter in amdgpu_ctx_init
- drm/amdgpu: change vblank_time's calculation method to reduce 
computational
  error.
- drm/radeon: narrow asic_init for virtualization
- drm/radeon/si/dpm: fix phase shedding setup
- drm/radeon: change vblank_time's calculation method to reduce 
computational
  error.
- drm/vmwgfx: Limit the user-space command buffer size
- drm/fsl-dcu: fix endian issue when using clk_register_divider
- drm/amd/powerplay: fix mclk not switching back after multi-head was 
disabled
- HID: add quirk for Akai MIDImix.
- drm/i915/skl: Update plane watermarks atomically during plane updates
- drm/i915: Move CRTC updating in atomic_commit into it's own hook
- drm/i915/skl: Update DDB values 

[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-21 Thread JuanJo Ciarlante
FTR/FYI (as per chatter w/kamal) we're waiting for >= 4.8.0-28
to be available at https://launchpad.net/ubuntu/+source/linux-hwe-edge

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Yakkety:
  Fix Committed
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-17 Thread James Troup
Corruption no longer visible with 4.4.0-49-generic:

  https://pastebin.canonical.com/171039/

Testing with Yakkety proper would be messy - what's the timeline for
getting an updated 4.8 kernel into Xenial?  I currently only see
4.8.0-27.29~16.04.1.


** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Yakkety:
  Fix Committed
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-16 Thread Luis Henriques
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
yakkety' to 'verification-done-yakkety'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Yakkety:
  Fix Committed
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-16 Thread Luis Henriques
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

** Tags added: verification-needed-yakkety

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Yakkety:
  Fix Committed
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-04 Thread Kamal Mostafa
** Changed in: linux (Ubuntu Xenial)
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu Yakkety)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Yakkety:
  Fix Committed
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-03 Thread Kamal Mostafa
https://lists.ubuntu.com/archives/kernel-team/2016-November/080705.html

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-03 Thread James Troup
** Tags added: canonical-bootstack

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-02 Thread Ming Lei
On Thu, Nov 3, 2016 at 5:42 AM, Kamal Mostafa  wrote:
> Ming Lei comment #2 says you're the author of this patch to the hio
> driver:
>
> +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,3,0))
> +   blk_queue_split(q, , q->bio_split);
> +#endif
> +
>
> Can you provide us with a short explanation for the git log, and also
> your Signed-off-by line for that patch?

Sure, please see the attachment.

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1638700
>
> Title:
>   hio: SSD data corruption under stress test
>
> Status in linux package in Ubuntu:
>   In Progress
> Status in linux source package in Xenial:
>   In Progress
> Status in linux source package in Yakkety:
>   In Progress
> Status in linux source package in Zesty:
>   In Progress
>
> Bug description:
>   {forward from James Troup}:
>
>   Just to followup to this with a little more information, we have now
>   reproduced this in the following scenarios:
>
>* Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
>* With and without Bcache involved
>* With both XFS and ext4
>* With HIO driver versions 2.1.0-23 and 2.1.0-25
>* With HIO Firmware 640 and 650
>* With and without the following two patches
> - 
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
> - 
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba
>
>   In all cases, we applied the following two patches in order to get hio
>   to build at all with a 4.4 or later kernel:
>
> 
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74
> 
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577
>
>   We've confirmed that we can reproduce the corruption on any machine in
>   Tele2's Vienna facility.
>
>   We've confirmed that, other than 1 machine, the 'hio_info' command
>   says the health is 'OK'.
>
>   Our most common reproducer is one of two scenarios:
>
>a) http://paste.ubuntu.com/23405150/
>
>b) http://paste.ubuntu.com/23405234/
>
>   In the last example, it's possible to see corruption faster by
>   increasing the 'count' argument to dd and avoid it by lowering it.
>   e.g. on the machine I'm currently testing on count=52450 doesn't
>   appear to show corruption, but a count of even 53000 would show it
>   immediately every time.
>
>   I hope this helps - please let us know what further information we can
>   provide to debug this problem.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions


** Patch added: 
"0001-hio.c-splitting-bio-in-the-entry-of-.make_request_fn.patch"
   
https://bugs.launchpad.net/bugs/1638700/+attachment/4771506/+files/0001-hio.c-splitting-bio-in-the-entry-of-.make_request_fn.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see 

[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-02 Thread Kamal Mostafa
On Wed, Nov 2, 2016 at 2:08 PM, James Troup 
wrote:

> I've confirmed the corruption is gone with the two fixes above on both
4.4 and 4.8.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-02 Thread Kamal Mostafa
James, thanks for clarifying that.  In that case, all my references to
"hio driver 2.1.0.26" should actually say "the patch set James is
testing", and the two fixes should still likely both be applied to the
Ubuntu hio driver.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-02 Thread Kamal Mostafa
Ming Lei comment #2 says you're the author of this patch to the hio
driver:

+#if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,3,0))
+   blk_queue_split(q, , q->bio_split);
+#endif
+

Can you provide us with a short explanation for the git log, and also
your Signed-off-by line for that patch?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-02 Thread Kamal Mostafa
Side-note about bug in hio driver 2.1.0.26 (#1: bio_endio() shim macro
sets bi_error field).  This turns out NOT the cause of this data
corruption bug per James' testing, but should be corrected (both in the
upstream driver and Ubuntu).

The idea behind this change is correct (the macro shim should indeed set
bi_error) but the implementation is unsafe:

 #if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,3,0))
-#define bio_endio(bio, errors) bio_endio(bio)
+#define bio_endio(bio, errors) \
+   bio->bi_error = errors; \
+   bio_endio(bio)
 #endif

Instead, the macro content must resolve to a single expression, e.g.:

+#define bio_endio(bio, errors) \
+   do { bio->bi_error = errors; bio_endio(bio); } while (0)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-02 Thread James Troup
Sorry - for the avoidance of doubt, neither of these changes are in
2.1.0.26 - 2.1.0.26 doesn't even include your patches.  I was testing
2.1.0.26 + our patches + the 2 changes you mention.  #1 does come from
Huawei, #2 comes from Ming Lei of Canonical.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-02 Thread James Troup
(Also 2.1.0.26 is not public yet for reasons Huawei are trying to figure
out, 2.1.0.25 is the last publicly available driver.)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1638700] Re: hio: SSD data corruption under stress test

2016-11-02 Thread Kamal Mostafa
Huawei's upstream hio driver version 2.1.0.26 introduces two changes
relative to the current Ubuntu version:

#1. bio_endio() shim macro sets bi_error field {{ but note bug in 
implementation! }}
#2. blk_queue_split() call inserted ssd_make_request()

James' initial test results indicate that #2 appears to fix the data
corruption.  Both fixes should be applied (though #1 needs correction)
after sufficient testing confirmation.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  {forward from James Troup}:

  Just to followup to this with a little more information, we have now
  reproduced this in the following scenarios:

   * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
   * With and without Bcache involved
   * With both XFS and ext4
   * With HIO driver versions 2.1.0-23 and 2.1.0-25
   * With HIO Firmware 640 and 650
   * With and without the following two patches
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
- 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

  In all cases, we applied the following two patches in order to get hio
  to build at all with a 4.4 or later kernel:


https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

  We've confirmed that we can reproduce the corruption on any machine in
  Tele2's Vienna facility.

  We've confirmed that, other than 1 machine, the 'hio_info' command
  says the health is 'OK'.

  Our most common reproducer is one of two scenarios:

   a) http://paste.ubuntu.com/23405150/

   b) http://paste.ubuntu.com/23405234/

  In the last example, it's possible to see corruption faster by
  increasing the 'count' argument to dd and avoid it by lowering it.
  e.g. on the machine I'm currently testing on count=52450 doesn't
  appear to show corruption, but a count of even 53000 would show it
  immediately every time.

  I hope this helps - please let us know what further information we can
  provide to debug this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp