[Kernel-packages] [Bug 1842271] Re: Infinite loop in __blkdev_issue_discard() consumes the entire system memory when formatting a raid array

2019-10-03 Thread Po-Hsu Lin
Patch could be found in Bionic-gcp master branch and Eoan, mark this one
as Fix-released.

** Changed in: linux (Ubuntu)
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1842271

Title:
  Infinite loop in __blkdev_issue_discard() consumes the entire system
  memory when formatting a raid array

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1842271

  [Impact]

  There is a regression present in block in 4.15.0-56, which causes the
  machine to consume all system memory and cause the OOM reaper to come
  out, when attempting to format a newly created raid array on a series
  of NVMe cluster with an xfs filesystem.

  The symptoms can be found with all system memory ending up in
  kmalloc-256 slab, with a nondescript call trace being printed.

  The problem is caused by the below two commits:

  commit 3c2f83d8bcbedeb89efcaf55ae64a99dce9d7e34
  Author: Ming Lei 
  Date:   Fri Oct 12 15:53:10 2018 +0800
  Subject: block: don't deal with discard limit in blkdev_issue_discard()
  BugLink: https://bugs.launchpad.net/bugs/1836802
  Upstream-commit: 744889b7cbb56a64f957e65ade7cb65fe3f35714
  Pastebin: https://paste.ubuntu.com/p/WKsmwKgCMc/

  commit b515257f186e532e0668f7deabcb04b5d27505cf
  Author: Ming Lei 
  Date:   Mon Oct 29 20:57:17 2018 +0800
  Subject: block: make sure discard bio is aligned with logical block size
  BugLink: https://bugs.launchpad.net/bugs/1836802
  Upstream-commit: 1adfc5e4136f5967d591c399aff95b3b035f16b7
  Pastebin: https://paste.ubuntu.com/p/TcnTck64PJ/

  Now, the fault was triggered in two stages. Firstly, in "block: don't
  deal with discard limit in blkdev_issue_discard()" a while loop was
  changed such that there is a possibility of an infinite loop if
  __blkdev_issue_discard() is called with nr_sects > 0 and req_sects
  somehow becomes 0:

  int __blkdev_issue_discard(..., sector_t nr_sects, ...)
  {
  ...
  while (nr_sects) {
  unsigned int req_sects = nr_sects;
  sector_t end_sect;

  end_sect = sector + req_sects;
  ...
  nr_sects -= req_sects;
  sector = end_sect;
  ...
  }

  if req_sects is 0, then end_sect is always equal to sector, and the
  most important part, nr_sects is only decremented in one place, by
  req_sects, which if 0, would lead to the infinite loop condition.

  Now, since req_sects is initially equal to nr_sects, the loop would
  never be entered in the first place if nr_sects is 0.

  This is where the second commit, "block: make sure discard bio is
  aligned with logical block size" comes in.

  This commit adds a line to the above loop, to allow req_sects to be
  set to a new value:

  int __blkdev_issue_discard(..., sector_t nr_sects, ...)
  {
  ...
  while (nr_sects) {
  unsigned int req_sects = nr_sects;
  sector_t end_sect;

  req_sects = min(req_sects, bio_allowed_max_sectors(q));

  end_sect = sector + req_sects;
  ...
  nr_sects -= req_sects;
  sector = end_sect;
  ...
  }

  We see that req_sects will now be the minimum of itself and
  bio_allowed_max_sectors(q), a new function introduced by the same
  commit.

  static inline unsigned int bio_allowed_max_sectors(struct request_queue *q)
  {
     return round_down(UINT_MAX, queue_logical_block_size(q)) >> 9;
  }

  queue_logical_block_size(q) queries the hardware for the logical block
  size of the underlying device.

  static inline unsigned short queue_logical_block_size(struct request_queue *q)
  {
  int retval = 512;

  if (q && q->limits.logical_block_size)
  retval = q->limits.logical_block_size;

  return retval;
  }

  if q->limits.logical_block_size is 512 or smaller, then bit shifted
  right by 9 yields 0.

  bio_allowed_max_sectors will return 0, and the min with req_sects ==
  nr_sects will favour the new 0.

  This causes nr_sects to never be decremented since req_sects is 0, and
  req_sects will never change since the min() that takes in itself will
  always favour the 0.

  From there the infinite loop iterates and fills up the kmalloc-256
  slab with newly created bio entries, until all memory is exhausted and
  the OOM reaper comes out and starts killing processes, which is
  ineffective since this is a kernel memory leak.

  [Fix]

  The fix comes in the form of:

  commit a55264933f12c2fdc28a66841c4724021e8c1caf
  Author: Mikulas Patocka 
  Date:   Tue Jul 3 13:34:22 2018 -0400
  Subject: block: fix infinite loop if the device loses discard capability
  BugLink: https://bugs.launchpad.net/bugs/1837257
  Upstream-commit: b88aef36b87c9787a4db724923ec4f57dfd513f3
  Pastebin: https://paste.ubuntu.com/p/bgC9vQnB8x/

  This adds a check right after the min(req_sects,
  bio_allowed_max_sectors(q)); to test if req_sects has been set to 0,
  and if 

[Kernel-packages] [Bug 1842271] Re: Infinite loop in __blkdev_issue_discard() consumes the entire system memory when formatting a raid array

2019-09-08 Thread Matthew Ruffell
This was released in 4.15.0-59-generic and works as intended. Marking
Fix Released. Note, this still needs to land in linux-gcp for Bionic due
to some technical reasons.

** Changed in: linux (Ubuntu Bionic)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1842271

Title:
  Infinite loop in __blkdev_issue_discard() consumes the entire system
  memory when formatting a raid array

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1842271

  [Impact]

  There is a regression present in block in 4.15.0-56, which causes the
  machine to consume all system memory and cause the OOM reaper to come
  out, when attempting to format a newly created raid array on a series
  of NVMe cluster with an xfs filesystem.

  The symptoms can be found with all system memory ending up in
  kmalloc-256 slab, with a nondescript call trace being printed.

  The problem is caused by the below two commits:

  commit 3c2f83d8bcbedeb89efcaf55ae64a99dce9d7e34
  Author: Ming Lei 
  Date:   Fri Oct 12 15:53:10 2018 +0800
  Subject: block: don't deal with discard limit in blkdev_issue_discard()
  BugLink: https://bugs.launchpad.net/bugs/1836802
  Upstream-commit: 744889b7cbb56a64f957e65ade7cb65fe3f35714
  Pastebin: https://paste.ubuntu.com/p/WKsmwKgCMc/

  commit b515257f186e532e0668f7deabcb04b5d27505cf
  Author: Ming Lei 
  Date:   Mon Oct 29 20:57:17 2018 +0800
  Subject: block: make sure discard bio is aligned with logical block size
  BugLink: https://bugs.launchpad.net/bugs/1836802
  Upstream-commit: 1adfc5e4136f5967d591c399aff95b3b035f16b7
  Pastebin: https://paste.ubuntu.com/p/TcnTck64PJ/

  Now, the fault was triggered in two stages. Firstly, in "block: don't
  deal with discard limit in blkdev_issue_discard()" a while loop was
  changed such that there is a possibility of an infinite loop if
  __blkdev_issue_discard() is called with nr_sects > 0 and req_sects
  somehow becomes 0:

  int __blkdev_issue_discard(..., sector_t nr_sects, ...)
  {
  ...
  while (nr_sects) {
  unsigned int req_sects = nr_sects;
  sector_t end_sect;

  end_sect = sector + req_sects;
  ...
  nr_sects -= req_sects;
  sector = end_sect;
  ...
  }

  if req_sects is 0, then end_sect is always equal to sector, and the
  most important part, nr_sects is only decremented in one place, by
  req_sects, which if 0, would lead to the infinite loop condition.

  Now, since req_sects is initially equal to nr_sects, the loop would
  never be entered in the first place if nr_sects is 0.

  This is where the second commit, "block: make sure discard bio is
  aligned with logical block size" comes in.

  This commit adds a line to the above loop, to allow req_sects to be
  set to a new value:

  int __blkdev_issue_discard(..., sector_t nr_sects, ...)
  {
  ...
  while (nr_sects) {
  unsigned int req_sects = nr_sects;
  sector_t end_sect;

  req_sects = min(req_sects, bio_allowed_max_sectors(q));

  end_sect = sector + req_sects;
  ...
  nr_sects -= req_sects;
  sector = end_sect;
  ...
  }

  We see that req_sects will now be the minimum of itself and
  bio_allowed_max_sectors(q), a new function introduced by the same
  commit.

  static inline unsigned int bio_allowed_max_sectors(struct request_queue *q)
  {
     return round_down(UINT_MAX, queue_logical_block_size(q)) >> 9;
  }

  queue_logical_block_size(q) queries the hardware for the logical block
  size of the underlying device.

  static inline unsigned short queue_logical_block_size(struct request_queue *q)
  {
  int retval = 512;

  if (q && q->limits.logical_block_size)
  retval = q->limits.logical_block_size;

  return retval;
  }

  if q->limits.logical_block_size is 512 or smaller, then bit shifted
  right by 9 yields 0.

  bio_allowed_max_sectors will return 0, and the min with req_sects ==
  nr_sects will favour the new 0.

  This causes nr_sects to never be decremented since req_sects is 0, and
  req_sects will never change since the min() that takes in itself will
  always favour the 0.

  From there the infinite loop iterates and fills up the kmalloc-256
  slab with newly created bio entries, until all memory is exhausted and
  the OOM reaper comes out and starts killing processes, which is
  ineffective since this is a kernel memory leak.

  [Fix]

  The fix comes in the form of:

  commit a55264933f12c2fdc28a66841c4724021e8c1caf
  Author: Mikulas Patocka 
  Date:   Tue Jul 3 13:34:22 2018 -0400
  Subject: block: fix infinite loop if the device loses discard capability
  BugLink: https://bugs.launchpad.net/bugs/1837257
  Upstream-commit: b88aef36b87c9787a4db724923ec4f57dfd513f3
  Pastebin: https://paste.ubuntu.com/p/bgC9vQnB8x/

  This adds a check right after the