Re: [dm-devel] [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
On Sun, 2017-08-13 at 22:47 -0400, Mikulas Patocka wrote: > > On Wed, 9 Aug 2017, h...@lst.de wrote: > > > Does commit 615d22a51c04856efe62af6e1d5b450aaf5cc2c0 > > "block: Fix __blkdev_issue_zeroout loop" fix the issue for you? > > > > -- > > dm-devel mailing list > > dm-devel@redhat.com > > https://www.redhat.com/mailman/listinfo/dm-devel > > I think that patch is incorrect. sector_t may be a 32-bit type and > nr_sects << 9 may overflow. > > static unsigned int __blkdev_sectors_to_bio_pages(sector_t nr_sects) > { >sector_t bytes = (nr_sects << 9) + PAGE_SIZE - 1; > >return min(bytes >> PAGE_SHIFT, (sector_t)BIO_MAX_PAGES); > } > > Mikulas Mikulas, Does the follwing patch fix the problem ? >From 947b3cf41e759b2b23f684e215e651d0c8037f88 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Mon, 14 Aug 2017 13:01:16 +0900 Subject: [PATCH] block: Fix __blkdev_sectors_to_bio_pages() On 32bit systems where sector_t is a 32bits type, the calculation of bytes may overflow. Use the u64 type for the local calculation to avoid overflows. Signed-off-by: Damien Le Moal --- block/blk-lib.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 3fe0aec90597..ccf22dba21f0 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -269,9 +269,9 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev, */ static unsigned int __blkdev_sectors_to_bio_pages(sector_t nr_sects) { - sector_t bytes = (nr_sects << 9) + PAGE_SIZE - 1; + u64 bytes = ((u64)nr_sects << 9) + PAGE_SIZE - 1; - return min(bytes >> PAGE_SHIFT, (sector_t)BIO_MAX_PAGES); + return min(bytes >> PAGE_SHIFT, (u64)BIO_MAX_PAGES); } /** -- 2.13.4 -- Damien Le Moal Western Digital -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
On Wed, 9 Aug 2017, h...@lst.de wrote: > Does commit 615d22a51c04856efe62af6e1d5b450aaf5cc2c0 > "block: Fix __blkdev_issue_zeroout loop" fix the issue for you? > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel I think that patch is incorrect. sector_t may be a 32-bit type and nr_sects << 9 may overflow. static unsigned int __blkdev_sectors_to_bio_pages(sector_t nr_sects) { sector_t bytes = (nr_sects << 9) + PAGE_SIZE - 1; return min(bytes >> PAGE_SHIFT, (sector_t)BIO_MAX_PAGES); } Mikulas -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
Never mind, I guess I was wrong, coz bio_add_page() is always called with "offset" = 0 here, so: offset == bv->bv_offset + bv->bv_len is never matched. Hence, if (bio->bi_vcnt >= bio->bi_max_vecs) return 0; will trigger the if-break. So maybe this is after all just a dm-crypt issue on handling multiple bios created with next_bio (a bio chain)? On 10 August 2017 at 07:27, Tom Yan wrote: > I haven't really tested but I was aware of the commit before I send my > last email. It doesn't seem relevant to be honest, because it doesn't > change the fact that the inner loop wil only end if the whole request > has been looped over. So still one big bio. > > There are a few things that seem suspicious to me. First of all, the > inner loop has an if-break at its end that seem to practically do > nothing, especially in terms of making the inner loop end when the bio > reach its expected size (BIO_MAX_PAGES?). > > bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0); > > If you take a look at bio_add_page(): > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/bio.c?h=v4.13-rc4#n820 > > it will basically always return "len", which is "sz" here, unchanged. > > if (bi_size < sz) > break; > > So this pretty much never happens, with two exceptions: > > if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) > return 0; > > which is most likely irrelevant in this case, > > if (bio->bi_vcnt >= bio->bi_max_vecs) > return 0; > > which seem to matter in this case. However: > > if (page == bv->bv_page && > offset == bv->bv_offset + bv->bv_len) { > bv->bv_len += len; > goto done; > } > ... > if (bio->bi_vcnt >= bio->bi_max_vecs) > return 0; > ... > bio->bi_vcnt++; > done: > bio->bi_iter.bi_size += len; > return len; > > So if the first condition in the above quote is matched, the exception > would never happen either. > > On 9 August 2017 at 21:38, h...@lst.de wrote: >> Does commit 615d22a51c04856efe62af6e1d5b450aaf5cc2c0 >> "block: Fix __blkdev_issue_zeroout loop" fix the issue for you? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
I haven't really tested but I was aware of the commit before I send my last email. It doesn't seem relevant to be honest, because it doesn't change the fact that the inner loop wil only end if the whole request has been looped over. So still one big bio. There are a few things that seem suspicious to me. First of all, the inner loop has an if-break at its end that seem to practically do nothing, especially in terms of making the inner loop end when the bio reach its expected size (BIO_MAX_PAGES?). bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0); If you take a look at bio_add_page(): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/bio.c?h=v4.13-rc4#n820 it will basically always return "len", which is "sz" here, unchanged. if (bi_size < sz) break; So this pretty much never happens, with two exceptions: if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) return 0; which is most likely irrelevant in this case, if (bio->bi_vcnt >= bio->bi_max_vecs) return 0; which seem to matter in this case. However: if (page == bv->bv_page && offset == bv->bv_offset + bv->bv_len) { bv->bv_len += len; goto done; } ... if (bio->bi_vcnt >= bio->bi_max_vecs) return 0; ... bio->bi_vcnt++; done: bio->bi_iter.bi_size += len; return len; So if the first condition in the above quote is matched, the exception would never happen either. On 9 August 2017 at 21:38, h...@lst.de wrote: > Does commit 615d22a51c04856efe62af6e1d5b450aaf5cc2c0 > "block: Fix __blkdev_issue_zeroout loop" fix the issue for you? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
On 08/09/2017 03:38 PM, h...@lst.de wrote: Does commit 615d22a51c04856efe62af6e1d5b450aaf5cc2c0 "block: Fix __blkdev_issue_zeroout loop" fix the issue for you? I crashed the 4.13-rc4 with the test above (blkdiscard -z on dm-crypt dev). Unfortunately I hadn't have my VM configured properly so the only info I got before it crashed was basically OOM. I'm going to play with it tomorrow and will let you know. O. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
Does commit 615d22a51c04856efe62af6e1d5b450aaf5cc2c0 "block: Fix __blkdev_issue_zeroout loop" fix the issue for you? -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
Hi again, I think I have sort of identified the problem. It appears to me that both the block layer and dm-crypt is defected on handling this. First of all, check out the "fallback" zero out implementation, which is used in this case, here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/blk-lib.c?h=v4.12#n309 >From the outer loop, it seem to imply that this should be done in multiple >"bio"s, if the request (original "nr_sects") is larger than BIO_MAX_PAGES: ... while (nr_sects != 0) { bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES), gfp_mask); ... } ... However, there is a inner loop: ... while (nr_sects != 0) { sz = min((sector_t) PAGE_SIZE >> 9 , nr_sects); bi_size = bio_add_page(bio, ZERO_PAGE(0), sz << 9, 0); nr_sects -= bi_size >> 9; sector += bi_size >> 9; if (bi_size < (sz << 9)) break; } ... which apparently would loop over the whole request on its own, making the outer loop a bogus one. The request ends up being done in a single (huge) bio. When the bio is passed on to dm-crypt, it appears that dm-crypt will not split the bio either when it allocates buffer for conversion/encryption: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/md/dm-crypt.c?h=v4.12#n1694 which leads to possible enormous uptake of memory, causing OOM / kernel panic. There seems to be some measure that is suppose to split large bio though: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/md/dm-crypt.c?h=v4.12#n2787 Apparently it is called before kcryptd_crypt_write_convert() / crypt_alloc_buffer(). However, I don't really parse dm_accept_partial_bio() (or the comment about it) so I don't really know what it actually does or how it does it. Neither can I see it helps in reality anyway. Here is another test case that shows the problem: https://ptpb.pw/BWWo.png https://ptpb.pw/aRTE.png Regards, Tom Yan From: Tom Yan Sent: Monday, August 7, 2017 9:58 AM To: dm-devel@redhat.com Subject: [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic Hi all, When I do the following: cryptsetup open /dev/sdX[Y] rand --type plain --key-file /dev/random; blkdiscard -z /dev/mapper/rand Some OOM killings and a kernel panic occur. Here is a screenshot: https://gitlab.com/cryptsetup/cryptsetup/uploads/207ffdada52f3172f54a014c67159625/DSC_0129.JPGhttps://gitlab.com/cryptsetup/cryptsetup/uploads/207ffdada52f3172f54a014c67159625/DSC_0129.JPG Similar issue is NOT found in doing `cat /dev/zero > /dev/mapper/rand`, or `blkdiscard -z /dev/sdX[Y]` (without opening a dm-crypt on it), on the same hardware and configuration. Note that `blkdiscard -z` does not trigger the BLKDISCARD ioctl but the BLKZEROOUT ioctl. And the devices I have been testing on are USB devices that does NOT support WRITE SAME or UNMAP. Regards, Tom Yan -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
(Sorry for the spam. Having hard time moving the thread from outlook.com to gmail.com for linux-block. Please reply to this, thank you.) Hi again, I think I have sort of identified the problem. It appears to me that both the block layer and dm-crypt is defected on handling this. First of all, check out the "fallback" zero out implementation, which is used in this case, here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/blk-lib.c?h=v4.12#n309 >From the outer loop, it seem to imply that this should be done in multiple "bio"s, if the request (original "nr_sects") is larger than BIO_MAX_PAGES: ... while (nr_sects != 0) { bio = next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES), gfp_mask); ... } ... However, there is a inner loop: ... while (nr_sects != 0) { sz = min((sector_t) PAGE_SIZE >> 9 , nr_sects); bi_size = bio_add_page(bio, ZERO_PAGE(0), sz << 9, 0); nr_sects -= bi_size >> 9; sector += bi_size >> 9; if (bi_size < (sz << 9)) break; } ... which apparently would loop over the whole request on its own, making the outer loop a bogus one. The request ends up being done in a single (huge) bio. When the bio is passed on to dm-crypt, it appears that dm-crypt will not split the bio either when it allocates buffer for conversion/encryption: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/md/dm-crypt.c?h=v4.12#n1694 which leads to possible enormous uptake of memory, causing OOM / kernel panic. There seems to be some measure that is suppose to split large bio though: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/md/dm-crypt.c?h=v4.12#n2787 Apparently it is called before kcryptd_crypt_write_convert() / crypt_alloc_buffer(). However, I don't really parse dm_accept_partial_bio() (or the comment about it) so I don't really know what it actually does or how it does it. Neither can I see it helps in reality anyway. Here is another test case that shows the problem: https://ptpb.pw/BWWo.png https://ptpb.pw/aRTE.png Regards, Tom Yan > From: Tom Yan > Sent: Monday, August 7, 2017 9:58 AM > To: dm-devel@redhat.com > Subject: [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic > > Hi all, > > When I do the following: > > cryptsetup open /dev/sdX[Y] rand --type plain --key-file /dev/random; > blkdiscard -z /dev/mapper/rand > > Some OOM killings and a kernel panic occur. Here is a screenshot: > https://gitlab.com/cryptsetup/cryptsetup/uploads/207ffdada52f3172f54a014c67159625/DSC_0129.JPG > > Similar issue is NOT found in doing `cat /dev/zero > /dev/mapper/rand`, or > `blkdiscard -z /dev/sdX[Y]` (without opening a dm-crypt on it), on the same > hardware and configuration. > > Note that `blkdiscard -z` does not trigger the BLKDISCARD ioctl but the > BLKZEROOUT ioctl. And the devices I have been testing on are USB devices that > does NOT support WRITE SAME or UNMAP. > > Regards, > Tom Yan -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
Hi again, I think I have sort of identified the problem. It appears to me that both t= he block layer and dm-crypt is defected on handling this. First of all, check out the "fallback" zero out implementation, which is us= ed in this case, here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/blo= ck/blk-lib.c?h=3Dv4.12#n309 >From the outer loop, it seem to imply that this should be done in multiple = "bio"s, if the request (original "nr_sects") is larger than BIO_MAX_PAGES: ... while (nr_sects !=3D 0) { bio =3D next_bio(bio, min(nr_sects, (sector_t)BIO_MAX_PAGES), gfp_mask); ... } ... However, there is a inner loop: ... while (nr_sects !=3D 0) { sz =3D min((sector_t) PAGE_SIZE >> 9 , nr_sects); bi_size =3D bio_add_page(bio, ZERO_PAGE(0), sz << 9, 0); nr_sects -=3D bi_size >> 9; sector +=3D bi_size >> 9; if (bi_size < (sz << 9)) break; } ... which apparently would loop over the whole request on its own, making the o= uter loop a bogus one. The request ends up being done in a single (huge) bio. When the bio is pass= ed on to dm-crypt, it appears that dm-crypt will not split the bio either w= hen it allocates buffer for conversion/encryption: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/dri= vers/md/dm-crypt.c?h=3Dv4.12#n1694 which leads to possible enormous uptake of memory, causing OOM / kernel pan= ic. There seems to be some measure that is suppose to split large bio though: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/dri= vers/md/dm-crypt.c?h=3Dv4.12#n2787 Apparently it is called before kcryptd_crypt_write_convert() / crypt_alloc_= buffer(). However, I don't really parse dm_accept_partial_bio() (or the com= ment about it) so I don't really know what it actually does or how it does = it. Neither can I see it helps in reality anyway. Here is another test case that shows the problem: https://ptpb.pw/BWWo.png https://ptpb.pw/aRTE.png Regards, Tom Yan > From: Tom Yan > Sent: Monday, August 7, 2017 9:58 AM > To: dm-devel@redhat.com > Subject: [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic > > Hi all, > > When I do the following: > > cryptsetup open /dev/sdX[Y] rand --type plain --key-file /dev/random; > blkdiscard -z /dev/mapper/rand > > Some OOM killings and a kernel panic occur. Here is a screenshot: > https://gitlab.com/cryptsetup/cryptsetup/uploads/207ffdada52f3172f54a014c67= > 159625/DSC_0129.JPGhttps://gitlab.com/cryptsetup/cryptsetup/uploads/207ffda= > da52f3172f54a014c67159625/DSC_0129.JPG=20 > > > Similar issue is NOT found in doing `cat /dev/zero > /dev/mapper/rand`, or = > `blkdiscard -z /dev/sdX[Y]` (without opening a dm-crypt on it), on the same= > hardware and configuration. > > Note that `blkdiscard -z` does not trigger the BLKDISCARD ioctl but the BLK= > ZEROOUT ioctl. And the devices I have been testing on are USB devices that = > does NOT support WRITE SAME or UNMAP. > > Regards, > Tom Yan -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel