Re: [dm-devel] dm-crypt: Fix error with too large bios
On 08/25/2016 12:34 PM, Mikulas Patocka wrote: On Thu, 18 Aug 2016, Eric Wheeler wrote: On Wed, Jun 01 2016 at 9:44am -0400, Christoph Hellwigwrote: be dm-crypt.c. Maybe you've identified some indirect use of BIO_MAX_SIZE? I mean the recently introduced BIO_MAX_SIZE in -next tree: https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486 The crazy bcache bios striking back once again. I really think it's harmful having a _MAX value and then having a minor driver reinterpreting it and sending larger ones. Until we can lift the maximum limit in general nad have common code exercise it we really need to stop bcache from sending these instead of littering the tree with workarounds. The bio_kmalloc function allocates bios with up to 1024 vector entries (as opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector entries). Device mapper is using bio_alloc_bioset with a bio set, so it is limited to 256 vector entries, but other kernel users may use bio_kmalloc and create larger bios. So, if you don't want bios with more than 256 vector entries to exist, you should impose this limit in bio_kmalloc (and fix all the callers that use it). FYI, Kent Overstreet notes this about bcache from the other thread here: https://lkml.org/lkml/2016/8/15/620 [paste] bcache originally had workaround code to split too-large bios when it first went upstream - that was dropped only after the patches to make generic_make_request() handle arbitrary size bios went in. So to do what you're suggesting would mean reverting that bcache patch and bringing that code back, which from my perspective would be a step in the wrong direction. I just want to get this over and done with. re: interactions with other drivers - bio_clone() has already been changed to only clone biovecs that are live for current bi_iter, so there shouldn't be any safety issues. A driver would have to be intentionally doing its own open coded bio cloning that clones all of bi_io_vec, not just the active ones - but if they're doing that, they're already broken because a driver isn't allowed to look at bi_vcnt if it isn't a bio that it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were created by bio_clone_fast). And the cloning and bi_vcnt usage stuff I audited very thoroughly back when I was working on immutable biovecs and such back in the day, and I had to do a fair amount of cleanup/refactoring before that stuff could go in. [/paste] They are making progress in the patch-v3 thread, so perhaps this can be fixed for now in generic_make_request(). -- Eric Wheeler Device mapper can't split the bio in generic_make_request - it frees the md->queue->bio_split bioset, to save one kernel thread per device. Device mapper uses its own splitting mechanism. So what is the final decision? - should device mapper split the big bio or should bcache not submit big bios? I think splitting big bios in the device mapper is better - simply because it is much less code than reworking bcache to split bios internally. BTW. In the device mapper, we have a layer dm-io, that was created to work around bio size limitations - it accepts unlimited I/O request and splits it to several bios. When bio size limitations are gone, we could simplify dm-io too. The patch from Ming Lei was applied for 4.8 the other day. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] dm-crypt: Fix error with too large bios
On Thu, 18 Aug 2016, Eric Wheeler wrote: > > On Wed, Jun 01 2016 at 9:44am -0400, Christoph Hellwig > >wrote: > > > > > > > be dm-crypt.c. Maybe you've identified some indirect use of > > > > > BIO_MAX_SIZE? > > > > > > > > I mean the recently introduced BIO_MAX_SIZE in -next tree: > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/md/dm-crypt.c?id=4ed89c97b0706477b822ea2182827640c0cec486 > > > > > > The crazy bcache bios striking back once again. I really think it's > > > harmful having a _MAX value and then having a minor driver > > > reinterpreting it and sending larger ones. Until we can lift the > > > maximum limit in general nad have common code exercise it we really need > > > to stop bcache from sending these instead of littering the tree with > > > workarounds. > > > > The bio_kmalloc function allocates bios with up to 1024 vector entries (as > > opposed to bio_alloc and bio_alloc_bioset that has a limit of 256 vector > > entries). > > > > Device mapper is using bio_alloc_bioset with a bio set, so it is limited > > to 256 vector entries, but other kernel users may use bio_kmalloc and > > create larger bios. > > > > So, if you don't want bios with more than 256 vector entries to exist, you > > should impose this limit in bio_kmalloc (and fix all the callers that use > > it). > > FYI, Kent Overstreet notes this about bcache from the other thread here: > https://lkml.org/lkml/2016/8/15/620 > > [paste] > >> bcache originally had workaround code to split too-large bios when it > >> first went upstream - that was dropped only after the patches to make > >> generic_make_request() handle arbitrary size bios went in. So to do what > >> you're suggesting would mean reverting that bcache patch and bringing that > >> code back, which from my perspective would be a step in the wrong > >> direction. I just want to get this over and done with. > >> > >> re: interactions with other drivers - bio_clone() has already been changed > >> to only clone biovecs that are live for current bi_iter, so there > >> shouldn't be any safety issues. A driver would have to be intentionally > >> doing its own open coded bio cloning that clones all of bi_io_vec, not > >> just the active ones - but if they're doing that, they're already broken > >> because a driver isn't allowed to look at bi_vcnt if it isn't a bio that > >> it owns - bi_vcnt is 0 on bios that don't own their biovec (i.e. that were > >> created by bio_clone_fast). > >> > >> And the cloning and bi_vcnt usage stuff I audited very thoroughly back > >> when I was working on immutable biovecs and such back in the day, and I > >> had to do a fair amount of cleanup/refactoring before that stuff could go > >> in. > [/paste] > > They are making progress in the patch-v3 thread, so perhaps this can be > fixed for now in generic_make_request(). > > -- > Eric Wheeler Device mapper can't split the bio in generic_make_request - it frees the md->queue->bio_split bioset, to save one kernel thread per device. Device mapper uses its own splitting mechanism. So what is the final decision? - should device mapper split the big bio or should bcache not submit big bios? I think splitting big bios in the device mapper is better - simply because it is much less code than reworking bcache to split bios internally. BTW. In the device mapper, we have a layer dm-io, that was created to work around bio size limitations - it accepts unlimited I/O request and splits it to several bios. When bio size limitations are gone, we could simplify dm-io too. Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html