Re: [PATCH] block: Revert bio_clone() default behaviour
On Wed, 6 Nov 2013 15:22:36 -0500 Chris Mason wrote: > > Yup - that should actually be safe for all the existing bio_clone() users > > actually, I audited all of them - because normally you're not going to > > complete > > the original bio until the clone finishes. > > I'd say we need an ack from Neil before we can switch to _fast. The > completions look non-trivial and _fast adds new ordering requirements on > free. I think it should be fairly straight forward to use _fast in md. As Kent says, the original bio that is cloned is (almost) never going to have endio called until all the clones have done their thing and finished up. The only exception is the "behind_pages" stuff in raid1.c. We clearly need to be more careful there. If the "ordering requirements" you mention are simply that the owner of the original may free the biovec, so the clone must have finished using it before that can happen, then that would most easily be fix with mbio->bi_io_vec = r1_bio->behind_bvecs; rather than the current loop which copies all the bv_page pointers. NeilBrown signature.asc Description: PGP signature
Re: [PATCH] block: Revert bio_clone() default behaviour
On Wed, Nov 06, 2013 at 04:25:45PM -0500, Chris Mason wrote: > Quoting Kent Overstreet (2013-11-06 15:57:34) > > On Wed, Nov 06, 2013 at 03:22:36PM -0500, Chris Mason wrote: > > > Quoting Kent Overstreet (2013-11-06 15:02:22) > > [ ... nods, thanks! ... ] > > > OTOH - with regards to just the ordering requirements, the more I look at > > various code the less accidental the fact that that works seems to be: the > > best > > explanation I've come up with so far is that you already needed to ensure > > that > > the _pages_ the clone points to stick around until the clone completes, and > > if > > you don't own the original bio the only way to do that is to not complete > > the > > original bio until after the clone completes. > > > > So if you're a driver cloning bios that were submitted to you, > > bio_clone_fast() > > introduces no new ordering requirements. > > > > On the third hand - if you're cloning (i.e. splitting) your own bios, in > > that > > case it would be possible to screw up the ordering - I don't know of any > > code in > > the kernel that does this today (except for, sort of, bcache) but my dio > > rewrite > > takes this approach - but if you do the obvious and sane thing with > > bio_chain() > > it's a non issue, it seems to me you'd have to come up with something pretty > > contrived and dumb for this to actually be an issue in practice. > > > > Anyways, I haven't come to any definite conclusions, those are just my > > observations so far. > > I do think you're right. We all seem to have clones doing work on > behalf of the original, and when everyone is done we complete the > original. > > But, btrfs does have silly things like this: > > dio_end_io(dio_bio, err); // end and free the original > bio_put(bio); // free the clone > > It's not a bug yet, but given enough time the space between those two > frees will grow new code that kills us all. Hopefully the DIO situation will get better in the near future, Josef was telling me btrfs was probably going to end up with its own DIO code (replacing a good chunk of direct-io.c) and that should get a lot easier and saner for you guys with my dio rewrite. Need to finish getting that to pass xfstests... > Really though, the new stuff is better, thanks. Thanks! :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
Quoting Kent Overstreet (2013-11-06 15:57:34) > On Wed, Nov 06, 2013 at 03:22:36PM -0500, Chris Mason wrote: > > Quoting Kent Overstreet (2013-11-06 15:02:22) [ ... nods, thanks! ... ] > OTOH - with regards to just the ordering requirements, the more I look at > various code the less accidental the fact that that works seems to be: the > best > explanation I've come up with so far is that you already needed to ensure that > the _pages_ the clone points to stick around until the clone completes, and if > you don't own the original bio the only way to do that is to not complete the > original bio until after the clone completes. > > So if you're a driver cloning bios that were submitted to you, > bio_clone_fast() > introduces no new ordering requirements. > > On the third hand - if you're cloning (i.e. splitting) your own bios, in that > case it would be possible to screw up the ordering - I don't know of any code > in > the kernel that does this today (except for, sort of, bcache) but my dio > rewrite > takes this approach - but if you do the obvious and sane thing with > bio_chain() > it's a non issue, it seems to me you'd have to come up with something pretty > contrived and dumb for this to actually be an issue in practice. > > Anyways, I haven't come to any definite conclusions, those are just my > observations so far. I do think you're right. We all seem to have clones doing work on behalf of the original, and when everyone is done we complete the original. But, btrfs does have silly things like this: dio_end_io(dio_bio, err); // end and free the original bio_put(bio); // free the clone It's not a bug yet, but given enough time the space between those two frees will grow new code that kills us all. Really though, the new stuff is better, thanks. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
On Wed, Nov 06, 2013 at 03:22:36PM -0500, Chris Mason wrote: > Quoting Kent Overstreet (2013-11-06 15:02:22) > > Yup - that should actually be safe for all the existing bio_clone() users > > actually, I audited all of them - because normally you're not going to > > complete > > the original bio until the clone finishes. > > I'd say we need an ack from Neil before we can switch to _fast. The > completions look non-trivial and _fast adds new ordering requirements on > free. Agreed, I've already broken md enough lately :P (I'm started to work on a real test suite for the block layer, that'll help with this stuff...) > > > If you look at dm's crypt_free_buffer_pages(), it had similar problems. > > > > Those are fine actually - in both cases, they're bios they allocated, not > > the > > bios that were submitted to them. > > Both are expecting bi_vcnt to be non-zero after a clone, which wasn't > the case before the bi_vcnt patch. With your current patch (adding > _fast) both should now be correct. Yeah, but bi_vcnt being zero was just a symptom - the underlying reason the old code was wrong was that with the clone sharing the parent's biovec, md was modifying a biovec (including bv_page!) that it didn't own - _that_ was what was horribly broken about it. > > What's unclear about it? The rule is just - if you didn't allocate the > > biovec, > > don't modify it or use bio_for_each_segment_all() (probably I didn't quite > > state > > it clearly enough before though) > > That part makes sense. The new rule that scares me is that we can't > free the src of the clone until all the clones are freed. If it works > with today's existing users it feels like it is more by accident than > design. I'm not saying we can't do it, we just need some bigger > flashing warning lights. Yeah, Jens was saying the same thing yesterday, hence this patch. OTOH - with regards to just the ordering requirements, the more I look at various code the less accidental the fact that that works seems to be: the best explanation I've come up with so far is that you already needed to ensure that the _pages_ the clone points to stick around until the clone completes, and if you don't own the original bio the only way to do that is to not complete the original bio until after the clone completes. So if you're a driver cloning bios that were submitted to you, bio_clone_fast() introduces no new ordering requirements. On the third hand - if you're cloning (i.e. splitting) your own bios, in that case it would be possible to screw up the ordering - I don't know of any code in the kernel that does this today (except for, sort of, bcache) but my dio rewrite takes this approach - but if you do the obvious and sane thing with bio_chain() it's a non issue, it seems to me you'd have to come up with something pretty contrived and dumb for this to actually be an issue in practice. Anyways, I haven't come to any definite conclusions, those are just my observations so far. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
Quoting Kent Overstreet (2013-11-06 15:02:22) > On Wed, Nov 06, 2013 at 11:11:30AM -0500, Chris Mason wrote: > > Quoting Kent Overstreet (2013-11-05 22:48:41) > > > This patch reverts the default behaviour introduced by > > > 9fc6286f347d00528adcdcf12396d220f47492ed - bio_clone_biovec() no clonger > > > shares the source bio's biovec, cloning the biovec is once again the > > > default. > > > > > > Instead, we add a new bio_clone_biovec_fast(), which creates a clone > > > that shares the source's biovec. This patch changes bcache and md to use > >^ > > dm? > > > > > __bio_clone_biovec_fast() since they're expecting the new behaviour due > > > to other refactoring; most of the other uses of bio_clone() should be > > > same to convert to the _fast() variant but that will be done more > > > incrementally in other patches (bio_split() in particular). > > > > Hi Kent, > > > > I noticed yesterday the _fast variants of bio clone introduce sharing > > between the src and the clone, but without any reference counts: > > > > bio->bi_io_vec = bio_src->bi_io_vec; > > > > Have you audited all of the _fast users to make sure they are not > > freeing the src before the clone? Sorry if this came up already in past > > reviews. > > Yup - that should actually be safe for all the existing bio_clone() users > actually, I audited all of them - because normally you're not going to > complete > the original bio until the clone finishes. I'd say we need an ack from Neil before we can switch to _fast. The completions look non-trivial and _fast adds new ordering requirements on free. > > > > Note that __bio_clone() isn't being readded - the reason being that with > > > immutable biovecs allocating the right number of biovecs for the new > > > clone is no longer trivial so we don't want drivers trying to do that > > > themselves. > > > > > > This patch also reverts febca1baea1cfe2d7a0271385d89b03d5fb34f94 - > > > __bio_clone_fast() should not be setting bi_vcnt for bios that do not > > > own the biovec (see Documentation/block/biovecs.txt for rationale) - in > > > short, > > > > I think I see what you mean with tying bi_vcnt to ownership of the bio, > > but we're not consistent. Looking at bio_for_eaach_segment_all: > > > > * > > * drivers should _never_ use the all version - the bio may have been split > > * before it got to the driver and the driver won't own all of it > > */ > > #define bio_for_each_segment_all(bvl, bio, i) \ > > for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++) > > > > bio_for_each_segment_all still trusts bi_vcnt, so any > > bio_for_each_segment_all operation on a clone will basically be a noop. > > > > Just looking at MD raid1 make_request() > > > > mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); > > ... > > alloc_behind_pages(mbio, r1_bio); -> bio_for_each_segment_all > > ... > > if (r1_bio->behind_bvecs) { > > bio_for_each_segment_all(bvec, mbio, j) > > ... > > > > I didn't test MD without the vcnt fix, but I think any operations in MD > > that duplicate data for raid1 turn into noops. I think we'll end up > > writing garbage (or nothing) to the second mirror. > > > > If you look at dm's crypt_free_buffer_pages(), it had similar problems. > > Those are fine actually - in both cases, they're bios they allocated, not the > bios that were submitted to them. Both are expecting bi_vcnt to be non-zero after a clone, which wasn't the case before the bi_vcnt patch. With your current patch (adding _fast) both should now be correct. > > Though md _definitely_ shouldn't have been > sharing the original bio's biovec, so looks like this patch will fix a bug > there... > > (I remember seeing that code before and I thought I added a bio_clone_biovec() > call to that md code, but apparently that never got commited. Argh.) > > > > > > not setting it might cause bugs in the short term but long term > > > it's likely to hide nastier more subtle bugs, we don't want code looking > > > at bi_vcnt at all for bios it does not own. > > > > I think the concept of bio ownership is still much too weak, at least > > for established users like MD and DM. I don't know how to verify the > > sharing of bi_io_vec without some kind of reference counting on the > > iovec. > > What's unclear about it? The rule is just - if you didn't allocate the biovec, > don't modify it or use bio_for_each_segment_all() (probably I didn't quite > state > it clearly enough before though) That part makes sense. The new rule that scares me is that we can't free the src of the clone until all the clones are freed. If it works with today's existing users it feels like it is more by accident than design. I'm not saying we
Re: [PATCH] block: Revert bio_clone() default behaviour
On Wed, Nov 06, 2013 at 11:11:30AM -0500, Chris Mason wrote: > Quoting Kent Overstreet (2013-11-05 22:48:41) > > This patch reverts the default behaviour introduced by > > 9fc6286f347d00528adcdcf12396d220f47492ed - bio_clone_biovec() no clonger > > shares the source bio's biovec, cloning the biovec is once again the > > default. > > > > Instead, we add a new bio_clone_biovec_fast(), which creates a clone > > that shares the source's biovec. This patch changes bcache and md to use >^ > dm? > > > __bio_clone_biovec_fast() since they're expecting the new behaviour due > > to other refactoring; most of the other uses of bio_clone() should be > > same to convert to the _fast() variant but that will be done more > > incrementally in other patches (bio_split() in particular). > > Hi Kent, > > I noticed yesterday the _fast variants of bio clone introduce sharing > between the src and the clone, but without any reference counts: > > bio->bi_io_vec = bio_src->bi_io_vec; > > Have you audited all of the _fast users to make sure they are not > freeing the src before the clone? Sorry if this came up already in past > reviews. Yup - that should actually be safe for all the existing bio_clone() users actually, I audited all of them - because normally you're not going to complete the original bio until the clone finishes. > > Note that __bio_clone() isn't being readded - the reason being that with > > immutable biovecs allocating the right number of biovecs for the new > > clone is no longer trivial so we don't want drivers trying to do that > > themselves. > > > > This patch also reverts febca1baea1cfe2d7a0271385d89b03d5fb34f94 - > > __bio_clone_fast() should not be setting bi_vcnt for bios that do not > > own the biovec (see Documentation/block/biovecs.txt for rationale) - in > > short, > > I think I see what you mean with tying bi_vcnt to ownership of the bio, > but we're not consistent. Looking at bio_for_eaach_segment_all: > > * > * drivers should _never_ use the all version - the bio may have been split > * before it got to the driver and the driver won't own all of it > */ > #define bio_for_each_segment_all(bvl, bio, i) \ > for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++) > > bio_for_each_segment_all still trusts bi_vcnt, so any > bio_for_each_segment_all operation on a clone will basically be a noop. > > Just looking at MD raid1 make_request() > > mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); > ... > alloc_behind_pages(mbio, r1_bio); -> bio_for_each_segment_all > ... > if (r1_bio->behind_bvecs) { > bio_for_each_segment_all(bvec, mbio, j) > ... > > I didn't test MD without the vcnt fix, but I think any operations in MD > that duplicate data for raid1 turn into noops. I think we'll end up > writing garbage (or nothing) to the second mirror. > > If you look at dm's crypt_free_buffer_pages(), it had similar problems. Those are fine actually - in both cases, they're bios they allocated, not the bios that were submitted to them. Though md _definitely_ shouldn't have been sharing the original bio's biovec, so looks like this patch will fix a bug there... (I remember seeing that code before and I thought I added a bio_clone_biovec() call to that md code, but apparently that never got commited. Argh.) > > > not setting it might cause bugs in the short term but long term > > it's likely to hide nastier more subtle bugs, we don't want code looking > > at bi_vcnt at all for bios it does not own. > > I think the concept of bio ownership is still much too weak, at least > for established users like MD and DM. I don't know how to verify the > sharing of bi_io_vec without some kind of reference counting on the > iovec. What's unclear about it? The rule is just - if you didn't allocate the biovec, don't modify it or use bio_for_each_segment_all() (probably I didn't quite state it clearly enough before though) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
Quoting Kent Overstreet (2013-11-05 22:48:41) > This patch reverts the default behaviour introduced by > 9fc6286f347d00528adcdcf12396d220f47492ed - bio_clone_biovec() no clonger > shares the source bio's biovec, cloning the biovec is once again the > default. > > Instead, we add a new bio_clone_biovec_fast(), which creates a clone > that shares the source's biovec. This patch changes bcache and md to use ^ dm? > __bio_clone_biovec_fast() since they're expecting the new behaviour due > to other refactoring; most of the other uses of bio_clone() should be > same to convert to the _fast() variant but that will be done more > incrementally in other patches (bio_split() in particular). Hi Kent, I noticed yesterday the _fast variants of bio clone introduce sharing between the src and the clone, but without any reference counts: bio->bi_io_vec = bio_src->bi_io_vec; Have you audited all of the _fast users to make sure they are not freeing the src before the clone? Sorry if this came up already in past reviews. > > Note that __bio_clone() isn't being readded - the reason being that with > immutable biovecs allocating the right number of biovecs for the new > clone is no longer trivial so we don't want drivers trying to do that > themselves. > > This patch also reverts febca1baea1cfe2d7a0271385d89b03d5fb34f94 - > __bio_clone_fast() should not be setting bi_vcnt for bios that do not > own the biovec (see Documentation/block/biovecs.txt for rationale) - in > short, I think I see what you mean with tying bi_vcnt to ownership of the bio, but we're not consistent. Looking at bio_for_eaach_segment_all: * * drivers should _never_ use the all version - the bio may have been split * before it got to the driver and the driver won't own all of it */ #define bio_for_each_segment_all(bvl, bio, i) \ for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++) bio_for_each_segment_all still trusts bi_vcnt, so any bio_for_each_segment_all operation on a clone will basically be a noop. Just looking at MD raid1 make_request() mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); ... alloc_behind_pages(mbio, r1_bio); -> bio_for_each_segment_all ... if (r1_bio->behind_bvecs) { bio_for_each_segment_all(bvec, mbio, j) ... I didn't test MD without the vcnt fix, but I think any operations in MD that duplicate data for raid1 turn into noops. I think we'll end up writing garbage (or nothing) to the second mirror. If you look at dm's crypt_free_buffer_pages(), it had similar problems. > not setting it might cause bugs in the short term but long term > it's likely to hide nastier more subtle bugs, we don't want code looking > at bi_vcnt at all for bios it does not own. I think the concept of bio ownership is still much too weak, at least for established users like MD and DM. I don't know how to verify the sharing of bi_io_vec without some kind of reference counting on the iovec. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
On Tue, Nov 5, 2013 at 9:07 PM, Kent Overstreet wrote: > On Tue, Nov 05, 2013 at 09:02:19PM -0800, Olof Johansson wrote: >> Hi, >> >> On Tue, Nov 5, 2013 at 7:48 PM, Kent Overstreet wrote: >> >> > Chris, Olaf, can you two in particular test this? I have tested the bounce >> > buffer code (and bcache), but Jens told me today there was an md bug that I >> > _still_ can't find any emails about so I'm not sure what to test for that. >> >> (I'm guessing you mean me and not some German person here. :-) >> >> What is this expected to apply on top of? It doesn't apply cleanly on >> the 20131105 -next tree which is where I had been seeing the issues. > > Sorry - it's on top of Jens' for-3.13/core branch, > git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git Tested-by: Olof Johansson Note that last night's -next seems to have held up as well. -Olof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
On Tue, Nov 5, 2013 at 9:07 PM, Kent Overstreet k...@daterainc.com wrote: On Tue, Nov 05, 2013 at 09:02:19PM -0800, Olof Johansson wrote: Hi, On Tue, Nov 5, 2013 at 7:48 PM, Kent Overstreet k...@daterainc.com wrote: Chris, Olaf, can you two in particular test this? I have tested the bounce buffer code (and bcache), but Jens told me today there was an md bug that I _still_ can't find any emails about so I'm not sure what to test for that. (I'm guessing you mean me and not some German person here. :-) What is this expected to apply on top of? It doesn't apply cleanly on the 20131105 -next tree which is where I had been seeing the issues. Sorry - it's on top of Jens' for-3.13/core branch, git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git Tested-by: Olof Johansson o...@lixom.net Note that last night's -next seems to have held up as well. -Olof -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
Quoting Kent Overstreet (2013-11-05 22:48:41) This patch reverts the default behaviour introduced by 9fc6286f347d00528adcdcf12396d220f47492ed - bio_clone_biovec() no clonger shares the source bio's biovec, cloning the biovec is once again the default. Instead, we add a new bio_clone_biovec_fast(), which creates a clone that shares the source's biovec. This patch changes bcache and md to use ^ dm? __bio_clone_biovec_fast() since they're expecting the new behaviour due to other refactoring; most of the other uses of bio_clone() should be same to convert to the _fast() variant but that will be done more incrementally in other patches (bio_split() in particular). Hi Kent, I noticed yesterday the _fast variants of bio clone introduce sharing between the src and the clone, but without any reference counts: bio-bi_io_vec = bio_src-bi_io_vec; Have you audited all of the _fast users to make sure they are not freeing the src before the clone? Sorry if this came up already in past reviews. Note that __bio_clone() isn't being readded - the reason being that with immutable biovecs allocating the right number of biovecs for the new clone is no longer trivial so we don't want drivers trying to do that themselves. This patch also reverts febca1baea1cfe2d7a0271385d89b03d5fb34f94 - __bio_clone_fast() should not be setting bi_vcnt for bios that do not own the biovec (see Documentation/block/biovecs.txt for rationale) - in short, I think I see what you mean with tying bi_vcnt to ownership of the bio, but we're not consistent. Looking at bio_for_eaach_segment_all: * * drivers should _never_ use the all version - the bio may have been split * before it got to the driver and the driver won't own all of it */ #define bio_for_each_segment_all(bvl, bio, i) \ for (i = 0, bvl = (bio)-bi_io_vec; i (bio)-bi_vcnt; i++, bvl++) bio_for_each_segment_all still trusts bi_vcnt, so any bio_for_each_segment_all operation on a clone will basically be a noop. Just looking at MD raid1 make_request() mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); ... alloc_behind_pages(mbio, r1_bio); - bio_for_each_segment_all ... if (r1_bio-behind_bvecs) { bio_for_each_segment_all(bvec, mbio, j) ... I didn't test MD without the vcnt fix, but I think any operations in MD that duplicate data for raid1 turn into noops. I think we'll end up writing garbage (or nothing) to the second mirror. If you look at dm's crypt_free_buffer_pages(), it had similar problems. not setting it might cause bugs in the short term but long term it's likely to hide nastier more subtle bugs, we don't want code looking at bi_vcnt at all for bios it does not own. I think the concept of bio ownership is still much too weak, at least for established users like MD and DM. I don't know how to verify the sharing of bi_io_vec without some kind of reference counting on the iovec. -chris -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
On Wed, Nov 06, 2013 at 11:11:30AM -0500, Chris Mason wrote: Quoting Kent Overstreet (2013-11-05 22:48:41) This patch reverts the default behaviour introduced by 9fc6286f347d00528adcdcf12396d220f47492ed - bio_clone_biovec() no clonger shares the source bio's biovec, cloning the biovec is once again the default. Instead, we add a new bio_clone_biovec_fast(), which creates a clone that shares the source's biovec. This patch changes bcache and md to use ^ dm? __bio_clone_biovec_fast() since they're expecting the new behaviour due to other refactoring; most of the other uses of bio_clone() should be same to convert to the _fast() variant but that will be done more incrementally in other patches (bio_split() in particular). Hi Kent, I noticed yesterday the _fast variants of bio clone introduce sharing between the src and the clone, but without any reference counts: bio-bi_io_vec = bio_src-bi_io_vec; Have you audited all of the _fast users to make sure they are not freeing the src before the clone? Sorry if this came up already in past reviews. Yup - that should actually be safe for all the existing bio_clone() users actually, I audited all of them - because normally you're not going to complete the original bio until the clone finishes. Note that __bio_clone() isn't being readded - the reason being that with immutable biovecs allocating the right number of biovecs for the new clone is no longer trivial so we don't want drivers trying to do that themselves. This patch also reverts febca1baea1cfe2d7a0271385d89b03d5fb34f94 - __bio_clone_fast() should not be setting bi_vcnt for bios that do not own the biovec (see Documentation/block/biovecs.txt for rationale) - in short, I think I see what you mean with tying bi_vcnt to ownership of the bio, but we're not consistent. Looking at bio_for_eaach_segment_all: * * drivers should _never_ use the all version - the bio may have been split * before it got to the driver and the driver won't own all of it */ #define bio_for_each_segment_all(bvl, bio, i) \ for (i = 0, bvl = (bio)-bi_io_vec; i (bio)-bi_vcnt; i++, bvl++) bio_for_each_segment_all still trusts bi_vcnt, so any bio_for_each_segment_all operation on a clone will basically be a noop. Just looking at MD raid1 make_request() mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); ... alloc_behind_pages(mbio, r1_bio); - bio_for_each_segment_all ... if (r1_bio-behind_bvecs) { bio_for_each_segment_all(bvec, mbio, j) ... I didn't test MD without the vcnt fix, but I think any operations in MD that duplicate data for raid1 turn into noops. I think we'll end up writing garbage (or nothing) to the second mirror. If you look at dm's crypt_free_buffer_pages(), it had similar problems. Those are fine actually - in both cases, they're bios they allocated, not the bios that were submitted to them. Though md _definitely_ shouldn't have been sharing the original bio's biovec, so looks like this patch will fix a bug there... (I remember seeing that code before and I thought I added a bio_clone_biovec() call to that md code, but apparently that never got commited. Argh.) not setting it might cause bugs in the short term but long term it's likely to hide nastier more subtle bugs, we don't want code looking at bi_vcnt at all for bios it does not own. I think the concept of bio ownership is still much too weak, at least for established users like MD and DM. I don't know how to verify the sharing of bi_io_vec without some kind of reference counting on the iovec. What's unclear about it? The rule is just - if you didn't allocate the biovec, don't modify it or use bio_for_each_segment_all() (probably I didn't quite state it clearly enough before though) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
Quoting Kent Overstreet (2013-11-06 15:02:22) On Wed, Nov 06, 2013 at 11:11:30AM -0500, Chris Mason wrote: Quoting Kent Overstreet (2013-11-05 22:48:41) This patch reverts the default behaviour introduced by 9fc6286f347d00528adcdcf12396d220f47492ed - bio_clone_biovec() no clonger shares the source bio's biovec, cloning the biovec is once again the default. Instead, we add a new bio_clone_biovec_fast(), which creates a clone that shares the source's biovec. This patch changes bcache and md to use ^ dm? __bio_clone_biovec_fast() since they're expecting the new behaviour due to other refactoring; most of the other uses of bio_clone() should be same to convert to the _fast() variant but that will be done more incrementally in other patches (bio_split() in particular). Hi Kent, I noticed yesterday the _fast variants of bio clone introduce sharing between the src and the clone, but without any reference counts: bio-bi_io_vec = bio_src-bi_io_vec; Have you audited all of the _fast users to make sure they are not freeing the src before the clone? Sorry if this came up already in past reviews. Yup - that should actually be safe for all the existing bio_clone() users actually, I audited all of them - because normally you're not going to complete the original bio until the clone finishes. I'd say we need an ack from Neil before we can switch to _fast. The completions look non-trivial and _fast adds new ordering requirements on free. Note that __bio_clone() isn't being readded - the reason being that with immutable biovecs allocating the right number of biovecs for the new clone is no longer trivial so we don't want drivers trying to do that themselves. This patch also reverts febca1baea1cfe2d7a0271385d89b03d5fb34f94 - __bio_clone_fast() should not be setting bi_vcnt for bios that do not own the biovec (see Documentation/block/biovecs.txt for rationale) - in short, I think I see what you mean with tying bi_vcnt to ownership of the bio, but we're not consistent. Looking at bio_for_eaach_segment_all: * * drivers should _never_ use the all version - the bio may have been split * before it got to the driver and the driver won't own all of it */ #define bio_for_each_segment_all(bvl, bio, i) \ for (i = 0, bvl = (bio)-bi_io_vec; i (bio)-bi_vcnt; i++, bvl++) bio_for_each_segment_all still trusts bi_vcnt, so any bio_for_each_segment_all operation on a clone will basically be a noop. Just looking at MD raid1 make_request() mbio = bio_clone_mddev(bio, GFP_NOIO, mddev); ... alloc_behind_pages(mbio, r1_bio); - bio_for_each_segment_all ... if (r1_bio-behind_bvecs) { bio_for_each_segment_all(bvec, mbio, j) ... I didn't test MD without the vcnt fix, but I think any operations in MD that duplicate data for raid1 turn into noops. I think we'll end up writing garbage (or nothing) to the second mirror. If you look at dm's crypt_free_buffer_pages(), it had similar problems. Those are fine actually - in both cases, they're bios they allocated, not the bios that were submitted to them. Both are expecting bi_vcnt to be non-zero after a clone, which wasn't the case before the bi_vcnt patch. With your current patch (adding _fast) both should now be correct. Though md _definitely_ shouldn't have been sharing the original bio's biovec, so looks like this patch will fix a bug there... (I remember seeing that code before and I thought I added a bio_clone_biovec() call to that md code, but apparently that never got commited. Argh.) not setting it might cause bugs in the short term but long term it's likely to hide nastier more subtle bugs, we don't want code looking at bi_vcnt at all for bios it does not own. I think the concept of bio ownership is still much too weak, at least for established users like MD and DM. I don't know how to verify the sharing of bi_io_vec without some kind of reference counting on the iovec. What's unclear about it? The rule is just - if you didn't allocate the biovec, don't modify it or use bio_for_each_segment_all() (probably I didn't quite state it clearly enough before though) That part makes sense. The new rule that scares me is that we can't free the src of the clone until all the clones are freed. If it works with today's existing users it feels like it is more by accident than design. I'm not saying we can't do it, we just need some bigger flashing warning lights. -chris -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org
Re: [PATCH] block: Revert bio_clone() default behaviour
On Wed, Nov 06, 2013 at 03:22:36PM -0500, Chris Mason wrote: Quoting Kent Overstreet (2013-11-06 15:02:22) Yup - that should actually be safe for all the existing bio_clone() users actually, I audited all of them - because normally you're not going to complete the original bio until the clone finishes. I'd say we need an ack from Neil before we can switch to _fast. The completions look non-trivial and _fast adds new ordering requirements on free. Agreed, I've already broken md enough lately :P (I'm started to work on a real test suite for the block layer, that'll help with this stuff...) If you look at dm's crypt_free_buffer_pages(), it had similar problems. Those are fine actually - in both cases, they're bios they allocated, not the bios that were submitted to them. Both are expecting bi_vcnt to be non-zero after a clone, which wasn't the case before the bi_vcnt patch. With your current patch (adding _fast) both should now be correct. Yeah, but bi_vcnt being zero was just a symptom - the underlying reason the old code was wrong was that with the clone sharing the parent's biovec, md was modifying a biovec (including bv_page!) that it didn't own - _that_ was what was horribly broken about it. What's unclear about it? The rule is just - if you didn't allocate the biovec, don't modify it or use bio_for_each_segment_all() (probably I didn't quite state it clearly enough before though) That part makes sense. The new rule that scares me is that we can't free the src of the clone until all the clones are freed. If it works with today's existing users it feels like it is more by accident than design. I'm not saying we can't do it, we just need some bigger flashing warning lights. Yeah, Jens was saying the same thing yesterday, hence this patch. OTOH - with regards to just the ordering requirements, the more I look at various code the less accidental the fact that that works seems to be: the best explanation I've come up with so far is that you already needed to ensure that the _pages_ the clone points to stick around until the clone completes, and if you don't own the original bio the only way to do that is to not complete the original bio until after the clone completes. So if you're a driver cloning bios that were submitted to you, bio_clone_fast() introduces no new ordering requirements. On the third hand - if you're cloning (i.e. splitting) your own bios, in that case it would be possible to screw up the ordering - I don't know of any code in the kernel that does this today (except for, sort of, bcache) but my dio rewrite takes this approach - but if you do the obvious and sane thing with bio_chain() it's a non issue, it seems to me you'd have to come up with something pretty contrived and dumb for this to actually be an issue in practice. Anyways, I haven't come to any definite conclusions, those are just my observations so far. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
Quoting Kent Overstreet (2013-11-06 15:57:34) On Wed, Nov 06, 2013 at 03:22:36PM -0500, Chris Mason wrote: Quoting Kent Overstreet (2013-11-06 15:02:22) [ ... nods, thanks! ... ] OTOH - with regards to just the ordering requirements, the more I look at various code the less accidental the fact that that works seems to be: the best explanation I've come up with so far is that you already needed to ensure that the _pages_ the clone points to stick around until the clone completes, and if you don't own the original bio the only way to do that is to not complete the original bio until after the clone completes. So if you're a driver cloning bios that were submitted to you, bio_clone_fast() introduces no new ordering requirements. On the third hand - if you're cloning (i.e. splitting) your own bios, in that case it would be possible to screw up the ordering - I don't know of any code in the kernel that does this today (except for, sort of, bcache) but my dio rewrite takes this approach - but if you do the obvious and sane thing with bio_chain() it's a non issue, it seems to me you'd have to come up with something pretty contrived and dumb for this to actually be an issue in practice. Anyways, I haven't come to any definite conclusions, those are just my observations so far. I do think you're right. We all seem to have clones doing work on behalf of the original, and when everyone is done we complete the original. But, btrfs does have silly things like this: dio_end_io(dio_bio, err); // end and free the original bio_put(bio); // free the clone It's not a bug yet, but given enough time the space between those two frees will grow new code that kills us all. Really though, the new stuff is better, thanks. -chris -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
On Wed, Nov 06, 2013 at 04:25:45PM -0500, Chris Mason wrote: Quoting Kent Overstreet (2013-11-06 15:57:34) On Wed, Nov 06, 2013 at 03:22:36PM -0500, Chris Mason wrote: Quoting Kent Overstreet (2013-11-06 15:02:22) [ ... nods, thanks! ... ] OTOH - with regards to just the ordering requirements, the more I look at various code the less accidental the fact that that works seems to be: the best explanation I've come up with so far is that you already needed to ensure that the _pages_ the clone points to stick around until the clone completes, and if you don't own the original bio the only way to do that is to not complete the original bio until after the clone completes. So if you're a driver cloning bios that were submitted to you, bio_clone_fast() introduces no new ordering requirements. On the third hand - if you're cloning (i.e. splitting) your own bios, in that case it would be possible to screw up the ordering - I don't know of any code in the kernel that does this today (except for, sort of, bcache) but my dio rewrite takes this approach - but if you do the obvious and sane thing with bio_chain() it's a non issue, it seems to me you'd have to come up with something pretty contrived and dumb for this to actually be an issue in practice. Anyways, I haven't come to any definite conclusions, those are just my observations so far. I do think you're right. We all seem to have clones doing work on behalf of the original, and when everyone is done we complete the original. But, btrfs does have silly things like this: dio_end_io(dio_bio, err); // end and free the original bio_put(bio); // free the clone It's not a bug yet, but given enough time the space between those two frees will grow new code that kills us all. Hopefully the DIO situation will get better in the near future, Josef was telling me btrfs was probably going to end up with its own DIO code (replacing a good chunk of direct-io.c) and that should get a lot easier and saner for you guys with my dio rewrite. Need to finish getting that to pass xfstests... Really though, the new stuff is better, thanks. Thanks! :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
On Wed, 6 Nov 2013 15:22:36 -0500 Chris Mason chris.ma...@fusionio.com wrote: Yup - that should actually be safe for all the existing bio_clone() users actually, I audited all of them - because normally you're not going to complete the original bio until the clone finishes. I'd say we need an ack from Neil before we can switch to _fast. The completions look non-trivial and _fast adds new ordering requirements on free. I think it should be fairly straight forward to use _fast in md. As Kent says, the original bio that is cloned is (almost) never going to have endio called until all the clones have done their thing and finished up. The only exception is the behind_pages stuff in raid1.c. We clearly need to be more careful there. If the ordering requirements you mention are simply that the owner of the original may free the biovec, so the clone must have finished using it before that can happen, then that would most easily be fix with mbio-bi_io_vec = r1_bio-behind_bvecs; rather than the current loop which copies all the bv_page pointers. NeilBrown signature.asc Description: PGP signature
Re: [PATCH] block: Revert bio_clone() default behaviour
On Tue, Nov 05, 2013 at 09:02:19PM -0800, Olof Johansson wrote: > Hi, > > On Tue, Nov 5, 2013 at 7:48 PM, Kent Overstreet wrote: > > > Chris, Olaf, can you two in particular test this? I have tested the bounce > > buffer code (and bcache), but Jens told me today there was an md bug that I > > _still_ can't find any emails about so I'm not sure what to test for that. > > (I'm guessing you mean me and not some German person here. :-) > > What is this expected to apply on top of? It doesn't apply cleanly on > the 20131105 -next tree which is where I had been seeing the issues. Sorry - it's on top of Jens' for-3.13/core branch, git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
Hi, On Tue, Nov 5, 2013 at 7:48 PM, Kent Overstreet wrote: > Chris, Olaf, can you two in particular test this? I have tested the bounce > buffer code (and bcache), but Jens told me today there was an md bug that I > _still_ can't find any emails about so I'm not sure what to test for that. (I'm guessing you mean me and not some German person here. :-) What is this expected to apply on top of? It doesn't apply cleanly on the 20131105 -next tree which is where I had been seeing the issues. -Olof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] block: Revert bio_clone() default behaviour
This patch reverts the default behaviour introduced by 9fc6286f347d00528adcdcf12396d220f47492ed - bio_clone_biovec() no clonger shares the source bio's biovec, cloning the biovec is once again the default. Instead, we add a new bio_clone_biovec_fast(), which creates a clone that shares the source's biovec. This patch changes bcache and md to use __bio_clone_biovec_fast() since they're expecting the new behaviour due to other refactoring; most of the other uses of bio_clone() should be same to convert to the _fast() variant but that will be done more incrementally in other patches (bio_split() in particular). Note that __bio_clone() isn't being readded - the reason being that with immutable biovecs allocating the right number of biovecs for the new clone is no longer trivial so we don't want drivers trying to do that themselves. This patch also reverts febca1baea1cfe2d7a0271385d89b03d5fb34f94 - __bio_clone_fast() should not be setting bi_vcnt for bios that do not own the biovec (see Documentation/block/biovecs.txt for rationale) - in short, not setting it might cause bugs in the short term but long term it's likely to hide nastier more subtle bugs, we don't want code looking at bi_vcnt at all for bios it does not own. However, this patch _shouldn't_ cause any regressions because of this since we're reverting back to the old bio_clone() behaviour. Signed-off-by: Kent Overstreet Cc: Jens Axboe Cc: Chris Mason Cc: Mike Snitzer Cc: NeilBrown Cc: Olof Johansson --- Chris, Olaf, can you two in particular test this? I have tested the bounce buffer code (and bcache), but Jens told me today there was an md bug that I _still_ can't find any emails about so I'm not sure what to test for that. drivers/md/bcache/request.c | 6 +-- drivers/md/dm.c | 4 +- fs/bio.c| 104 +++- include/linux/bio.h | 6 +-- mm/bounce.c | 1 - 5 files changed, 72 insertions(+), 49 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 52a1fef..ef44198 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -681,7 +681,7 @@ static void do_bio_hook(struct search *s) struct bio *bio = >bio.bio; bio_init(bio); - __bio_clone(bio, s->orig_bio); + __bio_clone_fast(bio, s->orig_bio); bio->bi_end_io = request_endio; bio->bi_private = >cl; @@ -969,8 +969,8 @@ static void request_write(struct cached_dev *dc, struct search *s) trace_bcache_write(s->orig_bio, s->writeback, s->op.skip); if (!s->writeback) { - s->op.cache_bio = bio_clone_bioset(bio, GFP_NOIO, - dc->disk.bio_split); + s->op.cache_bio = bio_clone_bioset_fast(bio, GFP_NOIO, + dc->disk.bio_split); closure_bio_submit(bio, cl, s->d); } else { diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 8e6174c..bafe7ed 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1135,7 +1135,7 @@ static void clone_bio(struct dm_target_io *tio, struct bio *bio, { struct bio *clone = >clone; - __bio_clone(clone, bio); + __bio_clone_fast(clone, bio); if (bio_integrity(bio)) bio_integrity_clone(clone, bio, GFP_NOIO); @@ -1177,7 +1177,7 @@ static void __clone_and_map_simple_bio(struct clone_info *ci, * ci->bio->bi_max_vecs is BIO_INLINE_VECS anyway, for both flush * and discard, so no need for concern about wasted bvec allocations. */ -__bio_clone(clone, ci->bio); +__bio_clone_fast(clone, ci->bio); if (len) bio_setup_sector(clone, ci->sector, len); diff --git a/fs/bio.c b/fs/bio.c index 6046c91..99ff176 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -539,15 +539,17 @@ inline int bio_phys_segments(struct request_queue *q, struct bio *bio) EXPORT_SYMBOL(bio_phys_segments); /** - * __bio_clone - clone a bio + * __bio_clone_fast - clone a bio that shares the original bio's biovec * @bio: destination bio * @bio_src: bio to clone * * Clone a Caller will own the returned bio, but not * the actual data it points to. Reference count of returned * bio will be one. + * + * Caller must ensure that @bio_src is not freed before @bio. */ -void __bio_clone(struct bio *bio, struct bio *bio_src) +void __bio_clone_fast(struct bio *bio, struct bio *bio_src) { BUG_ON(bio->bi_pool && BIO_POOL_IDX(bio) != BIO_POOL_NONE); @@ -560,20 +562,19 @@ void __bio_clone(struct bio *bio, struct bio *bio_src) bio->bi_rw = bio_src->bi_rw; bio->bi_iter = bio_src->bi_iter; bio->bi_io_vec = bio_src->bi_io_vec; - bio->bi_vcnt = bio_src->bi_vcnt; } -EXPORT_SYMBOL(__bio_clone); +EXPORT_SYMBOL(__bio_clone_fast); /** - *
[PATCH] block: Revert bio_clone() default behaviour
This patch reverts the default behaviour introduced by 9fc6286f347d00528adcdcf12396d220f47492ed - bio_clone_biovec() no clonger shares the source bio's biovec, cloning the biovec is once again the default. Instead, we add a new bio_clone_biovec_fast(), which creates a clone that shares the source's biovec. This patch changes bcache and md to use __bio_clone_biovec_fast() since they're expecting the new behaviour due to other refactoring; most of the other uses of bio_clone() should be same to convert to the _fast() variant but that will be done more incrementally in other patches (bio_split() in particular). Note that __bio_clone() isn't being readded - the reason being that with immutable biovecs allocating the right number of biovecs for the new clone is no longer trivial so we don't want drivers trying to do that themselves. This patch also reverts febca1baea1cfe2d7a0271385d89b03d5fb34f94 - __bio_clone_fast() should not be setting bi_vcnt for bios that do not own the biovec (see Documentation/block/biovecs.txt for rationale) - in short, not setting it might cause bugs in the short term but long term it's likely to hide nastier more subtle bugs, we don't want code looking at bi_vcnt at all for bios it does not own. However, this patch _shouldn't_ cause any regressions because of this since we're reverting back to the old bio_clone() behaviour. Signed-off-by: Kent Overstreet k...@daterainc.com Cc: Jens Axboe ax...@kernel.dk Cc: Chris Mason chris.ma...@fusionio.com Cc: Mike Snitzer snit...@redhat.com Cc: NeilBrown ne...@suse.de Cc: Olof Johansson o...@lixom.net --- Chris, Olaf, can you two in particular test this? I have tested the bounce buffer code (and bcache), but Jens told me today there was an md bug that I _still_ can't find any emails about so I'm not sure what to test for that. drivers/md/bcache/request.c | 6 +-- drivers/md/dm.c | 4 +- fs/bio.c| 104 +++- include/linux/bio.h | 6 +-- mm/bounce.c | 1 - 5 files changed, 72 insertions(+), 49 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 52a1fef..ef44198 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -681,7 +681,7 @@ static void do_bio_hook(struct search *s) struct bio *bio = s-bio.bio; bio_init(bio); - __bio_clone(bio, s-orig_bio); + __bio_clone_fast(bio, s-orig_bio); bio-bi_end_io = request_endio; bio-bi_private = s-cl; @@ -969,8 +969,8 @@ static void request_write(struct cached_dev *dc, struct search *s) trace_bcache_write(s-orig_bio, s-writeback, s-op.skip); if (!s-writeback) { - s-op.cache_bio = bio_clone_bioset(bio, GFP_NOIO, - dc-disk.bio_split); + s-op.cache_bio = bio_clone_bioset_fast(bio, GFP_NOIO, + dc-disk.bio_split); closure_bio_submit(bio, cl, s-d); } else { diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 8e6174c..bafe7ed 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1135,7 +1135,7 @@ static void clone_bio(struct dm_target_io *tio, struct bio *bio, { struct bio *clone = tio-clone; - __bio_clone(clone, bio); + __bio_clone_fast(clone, bio); if (bio_integrity(bio)) bio_integrity_clone(clone, bio, GFP_NOIO); @@ -1177,7 +1177,7 @@ static void __clone_and_map_simple_bio(struct clone_info *ci, * ci-bio-bi_max_vecs is BIO_INLINE_VECS anyway, for both flush * and discard, so no need for concern about wasted bvec allocations. */ -__bio_clone(clone, ci-bio); +__bio_clone_fast(clone, ci-bio); if (len) bio_setup_sector(clone, ci-sector, len); diff --git a/fs/bio.c b/fs/bio.c index 6046c91..99ff176 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -539,15 +539,17 @@ inline int bio_phys_segments(struct request_queue *q, struct bio *bio) EXPORT_SYMBOL(bio_phys_segments); /** - * __bio_clone - clone a bio + * __bio_clone_fast - clone a bio that shares the original bio's biovec * @bio: destination bio * @bio_src: bio to clone * * Clone a bio. Caller will own the returned bio, but not * the actual data it points to. Reference count of returned * bio will be one. + * + * Caller must ensure that @bio_src is not freed before @bio. */ -void __bio_clone(struct bio *bio, struct bio *bio_src) +void __bio_clone_fast(struct bio *bio, struct bio *bio_src) { BUG_ON(bio-bi_pool BIO_POOL_IDX(bio) != BIO_POOL_NONE); @@ -560,20 +562,19 @@ void __bio_clone(struct bio *bio, struct bio *bio_src) bio-bi_rw = bio_src-bi_rw; bio-bi_iter = bio_src-bi_iter; bio-bi_io_vec = bio_src-bi_io_vec; - bio-bi_vcnt = bio_src-bi_vcnt; }
Re: [PATCH] block: Revert bio_clone() default behaviour
Hi, On Tue, Nov 5, 2013 at 7:48 PM, Kent Overstreet k...@daterainc.com wrote: Chris, Olaf, can you two in particular test this? I have tested the bounce buffer code (and bcache), but Jens told me today there was an md bug that I _still_ can't find any emails about so I'm not sure what to test for that. (I'm guessing you mean me and not some German person here. :-) What is this expected to apply on top of? It doesn't apply cleanly on the 20131105 -next tree which is where I had been seeing the issues. -Olof -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: Revert bio_clone() default behaviour
On Tue, Nov 05, 2013 at 09:02:19PM -0800, Olof Johansson wrote: Hi, On Tue, Nov 5, 2013 at 7:48 PM, Kent Overstreet k...@daterainc.com wrote: Chris, Olaf, can you two in particular test this? I have tested the bounce buffer code (and bcache), but Jens told me today there was an md bug that I _still_ can't find any emails about so I'm not sure what to test for that. (I'm guessing you mean me and not some German person here. :-) What is this expected to apply on top of? It doesn't apply cleanly on the 20131105 -next tree which is where I had been seeing the issues. Sorry - it's on top of Jens' for-3.13/core branch, git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/