Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
On Tue, Jan 08, 2013 at 11:15:37AM -0500, Jeff Moyer wrote: > Kent Overstreet writes: > > > On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote: > >> Kent Overstreet writes: > >> > >> >> Is the rbtree really faster than a basic (l)list and a sort before > >> >> completing them? Would be simpler. > >> > > >> > Well, depends. With one or two kioctxs? The list would definitely be > >> > faster, but I'm loathe to use an O(n^2) algorithm anywhere where the > >> > input size isn't strictly controlled, and I know of applications out > >> > there that use tons of kioctxs. > >> > >> Out of curiosity, what applications do you know of that use tons of > >> kioctx's? > > > > "tons" is relative I suppose, but before this patch series sharing a > > kioctx between threads was really bad for performance and... you know > > how people can be with threads. > > I wasn't questioning the merits of the patch, I was simply curious to > know how aio is being (ab)used in the wild. So, is this some internal > tool, then, or what? Oh, didn't think you were, I just never looked for actual numbers. Yeah, some internal library code is what I was referring to, but from the story of how it evolved I don't think it's unusual. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
Kent Overstreet writes: > On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote: >> Kent Overstreet writes: >> >> >> Is the rbtree really faster than a basic (l)list and a sort before >> >> completing them? Would be simpler. >> > >> > Well, depends. With one or two kioctxs? The list would definitely be >> > faster, but I'm loathe to use an O(n^2) algorithm anywhere where the >> > input size isn't strictly controlled, and I know of applications out >> > there that use tons of kioctxs. >> >> Out of curiosity, what applications do you know of that use tons of >> kioctx's? > > "tons" is relative I suppose, but before this patch series sharing a > kioctx between threads was really bad for performance and... you know > how people can be with threads. I wasn't questioning the merits of the patch, I was simply curious to know how aio is being (ab)used in the wild. So, is this some internal tool, then, or what? Thanks! Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote: > Kent Overstreet writes: > > >> Is the rbtree really faster than a basic (l)list and a sort before > >> completing them? Would be simpler. > > > > Well, depends. With one or two kioctxs? The list would definitely be > > faster, but I'm loathe to use an O(n^2) algorithm anywhere where the > > input size isn't strictly controlled, and I know of applications out > > there that use tons of kioctxs. > > Out of curiosity, what applications do you know of that use tons of > kioctx's? "tons" is relative I suppose, but before this patch series sharing a kioctx between threads was really bad for performance and... you know how people can be with threads. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
Kent Overstreet writes: >> Is the rbtree really faster than a basic (l)list and a sort before >> completing them? Would be simpler. > > Well, depends. With one or two kioctxs? The list would definitely be > faster, but I'm loathe to use an O(n^2) algorithm anywhere where the > input size isn't strictly controlled, and I know of applications out > there that use tons of kioctxs. Out of curiosity, what applications do you know of that use tons of kioctx's? -Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
Kent Overstreet koverstr...@google.com writes: Is the rbtree really faster than a basic (l)list and a sort before completing them? Would be simpler. Well, depends. With one or two kioctxs? The list would definitely be faster, but I'm loathe to use an O(n^2) algorithm anywhere where the input size isn't strictly controlled, and I know of applications out there that use tons of kioctxs. Out of curiosity, what applications do you know of that use tons of kioctx's? -Jeff -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote: Kent Overstreet koverstr...@google.com writes: Is the rbtree really faster than a basic (l)list and a sort before completing them? Would be simpler. Well, depends. With one or two kioctxs? The list would definitely be faster, but I'm loathe to use an O(n^2) algorithm anywhere where the input size isn't strictly controlled, and I know of applications out there that use tons of kioctxs. Out of curiosity, what applications do you know of that use tons of kioctx's? tons is relative I suppose, but before this patch series sharing a kioctx between threads was really bad for performance and... you know how people can be with threads. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
Kent Overstreet koverstr...@google.com writes: On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote: Kent Overstreet koverstr...@google.com writes: Is the rbtree really faster than a basic (l)list and a sort before completing them? Would be simpler. Well, depends. With one or two kioctxs? The list would definitely be faster, but I'm loathe to use an O(n^2) algorithm anywhere where the input size isn't strictly controlled, and I know of applications out there that use tons of kioctxs. Out of curiosity, what applications do you know of that use tons of kioctx's? tons is relative I suppose, but before this patch series sharing a kioctx between threads was really bad for performance and... you know how people can be with threads. I wasn't questioning the merits of the patch, I was simply curious to know how aio is being (ab)used in the wild. So, is this some internal tool, then, or what? Thanks! Jeff -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
On Tue, Jan 08, 2013 at 11:15:37AM -0500, Jeff Moyer wrote: Kent Overstreet koverstr...@google.com writes: On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote: Kent Overstreet koverstr...@google.com writes: Is the rbtree really faster than a basic (l)list and a sort before completing them? Would be simpler. Well, depends. With one or two kioctxs? The list would definitely be faster, but I'm loathe to use an O(n^2) algorithm anywhere where the input size isn't strictly controlled, and I know of applications out there that use tons of kioctxs. Out of curiosity, what applications do you know of that use tons of kioctx's? tons is relative I suppose, but before this patch series sharing a kioctx between threads was really bad for performance and... you know how people can be with threads. I wasn't questioning the merits of the patch, I was simply curious to know how aio is being (ab)used in the wild. So, is this some internal tool, then, or what? Oh, didn't think you were, I just never looked for actual numbers. Yeah, some internal library code is what I was referring to, but from the story of how it evolved I don't think it's unusual. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
On Fri, Jan 04, 2013 at 10:22:35AM +0100, Jens Axboe wrote: > On 2012-12-27 03:00, Kent Overstreet wrote: > > When completing a kiocb, there's some fixed overhead from touching the > > kioctx's ring buffer the kiocb belongs to. Some newer high end block > > devices can complete multiple IOs per interrupt, much like many network > > interfaces have been for some time. > > > > This plumbs through infrastructure so we can take advantage of multiple > > completions at the interrupt level, and complete multiple kiocbs at the > > same time. > > > > Drivers have to be converted to take advantage of this, but it's a > > simple change and the next patches will convert a few drivers. > > > > To use it, an interrupt handler (or any code that completes bios or > > requests) declares and initializes a struct batch_complete: > > > > struct batch_complete batch; > > batch_complete_init(); > > > > Then, instead of calling bio_endio(), it calls > > bio_endio_batch(bio, err, ). This just adds the bio to a list in > > the batch_complete. > > > > At the end, it calls > > > > batch_complete(); > > > > This completes all the bios all at once, building up a list of kiocbs; > > then the list of kiocbs are completed all at once. > > > > Also, in order to batch up the kiocbs we have to add a different > > bio_endio function to struct bio, that takes a pointer to the > > batch_complete - this patch converts the dio code's bio_endio function. > > In order to avoid changing every bio_endio function in the kernel (there > > are many), we currently use a union and a flag to indicate what kind of > > bio endio function to call. This is admittedly a hack, but should > > suffice for now. > > It is indeed a hack... Famous last words as well, I'm sure that'll stick > around forever if it goes in! Any ideas on how we can clean this up > before that? Well, I wouldn't _really_ mind changing all 200 bi_end_io uses. On the other hand, the majority of them are either leaf nodes (filesystem code and whatnot that's not completing anything else that could be batched), or stuff like the dm and md code where it could be plumbed through (so we could batch completions through md/dm) but it may take some thought to do it right. So I think I'd prefer to do it incrementally, for the moment. I'm always a bit terrified of doing a cleanup that touches 50+ files, and then changing my mind about something and going back and redoing it. That said, I haven't forgotten about all the other block layer patches I've got for you, as soon as I'm less swamped I'm going to finish off that stuff so I should be around to revisit it... > Apart from that, I think the batching makes functional sense. For the > devices where we do get batches of completions (most of them), it's the > right thing to do. Would be nice it were better integrated though, not a > side hack. > > Is the rbtree really faster than a basic (l)list and a sort before > completing them? Would be simpler. Well, depends. With one or two kioctxs? The list would definitely be faster, but I'm loathe to use an O(n^2) algorithm anywhere where the input size isn't strictly controlled, and I know of applications out there that use tons of kioctxs. > A few small comments below. > > > +void bio_endio_batch(struct bio *bio, int error, struct batch_complete > > *batch) > > +{ > > + if (error) > > + bio->bi_error = error; > > + > > + if (batch) > > + bio_list_add(>bio, bio); > > + else > > + __bio_endio(bio, batch); > > + > > +} > > Ugh, get rid of this 'batch' checking. The reason I did it that way is - well, look at the dio code's bi_end_io function. It's got to be passed a pointer to a struct batch_complete * to batch kiocbs, but the driver that calls it may or may not have batch completions plumbed through. So unless every single driver gets converted (and I think that'd be silly for all the ones that can't do any actual batching) something's going to have to have that check, and better for it to be in generic code than every mid layer code we plumb it through. > > > +static inline void bio_endio(struct bio *bio, int error) > > +{ > > + bio_endio_batch(bio, error, NULL); > > +} > > + > > Just make that __bio_endio(). That one could be changed... I dislike having the if (error) bio->bi_error = error duplicated... Actually, it'd probably make more sense to inline bio_endio_batch(), because often the compiler is going to either know whether batch is null or not or be able to lift it out of a loop. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
On Fri, Jan 04, 2013 at 10:22:35AM +0100, Jens Axboe wrote: On 2012-12-27 03:00, Kent Overstreet wrote: When completing a kiocb, there's some fixed overhead from touching the kioctx's ring buffer the kiocb belongs to. Some newer high end block devices can complete multiple IOs per interrupt, much like many network interfaces have been for some time. This plumbs through infrastructure so we can take advantage of multiple completions at the interrupt level, and complete multiple kiocbs at the same time. Drivers have to be converted to take advantage of this, but it's a simple change and the next patches will convert a few drivers. To use it, an interrupt handler (or any code that completes bios or requests) declares and initializes a struct batch_complete: struct batch_complete batch; batch_complete_init(batch); Then, instead of calling bio_endio(), it calls bio_endio_batch(bio, err, batch). This just adds the bio to a list in the batch_complete. At the end, it calls batch_complete(batch); This completes all the bios all at once, building up a list of kiocbs; then the list of kiocbs are completed all at once. Also, in order to batch up the kiocbs we have to add a different bio_endio function to struct bio, that takes a pointer to the batch_complete - this patch converts the dio code's bio_endio function. In order to avoid changing every bio_endio function in the kernel (there are many), we currently use a union and a flag to indicate what kind of bio endio function to call. This is admittedly a hack, but should suffice for now. It is indeed a hack... Famous last words as well, I'm sure that'll stick around forever if it goes in! Any ideas on how we can clean this up before that? Well, I wouldn't _really_ mind changing all 200 bi_end_io uses. On the other hand, the majority of them are either leaf nodes (filesystem code and whatnot that's not completing anything else that could be batched), or stuff like the dm and md code where it could be plumbed through (so we could batch completions through md/dm) but it may take some thought to do it right. So I think I'd prefer to do it incrementally, for the moment. I'm always a bit terrified of doing a cleanup that touches 50+ files, and then changing my mind about something and going back and redoing it. That said, I haven't forgotten about all the other block layer patches I've got for you, as soon as I'm less swamped I'm going to finish off that stuff so I should be around to revisit it... Apart from that, I think the batching makes functional sense. For the devices where we do get batches of completions (most of them), it's the right thing to do. Would be nice it were better integrated though, not a side hack. Is the rbtree really faster than a basic (l)list and a sort before completing them? Would be simpler. Well, depends. With one or two kioctxs? The list would definitely be faster, but I'm loathe to use an O(n^2) algorithm anywhere where the input size isn't strictly controlled, and I know of applications out there that use tons of kioctxs. A few small comments below. +void bio_endio_batch(struct bio *bio, int error, struct batch_complete *batch) +{ + if (error) + bio-bi_error = error; + + if (batch) + bio_list_add(batch-bio, bio); + else + __bio_endio(bio, batch); + +} Ugh, get rid of this 'batch' checking. The reason I did it that way is - well, look at the dio code's bi_end_io function. It's got to be passed a pointer to a struct batch_complete * to batch kiocbs, but the driver that calls it may or may not have batch completions plumbed through. So unless every single driver gets converted (and I think that'd be silly for all the ones that can't do any actual batching) something's going to have to have that check, and better for it to be in generic code than every mid layer code we plumb it through. +static inline void bio_endio(struct bio *bio, int error) +{ + bio_endio_batch(bio, error, NULL); +} + Just make that __bio_endio(). That one could be changed... I dislike having the if (error) bio-bi_error = error duplicated... Actually, it'd probably make more sense to inline bio_endio_batch(), because often the compiler is going to either know whether batch is null or not or be able to lift it out of a loop. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
On 2012-12-27 03:00, Kent Overstreet wrote: > When completing a kiocb, there's some fixed overhead from touching the > kioctx's ring buffer the kiocb belongs to. Some newer high end block > devices can complete multiple IOs per interrupt, much like many network > interfaces have been for some time. > > This plumbs through infrastructure so we can take advantage of multiple > completions at the interrupt level, and complete multiple kiocbs at the > same time. > > Drivers have to be converted to take advantage of this, but it's a > simple change and the next patches will convert a few drivers. > > To use it, an interrupt handler (or any code that completes bios or > requests) declares and initializes a struct batch_complete: > > struct batch_complete batch; > batch_complete_init(); > > Then, instead of calling bio_endio(), it calls > bio_endio_batch(bio, err, ). This just adds the bio to a list in > the batch_complete. > > At the end, it calls > > batch_complete(); > > This completes all the bios all at once, building up a list of kiocbs; > then the list of kiocbs are completed all at once. > > Also, in order to batch up the kiocbs we have to add a different > bio_endio function to struct bio, that takes a pointer to the > batch_complete - this patch converts the dio code's bio_endio function. > In order to avoid changing every bio_endio function in the kernel (there > are many), we currently use a union and a flag to indicate what kind of > bio endio function to call. This is admittedly a hack, but should > suffice for now. It is indeed a hack... Famous last words as well, I'm sure that'll stick around forever if it goes in! Any ideas on how we can clean this up before that? Apart from that, I think the batching makes functional sense. For the devices where we do get batches of completions (most of them), it's the right thing to do. Would be nice it were better integrated though, not a side hack. Is the rbtree really faster than a basic (l)list and a sort before completing them? Would be simpler. A few small comments below. > +void bio_endio_batch(struct bio *bio, int error, struct batch_complete > *batch) > +{ > + if (error) > + bio->bi_error = error; > + > + if (batch) > + bio_list_add(>bio, bio); > + else > + __bio_endio(bio, batch); > + > +} Ugh, get rid of this 'batch' checking. > +static inline void bio_endio(struct bio *bio, int error) > +{ > + bio_endio_batch(bio, error, NULL); > +} > + Just make that __bio_endio(). Same thing exists on the rq side, iirc. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs
On 2012-12-27 03:00, Kent Overstreet wrote: When completing a kiocb, there's some fixed overhead from touching the kioctx's ring buffer the kiocb belongs to. Some newer high end block devices can complete multiple IOs per interrupt, much like many network interfaces have been for some time. This plumbs through infrastructure so we can take advantage of multiple completions at the interrupt level, and complete multiple kiocbs at the same time. Drivers have to be converted to take advantage of this, but it's a simple change and the next patches will convert a few drivers. To use it, an interrupt handler (or any code that completes bios or requests) declares and initializes a struct batch_complete: struct batch_complete batch; batch_complete_init(batch); Then, instead of calling bio_endio(), it calls bio_endio_batch(bio, err, batch). This just adds the bio to a list in the batch_complete. At the end, it calls batch_complete(batch); This completes all the bios all at once, building up a list of kiocbs; then the list of kiocbs are completed all at once. Also, in order to batch up the kiocbs we have to add a different bio_endio function to struct bio, that takes a pointer to the batch_complete - this patch converts the dio code's bio_endio function. In order to avoid changing every bio_endio function in the kernel (there are many), we currently use a union and a flag to indicate what kind of bio endio function to call. This is admittedly a hack, but should suffice for now. It is indeed a hack... Famous last words as well, I'm sure that'll stick around forever if it goes in! Any ideas on how we can clean this up before that? Apart from that, I think the batching makes functional sense. For the devices where we do get batches of completions (most of them), it's the right thing to do. Would be nice it were better integrated though, not a side hack. Is the rbtree really faster than a basic (l)list and a sort before completing them? Would be simpler. A few small comments below. +void bio_endio_batch(struct bio *bio, int error, struct batch_complete *batch) +{ + if (error) + bio-bi_error = error; + + if (batch) + bio_list_add(batch-bio, bio); + else + __bio_endio(bio, batch); + +} Ugh, get rid of this 'batch' checking. +static inline void bio_endio(struct bio *bio, int error) +{ + bio_endio_batch(bio, error, NULL); +} + Just make that __bio_endio(). Same thing exists on the rq side, iirc. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 29/32] block, aio: Batch completion for bios/kiocbs
When completing a kiocb, there's some fixed overhead from touching the kioctx's ring buffer the kiocb belongs to. Some newer high end block devices can complete multiple IOs per interrupt, much like many network interfaces have been for some time. This plumbs through infrastructure so we can take advantage of multiple completions at the interrupt level, and complete multiple kiocbs at the same time. Drivers have to be converted to take advantage of this, but it's a simple change and the next patches will convert a few drivers. To use it, an interrupt handler (or any code that completes bios or requests) declares and initializes a struct batch_complete: struct batch_complete batch; batch_complete_init(); Then, instead of calling bio_endio(), it calls bio_endio_batch(bio, err, ). This just adds the bio to a list in the batch_complete. At the end, it calls batch_complete(); This completes all the bios all at once, building up a list of kiocbs; then the list of kiocbs are completed all at once. Also, in order to batch up the kiocbs we have to add a different bio_endio function to struct bio, that takes a pointer to the batch_complete - this patch converts the dio code's bio_endio function. In order to avoid changing every bio_endio function in the kernel (there are many), we currently use a union and a flag to indicate what kind of bio endio function to call. This is admittedly a hack, but should suffice for now. For batching to work through say md or dm devices, the md/dm bio_endio functions would have to be converted, much like the dio code. That is left for future patches. Signed-off-by: Kent Overstreet --- block/blk-core.c | 34 --- block/blk-flush.c | 2 +- block/blk.h | 3 +- drivers/block/swim3.c | 2 +- drivers/md/dm.c | 2 +- fs/aio.c | 254 +++--- fs/bio.c | 52 ++ fs/direct-io.c| 20 ++-- include/linux/aio.h | 22 +++- include/linux/bio.h | 36 ++- include/linux/blk_types.h | 11 +- include/linux/blkdev.h| 12 ++- 12 files changed, 311 insertions(+), 139 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3c95c4d..4fac6ddb 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -151,7 +151,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq) EXPORT_SYMBOL(blk_rq_init); static void req_bio_endio(struct request *rq, struct bio *bio, - unsigned int nbytes, int error) + unsigned int nbytes, int error, + struct batch_complete *batch) { if (error) clear_bit(BIO_UPTODATE, >bi_flags); @@ -175,7 +176,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio, /* don't actually finish bio if it's part of flush sequence */ if (bio->bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ)) - bio_endio(bio, error); + bio_endio_batch(bio, error, batch); } void blk_dump_rq_flags(struct request *rq, char *msg) @@ -2215,7 +2216,8 @@ EXPORT_SYMBOL(blk_fetch_request); * %false - this request doesn't have any more data * %true - this request has more data **/ -bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) +bool blk_update_request(struct request *req, int error, unsigned int nr_bytes, + struct batch_complete *batch) { int total_bytes, bio_nbytes, next_idx = 0; struct bio *bio; @@ -2271,7 +2273,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) if (nr_bytes >= bio->bi_size) { req->bio = bio->bi_next; nbytes = bio->bi_size; - req_bio_endio(req, bio, nbytes, error); + req_bio_endio(req, bio, nbytes, error, batch); next_idx = 0; bio_nbytes = 0; } else { @@ -2333,7 +2335,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) * if the request wasn't completed, update state */ if (bio_nbytes) { - req_bio_endio(req, bio, bio_nbytes, error); + req_bio_endio(req, bio, bio_nbytes, error, batch); bio->bi_idx += next_idx; bio_iovec(bio)->bv_offset += nr_bytes; bio_iovec(bio)->bv_len -= nr_bytes; @@ -2370,14 +2372,15 @@ EXPORT_SYMBOL_GPL(blk_update_request); static bool blk_update_bidi_request(struct request *rq, int error, unsigned int nr_bytes, - unsigned int bidi_bytes) + unsigned int bidi_bytes, + struct batch_complete *batch) { - if (blk_update_request(rq, error, nr_bytes)) + if
[PATCH 29/32] block, aio: Batch completion for bios/kiocbs
When completing a kiocb, there's some fixed overhead from touching the kioctx's ring buffer the kiocb belongs to. Some newer high end block devices can complete multiple IOs per interrupt, much like many network interfaces have been for some time. This plumbs through infrastructure so we can take advantage of multiple completions at the interrupt level, and complete multiple kiocbs at the same time. Drivers have to be converted to take advantage of this, but it's a simple change and the next patches will convert a few drivers. To use it, an interrupt handler (or any code that completes bios or requests) declares and initializes a struct batch_complete: struct batch_complete batch; batch_complete_init(batch); Then, instead of calling bio_endio(), it calls bio_endio_batch(bio, err, batch). This just adds the bio to a list in the batch_complete. At the end, it calls batch_complete(batch); This completes all the bios all at once, building up a list of kiocbs; then the list of kiocbs are completed all at once. Also, in order to batch up the kiocbs we have to add a different bio_endio function to struct bio, that takes a pointer to the batch_complete - this patch converts the dio code's bio_endio function. In order to avoid changing every bio_endio function in the kernel (there are many), we currently use a union and a flag to indicate what kind of bio endio function to call. This is admittedly a hack, but should suffice for now. For batching to work through say md or dm devices, the md/dm bio_endio functions would have to be converted, much like the dio code. That is left for future patches. Signed-off-by: Kent Overstreet koverstr...@google.com --- block/blk-core.c | 34 --- block/blk-flush.c | 2 +- block/blk.h | 3 +- drivers/block/swim3.c | 2 +- drivers/md/dm.c | 2 +- fs/aio.c | 254 +++--- fs/bio.c | 52 ++ fs/direct-io.c| 20 ++-- include/linux/aio.h | 22 +++- include/linux/bio.h | 36 ++- include/linux/blk_types.h | 11 +- include/linux/blkdev.h| 12 ++- 12 files changed, 311 insertions(+), 139 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3c95c4d..4fac6ddb 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -151,7 +151,8 @@ void blk_rq_init(struct request_queue *q, struct request *rq) EXPORT_SYMBOL(blk_rq_init); static void req_bio_endio(struct request *rq, struct bio *bio, - unsigned int nbytes, int error) + unsigned int nbytes, int error, + struct batch_complete *batch) { if (error) clear_bit(BIO_UPTODATE, bio-bi_flags); @@ -175,7 +176,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio, /* don't actually finish bio if it's part of flush sequence */ if (bio-bi_size == 0 !(rq-cmd_flags REQ_FLUSH_SEQ)) - bio_endio(bio, error); + bio_endio_batch(bio, error, batch); } void blk_dump_rq_flags(struct request *rq, char *msg) @@ -2215,7 +2216,8 @@ EXPORT_SYMBOL(blk_fetch_request); * %false - this request doesn't have any more data * %true - this request has more data **/ -bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) +bool blk_update_request(struct request *req, int error, unsigned int nr_bytes, + struct batch_complete *batch) { int total_bytes, bio_nbytes, next_idx = 0; struct bio *bio; @@ -2271,7 +2273,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) if (nr_bytes = bio-bi_size) { req-bio = bio-bi_next; nbytes = bio-bi_size; - req_bio_endio(req, bio, nbytes, error); + req_bio_endio(req, bio, nbytes, error, batch); next_idx = 0; bio_nbytes = 0; } else { @@ -2333,7 +2335,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) * if the request wasn't completed, update state */ if (bio_nbytes) { - req_bio_endio(req, bio, bio_nbytes, error); + req_bio_endio(req, bio, bio_nbytes, error, batch); bio-bi_idx += next_idx; bio_iovec(bio)-bv_offset += nr_bytes; bio_iovec(bio)-bv_len -= nr_bytes; @@ -2370,14 +2372,15 @@ EXPORT_SYMBOL_GPL(blk_update_request); static bool blk_update_bidi_request(struct request *rq, int error, unsigned int nr_bytes, - unsigned int bidi_bytes) + unsigned int bidi_bytes, + struct batch_complete *batch) { - if (blk_update_request(rq, error,