Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-08 Thread Kent Overstreet
On Tue, Jan 08, 2013 at 11:15:37AM -0500, Jeff Moyer wrote:
> Kent Overstreet  writes:
> 
> > On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote:
> >> Kent Overstreet  writes:
> >> 
> >> >> Is the rbtree really faster than a basic (l)list and a sort before
> >> >> completing them? Would be simpler.
> >> >
> >> > Well, depends. With one or two kioctxs? The list would definitely be
> >> > faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
> >> > input size isn't strictly controlled, and I know of applications out
> >> > there that use tons of kioctxs.
> >> 
> >> Out of curiosity, what applications do you know of that use tons of
> >> kioctx's?
> >
> > "tons" is relative I suppose, but before this patch series sharing a
> > kioctx between threads was really bad for performance and... you know
> > how people can be with threads.
> 
> I wasn't questioning the merits of the patch, I was simply curious to
> know how aio is being (ab)used in the wild.  So, is this some internal
> tool, then, or what?

Oh, didn't think you were, I just never looked for actual numbers. Yeah,
some internal library code is what I was referring to, but from the
story of how it evolved I don't think it's unusual.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-08 Thread Jeff Moyer
Kent Overstreet  writes:

> On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote:
>> Kent Overstreet  writes:
>> 
>> >> Is the rbtree really faster than a basic (l)list and a sort before
>> >> completing them? Would be simpler.
>> >
>> > Well, depends. With one or two kioctxs? The list would definitely be
>> > faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
>> > input size isn't strictly controlled, and I know of applications out
>> > there that use tons of kioctxs.
>> 
>> Out of curiosity, what applications do you know of that use tons of
>> kioctx's?
>
> "tons" is relative I suppose, but before this patch series sharing a
> kioctx between threads was really bad for performance and... you know
> how people can be with threads.

I wasn't questioning the merits of the patch, I was simply curious to
know how aio is being (ab)used in the wild.  So, is this some internal
tool, then, or what?

Thanks!
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-08 Thread Kent Overstreet
On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote:
> Kent Overstreet  writes:
> 
> >> Is the rbtree really faster than a basic (l)list and a sort before
> >> completing them? Would be simpler.
> >
> > Well, depends. With one or two kioctxs? The list would definitely be
> > faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
> > input size isn't strictly controlled, and I know of applications out
> > there that use tons of kioctxs.
> 
> Out of curiosity, what applications do you know of that use tons of
> kioctx's?

"tons" is relative I suppose, but before this patch series sharing a
kioctx between threads was really bad for performance and... you know
how people can be with threads.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-08 Thread Jeff Moyer
Kent Overstreet  writes:

>> Is the rbtree really faster than a basic (l)list and a sort before
>> completing them? Would be simpler.
>
> Well, depends. With one or two kioctxs? The list would definitely be
> faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
> input size isn't strictly controlled, and I know of applications out
> there that use tons of kioctxs.

Out of curiosity, what applications do you know of that use tons of
kioctx's?

-Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-08 Thread Jeff Moyer
Kent Overstreet koverstr...@google.com writes:

 Is the rbtree really faster than a basic (l)list and a sort before
 completing them? Would be simpler.

 Well, depends. With one or two kioctxs? The list would definitely be
 faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
 input size isn't strictly controlled, and I know of applications out
 there that use tons of kioctxs.

Out of curiosity, what applications do you know of that use tons of
kioctx's?

-Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-08 Thread Kent Overstreet
On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote:
 Kent Overstreet koverstr...@google.com writes:
 
  Is the rbtree really faster than a basic (l)list and a sort before
  completing them? Would be simpler.
 
  Well, depends. With one or two kioctxs? The list would definitely be
  faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
  input size isn't strictly controlled, and I know of applications out
  there that use tons of kioctxs.
 
 Out of curiosity, what applications do you know of that use tons of
 kioctx's?

tons is relative I suppose, but before this patch series sharing a
kioctx between threads was really bad for performance and... you know
how people can be with threads.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-08 Thread Jeff Moyer
Kent Overstreet koverstr...@google.com writes:

 On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote:
 Kent Overstreet koverstr...@google.com writes:
 
  Is the rbtree really faster than a basic (l)list and a sort before
  completing them? Would be simpler.
 
  Well, depends. With one or two kioctxs? The list would definitely be
  faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
  input size isn't strictly controlled, and I know of applications out
  there that use tons of kioctxs.
 
 Out of curiosity, what applications do you know of that use tons of
 kioctx's?

 tons is relative I suppose, but before this patch series sharing a
 kioctx between threads was really bad for performance and... you know
 how people can be with threads.

I wasn't questioning the merits of the patch, I was simply curious to
know how aio is being (ab)used in the wild.  So, is this some internal
tool, then, or what?

Thanks!
Jeff
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-08 Thread Kent Overstreet
On Tue, Jan 08, 2013 at 11:15:37AM -0500, Jeff Moyer wrote:
 Kent Overstreet koverstr...@google.com writes:
 
  On Tue, Jan 08, 2013 at 10:33:18AM -0500, Jeff Moyer wrote:
  Kent Overstreet koverstr...@google.com writes:
  
   Is the rbtree really faster than a basic (l)list and a sort before
   completing them? Would be simpler.
  
   Well, depends. With one or two kioctxs? The list would definitely be
   faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
   input size isn't strictly controlled, and I know of applications out
   there that use tons of kioctxs.
  
  Out of curiosity, what applications do you know of that use tons of
  kioctx's?
 
  tons is relative I suppose, but before this patch series sharing a
  kioctx between threads was really bad for performance and... you know
  how people can be with threads.
 
 I wasn't questioning the merits of the patch, I was simply curious to
 know how aio is being (ab)used in the wild.  So, is this some internal
 tool, then, or what?

Oh, didn't think you were, I just never looked for actual numbers. Yeah,
some internal library code is what I was referring to, but from the
story of how it evolved I don't think it's unusual.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-07 Thread Kent Overstreet
On Fri, Jan 04, 2013 at 10:22:35AM +0100, Jens Axboe wrote:
> On 2012-12-27 03:00, Kent Overstreet wrote:
> > When completing a kiocb, there's some fixed overhead from touching the
> > kioctx's ring buffer the kiocb belongs to. Some newer high end block
> > devices can complete multiple IOs per interrupt, much like many network
> > interfaces have been for some time.
> > 
> > This plumbs through infrastructure so we can take advantage of multiple
> > completions at the interrupt level, and complete multiple kiocbs at the
> > same time.
> > 
> > Drivers have to be converted to take advantage of this, but it's a
> > simple change and the next patches will convert a few drivers.
> > 
> > To use it, an interrupt handler (or any code that completes bios or
> > requests) declares and initializes a struct batch_complete:
> > 
> > struct batch_complete batch;
> > batch_complete_init();
> > 
> > Then, instead of calling bio_endio(), it calls
> > bio_endio_batch(bio, err, ). This just adds the bio to a list in
> > the batch_complete.
> > 
> > At the end, it calls
> > 
> > batch_complete();
> > 
> > This completes all the bios all at once, building up a list of kiocbs;
> > then the list of kiocbs are completed all at once.
> > 
> > Also, in order to batch up the kiocbs we have to add a different
> > bio_endio function to struct bio, that takes a pointer to the
> > batch_complete - this patch converts the dio code's bio_endio function.
> > In order to avoid changing every bio_endio function in the kernel (there
> > are many), we currently use a union and a flag to indicate what kind of
> > bio endio function to call. This is admittedly a hack, but should
> > suffice for now.
> 
> It is indeed a hack... Famous last words as well, I'm sure that'll stick
> around forever if it goes in! Any ideas on how we can clean this up
> before that?

Well, I wouldn't _really_ mind changing all 200 bi_end_io uses. On the
other hand, the majority of them are either leaf nodes (filesystem code
and whatnot that's not completing anything else that could be batched),
or stuff like the dm and md code where it could be plumbed through (so
we could batch completions through md/dm) but it may take some thought
to do it right.

So I think I'd prefer to do it incrementally, for the moment. I'm always
a bit terrified of doing a cleanup that touches 50+ files, and then
changing my mind about something and going back and redoing it.

That said, I haven't forgotten about all the other block layer patches
I've got for you, as soon as I'm less swamped I'm going to finish off
that stuff so I should be around to revisit it...

> Apart from that, I think the batching makes functional sense. For the
> devices where we do get batches of completions (most of them), it's the
> right thing to do. Would be nice it were better integrated though, not a
> side hack.
> 
> Is the rbtree really faster than a basic (l)list and a sort before
> completing them? Would be simpler.

Well, depends. With one or two kioctxs? The list would definitely be
faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
input size isn't strictly controlled, and I know of applications out
there that use tons of kioctxs.

> A few small comments below.
> 
> > +void bio_endio_batch(struct bio *bio, int error, struct batch_complete 
> > *batch)
> > +{
> > +   if (error)
> > +   bio->bi_error = error;
> > +
> > +   if (batch)
> > +   bio_list_add(>bio, bio);
> > +   else
> > +   __bio_endio(bio, batch);
> > +
> > +}
> 
> Ugh, get rid of this 'batch' checking.

The reason I did it that way is - well, look at the dio code's bi_end_io
function. It's got to be passed a pointer to a struct batch_complete *
to batch kiocbs, but the driver that calls it may or may not have batch
completions plumbed through.

So unless every single driver gets converted (and I think that'd be
silly for all the ones that can't do any actual batching) something's
going to have to have that check, and better for it to be in generic
code than every mid layer code we plumb it through.

> 
> > +static inline void bio_endio(struct bio *bio, int error)
> > +{
> > +   bio_endio_batch(bio, error, NULL);
> > +}
> > +
> 
> Just make that __bio_endio().

That one could be changed... I dislike having the if (error)
bio->bi_error = error duplicated...

Actually, it'd probably make more sense to inline bio_endio_batch(),
because often the compiler is going to either know whether batch is null
or not or be able to lift it out of a loop.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-07 Thread Kent Overstreet
On Fri, Jan 04, 2013 at 10:22:35AM +0100, Jens Axboe wrote:
 On 2012-12-27 03:00, Kent Overstreet wrote:
  When completing a kiocb, there's some fixed overhead from touching the
  kioctx's ring buffer the kiocb belongs to. Some newer high end block
  devices can complete multiple IOs per interrupt, much like many network
  interfaces have been for some time.
  
  This plumbs through infrastructure so we can take advantage of multiple
  completions at the interrupt level, and complete multiple kiocbs at the
  same time.
  
  Drivers have to be converted to take advantage of this, but it's a
  simple change and the next patches will convert a few drivers.
  
  To use it, an interrupt handler (or any code that completes bios or
  requests) declares and initializes a struct batch_complete:
  
  struct batch_complete batch;
  batch_complete_init(batch);
  
  Then, instead of calling bio_endio(), it calls
  bio_endio_batch(bio, err, batch). This just adds the bio to a list in
  the batch_complete.
  
  At the end, it calls
  
  batch_complete(batch);
  
  This completes all the bios all at once, building up a list of kiocbs;
  then the list of kiocbs are completed all at once.
  
  Also, in order to batch up the kiocbs we have to add a different
  bio_endio function to struct bio, that takes a pointer to the
  batch_complete - this patch converts the dio code's bio_endio function.
  In order to avoid changing every bio_endio function in the kernel (there
  are many), we currently use a union and a flag to indicate what kind of
  bio endio function to call. This is admittedly a hack, but should
  suffice for now.
 
 It is indeed a hack... Famous last words as well, I'm sure that'll stick
 around forever if it goes in! Any ideas on how we can clean this up
 before that?

Well, I wouldn't _really_ mind changing all 200 bi_end_io uses. On the
other hand, the majority of them are either leaf nodes (filesystem code
and whatnot that's not completing anything else that could be batched),
or stuff like the dm and md code where it could be plumbed through (so
we could batch completions through md/dm) but it may take some thought
to do it right.

So I think I'd prefer to do it incrementally, for the moment. I'm always
a bit terrified of doing a cleanup that touches 50+ files, and then
changing my mind about something and going back and redoing it.

That said, I haven't forgotten about all the other block layer patches
I've got for you, as soon as I'm less swamped I'm going to finish off
that stuff so I should be around to revisit it...

 Apart from that, I think the batching makes functional sense. For the
 devices where we do get batches of completions (most of them), it's the
 right thing to do. Would be nice it were better integrated though, not a
 side hack.
 
 Is the rbtree really faster than a basic (l)list and a sort before
 completing them? Would be simpler.

Well, depends. With one or two kioctxs? The list would definitely be
faster, but I'm loathe to use an O(n^2) algorithm anywhere where the
input size isn't strictly controlled, and I know of applications out
there that use tons of kioctxs.

 A few small comments below.
 
  +void bio_endio_batch(struct bio *bio, int error, struct batch_complete 
  *batch)
  +{
  +   if (error)
  +   bio-bi_error = error;
  +
  +   if (batch)
  +   bio_list_add(batch-bio, bio);
  +   else
  +   __bio_endio(bio, batch);
  +
  +}
 
 Ugh, get rid of this 'batch' checking.

The reason I did it that way is - well, look at the dio code's bi_end_io
function. It's got to be passed a pointer to a struct batch_complete *
to batch kiocbs, but the driver that calls it may or may not have batch
completions plumbed through.

So unless every single driver gets converted (and I think that'd be
silly for all the ones that can't do any actual batching) something's
going to have to have that check, and better for it to be in generic
code than every mid layer code we plumb it through.

 
  +static inline void bio_endio(struct bio *bio, int error)
  +{
  +   bio_endio_batch(bio, error, NULL);
  +}
  +
 
 Just make that __bio_endio().

That one could be changed... I dislike having the if (error)
bio-bi_error = error duplicated...

Actually, it'd probably make more sense to inline bio_endio_batch(),
because often the compiler is going to either know whether batch is null
or not or be able to lift it out of a loop.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-04 Thread Jens Axboe
On 2012-12-27 03:00, Kent Overstreet wrote:
> When completing a kiocb, there's some fixed overhead from touching the
> kioctx's ring buffer the kiocb belongs to. Some newer high end block
> devices can complete multiple IOs per interrupt, much like many network
> interfaces have been for some time.
> 
> This plumbs through infrastructure so we can take advantage of multiple
> completions at the interrupt level, and complete multiple kiocbs at the
> same time.
> 
> Drivers have to be converted to take advantage of this, but it's a
> simple change and the next patches will convert a few drivers.
> 
> To use it, an interrupt handler (or any code that completes bios or
> requests) declares and initializes a struct batch_complete:
> 
> struct batch_complete batch;
> batch_complete_init();
> 
> Then, instead of calling bio_endio(), it calls
> bio_endio_batch(bio, err, ). This just adds the bio to a list in
> the batch_complete.
> 
> At the end, it calls
> 
> batch_complete();
> 
> This completes all the bios all at once, building up a list of kiocbs;
> then the list of kiocbs are completed all at once.
> 
> Also, in order to batch up the kiocbs we have to add a different
> bio_endio function to struct bio, that takes a pointer to the
> batch_complete - this patch converts the dio code's bio_endio function.
> In order to avoid changing every bio_endio function in the kernel (there
> are many), we currently use a union and a flag to indicate what kind of
> bio endio function to call. This is admittedly a hack, but should
> suffice for now.

It is indeed a hack... Famous last words as well, I'm sure that'll stick
around forever if it goes in! Any ideas on how we can clean this up
before that?

Apart from that, I think the batching makes functional sense. For the
devices where we do get batches of completions (most of them), it's the
right thing to do. Would be nice it were better integrated though, not a
side hack.

Is the rbtree really faster than a basic (l)list and a sort before
completing them? Would be simpler.

A few small comments below.

> +void bio_endio_batch(struct bio *bio, int error, struct batch_complete 
> *batch)
> +{
> + if (error)
> + bio->bi_error = error;
> +
> + if (batch)
> + bio_list_add(>bio, bio);
> + else
> + __bio_endio(bio, batch);
> +
> +}

Ugh, get rid of this 'batch' checking.

> +static inline void bio_endio(struct bio *bio, int error)
> +{
> + bio_endio_batch(bio, error, NULL);
> +}
> +

Just make that __bio_endio().

Same thing exists on the rq side, iirc.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2013-01-04 Thread Jens Axboe
On 2012-12-27 03:00, Kent Overstreet wrote:
 When completing a kiocb, there's some fixed overhead from touching the
 kioctx's ring buffer the kiocb belongs to. Some newer high end block
 devices can complete multiple IOs per interrupt, much like many network
 interfaces have been for some time.
 
 This plumbs through infrastructure so we can take advantage of multiple
 completions at the interrupt level, and complete multiple kiocbs at the
 same time.
 
 Drivers have to be converted to take advantage of this, but it's a
 simple change and the next patches will convert a few drivers.
 
 To use it, an interrupt handler (or any code that completes bios or
 requests) declares and initializes a struct batch_complete:
 
 struct batch_complete batch;
 batch_complete_init(batch);
 
 Then, instead of calling bio_endio(), it calls
 bio_endio_batch(bio, err, batch). This just adds the bio to a list in
 the batch_complete.
 
 At the end, it calls
 
 batch_complete(batch);
 
 This completes all the bios all at once, building up a list of kiocbs;
 then the list of kiocbs are completed all at once.
 
 Also, in order to batch up the kiocbs we have to add a different
 bio_endio function to struct bio, that takes a pointer to the
 batch_complete - this patch converts the dio code's bio_endio function.
 In order to avoid changing every bio_endio function in the kernel (there
 are many), we currently use a union and a flag to indicate what kind of
 bio endio function to call. This is admittedly a hack, but should
 suffice for now.

It is indeed a hack... Famous last words as well, I'm sure that'll stick
around forever if it goes in! Any ideas on how we can clean this up
before that?

Apart from that, I think the batching makes functional sense. For the
devices where we do get batches of completions (most of them), it's the
right thing to do. Would be nice it were better integrated though, not a
side hack.

Is the rbtree really faster than a basic (l)list and a sort before
completing them? Would be simpler.

A few small comments below.

 +void bio_endio_batch(struct bio *bio, int error, struct batch_complete 
 *batch)
 +{
 + if (error)
 + bio-bi_error = error;
 +
 + if (batch)
 + bio_list_add(batch-bio, bio);
 + else
 + __bio_endio(bio, batch);
 +
 +}

Ugh, get rid of this 'batch' checking.

 +static inline void bio_endio(struct bio *bio, int error)
 +{
 + bio_endio_batch(bio, error, NULL);
 +}
 +

Just make that __bio_endio().

Same thing exists on the rq side, iirc.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2012-12-26 Thread Kent Overstreet
When completing a kiocb, there's some fixed overhead from touching the
kioctx's ring buffer the kiocb belongs to. Some newer high end block
devices can complete multiple IOs per interrupt, much like many network
interfaces have been for some time.

This plumbs through infrastructure so we can take advantage of multiple
completions at the interrupt level, and complete multiple kiocbs at the
same time.

Drivers have to be converted to take advantage of this, but it's a
simple change and the next patches will convert a few drivers.

To use it, an interrupt handler (or any code that completes bios or
requests) declares and initializes a struct batch_complete:

struct batch_complete batch;
batch_complete_init();

Then, instead of calling bio_endio(), it calls
bio_endio_batch(bio, err, ). This just adds the bio to a list in
the batch_complete.

At the end, it calls

batch_complete();

This completes all the bios all at once, building up a list of kiocbs;
then the list of kiocbs are completed all at once.

Also, in order to batch up the kiocbs we have to add a different
bio_endio function to struct bio, that takes a pointer to the
batch_complete - this patch converts the dio code's bio_endio function.
In order to avoid changing every bio_endio function in the kernel (there
are many), we currently use a union and a flag to indicate what kind of
bio endio function to call. This is admittedly a hack, but should
suffice for now.

For batching to work through say md or dm devices, the md/dm bio_endio
functions would have to be converted, much like the dio code. That is
left for future patches.

Signed-off-by: Kent Overstreet 
---
 block/blk-core.c  |  34 ---
 block/blk-flush.c |   2 +-
 block/blk.h   |   3 +-
 drivers/block/swim3.c |   2 +-
 drivers/md/dm.c   |   2 +-
 fs/aio.c  | 254 +++---
 fs/bio.c  |  52 ++
 fs/direct-io.c|  20 ++--
 include/linux/aio.h   |  22 +++-
 include/linux/bio.h   |  36 ++-
 include/linux/blk_types.h |  11 +-
 include/linux/blkdev.h|  12 ++-
 12 files changed, 311 insertions(+), 139 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3c95c4d..4fac6ddb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -151,7 +151,8 @@ void blk_rq_init(struct request_queue *q, struct request 
*rq)
 EXPORT_SYMBOL(blk_rq_init);
 
 static void req_bio_endio(struct request *rq, struct bio *bio,
- unsigned int nbytes, int error)
+ unsigned int nbytes, int error,
+ struct batch_complete *batch)
 {
if (error)
clear_bit(BIO_UPTODATE, >bi_flags);
@@ -175,7 +176,7 @@ static void req_bio_endio(struct request *rq, struct bio 
*bio,
 
/* don't actually finish bio if it's part of flush sequence */
if (bio->bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ))
-   bio_endio(bio, error);
+   bio_endio_batch(bio, error, batch);
 }
 
 void blk_dump_rq_flags(struct request *rq, char *msg)
@@ -2215,7 +2216,8 @@ EXPORT_SYMBOL(blk_fetch_request);
  * %false - this request doesn't have any more data
  * %true  - this request has more data
  **/
-bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
+bool blk_update_request(struct request *req, int error, unsigned int nr_bytes,
+   struct batch_complete *batch)
 {
int total_bytes, bio_nbytes, next_idx = 0;
struct bio *bio;
@@ -2271,7 +2273,7 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
if (nr_bytes >= bio->bi_size) {
req->bio = bio->bi_next;
nbytes = bio->bi_size;
-   req_bio_endio(req, bio, nbytes, error);
+   req_bio_endio(req, bio, nbytes, error, batch);
next_idx = 0;
bio_nbytes = 0;
} else {
@@ -2333,7 +2335,7 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
 * if the request wasn't completed, update state
 */
if (bio_nbytes) {
-   req_bio_endio(req, bio, bio_nbytes, error);
+   req_bio_endio(req, bio, bio_nbytes, error, batch);
bio->bi_idx += next_idx;
bio_iovec(bio)->bv_offset += nr_bytes;
bio_iovec(bio)->bv_len -= nr_bytes;
@@ -2370,14 +2372,15 @@ EXPORT_SYMBOL_GPL(blk_update_request);
 
 static bool blk_update_bidi_request(struct request *rq, int error,
unsigned int nr_bytes,
-   unsigned int bidi_bytes)
+   unsigned int bidi_bytes,
+   struct batch_complete *batch)
 {
-   if (blk_update_request(rq, error, nr_bytes))
+   if 

[PATCH 29/32] block, aio: Batch completion for bios/kiocbs

2012-12-26 Thread Kent Overstreet
When completing a kiocb, there's some fixed overhead from touching the
kioctx's ring buffer the kiocb belongs to. Some newer high end block
devices can complete multiple IOs per interrupt, much like many network
interfaces have been for some time.

This plumbs through infrastructure so we can take advantage of multiple
completions at the interrupt level, and complete multiple kiocbs at the
same time.

Drivers have to be converted to take advantage of this, but it's a
simple change and the next patches will convert a few drivers.

To use it, an interrupt handler (or any code that completes bios or
requests) declares and initializes a struct batch_complete:

struct batch_complete batch;
batch_complete_init(batch);

Then, instead of calling bio_endio(), it calls
bio_endio_batch(bio, err, batch). This just adds the bio to a list in
the batch_complete.

At the end, it calls

batch_complete(batch);

This completes all the bios all at once, building up a list of kiocbs;
then the list of kiocbs are completed all at once.

Also, in order to batch up the kiocbs we have to add a different
bio_endio function to struct bio, that takes a pointer to the
batch_complete - this patch converts the dio code's bio_endio function.
In order to avoid changing every bio_endio function in the kernel (there
are many), we currently use a union and a flag to indicate what kind of
bio endio function to call. This is admittedly a hack, but should
suffice for now.

For batching to work through say md or dm devices, the md/dm bio_endio
functions would have to be converted, much like the dio code. That is
left for future patches.

Signed-off-by: Kent Overstreet koverstr...@google.com
---
 block/blk-core.c  |  34 ---
 block/blk-flush.c |   2 +-
 block/blk.h   |   3 +-
 drivers/block/swim3.c |   2 +-
 drivers/md/dm.c   |   2 +-
 fs/aio.c  | 254 +++---
 fs/bio.c  |  52 ++
 fs/direct-io.c|  20 ++--
 include/linux/aio.h   |  22 +++-
 include/linux/bio.h   |  36 ++-
 include/linux/blk_types.h |  11 +-
 include/linux/blkdev.h|  12 ++-
 12 files changed, 311 insertions(+), 139 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3c95c4d..4fac6ddb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -151,7 +151,8 @@ void blk_rq_init(struct request_queue *q, struct request 
*rq)
 EXPORT_SYMBOL(blk_rq_init);
 
 static void req_bio_endio(struct request *rq, struct bio *bio,
- unsigned int nbytes, int error)
+ unsigned int nbytes, int error,
+ struct batch_complete *batch)
 {
if (error)
clear_bit(BIO_UPTODATE, bio-bi_flags);
@@ -175,7 +176,7 @@ static void req_bio_endio(struct request *rq, struct bio 
*bio,
 
/* don't actually finish bio if it's part of flush sequence */
if (bio-bi_size == 0  !(rq-cmd_flags  REQ_FLUSH_SEQ))
-   bio_endio(bio, error);
+   bio_endio_batch(bio, error, batch);
 }
 
 void blk_dump_rq_flags(struct request *rq, char *msg)
@@ -2215,7 +2216,8 @@ EXPORT_SYMBOL(blk_fetch_request);
  * %false - this request doesn't have any more data
  * %true  - this request has more data
  **/
-bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
+bool blk_update_request(struct request *req, int error, unsigned int nr_bytes,
+   struct batch_complete *batch)
 {
int total_bytes, bio_nbytes, next_idx = 0;
struct bio *bio;
@@ -2271,7 +2273,7 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
if (nr_bytes = bio-bi_size) {
req-bio = bio-bi_next;
nbytes = bio-bi_size;
-   req_bio_endio(req, bio, nbytes, error);
+   req_bio_endio(req, bio, nbytes, error, batch);
next_idx = 0;
bio_nbytes = 0;
} else {
@@ -2333,7 +2335,7 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
 * if the request wasn't completed, update state
 */
if (bio_nbytes) {
-   req_bio_endio(req, bio, bio_nbytes, error);
+   req_bio_endio(req, bio, bio_nbytes, error, batch);
bio-bi_idx += next_idx;
bio_iovec(bio)-bv_offset += nr_bytes;
bio_iovec(bio)-bv_len -= nr_bytes;
@@ -2370,14 +2372,15 @@ EXPORT_SYMBOL_GPL(blk_update_request);
 
 static bool blk_update_bidi_request(struct request *rq, int error,
unsigned int nr_bytes,
-   unsigned int bidi_bytes)
+   unsigned int bidi_bytes,
+   struct batch_complete *batch)
 {
-   if (blk_update_request(rq, error,