Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-03-03 Thread Rusty Russell
Paolo Bonzini  writes:

> Il 27/02/2013 12:21, Rusty Russell ha scritto:
 >> Baseline (before add_sgs):
 >> 2.84-3.04(2.927292)user
 >> 
 >> After add_sgs:
 >> 2.97-3.15(3.053750)user
 >> 
 >> After simplifying add_buf a little:
 >> 2.95-3.21(3.081458)user
 >> 
 >> After inlining virtqueue_add/vring_add_indirect:
 >> 2.92-3.15(3.026875)user
 >> 
 >> After passing in iteration functions (chained vs unchained):
 >> 2.76-2.97(2.883542)user
>> Oops.  This result (and the next) is bogus.  I was playing with -O3, and
>> accidentally left that in :(
>
> Did you check what actually happened that improved speed so much?

No, it was a random aside, I didn't dig into it.  Perhaps we should
revisit using -O3 on the entire kernel, or perhaps grab gcc 3.8 and see
how that performs.

But I'm implementing specialized virtqueue_add_outbuf() and
virtqueue_add_inbuf() which seem to get more improvement anyway (except
occasionally I get hangs in my tests, which I'm debugging now...)

Cheers,
Rusty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-03-03 Thread Rusty Russell
Paolo Bonzini pbonz...@redhat.com writes:

 Il 27/02/2013 12:21, Rusty Russell ha scritto:
  Baseline (before add_sgs):
  2.84-3.04(2.927292)user
  
  After add_sgs:
  2.97-3.15(3.053750)user
  
  After simplifying add_buf a little:
  2.95-3.21(3.081458)user
  
  After inlining virtqueue_add/vring_add_indirect:
  2.92-3.15(3.026875)user
  
  After passing in iteration functions (chained vs unchained):
  2.76-2.97(2.883542)user
 Oops.  This result (and the next) is bogus.  I was playing with -O3, and
 accidentally left that in :(

 Did you check what actually happened that improved speed so much?

No, it was a random aside, I didn't dig into it.  Perhaps we should
revisit using -O3 on the entire kernel, or perhaps grab gcc 3.8 and see
how that performs.

But I'm implementing specialized virtqueue_add_outbuf() and
virtqueue_add_inbuf() which seem to get more improvement anyway (except
occasionally I get hangs in my tests, which I'm debugging now...)

Cheers,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-28 Thread Paolo Bonzini
Il 27/02/2013 12:21, Rusty Russell ha scritto:
>>> >> Baseline (before add_sgs):
>>> >> 2.84-3.04(2.927292)user
>>> >> 
>>> >> After add_sgs:
>>> >> 2.97-3.15(3.053750)user
>>> >> 
>>> >> After simplifying add_buf a little:
>>> >> 2.95-3.21(3.081458)user
>>> >> 
>>> >> After inlining virtqueue_add/vring_add_indirect:
>>> >> 2.92-3.15(3.026875)user
>>> >> 
>>> >> After passing in iteration functions (chained vs unchained):
>>> >> 2.76-2.97(2.883542)user
> Oops.  This result (and the next) is bogus.  I was playing with -O3, and
> accidentally left that in :(

Did you check what actually happened that improved speed so much?  Can
we do it ourselves, or use a GCC attribute to turn it on?  Looking at
the GCC manual and source, there's just a bunch of optimizations enabled
by -O3:

{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },

`-ftree-loop-distribute-patterns'
 This pass distributes the initialization loops and generates a
 call to memset zero.  For example, the loop

Doesn't matter.

{ OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },

Also doesn't matter.

{ OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },

Can be done by us at the source level.

{ OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 },

Probably doesn't matter.

{ OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },

`-fipa-cp-clone'
 Perform function cloning to make interprocedural constant
 propagation stronger.  When enabled, interprocedural constant
 propagation will perform function cloning when externally visible
 function can be called with constant arguments.

Can be done by adding new external APIs or marking functions as
always_inline.

{ OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },

`-fgcse-after-reload'
 When `-fgcse-after-reload' is enabled, a redundant load elimination
 pass is performed after reload.  The purpose of this pass is to
 cleanup redundant spilling.

Never saw it have any substantial effect.

{ OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },

Can be done by us simply by adding more "inline" keywords.

Plus, -O3 will make *full* loop unrolling a bit more aggressive.  But
full loop unrolling requires compile-time-known loop bounds, so I doubt
this is the case.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-28 Thread Paolo Bonzini
Il 27/02/2013 12:21, Rusty Russell ha scritto:
  Baseline (before add_sgs):
  2.84-3.04(2.927292)user
  
  After add_sgs:
  2.97-3.15(3.053750)user
  
  After simplifying add_buf a little:
  2.95-3.21(3.081458)user
  
  After inlining virtqueue_add/vring_add_indirect:
  2.92-3.15(3.026875)user
  
  After passing in iteration functions (chained vs unchained):
  2.76-2.97(2.883542)user
 Oops.  This result (and the next) is bogus.  I was playing with -O3, and
 accidentally left that in :(

Did you check what actually happened that improved speed so much?  Can
we do it ourselves, or use a GCC attribute to turn it on?  Looking at
the GCC manual and source, there's just a bunch of optimizations enabled
by -O3:

{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },

`-ftree-loop-distribute-patterns'
 This pass distributes the initialization loops and generates a
 call to memset zero.  For example, the loop

Doesn't matter.

{ OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },

Also doesn't matter.

{ OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },

Can be done by us at the source level.

{ OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 },

Probably doesn't matter.

{ OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },

`-fipa-cp-clone'
 Perform function cloning to make interprocedural constant
 propagation stronger.  When enabled, interprocedural constant
 propagation will perform function cloning when externally visible
 function can be called with constant arguments.

Can be done by adding new external APIs or marking functions as
always_inline.

{ OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },

`-fgcse-after-reload'
 When `-fgcse-after-reload' is enabled, a redundant load elimination
 pass is performed after reload.  The purpose of this pass is to
 cleanup redundant spilling.

Never saw it have any substantial effect.

{ OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },

Can be done by us simply by adding more inline keywords.

Plus, -O3 will make *full* loop unrolling a bit more aggressive.  But
full loop unrolling requires compile-time-known loop bounds, so I doubt
this is the case.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-27 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Wed, Feb 27, 2013 at 05:58:37PM +1030, Rusty Russell wrote:
>> Paolo Bonzini  writes:
>> > Il 24/02/2013 23:12, Michael S. Tsirkin ha scritto:
>> >> On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
>> >>> virtio_scsi can really use this, to avoid the current hack of copying
>> >>> the whole sg array.  Some other things get slightly neater, too.
>> >>>
>> >>> Signed-off-by: Rusty Russell 
>> >> 
>> >> Hmm, this makes add_buf a bit slower. virtio_test results
>> >> (I'll send a patch to update the test shortly):
>> >> 
>> >> Before:
>> >> 0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 
>> >> 480maxresident)k
>> >> 0inputs+0outputs (0major+145minor)pagefaults 0swaps
>> >> 
>> >> After:
>> >> 0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 
>> >> 480maxresident)k
>> >> 0inputs+0outputs (0major+145minor)pagefaults 0swaps
>> >
>> > Not unexpected at all... :(
>> >
>> > Some of it can be recovered, but if it's 20% I doubt all of it.  So my
>> > patches were not premature optimization; you really can take just two
>> > among speed, flexibility, and having a nice API.
>> 
>> The error bars on this are far too large to say "20%".
>> 
>> Here are my numbers, using 50 runs of:
>> time tools/virtio/vringh_test --indirect --eventidx --parallel and
>> stats --trim-outliers:
>> 
>> Baseline (before add_sgs):
>> 2.84-3.04(2.927292)user
>> 
>> After add_sgs:
>> 2.97-3.15(3.053750)user
>> 
>> After simplifying add_buf a little:
>> 2.95-3.21(3.081458)user
>> 
>> After inlining virtqueue_add/vring_add_indirect:
>> 2.92-3.15(3.026875)user
>> 
>> After passing in iteration functions (chained vs unchained):
>> 2.76-2.97(2.883542)user

Oops.  This result (and the next) is bogus.  I was playing with -O3, and
accidentally left that in :(

The final result was 3.005208, ie. 3% slowdown.  Which almost makes it
worth duplicating the whole set of code :(

>> After removing the now-unnecessary chain-cleaning in add_buf:
>> 2.66-2.83(2.753542)user
>> 
>> Any questions?
>> Rusty.
>
> Sorry, so which patches are included in the last stage?
> Something I didn't make clear: I tested 2/16 (the patch I replied to).

I wanted to tidy them up, add commentry, and integrate your tool cleanup
patches first.  That's when I noticed my screwup.

I'll push them now, but I want to revisit to see if there's something
cleverer I can do...

Cheers,
Rusty.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-27 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Wed, Feb 27, 2013 at 05:58:37PM +1030, Rusty Russell wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
  Il 24/02/2013 23:12, Michael S. Tsirkin ha scritto:
  On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
  virtio_scsi can really use this, to avoid the current hack of copying
  the whole sg array.  Some other things get slightly neater, too.
 
  Signed-off-by: Rusty Russell ru...@rustcorp.com.au
  
  Hmm, this makes add_buf a bit slower. virtio_test results
  (I'll send a patch to update the test shortly):
  
  Before:
  0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 
  480maxresident)k
  0inputs+0outputs (0major+145minor)pagefaults 0swaps
  
  After:
  0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 
  480maxresident)k
  0inputs+0outputs (0major+145minor)pagefaults 0swaps
 
  Not unexpected at all... :(
 
  Some of it can be recovered, but if it's 20% I doubt all of it.  So my
  patches were not premature optimization; you really can take just two
  among speed, flexibility, and having a nice API.
 
 The error bars on this are far too large to say 20%.
 
 Here are my numbers, using 50 runs of:
 time tools/virtio/vringh_test --indirect --eventidx --parallel and
 stats --trim-outliers:
 
 Baseline (before add_sgs):
 2.84-3.04(2.927292)user
 
 After add_sgs:
 2.97-3.15(3.053750)user
 
 After simplifying add_buf a little:
 2.95-3.21(3.081458)user
 
 After inlining virtqueue_add/vring_add_indirect:
 2.92-3.15(3.026875)user
 
 After passing in iteration functions (chained vs unchained):
 2.76-2.97(2.883542)user

Oops.  This result (and the next) is bogus.  I was playing with -O3, and
accidentally left that in :(

The final result was 3.005208, ie. 3% slowdown.  Which almost makes it
worth duplicating the whole set of code :(

 After removing the now-unnecessary chain-cleaning in add_buf:
 2.66-2.83(2.753542)user
 
 Any questions?
 Rusty.

 Sorry, so which patches are included in the last stage?
 Something I didn't make clear: I tested 2/16 (the patch I replied to).

I wanted to tidy them up, add commentry, and integrate your tool cleanup
patches first.  That's when I noticed my screwup.

I'll push them now, but I want to revisit to see if there's something
cleverer I can do...

Cheers,
Rusty.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-26 Thread Michael S. Tsirkin
On Wed, Feb 27, 2013 at 05:58:37PM +1030, Rusty Russell wrote:
> Paolo Bonzini  writes:
> > Il 24/02/2013 23:12, Michael S. Tsirkin ha scritto:
> >> On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
> >>> virtio_scsi can really use this, to avoid the current hack of copying
> >>> the whole sg array.  Some other things get slightly neater, too.
> >>>
> >>> Signed-off-by: Rusty Russell 
> >> 
> >> Hmm, this makes add_buf a bit slower. virtio_test results
> >> (I'll send a patch to update the test shortly):
> >> 
> >> Before:
> >> 0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 
> >> 480maxresident)k
> >> 0inputs+0outputs (0major+145minor)pagefaults 0swaps
> >> 
> >> After:
> >> 0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 
> >> 480maxresident)k
> >> 0inputs+0outputs (0major+145minor)pagefaults 0swaps
> >
> > Not unexpected at all... :(
> >
> > Some of it can be recovered, but if it's 20% I doubt all of it.  So my
> > patches were not premature optimization; you really can take just two
> > among speed, flexibility, and having a nice API.
> 
> The error bars on this are far too large to say "20%".
> 
> Here are my numbers, using 50 runs of:
> time tools/virtio/vringh_test --indirect --eventidx --parallel and
> stats --trim-outliers:
> 
> Baseline (before add_sgs):
> 2.84-3.04(2.927292)user
> 
> After add_sgs:
> 2.97-3.15(3.053750)user
> 
> After simplifying add_buf a little:
> 2.95-3.21(3.081458)user
> 
> After inlining virtqueue_add/vring_add_indirect:
> 2.92-3.15(3.026875)user
> 
> After passing in iteration functions (chained vs unchained):
> 2.76-2.97(2.883542)user
> 
> After removing the now-unnecessary chain-cleaning in add_buf:
> 2.66-2.83(2.753542)user
> 
> Any questions?
> Rusty.

Sorry, so which patches are included in the last stage?
Something I didn't make clear: I tested 2/16 (the patch I replied to).

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-26 Thread Rusty Russell
Paolo Bonzini  writes:
> Il 24/02/2013 23:12, Michael S. Tsirkin ha scritto:
>> On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
>>> virtio_scsi can really use this, to avoid the current hack of copying
>>> the whole sg array.  Some other things get slightly neater, too.
>>>
>>> Signed-off-by: Rusty Russell 
>> 
>> Hmm, this makes add_buf a bit slower. virtio_test results
>> (I'll send a patch to update the test shortly):
>> 
>> Before:
>> 0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 480maxresident)k
>> 0inputs+0outputs (0major+145minor)pagefaults 0swaps
>> 
>> After:
>> 0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 480maxresident)k
>> 0inputs+0outputs (0major+145minor)pagefaults 0swaps
>
> Not unexpected at all... :(
>
> Some of it can be recovered, but if it's 20% I doubt all of it.  So my
> patches were not premature optimization; you really can take just two
> among speed, flexibility, and having a nice API.

The error bars on this are far too large to say "20%".

Here are my numbers, using 50 runs of:
time tools/virtio/vringh_test --indirect --eventidx --parallel and
stats --trim-outliers:

Baseline (before add_sgs):
2.84-3.04(2.927292)user

After add_sgs:
2.97-3.15(3.053750)user

After simplifying add_buf a little:
2.95-3.21(3.081458)user

After inlining virtqueue_add/vring_add_indirect:
2.92-3.15(3.026875)user

After passing in iteration functions (chained vs unchained):
2.76-2.97(2.883542)user

After removing the now-unnecessary chain-cleaning in add_buf:
2.66-2.83(2.753542)user

Any questions?
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-26 Thread Michael S. Tsirkin
On Tue, Feb 26, 2013 at 03:44:26PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> > On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
> >> virtio_scsi can really use this, to avoid the current hack of copying
> >> the whole sg array.  Some other things get slightly neater, too.
> >> 
> >> Signed-off-by: Rusty Russell 
> >
> > Hmm, this makes add_buf a bit slower. virtio_test results
> > (I'll send a patch to update the test shortly):
> >
> > Before:
> > 0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 
> > 480maxresident)k
> > 0inputs+0outputs (0major+145minor)pagefaults 0swaps
> >
> > After:
> > 0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 
> > 480maxresident)k
> > 0inputs+0outputs (0major+145minor)pagefaults 0swaps
> 
> Interesting: how much of this is due to the shim in virtqueue_add_buf()
> to clean up the sg arrays?
> 
> (Perhaps we should make virtio_test run for longer, too).
> 
> BTW, you might be interested in:
> https://github.com/rustyrussell/stats.git
> 
> Which provides a useful filter for multiple results.
> 
> Cheers,
> Rusty.

Nifty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-26 Thread Michael S. Tsirkin
On Tue, Feb 26, 2013 at 03:44:26PM +1030, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
  virtio_scsi can really use this, to avoid the current hack of copying
  the whole sg array.  Some other things get slightly neater, too.
  
  Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 
  Hmm, this makes add_buf a bit slower. virtio_test results
  (I'll send a patch to update the test shortly):
 
  Before:
  0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 
  480maxresident)k
  0inputs+0outputs (0major+145minor)pagefaults 0swaps
 
  After:
  0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 
  480maxresident)k
  0inputs+0outputs (0major+145minor)pagefaults 0swaps
 
 Interesting: how much of this is due to the shim in virtqueue_add_buf()
 to clean up the sg arrays?
 
 (Perhaps we should make virtio_test run for longer, too).
 
 BTW, you might be interested in:
 https://github.com/rustyrussell/stats.git
 
 Which provides a useful filter for multiple results.
 
 Cheers,
 Rusty.

Nifty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-26 Thread Rusty Russell
Paolo Bonzini pbonz...@redhat.com writes:
 Il 24/02/2013 23:12, Michael S. Tsirkin ha scritto:
 On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
 virtio_scsi can really use this, to avoid the current hack of copying
 the whole sg array.  Some other things get slightly neater, too.

 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 
 Hmm, this makes add_buf a bit slower. virtio_test results
 (I'll send a patch to update the test shortly):
 
 Before:
 0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 480maxresident)k
 0inputs+0outputs (0major+145minor)pagefaults 0swaps
 
 After:
 0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 480maxresident)k
 0inputs+0outputs (0major+145minor)pagefaults 0swaps

 Not unexpected at all... :(

 Some of it can be recovered, but if it's 20% I doubt all of it.  So my
 patches were not premature optimization; you really can take just two
 among speed, flexibility, and having a nice API.

The error bars on this are far too large to say 20%.

Here are my numbers, using 50 runs of:
time tools/virtio/vringh_test --indirect --eventidx --parallel and
stats --trim-outliers:

Baseline (before add_sgs):
2.84-3.04(2.927292)user

After add_sgs:
2.97-3.15(3.053750)user

After simplifying add_buf a little:
2.95-3.21(3.081458)user

After inlining virtqueue_add/vring_add_indirect:
2.92-3.15(3.026875)user

After passing in iteration functions (chained vs unchained):
2.76-2.97(2.883542)user

After removing the now-unnecessary chain-cleaning in add_buf:
2.66-2.83(2.753542)user

Any questions?
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-26 Thread Michael S. Tsirkin
On Wed, Feb 27, 2013 at 05:58:37PM +1030, Rusty Russell wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
  Il 24/02/2013 23:12, Michael S. Tsirkin ha scritto:
  On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
  virtio_scsi can really use this, to avoid the current hack of copying
  the whole sg array.  Some other things get slightly neater, too.
 
  Signed-off-by: Rusty Russell ru...@rustcorp.com.au
  
  Hmm, this makes add_buf a bit slower. virtio_test results
  (I'll send a patch to update the test shortly):
  
  Before:
  0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 
  480maxresident)k
  0inputs+0outputs (0major+145minor)pagefaults 0swaps
  
  After:
  0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 
  480maxresident)k
  0inputs+0outputs (0major+145minor)pagefaults 0swaps
 
  Not unexpected at all... :(
 
  Some of it can be recovered, but if it's 20% I doubt all of it.  So my
  patches were not premature optimization; you really can take just two
  among speed, flexibility, and having a nice API.
 
 The error bars on this are far too large to say 20%.
 
 Here are my numbers, using 50 runs of:
 time tools/virtio/vringh_test --indirect --eventidx --parallel and
 stats --trim-outliers:
 
 Baseline (before add_sgs):
 2.84-3.04(2.927292)user
 
 After add_sgs:
 2.97-3.15(3.053750)user
 
 After simplifying add_buf a little:
 2.95-3.21(3.081458)user
 
 After inlining virtqueue_add/vring_add_indirect:
 2.92-3.15(3.026875)user
 
 After passing in iteration functions (chained vs unchained):
 2.76-2.97(2.883542)user
 
 After removing the now-unnecessary chain-cleaning in add_buf:
 2.66-2.83(2.753542)user
 
 Any questions?
 Rusty.

Sorry, so which patches are included in the last stage?
Something I didn't make clear: I tested 2/16 (the patch I replied to).

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-25 Thread Paolo Bonzini
Il 24/02/2013 23:12, Michael S. Tsirkin ha scritto:
> On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
>> virtio_scsi can really use this, to avoid the current hack of copying
>> the whole sg array.  Some other things get slightly neater, too.
>>
>> Signed-off-by: Rusty Russell 
> 
> Hmm, this makes add_buf a bit slower. virtio_test results
> (I'll send a patch to update the test shortly):
> 
> Before:
> 0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 480maxresident)k
> 0inputs+0outputs (0major+145minor)pagefaults 0swaps
> 
> After:
> 0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 480maxresident)k
> 0inputs+0outputs (0major+145minor)pagefaults 0swaps

Not unexpected at all... :(

Some of it can be recovered, but if it's 20% I doubt all of it.  So my
patches were not premature optimization; you really can take just two
among speed, flexibility, and having a nice API.

Paolo

> 
> 
> 
>> ---
>>  drivers/virtio/virtio_ring.c |  144 
>> ++
>>  include/linux/virtio.h   |7 ++
>>  2 files changed, 109 insertions(+), 42 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index 245177c..27e31d3 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -100,14 +100,16 @@ struct vring_virtqueue
>>  
>>  /* Set up an indirect table of descriptors and add it to the queue. */
>>  static int vring_add_indirect(struct vring_virtqueue *vq,
>> -  struct scatterlist sg[],
>> -  unsigned int out,
>> -  unsigned int in,
>> +  struct scatterlist *sgs[],
>> +  unsigned int total_sg,
>> +  unsigned int out_sgs,
>> +  unsigned int in_sgs,
>>gfp_t gfp)
>>  {
>>  struct vring_desc *desc;
>>  unsigned head;
>> -int i;
>> +struct scatterlist *sg;
>> +int i, n;
>>  
>>  /*
>>   * We require lowmem mappings for the descriptors because
>> @@ -116,25 +118,31 @@ static int vring_add_indirect(struct vring_virtqueue 
>> *vq,
>>   */
>>  gfp &= ~(__GFP_HIGHMEM | __GFP_HIGH);
>>  
>> -desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
>> +desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
>>  if (!desc)
>>  return -ENOMEM;
>>  
>> -/* Transfer entries from the sg list into the indirect page */
>> -for (i = 0; i < out; i++) {
>> -desc[i].flags = VRING_DESC_F_NEXT;
>> -desc[i].addr = sg_phys(sg);
>> -desc[i].len = sg->length;
>> -desc[i].next = i+1;
>> -sg++;
>> +/* Transfer entries from the sg lists into the indirect page */
>> +i = 0;
>> +for (n = 0; n < out_sgs; n++) {
>> +for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>> +desc[i].flags = VRING_DESC_F_NEXT;
>> +desc[i].addr = sg_phys(sg);
>> +desc[i].len = sg->length;
>> +desc[i].next = i+1;
>> +i++;
>> +}
>>  }
>> -for (; i < (out + in); i++) {
>> -desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
>> -desc[i].addr = sg_phys(sg);
>> -desc[i].len = sg->length;
>> -desc[i].next = i+1;
>> -sg++;
>> +for (; n < (out_sgs + in_sgs); n++) {
>> +for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>> +desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
>> +desc[i].addr = sg_phys(sg);
>> +desc[i].len = sg->length;
>> +desc[i].next = i+1;
>> +i++;
>> +}
>>  }
>> +BUG_ON(i != total_sg);
>>  
>>  /* Last one doesn't continue. */
>>  desc[i-1].flags &= ~VRING_DESC_F_NEXT;
>> @@ -176,8 +184,48 @@ int virtqueue_add_buf(struct virtqueue *_vq,
>>void *data,
>>gfp_t gfp)
>>  {
>> +struct scatterlist *sgs[2];
>> +unsigned int i;
>> +
>> +sgs[0] = sg;
>> +sgs[1] = sg + out;
>> +
>> +/* Workaround until callers pass well-formed sgs. */
>> +for (i = 0; i < out + in; i++)
>> +sg_unmark_end(sg + i);
>> +
>> +sg_mark_end(sg + out + in - 1);
>> +if (out && in)
>> +sg_mark_end(sg + out - 1);
>> +
>> +return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);
>> +}
>> +EXPORT_SYMBOL_GPL(virtqueue_add_buf);
>> +
>> +/**
>> + * virtqueue_add_sgs - expose buffers to other end
>> + * @vq: the struct virtqueue we're talking about.
>> + * @sgs: array of terminated scatterlists.
>> + * @out_num: the number of scatterlists readable by other side
>> + * @in_num: the number of scatterlists which are writable (after readable 
>> ones)
>> + * @data: the token identifying the buffer.
>> + * 

Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-25 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
>> virtio_scsi can really use this, to avoid the current hack of copying
>> the whole sg array.  Some other things get slightly neater, too.
>> 
>> Signed-off-by: Rusty Russell 
>
> Hmm, this makes add_buf a bit slower. virtio_test results
> (I'll send a patch to update the test shortly):
>
> Before:
> 0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 480maxresident)k
> 0inputs+0outputs (0major+145minor)pagefaults 0swaps
>
> After:
> 0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 480maxresident)k
> 0inputs+0outputs (0major+145minor)pagefaults 0swaps

Interesting: how much of this is due to the shim in virtqueue_add_buf()
to clean up the sg arrays?

(Perhaps we should make virtio_test run for longer, too).

BTW, you might be interested in:
https://github.com/rustyrussell/stats.git

Which provides a useful filter for multiple results.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-25 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
 virtio_scsi can really use this, to avoid the current hack of copying
 the whole sg array.  Some other things get slightly neater, too.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

 Hmm, this makes add_buf a bit slower. virtio_test results
 (I'll send a patch to update the test shortly):

 Before:
 0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 480maxresident)k
 0inputs+0outputs (0major+145minor)pagefaults 0swaps

 After:
 0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 480maxresident)k
 0inputs+0outputs (0major+145minor)pagefaults 0swaps

Interesting: how much of this is due to the shim in virtqueue_add_buf()
to clean up the sg arrays?

(Perhaps we should make virtio_test run for longer, too).

BTW, you might be interested in:
https://github.com/rustyrussell/stats.git

Which provides a useful filter for multiple results.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-25 Thread Paolo Bonzini
Il 24/02/2013 23:12, Michael S. Tsirkin ha scritto:
 On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
 virtio_scsi can really use this, to avoid the current hack of copying
 the whole sg array.  Some other things get slightly neater, too.

 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 
 Hmm, this makes add_buf a bit slower. virtio_test results
 (I'll send a patch to update the test shortly):
 
 Before:
 0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 480maxresident)k
 0inputs+0outputs (0major+145minor)pagefaults 0swaps
 
 After:
 0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 480maxresident)k
 0inputs+0outputs (0major+145minor)pagefaults 0swaps

Not unexpected at all... :(

Some of it can be recovered, but if it's 20% I doubt all of it.  So my
patches were not premature optimization; you really can take just two
among speed, flexibility, and having a nice API.

Paolo

 
 
 
 ---
  drivers/virtio/virtio_ring.c |  144 
 ++
  include/linux/virtio.h   |7 ++
  2 files changed, 109 insertions(+), 42 deletions(-)

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 245177c..27e31d3 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -100,14 +100,16 @@ struct vring_virtqueue
  
  /* Set up an indirect table of descriptors and add it to the queue. */
  static int vring_add_indirect(struct vring_virtqueue *vq,
 -  struct scatterlist sg[],
 -  unsigned int out,
 -  unsigned int in,
 +  struct scatterlist *sgs[],
 +  unsigned int total_sg,
 +  unsigned int out_sgs,
 +  unsigned int in_sgs,
gfp_t gfp)
  {
  struct vring_desc *desc;
  unsigned head;
 -int i;
 +struct scatterlist *sg;
 +int i, n;
  
  /*
   * We require lowmem mappings for the descriptors because
 @@ -116,25 +118,31 @@ static int vring_add_indirect(struct vring_virtqueue 
 *vq,
   */
  gfp = ~(__GFP_HIGHMEM | __GFP_HIGH);
  
 -desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
 +desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
  if (!desc)
  return -ENOMEM;
  
 -/* Transfer entries from the sg list into the indirect page */
 -for (i = 0; i  out; i++) {
 -desc[i].flags = VRING_DESC_F_NEXT;
 -desc[i].addr = sg_phys(sg);
 -desc[i].len = sg-length;
 -desc[i].next = i+1;
 -sg++;
 +/* Transfer entries from the sg lists into the indirect page */
 +i = 0;
 +for (n = 0; n  out_sgs; n++) {
 +for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 +desc[i].flags = VRING_DESC_F_NEXT;
 +desc[i].addr = sg_phys(sg);
 +desc[i].len = sg-length;
 +desc[i].next = i+1;
 +i++;
 +}
  }
 -for (; i  (out + in); i++) {
 -desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 -desc[i].addr = sg_phys(sg);
 -desc[i].len = sg-length;
 -desc[i].next = i+1;
 -sg++;
 +for (; n  (out_sgs + in_sgs); n++) {
 +for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 +desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 +desc[i].addr = sg_phys(sg);
 +desc[i].len = sg-length;
 +desc[i].next = i+1;
 +i++;
 +}
  }
 +BUG_ON(i != total_sg);
  
  /* Last one doesn't continue. */
  desc[i-1].flags = ~VRING_DESC_F_NEXT;
 @@ -176,8 +184,48 @@ int virtqueue_add_buf(struct virtqueue *_vq,
void *data,
gfp_t gfp)
  {
 +struct scatterlist *sgs[2];
 +unsigned int i;
 +
 +sgs[0] = sg;
 +sgs[1] = sg + out;
 +
 +/* Workaround until callers pass well-formed sgs. */
 +for (i = 0; i  out + in; i++)
 +sg_unmark_end(sg + i);
 +
 +sg_mark_end(sg + out + in - 1);
 +if (out  in)
 +sg_mark_end(sg + out - 1);
 +
 +return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);
 +}
 +EXPORT_SYMBOL_GPL(virtqueue_add_buf);
 +
 +/**
 + * virtqueue_add_sgs - expose buffers to other end
 + * @vq: the struct virtqueue we're talking about.
 + * @sgs: array of terminated scatterlists.
 + * @out_num: the number of scatterlists readable by other side
 + * @in_num: the number of scatterlists which are writable (after readable 
 ones)
 + * @data: the token identifying the buffer.
 + * @gfp: how to do memory allocations (if necessary).
 + *
 + * Caller must ensure we don't call this with other virtqueue operations
 + * at the same time (except where noted).
 + *
 + * Returns zero or a negative error (ie. ENOSPC, 

Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-24 Thread Michael S. Tsirkin
On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
> virtio_scsi can really use this, to avoid the current hack of copying
> the whole sg array.  Some other things get slightly neater, too.
> 
> Signed-off-by: Rusty Russell 

Hmm, this makes add_buf a bit slower. virtio_test results
(I'll send a patch to update the test shortly):

Before:
0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 480maxresident)k
0inputs+0outputs (0major+145minor)pagefaults 0swaps

After:
0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 480maxresident)k
0inputs+0outputs (0major+145minor)pagefaults 0swaps



> ---
>  drivers/virtio/virtio_ring.c |  144 
> ++
>  include/linux/virtio.h   |7 ++
>  2 files changed, 109 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 245177c..27e31d3 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -100,14 +100,16 @@ struct vring_virtqueue
>  
>  /* Set up an indirect table of descriptors and add it to the queue. */
>  static int vring_add_indirect(struct vring_virtqueue *vq,
> -   struct scatterlist sg[],
> -   unsigned int out,
> -   unsigned int in,
> +   struct scatterlist *sgs[],
> +   unsigned int total_sg,
> +   unsigned int out_sgs,
> +   unsigned int in_sgs,
> gfp_t gfp)
>  {
>   struct vring_desc *desc;
>   unsigned head;
> - int i;
> + struct scatterlist *sg;
> + int i, n;
>  
>   /*
>* We require lowmem mappings for the descriptors because
> @@ -116,25 +118,31 @@ static int vring_add_indirect(struct vring_virtqueue 
> *vq,
>*/
>   gfp &= ~(__GFP_HIGHMEM | __GFP_HIGH);
>  
> - desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> + desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
>   if (!desc)
>   return -ENOMEM;
>  
> - /* Transfer entries from the sg list into the indirect page */
> - for (i = 0; i < out; i++) {
> - desc[i].flags = VRING_DESC_F_NEXT;
> - desc[i].addr = sg_phys(sg);
> - desc[i].len = sg->length;
> - desc[i].next = i+1;
> - sg++;
> + /* Transfer entries from the sg lists into the indirect page */
> + i = 0;
> + for (n = 0; n < out_sgs; n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + desc[i].flags = VRING_DESC_F_NEXT;
> + desc[i].addr = sg_phys(sg);
> + desc[i].len = sg->length;
> + desc[i].next = i+1;
> + i++;
> + }
>   }
> - for (; i < (out + in); i++) {
> - desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> - desc[i].addr = sg_phys(sg);
> - desc[i].len = sg->length;
> - desc[i].next = i+1;
> - sg++;
> + for (; n < (out_sgs + in_sgs); n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> + desc[i].addr = sg_phys(sg);
> + desc[i].len = sg->length;
> + desc[i].next = i+1;
> + i++;
> + }
>   }
> + BUG_ON(i != total_sg);
>  
>   /* Last one doesn't continue. */
>   desc[i-1].flags &= ~VRING_DESC_F_NEXT;
> @@ -176,8 +184,48 @@ int virtqueue_add_buf(struct virtqueue *_vq,
> void *data,
> gfp_t gfp)
>  {
> + struct scatterlist *sgs[2];
> + unsigned int i;
> +
> + sgs[0] = sg;
> + sgs[1] = sg + out;
> +
> + /* Workaround until callers pass well-formed sgs. */
> + for (i = 0; i < out + in; i++)
> + sg_unmark_end(sg + i);
> +
> + sg_mark_end(sg + out + in - 1);
> + if (out && in)
> + sg_mark_end(sg + out - 1);
> +
> + return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_add_buf);
> +
> +/**
> + * virtqueue_add_sgs - expose buffers to other end
> + * @vq: the struct virtqueue we're talking about.
> + * @sgs: array of terminated scatterlists.
> + * @out_num: the number of scatterlists readable by other side
> + * @in_num: the number of scatterlists which are writable (after readable 
> ones)
> + * @data: the token identifying the buffer.
> + * @gfp: how to do memory allocations (if necessary).
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
> + */
> +int virtqueue_add_sgs(struct virtqueue *_vq,
> +   struct scatterlist *sgs[],
> +   

Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-24 Thread Michael S. Tsirkin
On Tue, Feb 19, 2013 at 06:26:20PM +1030, Rusty Russell wrote:
 virtio_scsi can really use this, to avoid the current hack of copying
 the whole sg array.  Some other things get slightly neater, too.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

Hmm, this makes add_buf a bit slower. virtio_test results
(I'll send a patch to update the test shortly):

Before:
0.09user 0.01system 0:00.12elapsed 91%CPU (0avgtext+0avgdata 480maxresident)k
0inputs+0outputs (0major+145minor)pagefaults 0swaps

After:
0.11user 0.01system 0:00.13elapsed 90%CPU (0avgtext+0avgdata 480maxresident)k
0inputs+0outputs (0major+145minor)pagefaults 0swaps



 ---
  drivers/virtio/virtio_ring.c |  144 
 ++
  include/linux/virtio.h   |7 ++
  2 files changed, 109 insertions(+), 42 deletions(-)
 
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 245177c..27e31d3 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -100,14 +100,16 @@ struct vring_virtqueue
  
  /* Set up an indirect table of descriptors and add it to the queue. */
  static int vring_add_indirect(struct vring_virtqueue *vq,
 -   struct scatterlist sg[],
 -   unsigned int out,
 -   unsigned int in,
 +   struct scatterlist *sgs[],
 +   unsigned int total_sg,
 +   unsigned int out_sgs,
 +   unsigned int in_sgs,
 gfp_t gfp)
  {
   struct vring_desc *desc;
   unsigned head;
 - int i;
 + struct scatterlist *sg;
 + int i, n;
  
   /*
* We require lowmem mappings for the descriptors because
 @@ -116,25 +118,31 @@ static int vring_add_indirect(struct vring_virtqueue 
 *vq,
*/
   gfp = ~(__GFP_HIGHMEM | __GFP_HIGH);
  
 - desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
 + desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
   if (!desc)
   return -ENOMEM;
  
 - /* Transfer entries from the sg list into the indirect page */
 - for (i = 0; i  out; i++) {
 - desc[i].flags = VRING_DESC_F_NEXT;
 - desc[i].addr = sg_phys(sg);
 - desc[i].len = sg-length;
 - desc[i].next = i+1;
 - sg++;
 + /* Transfer entries from the sg lists into the indirect page */
 + i = 0;
 + for (n = 0; n  out_sgs; n++) {
 + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 + desc[i].flags = VRING_DESC_F_NEXT;
 + desc[i].addr = sg_phys(sg);
 + desc[i].len = sg-length;
 + desc[i].next = i+1;
 + i++;
 + }
   }
 - for (; i  (out + in); i++) {
 - desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 - desc[i].addr = sg_phys(sg);
 - desc[i].len = sg-length;
 - desc[i].next = i+1;
 - sg++;
 + for (; n  (out_sgs + in_sgs); n++) {
 + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 + desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 + desc[i].addr = sg_phys(sg);
 + desc[i].len = sg-length;
 + desc[i].next = i+1;
 + i++;
 + }
   }
 + BUG_ON(i != total_sg);
  
   /* Last one doesn't continue. */
   desc[i-1].flags = ~VRING_DESC_F_NEXT;
 @@ -176,8 +184,48 @@ int virtqueue_add_buf(struct virtqueue *_vq,
 void *data,
 gfp_t gfp)
  {
 + struct scatterlist *sgs[2];
 + unsigned int i;
 +
 + sgs[0] = sg;
 + sgs[1] = sg + out;
 +
 + /* Workaround until callers pass well-formed sgs. */
 + for (i = 0; i  out + in; i++)
 + sg_unmark_end(sg + i);
 +
 + sg_mark_end(sg + out + in - 1);
 + if (out  in)
 + sg_mark_end(sg + out - 1);
 +
 + return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);
 +}
 +EXPORT_SYMBOL_GPL(virtqueue_add_buf);
 +
 +/**
 + * virtqueue_add_sgs - expose buffers to other end
 + * @vq: the struct virtqueue we're talking about.
 + * @sgs: array of terminated scatterlists.
 + * @out_num: the number of scatterlists readable by other side
 + * @in_num: the number of scatterlists which are writable (after readable 
 ones)
 + * @data: the token identifying the buffer.
 + * @gfp: how to do memory allocations (if necessary).
 + *
 + * Caller must ensure we don't call this with other virtqueue operations
 + * at the same time (except where noted).
 + *
 + * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
 + */
 +int virtqueue_add_sgs(struct virtqueue *_vq,
 +   struct scatterlist *sgs[],
 +   unsigned int out_sgs,
 +   unsigned int in_sgs,
 +   void *data,
 

Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-20 Thread Asias He
On 02/19/2013 03:56 PM, Rusty Russell wrote:
> virtio_scsi can really use this, to avoid the current hack of copying
> the whole sg array.  Some other things get slightly neater, too.
> 
> Signed-off-by: Rusty Russell 

This simpler API makes more sense to me.

Reviewed-by: Asias He 


> ---
>  drivers/virtio/virtio_ring.c |  144 
> ++
>  include/linux/virtio.h   |7 ++
>  2 files changed, 109 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 245177c..27e31d3 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -100,14 +100,16 @@ struct vring_virtqueue
>  
>  /* Set up an indirect table of descriptors and add it to the queue. */
>  static int vring_add_indirect(struct vring_virtqueue *vq,
> -   struct scatterlist sg[],
> -   unsigned int out,
> -   unsigned int in,
> +   struct scatterlist *sgs[],
> +   unsigned int total_sg,
> +   unsigned int out_sgs,
> +   unsigned int in_sgs,
> gfp_t gfp)
>  {
>   struct vring_desc *desc;
>   unsigned head;
> - int i;
> + struct scatterlist *sg;
> + int i, n;
>  
>   /*
>* We require lowmem mappings for the descriptors because
> @@ -116,25 +118,31 @@ static int vring_add_indirect(struct vring_virtqueue 
> *vq,
>*/
>   gfp &= ~(__GFP_HIGHMEM | __GFP_HIGH);
>  
> - desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> + desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
>   if (!desc)
>   return -ENOMEM;
>  
> - /* Transfer entries from the sg list into the indirect page */
> - for (i = 0; i < out; i++) {
> - desc[i].flags = VRING_DESC_F_NEXT;
> - desc[i].addr = sg_phys(sg);
> - desc[i].len = sg->length;
> - desc[i].next = i+1;
> - sg++;
> + /* Transfer entries from the sg lists into the indirect page */
> + i = 0;
> + for (n = 0; n < out_sgs; n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + desc[i].flags = VRING_DESC_F_NEXT;
> + desc[i].addr = sg_phys(sg);
> + desc[i].len = sg->length;
> + desc[i].next = i+1;
> + i++;
> + }
>   }
> - for (; i < (out + in); i++) {
> - desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> - desc[i].addr = sg_phys(sg);
> - desc[i].len = sg->length;
> - desc[i].next = i+1;
> - sg++;
> + for (; n < (out_sgs + in_sgs); n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> + desc[i].addr = sg_phys(sg);
> + desc[i].len = sg->length;
> + desc[i].next = i+1;
> + i++;
> + }
>   }
> + BUG_ON(i != total_sg);
>  
>   /* Last one doesn't continue. */
>   desc[i-1].flags &= ~VRING_DESC_F_NEXT;
> @@ -176,8 +184,48 @@ int virtqueue_add_buf(struct virtqueue *_vq,
> void *data,
> gfp_t gfp)
>  {
> + struct scatterlist *sgs[2];
> + unsigned int i;
> +
> + sgs[0] = sg;
> + sgs[1] = sg + out;
> +
> + /* Workaround until callers pass well-formed sgs. */
> + for (i = 0; i < out + in; i++)
> + sg_unmark_end(sg + i);
> +
> + sg_mark_end(sg + out + in - 1);
> + if (out && in)
> + sg_mark_end(sg + out - 1);
> +
> + return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_add_buf);
> +
> +/**
> + * virtqueue_add_sgs - expose buffers to other end
> + * @vq: the struct virtqueue we're talking about.
> + * @sgs: array of terminated scatterlists.
> + * @out_num: the number of scatterlists readable by other side
> + * @in_num: the number of scatterlists which are writable (after readable 
> ones)
> + * @data: the token identifying the buffer.
> + * @gfp: how to do memory allocations (if necessary).
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
> + */
> +int virtqueue_add_sgs(struct virtqueue *_vq,
> +   struct scatterlist *sgs[],
> +   unsigned int out_sgs,
> +   unsigned int in_sgs,
> +   void *data,
> +   gfp_t gfp)
> +{
>   struct vring_virtqueue *vq = to_vvq(_vq);
> - unsigned int i, avail, uninitialized_var(prev);
> + struct scatterlist *sg;
> + unsigned int i, n, avail, 

Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-20 Thread Asias He
On 02/19/2013 03:56 PM, Rusty Russell wrote:
 virtio_scsi can really use this, to avoid the current hack of copying
 the whole sg array.  Some other things get slightly neater, too.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

This simpler API makes more sense to me.

Reviewed-by: Asias He as...@redhat.com


 ---
  drivers/virtio/virtio_ring.c |  144 
 ++
  include/linux/virtio.h   |7 ++
  2 files changed, 109 insertions(+), 42 deletions(-)
 
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 245177c..27e31d3 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -100,14 +100,16 @@ struct vring_virtqueue
  
  /* Set up an indirect table of descriptors and add it to the queue. */
  static int vring_add_indirect(struct vring_virtqueue *vq,
 -   struct scatterlist sg[],
 -   unsigned int out,
 -   unsigned int in,
 +   struct scatterlist *sgs[],
 +   unsigned int total_sg,
 +   unsigned int out_sgs,
 +   unsigned int in_sgs,
 gfp_t gfp)
  {
   struct vring_desc *desc;
   unsigned head;
 - int i;
 + struct scatterlist *sg;
 + int i, n;
  
   /*
* We require lowmem mappings for the descriptors because
 @@ -116,25 +118,31 @@ static int vring_add_indirect(struct vring_virtqueue 
 *vq,
*/
   gfp = ~(__GFP_HIGHMEM | __GFP_HIGH);
  
 - desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
 + desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
   if (!desc)
   return -ENOMEM;
  
 - /* Transfer entries from the sg list into the indirect page */
 - for (i = 0; i  out; i++) {
 - desc[i].flags = VRING_DESC_F_NEXT;
 - desc[i].addr = sg_phys(sg);
 - desc[i].len = sg-length;
 - desc[i].next = i+1;
 - sg++;
 + /* Transfer entries from the sg lists into the indirect page */
 + i = 0;
 + for (n = 0; n  out_sgs; n++) {
 + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 + desc[i].flags = VRING_DESC_F_NEXT;
 + desc[i].addr = sg_phys(sg);
 + desc[i].len = sg-length;
 + desc[i].next = i+1;
 + i++;
 + }
   }
 - for (; i  (out + in); i++) {
 - desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 - desc[i].addr = sg_phys(sg);
 - desc[i].len = sg-length;
 - desc[i].next = i+1;
 - sg++;
 + for (; n  (out_sgs + in_sgs); n++) {
 + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 + desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 + desc[i].addr = sg_phys(sg);
 + desc[i].len = sg-length;
 + desc[i].next = i+1;
 + i++;
 + }
   }
 + BUG_ON(i != total_sg);
  
   /* Last one doesn't continue. */
   desc[i-1].flags = ~VRING_DESC_F_NEXT;
 @@ -176,8 +184,48 @@ int virtqueue_add_buf(struct virtqueue *_vq,
 void *data,
 gfp_t gfp)
  {
 + struct scatterlist *sgs[2];
 + unsigned int i;
 +
 + sgs[0] = sg;
 + sgs[1] = sg + out;
 +
 + /* Workaround until callers pass well-formed sgs. */
 + for (i = 0; i  out + in; i++)
 + sg_unmark_end(sg + i);
 +
 + sg_mark_end(sg + out + in - 1);
 + if (out  in)
 + sg_mark_end(sg + out - 1);
 +
 + return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);
 +}
 +EXPORT_SYMBOL_GPL(virtqueue_add_buf);
 +
 +/**
 + * virtqueue_add_sgs - expose buffers to other end
 + * @vq: the struct virtqueue we're talking about.
 + * @sgs: array of terminated scatterlists.
 + * @out_num: the number of scatterlists readable by other side
 + * @in_num: the number of scatterlists which are writable (after readable 
 ones)
 + * @data: the token identifying the buffer.
 + * @gfp: how to do memory allocations (if necessary).
 + *
 + * Caller must ensure we don't call this with other virtqueue operations
 + * at the same time (except where noted).
 + *
 + * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
 + */
 +int virtqueue_add_sgs(struct virtqueue *_vq,
 +   struct scatterlist *sgs[],
 +   unsigned int out_sgs,
 +   unsigned int in_sgs,
 +   void *data,
 +   gfp_t gfp)
 +{
   struct vring_virtqueue *vq = to_vvq(_vq);
 - unsigned int i, avail, uninitialized_var(prev);
 + struct scatterlist *sg;
 + unsigned int i, n, avail, uninitialized_var(prev), total_sg;
   int head;
  
   START_USE(vq);
 @@ -197,46 +245,58 @@ int 

Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-19 Thread Wanlong Gao
On 02/19/2013 03:56 PM, Rusty Russell wrote:
> virtio_scsi can really use this, to avoid the current hack of copying
> the whole sg array.  Some other things get slightly neater, too.
> 
> Signed-off-by: Rusty Russell 

I like this simple implementation.

Reviewed-by: Wanlong Gao 

> ---
>  drivers/virtio/virtio_ring.c |  144 
> ++
>  include/linux/virtio.h   |7 ++
>  2 files changed, 109 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 245177c..27e31d3 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -100,14 +100,16 @@ struct vring_virtqueue
>  
>  /* Set up an indirect table of descriptors and add it to the queue. */
>  static int vring_add_indirect(struct vring_virtqueue *vq,
> -   struct scatterlist sg[],
> -   unsigned int out,
> -   unsigned int in,
> +   struct scatterlist *sgs[],
> +   unsigned int total_sg,
> +   unsigned int out_sgs,
> +   unsigned int in_sgs,
> gfp_t gfp)
>  {
>   struct vring_desc *desc;
>   unsigned head;
> - int i;
> + struct scatterlist *sg;
> + int i, n;
>  
>   /*
>* We require lowmem mappings for the descriptors because
> @@ -116,25 +118,31 @@ static int vring_add_indirect(struct vring_virtqueue 
> *vq,
>*/
>   gfp &= ~(__GFP_HIGHMEM | __GFP_HIGH);
>  
> - desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
> + desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
>   if (!desc)
>   return -ENOMEM;
>  
> - /* Transfer entries from the sg list into the indirect page */
> - for (i = 0; i < out; i++) {
> - desc[i].flags = VRING_DESC_F_NEXT;
> - desc[i].addr = sg_phys(sg);
> - desc[i].len = sg->length;
> - desc[i].next = i+1;
> - sg++;
> + /* Transfer entries from the sg lists into the indirect page */
> + i = 0;
> + for (n = 0; n < out_sgs; n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + desc[i].flags = VRING_DESC_F_NEXT;
> + desc[i].addr = sg_phys(sg);
> + desc[i].len = sg->length;
> + desc[i].next = i+1;
> + i++;
> + }
>   }
> - for (; i < (out + in); i++) {
> - desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> - desc[i].addr = sg_phys(sg);
> - desc[i].len = sg->length;
> - desc[i].next = i+1;
> - sg++;
> + for (; n < (out_sgs + in_sgs); n++) {
> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> + desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> + desc[i].addr = sg_phys(sg);
> + desc[i].len = sg->length;
> + desc[i].next = i+1;
> + i++;
> + }
>   }
> + BUG_ON(i != total_sg);
>  
>   /* Last one doesn't continue. */
>   desc[i-1].flags &= ~VRING_DESC_F_NEXT;
> @@ -176,8 +184,48 @@ int virtqueue_add_buf(struct virtqueue *_vq,
> void *data,
> gfp_t gfp)
>  {
> + struct scatterlist *sgs[2];
> + unsigned int i;
> +
> + sgs[0] = sg;
> + sgs[1] = sg + out;
> +
> + /* Workaround until callers pass well-formed sgs. */
> + for (i = 0; i < out + in; i++)
> + sg_unmark_end(sg + i);
> +
> + sg_mark_end(sg + out + in - 1);
> + if (out && in)
> + sg_mark_end(sg + out - 1);
> +
> + return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_add_buf);
> +
> +/**
> + * virtqueue_add_sgs - expose buffers to other end
> + * @vq: the struct virtqueue we're talking about.
> + * @sgs: array of terminated scatterlists.
> + * @out_num: the number of scatterlists readable by other side
> + * @in_num: the number of scatterlists which are writable (after readable 
> ones)
> + * @data: the token identifying the buffer.
> + * @gfp: how to do memory allocations (if necessary).
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
> + */
> +int virtqueue_add_sgs(struct virtqueue *_vq,
> +   struct scatterlist *sgs[],
> +   unsigned int out_sgs,
> +   unsigned int in_sgs,
> +   void *data,
> +   gfp_t gfp)
> +{
>   struct vring_virtqueue *vq = to_vvq(_vq);
> - unsigned int i, avail, uninitialized_var(prev);
> + struct scatterlist *sg;
> + unsigned int i, n, avail, 

Re: [PATCH 02/16] virtio_ring: virtqueue_add_sgs, to add multiple sgs.

2013-02-19 Thread Wanlong Gao
On 02/19/2013 03:56 PM, Rusty Russell wrote:
 virtio_scsi can really use this, to avoid the current hack of copying
 the whole sg array.  Some other things get slightly neater, too.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

I like this simple implementation.

Reviewed-by: Wanlong Gao gaowanl...@cn.fujitsu.com

 ---
  drivers/virtio/virtio_ring.c |  144 
 ++
  include/linux/virtio.h   |7 ++
  2 files changed, 109 insertions(+), 42 deletions(-)
 
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 245177c..27e31d3 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -100,14 +100,16 @@ struct vring_virtqueue
  
  /* Set up an indirect table of descriptors and add it to the queue. */
  static int vring_add_indirect(struct vring_virtqueue *vq,
 -   struct scatterlist sg[],
 -   unsigned int out,
 -   unsigned int in,
 +   struct scatterlist *sgs[],
 +   unsigned int total_sg,
 +   unsigned int out_sgs,
 +   unsigned int in_sgs,
 gfp_t gfp)
  {
   struct vring_desc *desc;
   unsigned head;
 - int i;
 + struct scatterlist *sg;
 + int i, n;
  
   /*
* We require lowmem mappings for the descriptors because
 @@ -116,25 +118,31 @@ static int vring_add_indirect(struct vring_virtqueue 
 *vq,
*/
   gfp = ~(__GFP_HIGHMEM | __GFP_HIGH);
  
 - desc = kmalloc((out + in) * sizeof(struct vring_desc), gfp);
 + desc = kmalloc(total_sg * sizeof(struct vring_desc), gfp);
   if (!desc)
   return -ENOMEM;
  
 - /* Transfer entries from the sg list into the indirect page */
 - for (i = 0; i  out; i++) {
 - desc[i].flags = VRING_DESC_F_NEXT;
 - desc[i].addr = sg_phys(sg);
 - desc[i].len = sg-length;
 - desc[i].next = i+1;
 - sg++;
 + /* Transfer entries from the sg lists into the indirect page */
 + i = 0;
 + for (n = 0; n  out_sgs; n++) {
 + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 + desc[i].flags = VRING_DESC_F_NEXT;
 + desc[i].addr = sg_phys(sg);
 + desc[i].len = sg-length;
 + desc[i].next = i+1;
 + i++;
 + }
   }
 - for (; i  (out + in); i++) {
 - desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 - desc[i].addr = sg_phys(sg);
 - desc[i].len = sg-length;
 - desc[i].next = i+1;
 - sg++;
 + for (; n  (out_sgs + in_sgs); n++) {
 + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 + desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
 + desc[i].addr = sg_phys(sg);
 + desc[i].len = sg-length;
 + desc[i].next = i+1;
 + i++;
 + }
   }
 + BUG_ON(i != total_sg);
  
   /* Last one doesn't continue. */
   desc[i-1].flags = ~VRING_DESC_F_NEXT;
 @@ -176,8 +184,48 @@ int virtqueue_add_buf(struct virtqueue *_vq,
 void *data,
 gfp_t gfp)
  {
 + struct scatterlist *sgs[2];
 + unsigned int i;
 +
 + sgs[0] = sg;
 + sgs[1] = sg + out;
 +
 + /* Workaround until callers pass well-formed sgs. */
 + for (i = 0; i  out + in; i++)
 + sg_unmark_end(sg + i);
 +
 + sg_mark_end(sg + out + in - 1);
 + if (out  in)
 + sg_mark_end(sg + out - 1);
 +
 + return virtqueue_add_sgs(_vq, sgs, out ? 1 : 0, in ? 1 : 0, data, gfp);
 +}
 +EXPORT_SYMBOL_GPL(virtqueue_add_buf);
 +
 +/**
 + * virtqueue_add_sgs - expose buffers to other end
 + * @vq: the struct virtqueue we're talking about.
 + * @sgs: array of terminated scatterlists.
 + * @out_num: the number of scatterlists readable by other side
 + * @in_num: the number of scatterlists which are writable (after readable 
 ones)
 + * @data: the token identifying the buffer.
 + * @gfp: how to do memory allocations (if necessary).
 + *
 + * Caller must ensure we don't call this with other virtqueue operations
 + * at the same time (except where noted).
 + *
 + * Returns zero or a negative error (ie. ENOSPC, ENOMEM).
 + */
 +int virtqueue_add_sgs(struct virtqueue *_vq,
 +   struct scatterlist *sgs[],
 +   unsigned int out_sgs,
 +   unsigned int in_sgs,
 +   void *data,
 +   gfp_t gfp)
 +{
   struct vring_virtqueue *vq = to_vvq(_vq);
 - unsigned int i, avail, uninitialized_var(prev);
 + struct scatterlist *sg;
 + unsigned int i, n, avail, uninitialized_var(prev), total_sg;
   int head;
  
   START_USE(vq);
 @@ -197,46 +245,58 @@ int