subject:"\[PATCH 0\/3\] virtio\-net\: inline header support"

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-12 Thread Cornelia Huck

On Fri, 12 Oct 2012 09:07:46 +1030
Rusty Russell  wrote:

> "Michael S. Tsirkin"  writes:
> > On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote:
> >> OK.  Well, Anthony wants qemu to be robust in this regard, so I am
> >> tempted to rework all the qemu drivers to handle arbitrary layouts.
> >> They could use a good audit anyway.
> >
> > I agree here. Still trying to understand whether we can agree to use
> > a feature bit for this, or not.
> 
> I'd *like* to imply it by the new PCI layout, but if it doesn't work
> we'll add a new feature bit.
> 
> I'm resisting a feature bit, since it constrains future implementations
> which could otherwise assume it.
> 
> >> This would become a glaring exception, but I'm tempted to fix it to 32
> >> bytes at the same time as we get the new pci layout (ie. for the virtio
> >> 1.0 spec).
> >
> > But this isn't a virtio-pci only issue, is it?
> > qemu has s390 bus with same limmitation.
> > How can we tie it to pci layout?
> 
> They can use a transport feature if they need to, of course.  But
> perhaps the timing with ccw will coincide with the fix, in which they
> don't need to, but it might be a bit late.
> 
> Cornelia?

My virtio-ccw host code is still going through a bit of rework, so it
might well go in after the fix.

There's also the existing (non-spec'ed) s390-virtio transport. While it
will likely be deprecated sometime in the future, it should probably
get a feature bit for consistency's sake.

> 
> Cheers,
> Rusty.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-12 Thread Paolo Bonzini

Il 12/10/2012 00:37, Rusty Russell ha scritto:
> "Michael S. Tsirkin"  writes:
>> On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote:
>>> OK.  Well, Anthony wants qemu to be robust in this regard, so I am
>>> tempted to rework all the qemu drivers to handle arbitrary layouts.
>>> They could use a good audit anyway.
>>
>> I agree here. Still trying to understand whether we can agree to use
>> a feature bit for this, or not.
> 
> I'd *like* to imply it by the new PCI layout, but if it doesn't work
> we'll add a new feature bit.
> 
> I'm resisting a feature bit, since it constrains future implementations
> which could otherwise assume it.

Future implementations may certainly refuse to start if the feature is
not there.  Whether it's a good idea or not, well, that depends on how
much future they are.

Paolo

>>> This would become a glaring exception, but I'm tempted to fix it to 32
>>> bytes at the same time as we get the new pci layout (ie. for the virtio
>>> 1.0 spec).
>>
>> But this isn't a virtio-pci only issue, is it?
>> qemu has s390 bus with same limmitation.
>> How can we tie it to pci layout?
> 
> They can use a transport feature if they need to, of course.  But
> perhaps the timing with ccw will coincide with the fix, in which they
> don't need to, but it might be a bit late.
> 
> Cornelia?
> 
> Cheers,
> Rusty.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-12 Thread Paolo Bonzini

Il 12/10/2012 00:37, Rusty Russell ha scritto:
 Michael S. Tsirkin m...@redhat.com writes:
 On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote:
 OK.  Well, Anthony wants qemu to be robust in this regard, so I am
 tempted to rework all the qemu drivers to handle arbitrary layouts.
 They could use a good audit anyway.

 I agree here. Still trying to understand whether we can agree to use
 a feature bit for this, or not.
 
 I'd *like* to imply it by the new PCI layout, but if it doesn't work
 we'll add a new feature bit.
 
 I'm resisting a feature bit, since it constrains future implementations
 which could otherwise assume it.

Future implementations may certainly refuse to start if the feature is
not there.  Whether it's a good idea or not, well, that depends on how
much future they are.

Paolo

 This would become a glaring exception, but I'm tempted to fix it to 32
 bytes at the same time as we get the new pci layout (ie. for the virtio
 1.0 spec).

 But this isn't a virtio-pci only issue, is it?
 qemu has s390 bus with same limmitation.
 How can we tie it to pci layout?
 
 They can use a transport feature if they need to, of course.  But
 perhaps the timing with ccw will coincide with the fix, in which they
 don't need to, but it might be a bit late.
 
 Cornelia?
 
 Cheers,
 Rusty.
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-12 Thread Cornelia Huck

On Fri, 12 Oct 2012 09:07:46 +1030
Rusty Russell ru...@rustcorp.com.au wrote:

 Michael S. Tsirkin m...@redhat.com writes:
  On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote:
  OK.  Well, Anthony wants qemu to be robust in this regard, so I am
  tempted to rework all the qemu drivers to handle arbitrary layouts.
  They could use a good audit anyway.
 
  I agree here. Still trying to understand whether we can agree to use
  a feature bit for this, or not.
 
 I'd *like* to imply it by the new PCI layout, but if it doesn't work
 we'll add a new feature bit.
 
 I'm resisting a feature bit, since it constrains future implementations
 which could otherwise assume it.
 
  This would become a glaring exception, but I'm tempted to fix it to 32
  bytes at the same time as we get the new pci layout (ie. for the virtio
  1.0 spec).
 
  But this isn't a virtio-pci only issue, is it?
  qemu has s390 bus with same limmitation.
  How can we tie it to pci layout?
 
 They can use a transport feature if they need to, of course.  But
 perhaps the timing with ccw will coincide with the fix, in which they
 don't need to, but it might be a bit late.
 
 Cornelia?

My virtio-ccw host code is still going through a bit of rework, so it
might well go in after the fix.

There's also the existing (non-spec'ed) s390-virtio transport. While it
will likely be deprecated sometime in the future, it should probably
get a feature bit for consistency's sake.

 
 Cheers,
 Rusty.
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-11 Thread Rusty Russell

"Michael S. Tsirkin"  writes:
> On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote:
>> OK.  Well, Anthony wants qemu to be robust in this regard, so I am
>> tempted to rework all the qemu drivers to handle arbitrary layouts.
>> They could use a good audit anyway.
>
> I agree here. Still trying to understand whether we can agree to use
> a feature bit for this, or not.

I'd *like* to imply it by the new PCI layout, but if it doesn't work
we'll add a new feature bit.

I'm resisting a feature bit, since it constrains future implementations
which could otherwise assume it.

>> This would become a glaring exception, but I'm tempted to fix it to 32
>> bytes at the same time as we get the new pci layout (ie. for the virtio
>> 1.0 spec).
>
> But this isn't a virtio-pci only issue, is it?
> qemu has s390 bus with same limmitation.
> How can we tie it to pci layout?

They can use a transport feature if they need to, of course.  But
perhaps the timing with ccw will coincide with the fix, in which they
don't need to, but it might be a bit late.

Cornelia?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-11 Thread Michael S. Tsirkin

On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote:
> Paolo Bonzini  writes:
> > Il 09/10/2012 06:59, Rusty Russell ha scritto:
> >> Paolo Bonzini  writes:
> >>> Il 05/10/2012 07:43, Rusty Russell ha scritto:
>  That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
>  said to Anthony, the best rules are "always" and "never", so I'd really
>  rather not have to grandfather that in.
> >>>
> >>> It is, but we can add a rule that if the (transport) flag
> >>> VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
> >>> virtio-blk.
> >> 
> >> Could we do that?  It's the cmd length I'm concerned about; is it always
> >> 32 in practice for some reason?
> >
> > It is always 32 or less except in very obscure cases that are pretty
> > much confined to iSCSI.  We don't care about the obscure cases, and the
> > extra bytes don't hurt.
> >
> > BTW, 32 is the default cdb_size used by virtio-scsi.
> >
> >> Currently qemu does:
> >> 
> >> struct sg_io_hdr hdr;
> >> memset(, 0, sizeof(struct sg_io_hdr));
> >> hdr.interface_id = 'S';
> >> hdr.cmd_len = req->elem.out_sg[1].iov_len;
> >> hdr.cmdp = req->elem.out_sg[1].iov_base;
> >> hdr.dxfer_len = 0;
> >> 
> >> If it's a command which expects more output data, there's no way to
> >> guess where the boundary is between that command and the data.
> >
> > Yep, so I understood the problem right.
> 
> OK.  Well, Anthony wants qemu to be robust in this regard, so I am
> tempted to rework all the qemu drivers to handle arbitrary layouts.
> They could use a good audit anyway.

I agree here. Still trying to understand whether we can agree to use
a feature bit for this, or not.

> This would become a glaring exception, but I'm tempted to fix it to 32
> bytes at the same time as we get the new pci layout (ie. for the virtio
> 1.0 spec).

But this isn't a virtio-pci only issue, is it?
qemu has s390 bus with same limmitation.
How can we tie it to pci layout?

Maybe what you mean is to use a transport feature for this
and tie *that* to new layout in case of pci?



> The Linux driver would carefully be backwards compatible, of
> course, and the spec would document why.
> 
> Cheers,
> Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-11 Thread Michael S. Tsirkin

On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
  Il 09/10/2012 06:59, Rusty Russell ha scritto:
  Paolo Bonzini pbonz...@redhat.com writes:
  Il 05/10/2012 07:43, Rusty Russell ha scritto:
  That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
  said to Anthony, the best rules are always and never, so I'd really
  rather not have to grandfather that in.
 
  It is, but we can add a rule that if the (transport) flag
  VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
  virtio-blk.
  
  Could we do that?  It's the cmd length I'm concerned about; is it always
  32 in practice for some reason?
 
  It is always 32 or less except in very obscure cases that are pretty
  much confined to iSCSI.  We don't care about the obscure cases, and the
  extra bytes don't hurt.
 
  BTW, 32 is the default cdb_size used by virtio-scsi.
 
  Currently qemu does:
  
  struct sg_io_hdr hdr;
  memset(hdr, 0, sizeof(struct sg_io_hdr));
  hdr.interface_id = 'S';
  hdr.cmd_len = req-elem.out_sg[1].iov_len;
  hdr.cmdp = req-elem.out_sg[1].iov_base;
  hdr.dxfer_len = 0;
  
  If it's a command which expects more output data, there's no way to
  guess where the boundary is between that command and the data.
 
  Yep, so I understood the problem right.
 
 OK.  Well, Anthony wants qemu to be robust in this regard, so I am
 tempted to rework all the qemu drivers to handle arbitrary layouts.
 They could use a good audit anyway.

I agree here. Still trying to understand whether we can agree to use
a feature bit for this, or not.

 This would become a glaring exception, but I'm tempted to fix it to 32
 bytes at the same time as we get the new pci layout (ie. for the virtio
 1.0 spec).

But this isn't a virtio-pci only issue, is it?
qemu has s390 bus with same limmitation.
How can we tie it to pci layout?

Maybe what you mean is to use a transport feature for this
and tie *that* to new layout in case of pci?



 The Linux driver would carefully be backwards compatible, of
 course, and the spec would document why.
 
 Cheers,
 Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-11 Thread Rusty Russell

Michael S. Tsirkin m...@redhat.com writes:
 On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote:
 OK.  Well, Anthony wants qemu to be robust in this regard, so I am
 tempted to rework all the qemu drivers to handle arbitrary layouts.
 They could use a good audit anyway.

 I agree here. Still trying to understand whether we can agree to use
 a feature bit for this, or not.

I'd *like* to imply it by the new PCI layout, but if it doesn't work
we'll add a new feature bit.

I'm resisting a feature bit, since it constrains future implementations
which could otherwise assume it.

 This would become a glaring exception, but I'm tempted to fix it to 32
 bytes at the same time as we get the new pci layout (ie. for the virtio
 1.0 spec).

 But this isn't a virtio-pci only issue, is it?
 qemu has s390 bus with same limmitation.
 How can we tie it to pci layout?

They can use a transport feature if they need to, of course.  But
perhaps the timing with ccw will coincide with the fix, in which they
don't need to, but it might be a bit late.

Cornelia?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-10 Thread Rusty Russell

Paolo Bonzini  writes:
> Il 09/10/2012 06:59, Rusty Russell ha scritto:
>> Paolo Bonzini  writes:
>>> Il 05/10/2012 07:43, Rusty Russell ha scritto:
 That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
 said to Anthony, the best rules are "always" and "never", so I'd really
 rather not have to grandfather that in.
>>>
>>> It is, but we can add a rule that if the (transport) flag
>>> VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
>>> virtio-blk.
>> 
>> Could we do that?  It's the cmd length I'm concerned about; is it always
>> 32 in practice for some reason?
>
> It is always 32 or less except in very obscure cases that are pretty
> much confined to iSCSI.  We don't care about the obscure cases, and the
> extra bytes don't hurt.
>
> BTW, 32 is the default cdb_size used by virtio-scsi.
>
>> Currently qemu does:
>> 
>> struct sg_io_hdr hdr;
>> memset(, 0, sizeof(struct sg_io_hdr));
>> hdr.interface_id = 'S';
>> hdr.cmd_len = req->elem.out_sg[1].iov_len;
>> hdr.cmdp = req->elem.out_sg[1].iov_base;
>> hdr.dxfer_len = 0;
>> 
>> If it's a command which expects more output data, there's no way to
>> guess where the boundary is between that command and the data.
>
> Yep, so I understood the problem right.

OK.  Well, Anthony wants qemu to be robust in this regard, so I am
tempted to rework all the qemu drivers to handle arbitrary layouts.
They could use a good audit anyway.

This would become a glaring exception, but I'm tempted to fix it to 32
bytes at the same time as we get the new pci layout (ie. for the virtio
1.0 spec).  The Linux driver would carefully be backwards compatible, of
course, and the spec would document why.

Cheers,
Rusty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-10 Thread Rusty Russell

Paolo Bonzini pbonz...@redhat.com writes:
 Il 09/10/2012 06:59, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 Il 05/10/2012 07:43, Rusty Russell ha scritto:
 That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
 said to Anthony, the best rules are always and never, so I'd really
 rather not have to grandfather that in.

 It is, but we can add a rule that if the (transport) flag
 VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
 virtio-blk.
 
 Could we do that?  It's the cmd length I'm concerned about; is it always
 32 in practice for some reason?

 It is always 32 or less except in very obscure cases that are pretty
 much confined to iSCSI.  We don't care about the obscure cases, and the
 extra bytes don't hurt.

 BTW, 32 is the default cdb_size used by virtio-scsi.

 Currently qemu does:
 
 struct sg_io_hdr hdr;
 memset(hdr, 0, sizeof(struct sg_io_hdr));
 hdr.interface_id = 'S';
 hdr.cmd_len = req-elem.out_sg[1].iov_len;
 hdr.cmdp = req-elem.out_sg[1].iov_base;
 hdr.dxfer_len = 0;
 
 If it's a command which expects more output data, there's no way to
 guess where the boundary is between that command and the data.

 Yep, so I understood the problem right.

OK.  Well, Anthony wants qemu to be robust in this regard, so I am
tempted to rework all the qemu drivers to handle arbitrary layouts.
They could use a good audit anyway.

This would become a glaring exception, but I'm tempted to fix it to 32
bytes at the same time as we get the new pci layout (ie. for the virtio
1.0 spec).  The Linux driver would carefully be backwards compatible, of
course, and the spec would document why.

Cheers,
Rusty.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-09 Thread Paolo Bonzini

Il 09/10/2012 06:59, Rusty Russell ha scritto:
> Paolo Bonzini  writes:
>> Il 05/10/2012 07:43, Rusty Russell ha scritto:
>>> That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
>>> said to Anthony, the best rules are "always" and "never", so I'd really
>>> rather not have to grandfather that in.
>>
>> It is, but we can add a rule that if the (transport) flag
>> VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
>> virtio-blk.
> 
> Could we do that?  It's the cmd length I'm concerned about; is it always
> 32 in practice for some reason?

It is always 32 or less except in very obscure cases that are pretty
much confined to iSCSI.  We don't care about the obscure cases, and the
extra bytes don't hurt.

BTW, 32 is the default cdb_size used by virtio-scsi.

> Currently qemu does:
> 
> struct sg_io_hdr hdr;
> memset(, 0, sizeof(struct sg_io_hdr));
> hdr.interface_id = 'S';
> hdr.cmd_len = req->elem.out_sg[1].iov_len;
> hdr.cmdp = req->elem.out_sg[1].iov_base;
> hdr.dxfer_len = 0;
> 
> If it's a command which expects more output data, there's no way to
> guess where the boundary is between that command and the data.

Yep, so I understood the problem right.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-09 Thread Paolo Bonzini

Il 09/10/2012 06:59, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 Il 05/10/2012 07:43, Rusty Russell ha scritto:
 That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
 said to Anthony, the best rules are always and never, so I'd really
 rather not have to grandfather that in.

 It is, but we can add a rule that if the (transport) flag
 VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
 virtio-blk.
 
 Could we do that?  It's the cmd length I'm concerned about; is it always
 32 in practice for some reason?

It is always 32 or less except in very obscure cases that are pretty
much confined to iSCSI.  We don't care about the obscure cases, and the
extra bytes don't hurt.

BTW, 32 is the default cdb_size used by virtio-scsi.

 Currently qemu does:
 
 struct sg_io_hdr hdr;
 memset(hdr, 0, sizeof(struct sg_io_hdr));
 hdr.interface_id = 'S';
 hdr.cmd_len = req-elem.out_sg[1].iov_len;
 hdr.cmdp = req-elem.out_sg[1].iov_base;
 hdr.dxfer_len = 0;
 
 If it's a command which expects more output data, there's no way to
 guess where the boundary is between that command and the data.

Yep, so I understood the problem right.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-08 Thread Rusty Russell

Paolo Bonzini  writes:
> Il 05/10/2012 07:43, Rusty Russell ha scritto:
>> That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
>> said to Anthony, the best rules are "always" and "never", so I'd really
>> rather not have to grandfather that in.
>
> It is, but we can add a rule that if the (transport) flag
> VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
> virtio-blk.

Could we do that?  It's the cmd length I'm concerned about; is it always
32 in practice for some reason?

Currently qemu does:

struct sg_io_hdr hdr;
memset(, 0, sizeof(struct sg_io_hdr));
hdr.interface_id = 'S';
hdr.cmd_len = req->elem.out_sg[1].iov_len;
hdr.cmdp = req->elem.out_sg[1].iov_base;
hdr.dxfer_len = 0;

If it's a command which expects more output data, there's no way to
guess where the boundary is between that command and the data.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-08 Thread Michael S. Tsirkin

On Thu, Oct 04, 2012 at 01:04:56PM +0930, Rusty Russell wrote:
> Anthony Liguori  writes:
> > Rusty Russell  writes:
> >
> >> "Michael S. Tsirkin"  writes:
> >>
> >>> Thinking about Sasha's patches, we can reduce ring usage
> >>> for virtio net small packets dramatically if we put
> >>> virtio net header inline with the data.
> >>> This can be done for free in case guest net stack allocated
> >>> extra head room for the packet, and I don't see
> >>> why would this have any downsides.
> >>
> >> I've been wanting to do this for the longest time... but...
> >>
> >>> Even though with my recent patches qemu
> >>> no longer requires header to be the first s/g element,
> >>> we need a new feature bit to detect this.
> >>> A trivial qemu patch will be sent separately.
> >>
> >> There's a reason I haven't done this.  I really, really dislike "my
> >> implemention isn't broken" feature bits.  We could have an infinite
> >> number of them, for each bug in each device.
> >
> > This is a bug in the specification.
> >
> > The QEMU implementation pre-dates the specification.  All of the actual
> > implementations of virtio relied on the semantics of s/g elements and
> > still do.
> 
> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
> ever going to be merged, so I'm not sure what its status is?  But I'm
> determined to fix qemu, and hence my torture patch to make sure this
> doesn't creep in again.

If you look at my patch you'll notice there's also a
comment in virtio_net.h that seems to be broken in this respect:

/* This is the first element of the scatter-gather list.  If you don't
 * specify GSO or CSUM features, you can simply ignore the header. */

There is a similar comment in virtio-blk.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-08 Thread Michael S. Tsirkin

On Wed, Oct 03, 2012 at 04:14:17PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> 
> > Thinking about Sasha's patches, we can reduce ring usage
> > for virtio net small packets dramatically if we put
> > virtio net header inline with the data.
> > This can be done for free in case guest net stack allocated
> > extra head room for the packet, and I don't see
> > why would this have any downsides.
> 
> I've been wanting to do this for the longest time... but...
> 
> > Even though with my recent patches qemu
> > no longer requires header to be the first s/g element,
> > we need a new feature bit to detect this.
> > A trivial qemu patch will be sent separately.
> 
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.
> 
> So my plan was to tie this assumption to the new PCI layout.

I don't object but old qemu has this limitation for s390 as well,
and that's not using PCI, right? So how do we detect
new hypervisor there?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-08 Thread Michael S. Tsirkin

On Wed, Oct 03, 2012 at 04:14:17PM +0930, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  Thinking about Sasha's patches, we can reduce ring usage
  for virtio net small packets dramatically if we put
  virtio net header inline with the data.
  This can be done for free in case guest net stack allocated
  extra head room for the packet, and I don't see
  why would this have any downsides.
 
 I've been wanting to do this for the longest time... but...
 
  Even though with my recent patches qemu
  no longer requires header to be the first s/g element,
  we need a new feature bit to detect this.
  A trivial qemu patch will be sent separately.
 
 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.
 
 So my plan was to tie this assumption to the new PCI layout.

I don't object but old qemu has this limitation for s390 as well,
and that's not using PCI, right? So how do we detect
new hypervisor there?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-08 Thread Michael S. Tsirkin

On Thu, Oct 04, 2012 at 01:04:56PM +0930, Rusty Russell wrote:
 Anthony Liguori anth...@codemonkey.ws writes:
  Rusty Russell ru...@rustcorp.com.au writes:
 
  Michael S. Tsirkin m...@redhat.com writes:
 
  Thinking about Sasha's patches, we can reduce ring usage
  for virtio net small packets dramatically if we put
  virtio net header inline with the data.
  This can be done for free in case guest net stack allocated
  extra head room for the packet, and I don't see
  why would this have any downsides.
 
  I've been wanting to do this for the longest time... but...
 
  Even though with my recent patches qemu
  no longer requires header to be the first s/g element,
  we need a new feature bit to detect this.
  A trivial qemu patch will be sent separately.
 
  There's a reason I haven't done this.  I really, really dislike my
  implemention isn't broken feature bits.  We could have an infinite
  number of them, for each bug in each device.
 
  This is a bug in the specification.
 
  The QEMU implementation pre-dates the specification.  All of the actual
  implementations of virtio relied on the semantics of s/g elements and
  still do.
 
 lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
 ever going to be merged, so I'm not sure what its status is?  But I'm
 determined to fix qemu, and hence my torture patch to make sure this
 doesn't creep in again.

If you look at my patch you'll notice there's also a
comment in virtio_net.h that seems to be broken in this respect:

/* This is the first element of the scatter-gather list.  If you don't
 * specify GSO or CSUM features, you can simply ignore the header. */

There is a similar comment in virtio-blk.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-08 Thread Rusty Russell

Paolo Bonzini pbonz...@redhat.com writes:
 Il 05/10/2012 07:43, Rusty Russell ha scritto:
 That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
 said to Anthony, the best rules are always and never, so I'd really
 rather not have to grandfather that in.

 It is, but we can add a rule that if the (transport) flag
 VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
 virtio-blk.

Could we do that?  It's the cmd length I'm concerned about; is it always
32 in practice for some reason?

Currently qemu does:

struct sg_io_hdr hdr;
memset(hdr, 0, sizeof(struct sg_io_hdr));
hdr.interface_id = 'S';
hdr.cmd_len = req-elem.out_sg[1].iov_len;
hdr.cmdp = req-elem.out_sg[1].iov_base;
hdr.dxfer_len = 0;

If it's a command which expects more output data, there's no way to
guess where the boundary is between that command and the data.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-06 Thread Paolo Bonzini

Il 05/10/2012 07:43, Rusty Russell ha scritto:
>> > struct virtio_scsi_req_cmd {
>> > // Read-only
>> > u8 lun[8];
>> > u64 id;
>> > u8 task_attr;
>> > u8 prio;
>> > u8 crn;
>> > char cdb[cdb_size];
>> > char dataout[];
>> > // Write-only part
>> > u32 sense_len;
>> > u32 residual;
>> > u16 status_qualifier;
>> > u8 status;
>> > u8 response;
>> > u8 sense[sense_size];
>> > char datain[];
>> > };
>> >
>> > where cdb_size and sense_size come from configuration space.  The device
>> > right now expects everything before dataout/datain to be in a single
>> > descriptor, but that's in no way part of the spec.  Am I missing
>> > something egregious?
> Since you wrote it, I hope not :)

Yeah, I guess the confusion came from cdb_size and sense_size being in
configuration space.

> That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
> said to Anthony, the best rules are "always" and "never", so I'd really
> rather not have to grandfather that in.

It is, but we can add a rule that if the (transport) flag
VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
virtio-blk.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-06 Thread Rusty Russell

Paolo Bonzini  writes:

> Il 04/10/2012 14:51, Rusty Russell ha scritto:
>> Paolo Bonzini  writes:
>> 
>>> Il 04/10/2012 02:11, Rusty Russell ha scritto:
>> There's a reason I haven't done this.  I really, really dislike "my
>> implemention isn't broken" feature bits.  We could have an infinite
>> number of them, for each bug in each device.
>
> However, this bug affects (almost) all implementations and (almost) all
> devices.  It even makes sense to reserve a transport feature bit for it
> instead of a device feature bit.

 Perhaps, but we have to fix the bugs first!
>>>
>>> Yes. :)  Isn't that what mst's patch does?
>>>
 As I said, my torture patch broke qemu immediately.  Since noone has
 leapt onto fixing that, I'll take a look now...
>>>
>>> I can look at virtio-scsi.
>> 
>> Actually, you can't, see my reply to Anthony...
>> 
>> Message-ID: <87lifm1y1n@rustcorp.com.au>
>
> struct virtio_scsi_req_cmd {
> // Read-only
> u8 lun[8];
> u64 id;
> u8 task_attr;
> u8 prio;
> u8 crn;
> char cdb[cdb_size];
> char dataout[];
> // Write-only part
> u32 sense_len;
> u32 residual;
> u16 status_qualifier;
> u8 status;
> u8 response;
> u8 sense[sense_size];
> char datain[];
> };
>
> where cdb_size and sense_size come from configuration space.  The device
> right now expects everything before dataout/datain to be in a single
> descriptor, but that's in no way part of the spec.  Am I missing
> something egregious?

Since you wrote it, I hope not :)

That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
said to Anthony, the best rules are "always" and "never", so I'd really
rather not have to grandfather that in.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-06 Thread Rusty Russell

Paolo Bonzini pbonz...@redhat.com writes:

 Il 04/10/2012 14:51, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 
 Il 04/10/2012 02:11, Rusty Russell ha scritto:
 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 However, this bug affects (almost) all implementations and (almost) all
 devices.  It even makes sense to reserve a transport feature bit for it
 instead of a device feature bit.

 Perhaps, but we have to fix the bugs first!

 Yes. :)  Isn't that what mst's patch does?

 As I said, my torture patch broke qemu immediately.  Since noone has
 leapt onto fixing that, I'll take a look now...

 I can look at virtio-scsi.
 
 Actually, you can't, see my reply to Anthony...
 
 Message-ID: 87lifm1y1n@rustcorp.com.au

 struct virtio_scsi_req_cmd {
 // Read-only
 u8 lun[8];
 u64 id;
 u8 task_attr;
 u8 prio;
 u8 crn;
 char cdb[cdb_size];
 char dataout[];
 // Write-only part
 u32 sense_len;
 u32 residual;
 u16 status_qualifier;
 u8 status;
 u8 response;
 u8 sense[sense_size];
 char datain[];
 };

 where cdb_size and sense_size come from configuration space.  The device
 right now expects everything before dataout/datain to be in a single
 descriptor, but that's in no way part of the spec.  Am I missing
 something egregious?

Since you wrote it, I hope not :)

That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
said to Anthony, the best rules are always and never, so I'd really
rather not have to grandfather that in.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-06 Thread Paolo Bonzini

Il 05/10/2012 07:43, Rusty Russell ha scritto:
  struct virtio_scsi_req_cmd {
  // Read-only
  u8 lun[8];
  u64 id;
  u8 task_attr;
  u8 prio;
  u8 crn;
  char cdb[cdb_size];
  char dataout[];
  // Write-only part
  u32 sense_len;
  u32 residual;
  u16 status_qualifier;
  u8 status;
  u8 response;
  u8 sense[sense_size];
  char datain[];
  };
 
  where cdb_size and sense_size come from configuration space.  The device
  right now expects everything before dataout/datain to be in a single
  descriptor, but that's in no way part of the spec.  Am I missing
  something egregious?
 Since you wrote it, I hope not :)

Yeah, I guess the confusion came from cdb_size and sense_size being in
configuration space.

 That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
 said to Anthony, the best rules are always and never, so I'd really
 rather not have to grandfather that in.

It is, but we can add a rule that if the (transport) flag
VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
virtio-blk.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-05 Thread Paolo Bonzini

Il 04/10/2012 09:44, Rusty Russell ha scritto:
> -In particular, no implementation should use the descriptor
> -boundaries to determine the size of any header in a request.[footnote:
> -The current qemu device implementations mistakenly insist that
> -the first descriptor cover the header in these cases exactly, so
> -a cautious driver should arrange it so.
> +[footnote:
> +It was previously asserted that framing should be independent of
> +message contents, yet invariably drivers layed out messages in
> +reliable ways and devices assumed it. In addition, the
> +specifications for virtio_blk and virtio_scsi require intuiting
> +field lengths from frame boundaries.
>  ]

Not true for virtio_scsi...

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-05 Thread Paolo Bonzini

Il 04/10/2012 09:44, Rusty Russell ha scritto:
 -In particular, no implementation should use the descriptor
 -boundaries to determine the size of any header in a request.[footnote:
 -The current qemu device implementations mistakenly insist that
 -the first descriptor cover the header in these cases exactly, so
 -a cautious driver should arrange it so.
 +[footnote:
 +It was previously asserted that framing should be independent of
 +message contents, yet invariably drivers layed out messages in
 +reliable ways and devices assumed it. In addition, the
 +specifications for virtio_blk and virtio_scsi require intuiting
 +field lengths from frame boundaries.
  ]

Not true for virtio_scsi...

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 14:51, Rusty Russell ha scritto:
> Paolo Bonzini  writes:
> 
>> Il 04/10/2012 02:11, Rusty Russell ha scritto:
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.

 However, this bug affects (almost) all implementations and (almost) all
 devices.  It even makes sense to reserve a transport feature bit for it
 instead of a device feature bit.
>>>
>>> Perhaps, but we have to fix the bugs first!
>>
>> Yes. :)  Isn't that what mst's patch does?
>>
>>> As I said, my torture patch broke qemu immediately.  Since noone has
>>> leapt onto fixing that, I'll take a look now...
>>
>> I can look at virtio-scsi.
> 
> Actually, you can't, see my reply to Anthony...
> 
> Message-ID: <87lifm1y1n@rustcorp.com.au>

struct virtio_scsi_req_cmd {
// Read-only
u8 lun[8];
u64 id;
u8 task_attr;
u8 prio;
u8 crn;
char cdb[cdb_size];
char dataout[];
// Write-only part
u32 sense_len;
u32 residual;
u16 status_qualifier;
u8 status;
u8 response;
u8 sense[sense_size];
char datain[];
};

where cdb_size and sense_size come from configuration space.  The device
right now expects everything before dataout/datain to be in a single
descriptor, but that's in no way part of the spec.  Am I missing
something egregious?

Paolo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Rusty Russell

Paolo Bonzini  writes:

> Il 04/10/2012 02:11, Rusty Russell ha scritto:
>> > > There's a reason I haven't done this.  I really, really dislike "my
>> > > implemention isn't broken" feature bits.  We could have an infinite
>> > > number of them, for each bug in each device.
>> >
>> > However, this bug affects (almost) all implementations and (almost) all
>> > devices.  It even makes sense to reserve a transport feature bit for it
>> > instead of a device feature bit.
>>
>> Perhaps, but we have to fix the bugs first!
>
> Yes. :)  Isn't that what mst's patch does?
>
>> As I said, my torture patch broke qemu immediately.  Since noone has
>> leapt onto fixing that, I'll take a look now...
>
> I can look at virtio-scsi.

Actually, you can't, see my reply to Anthony...

Message-ID: <87lifm1y1n@rustcorp.com.au>

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Rusty Russell

Anthony Liguori  writes:
>> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
>> ever going to be merged, so I'm not sure what its status is?  But I'm
>> determined to fix qemu, and hence my torture patch to make sure this
>> doesn't creep in again.
>
> There are even more implementations out there and I'd wager they all
> rely on framing.

Worse, both virtio_blk (for scsi commands) and virtio_scsi explicitly
and inescapably rely on framing.  The spec conflicts clearly with
itself.

Such layering violations are always a mistake, but I can't blame anyone
else for my lack of attention :(

Here's the spec change:
commit 7e74459bb966ccbaad9e4bf361d1178b7f400b79
Author: Rusty Russell 
Date:   Thu Oct 4 17:11:27 2012 +0930

No longer assume framing is independent of messages.  *sniff*

Signed-off-by: Rusty Russell 

--- virtio-spec.txt 2012-10-04 17:13:04.988259234 +0930
+++ virtio-spec.txt.new 2012-10-04 17:12:54.624258969 +0930
@@ -880,19 +880,19 @@
 
   Message Framing
 
-The descriptors used for a buffer should not effect the semantics
-of the message, except for the total length of the buffer. For
-example, a network buffer consists of a 10 byte header followed
-by the network packet. Whether this is presented in the ring
-descriptor chain as (say) a 10 byte buffer and a 1514 byte
-buffer, or a single 1524 byte buffer, or even three buffers,
-should have no effect.
+Unless stated otherwise, it is expected that headers within a
+message are contained within their own descriptors. For example,
+a network buffer consists of a 10 or 12 byte header followed by
+the network packet. An implementation should expect that this
+header will be within the first descriptor, and that the
+remainder of the data will begin on the second descriptor.
 
-In particular, no implementation should use the descriptor
-boundaries to determine the size of any header in a request.[footnote:
-The current qemu device implementations mistakenly insist that
-the first descriptor cover the header in these cases exactly, so
-a cautious driver should arrange it so.
+[footnote:
+It was previously asserted that framing should be independent of
+message contents, yet invariably drivers layed out messages in
+reliable ways and devices assumed it. In addition, the
+specifications for virtio_blk and virtio_scsi require intuiting
+field lengths from frame boundaries.
 ]
 
   Device Improvements

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 02:11, Rusty Russell ha scritto:
> > > There's a reason I haven't done this.  I really, really dislike "my
> > > implemention isn't broken" feature bits.  We could have an infinite
> > > number of them, for each bug in each device.
> >
> > However, this bug affects (almost) all implementations and (almost) all
> > devices.  It even makes sense to reserve a transport feature bit for it
> > instead of a device feature bit.
>
> Perhaps, but we have to fix the bugs first!

Yes. :)  Isn't that what mst's patch does?

> As I said, my torture patch broke qemu immediately.  Since noone has
> leapt onto fixing that, I'll take a look now...

I can look at virtio-scsi.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 02:11, Rusty Russell ha scritto:
   There's a reason I haven't done this.  I really, really dislike my
   implemention isn't broken feature bits.  We could have an infinite
   number of them, for each bug in each device.
 
  However, this bug affects (almost) all implementations and (almost) all
  devices.  It even makes sense to reserve a transport feature bit for it
  instead of a device feature bit.

 Perhaps, but we have to fix the bugs first!

Yes. :)  Isn't that what mst's patch does?

 As I said, my torture patch broke qemu immediately.  Since noone has
 leapt onto fixing that, I'll take a look now...

I can look at virtio-scsi.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Rusty Russell

Anthony Liguori anth...@codemonkey.ws writes:
 lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
 ever going to be merged, so I'm not sure what its status is?  But I'm
 determined to fix qemu, and hence my torture patch to make sure this
 doesn't creep in again.

 There are even more implementations out there and I'd wager they all
 rely on framing.

Worse, both virtio_blk (for scsi commands) and virtio_scsi explicitly
and inescapably rely on framing.  The spec conflicts clearly with
itself.

Such layering violations are always a mistake, but I can't blame anyone
else for my lack of attention :(

Here's the spec change:
commit 7e74459bb966ccbaad9e4bf361d1178b7f400b79
Author: Rusty Russell ru...@rustcorp.com.au
Date:   Thu Oct 4 17:11:27 2012 +0930

No longer assume framing is independent of messages.  *sniff*

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

--- virtio-spec.txt 2012-10-04 17:13:04.988259234 +0930
+++ virtio-spec.txt.new 2012-10-04 17:12:54.624258969 +0930
@@ -880,19 +880,19 @@
 
   Message Framing
 
-The descriptors used for a buffer should not effect the semantics
-of the message, except for the total length of the buffer. For
-example, a network buffer consists of a 10 byte header followed
-by the network packet. Whether this is presented in the ring
-descriptor chain as (say) a 10 byte buffer and a 1514 byte
-buffer, or a single 1524 byte buffer, or even three buffers,
-should have no effect.
+Unless stated otherwise, it is expected that headers within a
+message are contained within their own descriptors. For example,
+a network buffer consists of a 10 or 12 byte header followed by
+the network packet. An implementation should expect that this
+header will be within the first descriptor, and that the
+remainder of the data will begin on the second descriptor.
 
-In particular, no implementation should use the descriptor
-boundaries to determine the size of any header in a request.[footnote:
-The current qemu device implementations mistakenly insist that
-the first descriptor cover the header in these cases exactly, so
-a cautious driver should arrange it so.
+[footnote:
+It was previously asserted that framing should be independent of
+message contents, yet invariably drivers layed out messages in
+reliable ways and devices assumed it. In addition, the
+specifications for virtio_blk and virtio_scsi require intuiting
+field lengths from frame boundaries.
 ]
 
   Device Improvements

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Rusty Russell

Paolo Bonzini pbonz...@redhat.com writes:

 Il 04/10/2012 02:11, Rusty Russell ha scritto:
   There's a reason I haven't done this.  I really, really dislike my
   implemention isn't broken feature bits.  We could have an infinite
   number of them, for each bug in each device.
 
  However, this bug affects (almost) all implementations and (almost) all
  devices.  It even makes sense to reserve a transport feature bit for it
  instead of a device feature bit.

 Perhaps, but we have to fix the bugs first!

 Yes. :)  Isn't that what mst's patch does?

 As I said, my torture patch broke qemu immediately.  Since noone has
 leapt onto fixing that, I'll take a look now...

 I can look at virtio-scsi.

Actually, you can't, see my reply to Anthony...

Message-ID: 87lifm1y1n@rustcorp.com.au

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-04 Thread Paolo Bonzini

Il 04/10/2012 14:51, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 
 Il 04/10/2012 02:11, Rusty Russell ha scritto:
 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 However, this bug affects (almost) all implementations and (almost) all
 devices.  It even makes sense to reserve a transport feature bit for it
 instead of a device feature bit.

 Perhaps, but we have to fix the bugs first!

 Yes. :)  Isn't that what mst's patch does?

 As I said, my torture patch broke qemu immediately.  Since noone has
 leapt onto fixing that, I'll take a look now...

 I can look at virtio-scsi.
 
 Actually, you can't, see my reply to Anthony...
 
 Message-ID: 87lifm1y1n@rustcorp.com.au

struct virtio_scsi_req_cmd {
// Read-only
u8 lun[8];
u64 id;
u8 task_attr;
u8 prio;
u8 crn;
char cdb[cdb_size];
char dataout[];
// Write-only part
u32 sense_len;
u32 residual;
u16 status_qualifier;
u8 status;
u8 response;
u8 sense[sense_size];
char datain[];
};

where cdb_size and sense_size come from configuration space.  The device
right now expects everything before dataout/datain to be in a single
descriptor, but that's in no way part of the spec.  Am I missing
something egregious?

Paolo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

Anthony Liguori  writes:

> Rusty Russell  writes:
>
>> "Michael S. Tsirkin"  writes:
>>
>> There's a reason I haven't done this.  I really, really dislike "my
>> implemention isn't broken" feature bits.  We could have an infinite
>> number of them, for each bug in each device.
>>
>> So my plan was to tie this assumption to the new PCI layout.  And have a
>> stress-testing patch like the one below in the kernel (see my virtio-wip
>> branch for stuff like this).  Turn it on at boot with
>> "virtio_ring.torture" on the kernel commandline.
>>
>> BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
>> is too old.  Building the latest git now...
>>
>> Cheers,
>> Rusty.
>>
>> Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE
>>
>> Virtio devices are not supposed to depend on the framing of the 
>> scatter-gather
>> lists, but various implementations did.  Safeguard this in future by adding
>> an option to deliberately create perverse descriptors.
>>
>> Signed-off-by: Rusty Russell 
>
> Ignore framing is really a bad idea.  You want backends to enforce
> reasonable framing because guest's shouldn't do silly things with framing.
>
> For instance, with virtio-blk, if you want decent performance, you
> absolutely want to avoid bouncing the data.  If you're using O_DIRECT in
> the host to submit I/O requests, then it's critical that all of the s/g
> elements are aligned to a sector boundary and sized to a sector
> boundary.
>
> Yes, QEMU can handle if that's not the case, but it would be insanely
> stupid for a guest not to do this.  This is the sort of thing that ought
> to be enforced in the specification because a guest cannot perform well
> if it doesn't follow these rules.

Lack of imagination is what got us into trouble in the first place; when
presented with one counter-example, it's useful to look for others.

That's our job, not to dismiss them a "insanely stupid".

For example:
1) Perhaps the guest isn't trying to perform well, it's trying to be a
   tiny bootloader?
2) Perhaps the guest is the direct consumer, and aligning buffers is
   redundant.

> A spec isn't terribly useful if the result is guest drivers that are
> slow.  There's very little to gain by not enforcing rules around framing
> and there's a lot to lose if a guest frames incorrectly.

The guest has the flexibility, and gets to decide.  The spec is not
forcing them to perform badly.

> In the rare case where we want to make a framing change, we should use
> feature bits like Michael is proposing.
>
> In this case, we should simply say that with the feature bit, the vnet
> header can be in the same element as the data but not allow the header
> to be spread across multiple elements.

I'd love to split struct virtio_net_hdr_mrg_rxbuf, so the num_buffers
ends up somewhere else.

The simplest rules are "never" or "always".

Cheers,
Rusty.
PS.  Inserting zero-length buffers is something I'd be prepared to rule
 out, my current patch does it just for yuks...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Anthony Liguori

Rusty Russell  writes:

> Anthony Liguori  writes:
>> Rusty Russell  writes:
>>
>>> "Michael S. Tsirkin"  writes:
>>>
 Thinking about Sasha's patches, we can reduce ring usage
 for virtio net small packets dramatically if we put
 virtio net header inline with the data.
 This can be done for free in case guest net stack allocated
 extra head room for the packet, and I don't see
 why would this have any downsides.
>>>
>>> I've been wanting to do this for the longest time... but...
>>>
 Even though with my recent patches qemu
 no longer requires header to be the first s/g element,
 we need a new feature bit to detect this.
 A trivial qemu patch will be sent separately.
>>>
>>> There's a reason I haven't done this.  I really, really dislike "my
>>> implemention isn't broken" feature bits.  We could have an infinite
>>> number of them, for each bug in each device.
>>
>> This is a bug in the specification.
>>
>> The QEMU implementation pre-dates the specification.  All of the actual
>> implementations of virtio relied on the semantics of s/g elements and
>> still do.
>
> lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
> ever going to be merged, so I'm not sure what its status is?  But I'm
> determined to fix qemu, and hence my torture patch to make sure this
> doesn't creep in again.

There are even more implementations out there and I'd wager they all
rely on framing.

>> What's in the specification really doesn't matter when it doesn't agree
>> with all of the existing implementations.
>>
>> Users use implementations, not specifications.  The specification really
>> ought to be changed here.
>
> I'm sorely tempted, except that we're losing a real optimization because
> of this :(

What optimizations?  What Michael is proposing is still achievable with
a device feature.  Are there other optimizations that can be achieved by
changing framing that we can't achieve with feature bits?

As I mentioned in another note, bad framing decisions can cause
performance issues too...

> The specification has long contained the footnote:
>
> The current qemu device implementations mistakenly insist that
> the first descriptor cover the header in these cases exactly, so
> a cautious driver should arrange it so.

I seem to recall this being a compromise between you and I..  I think
I objected strongly to this back when you first wrote the spec and you
added this to appease me ;-)

Regards,

Anthony Liguori

>
> I'd like to tie this caveat to the PCI capability change, so this note
> will move to the appendix with the old PCI layout.
>
> Cheers,
> Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

Anthony Liguori  writes:
> Rusty Russell  writes:
>
>> "Michael S. Tsirkin"  writes:
>>
>>> Thinking about Sasha's patches, we can reduce ring usage
>>> for virtio net small packets dramatically if we put
>>> virtio net header inline with the data.
>>> This can be done for free in case guest net stack allocated
>>> extra head room for the packet, and I don't see
>>> why would this have any downsides.
>>
>> I've been wanting to do this for the longest time... but...
>>
>>> Even though with my recent patches qemu
>>> no longer requires header to be the first s/g element,
>>> we need a new feature bit to detect this.
>>> A trivial qemu patch will be sent separately.
>>
>> There's a reason I haven't done this.  I really, really dislike "my
>> implemention isn't broken" feature bits.  We could have an infinite
>> number of them, for each bug in each device.
>
> This is a bug in the specification.
>
> The QEMU implementation pre-dates the specification.  All of the actual
> implementations of virtio relied on the semantics of s/g elements and
> still do.

lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
ever going to be merged, so I'm not sure what its status is?  But I'm
determined to fix qemu, and hence my torture patch to make sure this
doesn't creep in again.

> What's in the specification really doesn't matter when it doesn't agree
> with all of the existing implementations.
>
> Users use implementations, not specifications.  The specification really
> ought to be changed here.

I'm sorely tempted, except that we're losing a real optimization because
of this :(

The specification has long contained the footnote:

The current qemu device implementations mistakenly insist that
the first descriptor cover the header in these cases exactly, so
a cautious driver should arrange it so.

I'd like to tie this caveat to the PCI capability change, so this note
will move to the appendix with the old PCI layout.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Anthony Liguori

Rusty Russell  writes:

> "Michael S. Tsirkin"  writes:
>
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.
>
> So my plan was to tie this assumption to the new PCI layout.  And have a
> stress-testing patch like the one below in the kernel (see my virtio-wip
> branch for stuff like this).  Turn it on at boot with
> "virtio_ring.torture" on the kernel commandline.
>
> BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
> is too old.  Building the latest git now...
>
> Cheers,
> Rusty.
>
> Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE
>
> Virtio devices are not supposed to depend on the framing of the scatter-gather
> lists, but various implementations did.  Safeguard this in future by adding
> an option to deliberately create perverse descriptors.
>
> Signed-off-by: Rusty Russell 

Ignore framing is really a bad idea.  You want backends to enforce
reasonable framing because guest's shouldn't do silly things with framing.

For instance, with virtio-blk, if you want decent performance, you
absolutely want to avoid bouncing the data.  If you're using O_DIRECT in
the host to submit I/O requests, then it's critical that all of the s/g
elements are aligned to a sector boundary and sized to a sector
boundary.

Yes, QEMU can handle if that's not the case, but it would be insanely
stupid for a guest not to do this.  This is the sort of thing that ought
to be enforced in the specification because a guest cannot perform well
if it doesn't follow these rules.

A spec isn't terribly useful if the result is guest drivers that are
slow.  There's very little to gain by not enforcing rules around framing
and there's a lot to lose if a guest frames incorrectly.

In the rare case where we want to make a framing change, we should use
feature bits like Michael is proposing.

In this case, we should simply say that with the feature bit, the vnet
header can be in the same element as the data but not allow the header
to be spread across multiple elements.

Regards,

Anthony Liguori

>
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 8d5bddb..930a4ea 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -5,6 +5,15 @@ config VIRTIO
> bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
> CONFIG_RPMSG or CONFIG_S390_GUEST.
>  
> +config VIRTIO_DEVICE_TORTURE
> + bool "Virtio device torture tests"
> + depends on VIRTIO && DEBUG_KERNEL
> + help
> +   This makes the virtio_ring implementation creatively change
> +   the format of requests to make sure that devices are
> +   properly implemented.  This will make your virtual machine
> +   slow *and* unreliable!  Say N.
> +
>  menu "Virtio drivers"
>  
>  config VIRTIO_PCI
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index e639584..8893753 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -124,6 +124,149 @@ struct vring_virtqueue
>  
>  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>  
> +#ifdef CONFIG_VIRTIO_DEVICE_TORTURE
> +static bool torture;
> +module_param(torture, bool, 0644);
> +
> +struct torture {
> + unsigned int orig_out, orig_in;
> + void *orig_data;
> + struct scatterlist sg[4];
> + struct scatterlist orig_sg[];
> +};
> +
> +static size_t tot_len(struct scatterlist sg[], unsigned num)
> +{
> + size_t len, i;
> +
> + for (len = 0, i = 0; i < num; i++)
> + len += sg[i].length;
> +
> + return len;
> +}
> +
> +static void copy_sg_data(const struct scatterlist *dst, unsigned dnum,
> +  const struct scatterlist *src, unsigned snum)
> +{
> + unsigned len;
> + struct scatterlist s, d;
> +
> + s = *src;
> + d = *dst;
> +
> + while (snum && dnum) {
> + len = min(s.length, d.length);
> + memcpy(sg_virt(), sg_virt(), len);
> + d.offset += len;
> + d.length -= len;
> + s.offset += len;
> + s.length -= len;
> + if (!s.length) {
> + BUG_ON(snum == 0);
> + src++;
> + snum--;
> + s = *src;
> + }
> + if (!d.length) {
> + BUG_ON(dnum == 0);
> + dst++;
> + dnum--;
> + d = *dst;
> + }
> + }
> +}
> +
> +static bool torture_replace(struct scatterlist **sg,
> +  unsigned int *out,
> +  unsigned int *in,
> +  void **data,
> +  gfp_t gfp)
> +{
> + static size_t seed;
> + struct torture *t;
> + size_t outlen, inlen, ourseed, len1;
> + void *buf;
> +
> + if (!torture)
> + return

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Anthony Liguori

Rusty Russell  writes:

> "Michael S. Tsirkin"  writes:
>
>> Thinking about Sasha's patches, we can reduce ring usage
>> for virtio net small packets dramatically if we put
>> virtio net header inline with the data.
>> This can be done for free in case guest net stack allocated
>> extra head room for the packet, and I don't see
>> why would this have any downsides.
>
> I've been wanting to do this for the longest time... but...
>
>> Even though with my recent patches qemu
>> no longer requires header to be the first s/g element,
>> we need a new feature bit to detect this.
>> A trivial qemu patch will be sent separately.
>
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.

This is a bug in the specification.

The QEMU implementation pre-dates the specification.  All of the actual
implementations of virtio relied on the semantics of s/g elements and
still do.

What's in the specification really doesn't matter when it doesn't agree
with all of the existing implementations.

Users use implementations, not specifications.  The specification really
ought to be changed here.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

Paolo Bonzini  writes:

> Il 03/10/2012 08:44, Rusty Russell ha scritto:
>> There's a reason I haven't done this.  I really, really dislike "my
>> implemention isn't broken" feature bits.  We could have an infinite
>> number of them, for each bug in each device.
>
> However, this bug affects (almost) all implementations and (almost) all
> devices.  It even makes sense to reserve a transport feature bit for it
> instead of a device feature bit.
>
> Paolo

Perhaps, but we have to fix the bugs first!

As I said, my torture patch broke qemu immediately.  Since noone has
leapt onto fixing that, I'll take a look now...

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

Rusty Russell  writes:

> "Michael S. Tsirkin"  writes:
>
>> Thinking about Sasha's patches, we can reduce ring usage
>> for virtio net small packets dramatically if we put
>> virtio net header inline with the data.
>> This can be done for free in case guest net stack allocated
>> extra head room for the packet, and I don't see
>> why would this have any downsides.
>
> I've been wanting to do this for the longest time... but...
>
>> Even though with my recent patches qemu
>> no longer requires header to be the first s/g element,

Breaks for me; see why I hate bug features?  Now we'd need another
one...

qemu-system-i386: virtio: trying to map MMIO memory

Please try my patch.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Paolo Bonzini

Il 03/10/2012 08:44, Rusty Russell ha scritto:
> There's a reason I haven't done this.  I really, really dislike "my
> implemention isn't broken" feature bits.  We could have an infinite
> number of them, for each bug in each device.

However, this bug affects (almost) all implementations and (almost) all
devices.  It even makes sense to reserve a transport feature bit for it
instead of a device feature bit.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

"Michael S. Tsirkin"  writes:

> Thinking about Sasha's patches, we can reduce ring usage
> for virtio net small packets dramatically if we put
> virtio net header inline with the data.
> This can be done for free in case guest net stack allocated
> extra head room for the packet, and I don't see
> why would this have any downsides.

I've been wanting to do this for the longest time... but...

> Even though with my recent patches qemu
> no longer requires header to be the first s/g element,
> we need a new feature bit to detect this.
> A trivial qemu patch will be sent separately.

There's a reason I haven't done this.  I really, really dislike "my
implemention isn't broken" feature bits.  We could have an infinite
number of them, for each bug in each device.

So my plan was to tie this assumption to the new PCI layout.  And have a
stress-testing patch like the one below in the kernel (see my virtio-wip
branch for stuff like this).  Turn it on at boot with
"virtio_ring.torture" on the kernel commandline.

BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
is too old.  Building the latest git now...

Cheers,
Rusty.

Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE

Virtio devices are not supposed to depend on the framing of the scatter-gather
lists, but various implementations did.  Safeguard this in future by adding
an option to deliberately create perverse descriptors.

Signed-off-by: Rusty Russell 

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..930a4ea 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,15 @@ config VIRTIO
  bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
  CONFIG_RPMSG or CONFIG_S390_GUEST.
 
+config VIRTIO_DEVICE_TORTURE
+   bool "Virtio device torture tests"
+   depends on VIRTIO && DEBUG_KERNEL
+   help
+ This makes the virtio_ring implementation creatively change
+ the format of requests to make sure that devices are
+ properly implemented.  This will make your virtual machine
+ slow *and* unreliable!  Say N.
+
 menu "Virtio drivers"
 
 config VIRTIO_PCI
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e639584..8893753 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -124,6 +124,149 @@ struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+#ifdef CONFIG_VIRTIO_DEVICE_TORTURE
+static bool torture;
+module_param(torture, bool, 0644);
+
+struct torture {
+   unsigned int orig_out, orig_in;
+   void *orig_data;
+   struct scatterlist sg[4];
+   struct scatterlist orig_sg[];
+};
+
+static size_t tot_len(struct scatterlist sg[], unsigned num)
+{
+   size_t len, i;
+
+   for (len = 0, i = 0; i < num; i++)
+   len += sg[i].length;
+
+   return len;
+}
+
+static void copy_sg_data(const struct scatterlist *dst, unsigned dnum,
+const struct scatterlist *src, unsigned snum)
+{
+   unsigned len;
+   struct scatterlist s, d;
+
+   s = *src;
+   d = *dst;
+
+   while (snum && dnum) {
+   len = min(s.length, d.length);
+   memcpy(sg_virt(), sg_virt(), len);
+   d.offset += len;
+   d.length -= len;
+   s.offset += len;
+   s.length -= len;
+   if (!s.length) {
+   BUG_ON(snum == 0);
+   src++;
+   snum--;
+   s = *src;
+   }
+   if (!d.length) {
+   BUG_ON(dnum == 0);
+   dst++;
+   dnum--;
+   d = *dst;
+   }
+   }
+}
+
+static bool torture_replace(struct scatterlist **sg,
+unsigned int *out,
+unsigned int *in,
+void **data,
+gfp_t gfp)
+{
+   static size_t seed;
+   struct torture *t;
+   size_t outlen, inlen, ourseed, len1;
+   void *buf;
+
+   if (!torture)
+   return true;
+
+   outlen = tot_len(*sg, *out);
+   inlen = tot_len(*sg + *out, *in);
+
+   /* This will break horribly on large block requests. */
+   t = kmalloc(sizeof(*t) + (*out + *in) * sizeof(t->orig_sg[1])
+   + outlen + 1 + inlen + 1, gfp);
+   if (!t)
+   return false;
+
+   sg_init_table(t->sg, 4);
+   buf = >orig_sg[*out + *in];
+
+   memcpy(t->orig_sg, *sg, sizeof(**sg) * (*out + *in));
+   t->orig_out = *out;
+   t->orig_in = *in;
+   t->orig_data = *data;
+   *data = t;
+
+   ourseed = ACCESS_ONCE(seed);
+   seed++;
+
+   *sg = t->sg;
+   if (outlen) {
+   /* Split outbuf into two parts, one byte apart. */
+   *out = 2;
+   len1 = ourseed % (outlen

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

Michael S. Tsirkin m...@redhat.com writes:

 Thinking about Sasha's patches, we can reduce ring usage
 for virtio net small packets dramatically if we put
 virtio net header inline with the data.
 This can be done for free in case guest net stack allocated
 extra head room for the packet, and I don't see
 why would this have any downsides.

I've been wanting to do this for the longest time... but...

 Even though with my recent patches qemu
 no longer requires header to be the first s/g element,
 we need a new feature bit to detect this.
 A trivial qemu patch will be sent separately.

There's a reason I haven't done this.  I really, really dislike my
implemention isn't broken feature bits.  We could have an infinite
number of them, for each bug in each device.

So my plan was to tie this assumption to the new PCI layout.  And have a
stress-testing patch like the one below in the kernel (see my virtio-wip
branch for stuff like this).  Turn it on at boot with
virtio_ring.torture on the kernel commandline.

BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
is too old.  Building the latest git now...

Cheers,
Rusty.

Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE

Virtio devices are not supposed to depend on the framing of the scatter-gather
lists, but various implementations did.  Safeguard this in future by adding
an option to deliberately create perverse descriptors.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..930a4ea 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,15 @@ config VIRTIO
  bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
  CONFIG_RPMSG or CONFIG_S390_GUEST.
 
+config VIRTIO_DEVICE_TORTURE
+   bool Virtio device torture tests
+   depends on VIRTIO  DEBUG_KERNEL
+   help
+ This makes the virtio_ring implementation creatively change
+ the format of requests to make sure that devices are
+ properly implemented.  This will make your virtual machine
+ slow *and* unreliable!  Say N.
+
 menu Virtio drivers
 
 config VIRTIO_PCI
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e639584..8893753 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -124,6 +124,149 @@ struct vring_virtqueue
 
 #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
 
+#ifdef CONFIG_VIRTIO_DEVICE_TORTURE
+static bool torture;
+module_param(torture, bool, 0644);
+
+struct torture {
+   unsigned int orig_out, orig_in;
+   void *orig_data;
+   struct scatterlist sg[4];
+   struct scatterlist orig_sg[];
+};
+
+static size_t tot_len(struct scatterlist sg[], unsigned num)
+{
+   size_t len, i;
+
+   for (len = 0, i = 0; i  num; i++)
+   len += sg[i].length;
+
+   return len;
+}
+
+static void copy_sg_data(const struct scatterlist *dst, unsigned dnum,
+const struct scatterlist *src, unsigned snum)
+{
+   unsigned len;
+   struct scatterlist s, d;
+
+   s = *src;
+   d = *dst;
+
+   while (snum  dnum) {
+   len = min(s.length, d.length);
+   memcpy(sg_virt(d), sg_virt(s), len);
+   d.offset += len;
+   d.length -= len;
+   s.offset += len;
+   s.length -= len;
+   if (!s.length) {
+   BUG_ON(snum == 0);
+   src++;
+   snum--;
+   s = *src;
+   }
+   if (!d.length) {
+   BUG_ON(dnum == 0);
+   dst++;
+   dnum--;
+   d = *dst;
+   }
+   }
+}
+
+static bool torture_replace(struct scatterlist **sg,
+unsigned int *out,
+unsigned int *in,
+void **data,
+gfp_t gfp)
+{
+   static size_t seed;
+   struct torture *t;
+   size_t outlen, inlen, ourseed, len1;
+   void *buf;
+
+   if (!torture)
+   return true;
+
+   outlen = tot_len(*sg, *out);
+   inlen = tot_len(*sg + *out, *in);
+
+   /* This will break horribly on large block requests. */
+   t = kmalloc(sizeof(*t) + (*out + *in) * sizeof(t-orig_sg[1])
+   + outlen + 1 + inlen + 1, gfp);
+   if (!t)
+   return false;
+
+   sg_init_table(t-sg, 4);
+   buf = t-orig_sg[*out + *in];
+
+   memcpy(t-orig_sg, *sg, sizeof(**sg) * (*out + *in));
+   t-orig_out = *out;
+   t-orig_in = *in;
+   t-orig_data = *data;
+   *data = t;
+
+   ourseed = ACCESS_ONCE(seed);
+   seed++;
+
+   *sg = t-sg;
+   if (outlen) {
+   /* Split outbuf into two parts, one byte apart. */
+   *out = 2;
+   len1 = ourseed %

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Paolo Bonzini

Il 03/10/2012 08:44, Rusty Russell ha scritto:
 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

However, this bug affects (almost) all implementations and (almost) all
devices.  It even makes sense to reserve a transport feature bit for it
instead of a device feature bit.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:

 Thinking about Sasha's patches, we can reduce ring usage
 for virtio net small packets dramatically if we put
 virtio net header inline with the data.
 This can be done for free in case guest net stack allocated
 extra head room for the packet, and I don't see
 why would this have any downsides.

 I've been wanting to do this for the longest time... but...

 Even though with my recent patches qemu
 no longer requires header to be the first s/g element,

Breaks for me; see why I hate bug features?  Now we'd need another
one...

qemu-system-i386: virtio: trying to map MMIO memory

Please try my patch.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

Paolo Bonzini pbonz...@redhat.com writes:

 Il 03/10/2012 08:44, Rusty Russell ha scritto:
 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 However, this bug affects (almost) all implementations and (almost) all
 devices.  It even makes sense to reserve a transport feature bit for it
 instead of a device feature bit.

 Paolo

Perhaps, but we have to fix the bugs first!

As I said, my torture patch broke qemu immediately.  Since noone has
leapt onto fixing that, I'll take a look now...

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:

 Thinking about Sasha's patches, we can reduce ring usage
 for virtio net small packets dramatically if we put
 virtio net header inline with the data.
 This can be done for free in case guest net stack allocated
 extra head room for the packet, and I don't see
 why would this have any downsides.

 I've been wanting to do this for the longest time... but...

 Even though with my recent patches qemu
 no longer requires header to be the first s/g element,
 we need a new feature bit to detect this.
 A trivial qemu patch will be sent separately.

 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

This is a bug in the specification.

The QEMU implementation pre-dates the specification.  All of the actual
implementations of virtio relied on the semantics of s/g elements and
still do.

What's in the specification really doesn't matter when it doesn't agree
with all of the existing implementations.

Users use implementations, not specifications.  The specification really
ought to be changed here.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:

 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 So my plan was to tie this assumption to the new PCI layout.  And have a
 stress-testing patch like the one below in the kernel (see my virtio-wip
 branch for stuff like this).  Turn it on at boot with
 virtio_ring.torture on the kernel commandline.

 BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
 is too old.  Building the latest git now...

 Cheers,
 Rusty.

 Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE

 Virtio devices are not supposed to depend on the framing of the scatter-gather
 lists, but various implementations did.  Safeguard this in future by adding
 an option to deliberately create perverse descriptors.

 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

Ignore framing is really a bad idea.  You want backends to enforce
reasonable framing because guest's shouldn't do silly things with framing.

For instance, with virtio-blk, if you want decent performance, you
absolutely want to avoid bouncing the data.  If you're using O_DIRECT in
the host to submit I/O requests, then it's critical that all of the s/g
elements are aligned to a sector boundary and sized to a sector
boundary.

Yes, QEMU can handle if that's not the case, but it would be insanely
stupid for a guest not to do this.  This is the sort of thing that ought
to be enforced in the specification because a guest cannot perform well
if it doesn't follow these rules.

A spec isn't terribly useful if the result is guest drivers that are
slow.  There's very little to gain by not enforcing rules around framing
and there's a lot to lose if a guest frames incorrectly.

In the rare case where we want to make a framing change, we should use
feature bits like Michael is proposing.

In this case, we should simply say that with the feature bit, the vnet
header can be in the same element as the data but not allow the header
to be spread across multiple elements.

Regards,

Anthony Liguori


 diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
 index 8d5bddb..930a4ea 100644
 --- a/drivers/virtio/Kconfig
 +++ b/drivers/virtio/Kconfig
 @@ -5,6 +5,15 @@ config VIRTIO
 bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
 CONFIG_RPMSG or CONFIG_S390_GUEST.
  
 +config VIRTIO_DEVICE_TORTURE
 + bool Virtio device torture tests
 + depends on VIRTIO  DEBUG_KERNEL
 + help
 +   This makes the virtio_ring implementation creatively change
 +   the format of requests to make sure that devices are
 +   properly implemented.  This will make your virtual machine
 +   slow *and* unreliable!  Say N.
 +
  menu Virtio drivers
  
  config VIRTIO_PCI
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index e639584..8893753 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -124,6 +124,149 @@ struct vring_virtqueue
  
  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
  
 +#ifdef CONFIG_VIRTIO_DEVICE_TORTURE
 +static bool torture;
 +module_param(torture, bool, 0644);
 +
 +struct torture {
 + unsigned int orig_out, orig_in;
 + void *orig_data;
 + struct scatterlist sg[4];
 + struct scatterlist orig_sg[];
 +};
 +
 +static size_t tot_len(struct scatterlist sg[], unsigned num)
 +{
 + size_t len, i;
 +
 + for (len = 0, i = 0; i  num; i++)
 + len += sg[i].length;
 +
 + return len;
 +}
 +
 +static void copy_sg_data(const struct scatterlist *dst, unsigned dnum,
 +  const struct scatterlist *src, unsigned snum)
 +{
 + unsigned len;
 + struct scatterlist s, d;
 +
 + s = *src;
 + d = *dst;
 +
 + while (snum  dnum) {
 + len = min(s.length, d.length);
 + memcpy(sg_virt(d), sg_virt(s), len);
 + d.offset += len;
 + d.length -= len;
 + s.offset += len;
 + s.length -= len;
 + if (!s.length) {
 + BUG_ON(snum == 0);
 + src++;
 + snum--;
 + s = *src;
 + }
 + if (!d.length) {
 + BUG_ON(dnum == 0);
 + dst++;
 + dnum--;
 + d = *dst;
 + }
 + }
 +}
 +
 +static bool torture_replace(struct scatterlist **sg,
 +  unsigned int *out,
 +  unsigned int *in,
 +  void **data,
 +  gfp_t gfp)
 +{
 + static size_t seed;
 + struct torture *t;
 + size_t outlen, inlen, ourseed, len1;
 + void *buf;
 +
 + if (!torture)
 + return true;
 +
 + outlen = tot_len(*sg, *out);
 + inlen = tot_len(*sg +

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

Anthony Liguori anth...@codemonkey.ws writes:
 Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:

 Thinking about Sasha's patches, we can reduce ring usage
 for virtio net small packets dramatically if we put
 virtio net header inline with the data.
 This can be done for free in case guest net stack allocated
 extra head room for the packet, and I don't see
 why would this have any downsides.

 I've been wanting to do this for the longest time... but...

 Even though with my recent patches qemu
 no longer requires header to be the first s/g element,
 we need a new feature bit to detect this.
 A trivial qemu patch will be sent separately.

 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 This is a bug in the specification.

 The QEMU implementation pre-dates the specification.  All of the actual
 implementations of virtio relied on the semantics of s/g elements and
 still do.

lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
ever going to be merged, so I'm not sure what its status is?  But I'm
determined to fix qemu, and hence my torture patch to make sure this
doesn't creep in again.

 What's in the specification really doesn't matter when it doesn't agree
 with all of the existing implementations.

 Users use implementations, not specifications.  The specification really
 ought to be changed here.

I'm sorely tempted, except that we're losing a real optimization because
of this :(

The specification has long contained the footnote:

The current qemu device implementations mistakenly insist that
the first descriptor cover the header in these cases exactly, so
a cautious driver should arrange it so.

I'd like to tie this caveat to the PCI capability change, so this note
will move to the appendix with the old PCI layout.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori anth...@codemonkey.ws writes:
 Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:

 Thinking about Sasha's patches, we can reduce ring usage
 for virtio net small packets dramatically if we put
 virtio net header inline with the data.
 This can be done for free in case guest net stack allocated
 extra head room for the packet, and I don't see
 why would this have any downsides.

 I've been wanting to do this for the longest time... but...

 Even though with my recent patches qemu
 no longer requires header to be the first s/g element,
 we need a new feature bit to detect this.
 A trivial qemu patch will be sent separately.

 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 This is a bug in the specification.

 The QEMU implementation pre-dates the specification.  All of the actual
 implementations of virtio relied on the semantics of s/g elements and
 still do.

 lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
 ever going to be merged, so I'm not sure what its status is?  But I'm
 determined to fix qemu, and hence my torture patch to make sure this
 doesn't creep in again.

There are even more implementations out there and I'd wager they all
rely on framing.

 What's in the specification really doesn't matter when it doesn't agree
 with all of the existing implementations.

 Users use implementations, not specifications.  The specification really
 ought to be changed here.

 I'm sorely tempted, except that we're losing a real optimization because
 of this :(

What optimizations?  What Michael is proposing is still achievable with
a device feature.  Are there other optimizations that can be achieved by
changing framing that we can't achieve with feature bits?

As I mentioned in another note, bad framing decisions can cause
performance issues too...

 The specification has long contained the footnote:

 The current qemu device implementations mistakenly insist that
 the first descriptor cover the header in these cases exactly, so
 a cautious driver should arrange it so.

I seem to recall this being a compromise between you and I..  I think
I objected strongly to this back when you first wrote the spec and you
added this to appease me ;-)

Regards,

Anthony Liguori


 I'd like to tie this caveat to the PCI capability change, so this note
 will move to the appendix with the old PCI layout.

 Cheers,
 Rusty.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Rusty Russell

Anthony Liguori anth...@codemonkey.ws writes:

 Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:

 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 So my plan was to tie this assumption to the new PCI layout.  And have a
 stress-testing patch like the one below in the kernel (see my virtio-wip
 branch for stuff like this).  Turn it on at boot with
 virtio_ring.torture on the kernel commandline.

 BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
 is too old.  Building the latest git now...

 Cheers,
 Rusty.

 Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE

 Virtio devices are not supposed to depend on the framing of the 
 scatter-gather
 lists, but various implementations did.  Safeguard this in future by adding
 an option to deliberately create perverse descriptors.

 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

 Ignore framing is really a bad idea.  You want backends to enforce
 reasonable framing because guest's shouldn't do silly things with framing.

 For instance, with virtio-blk, if you want decent performance, you
 absolutely want to avoid bouncing the data.  If you're using O_DIRECT in
 the host to submit I/O requests, then it's critical that all of the s/g
 elements are aligned to a sector boundary and sized to a sector
 boundary.

 Yes, QEMU can handle if that's not the case, but it would be insanely
 stupid for a guest not to do this.  This is the sort of thing that ought
 to be enforced in the specification because a guest cannot perform well
 if it doesn't follow these rules.

Lack of imagination is what got us into trouble in the first place; when
presented with one counter-example, it's useful to look for others.

That's our job, not to dismiss them a insanely stupid.

For example:
1) Perhaps the guest isn't trying to perform well, it's trying to be a
   tiny bootloader?
2) Perhaps the guest is the direct consumer, and aligning buffers is
   redundant.

 A spec isn't terribly useful if the result is guest drivers that are
 slow.  There's very little to gain by not enforcing rules around framing
 and there's a lot to lose if a guest frames incorrectly.

The guest has the flexibility, and gets to decide.  The spec is not
forcing them to perform badly.

 In the rare case where we want to make a framing change, we should use
 feature bits like Michael is proposing.

 In this case, we should simply say that with the feature bit, the vnet
 header can be in the same element as the data but not allow the header
 to be spread across multiple elements.

I'd love to split struct virtio_net_hdr_mrg_rxbuf, so the num_buffers
ends up somewhere else.

The simplest rules are never or always.

Cheers,
Rusty.
PS.  Inserting zero-length buffers is something I'd be prepared to rule
 out, my current patch does it just for yuks...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] virtio-net: inline header support

2012-09-28 Thread Michael S. Tsirkin

Thinking about Sasha's patches, we can reduce ring usage
for virtio net small packets dramatically if we put
virtio net header inline with the data.
This can be done for free in case guest net stack allocated
extra head room for the packet, and I don't see
why would this have any downsides.

Even though with my recent patches qemu
no longer requires header to be the first s/g element,
we need a new feature bit to detect this.
A trivial qemu patch will be sent separately.

We could get rid of an extra s/g for big packets too,
but since in practice everyone enables mergeable buffers,
I don't see much of a point.

Rusty, if you decide to pick this up I'll send a
(rather trivial) spec patch shortly afterwards, but holidays
are beginning here. Considering how simple
the guest patch is, I hope it can make it in 3.7?

Also note that patch 1 and 2 are IMO a good
idea without patch 3. If you decide to defer patch 3
pls consider 1/2 separately.

Before:
[root@virtlab203 qemu]# ssh robin ./netperf/bin/netperf -t TCP_RR -H
11.0.0.4
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   10.002992.88   
16384  87380 

After:
[root@virtlab203 qemu]# ssh robin ./netperf/bin/netperf -t TCP_RR -H
11.0.0.4
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   10.003195.57   
16384  87380 

Michael S. Tsirkin (3):
  virtio: add API to query ring capacity
  virtio-net: correct capacity math on ring full
  virtio-net: put virtio net header inline with data

 drivers/net/virtio_net.c | 57 +++-
 drivers/virtio/virtio_ring.c | 19 +++
 include/linux/virtio.h   |  2 ++
 include/linux/virtio_net.h   |  5 +++-
 4 files changed, 66 insertions(+), 17 deletions(-)

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] virtio-net: inline header support

2012-09-28 Thread Michael S. Tsirkin

Thinking about Sasha's patches, we can reduce ring usage
for virtio net small packets dramatically if we put
virtio net header inline with the data.
This can be done for free in case guest net stack allocated
extra head room for the packet, and I don't see
why would this have any downsides.

Even though with my recent patches qemu
no longer requires header to be the first s/g element,
we need a new feature bit to detect this.
A trivial qemu patch will be sent separately.

We could get rid of an extra s/g for big packets too,
but since in practice everyone enables mergeable buffers,
I don't see much of a point.

Rusty, if you decide to pick this up I'll send a
(rather trivial) spec patch shortly afterwards, but holidays
are beginning here. Considering how simple
the guest patch is, I hope it can make it in 3.7?

Also note that patch 1 and 2 are IMO a good
idea without patch 3. If you decide to defer patch 3
pls consider 1/2 separately.

Before:
[root@virtlab203 qemu]# ssh robin ./netperf/bin/netperf -t TCP_RR -H
11.0.0.4
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   10.002992.88   
16384  87380 

After:
[root@virtlab203 qemu]# ssh robin ./netperf/bin/netperf -t TCP_RR -H
11.0.0.4
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   10.003195.57   
16384  87380 

Michael S. Tsirkin (3):
  virtio: add API to query ring capacity
  virtio-net: correct capacity math on ring full
  virtio-net: put virtio net header inline with data

 drivers/net/virtio_net.c | 57 +++-
 drivers/virtio/virtio_ring.c | 19 +++
 include/linux/virtio.h   |  2 ++
 include/linux/virtio_net.h   |  5 +++-
 4 files changed, 66 insertions(+), 17 deletions(-)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

52 matches

Mail list logo