Re: [PATCH] virtio config_ops refactoring

2007-11-10 Thread Anthony Liguori

Rusty Russell wrote:

On Saturday 10 November 2007 10:45:38 Anthony Liguori wrote:
  

The problem is the ABI.  We can either require that PCI configuration
values are accessed with natural instructions, or it makes very little
sense to use the PCI configuration space for virtio configuration
information.



To me it seems logical and simplest to allow any accesses, lay out your 
structure and emulate it in the obvious way.
  


Okay, I've got some updates that I'm going to send out now to the PCI 
virtio driver but I'll also change it to switch over to a memory 
layout.  It's not the best PCI ABI but I can certainly live with it.


You can put the configuration information somewhere else, but the PCI config 
space seems the logical place.
  


If we're treating the PCI config space as an opaque memory blob, instead 
of as distinct registers, I think it makes more sense to just put it in 
memory.  In the backend, I have to treat it as a memory blob anyway and 
using the PCI config space just means that I have to write all the 
various PIO handlers for the different access sizes.  It's much more 
elegant in my mind just to have the driver provide some memory that the 
host fills out.


Thanks for all the review,

Anthony Liguori


Either virtio config looks like a shared memory area (as lguest
currently implements it), or it looks like hardware registers (like
virtio-pci implements it).  After thinking about it for a while, I don't
think the two can be reconciled.  There are subtle differences between
the two that can't be hidden in the virtio interface.  For instance, in
the PCI model, you get notified when values are read/written whereas in
the lguest model, you don't and need explicit status bits.



No.  You need those status bits even if you have your register model, 
otherwise you can't tell when the configuration as a whole is stable.  
Consider the feature bits.  Worse, consider extending the feature bits beyond 
32 bits.


(We will have to deal with dynamic configuration changes in future; I was 
planning on using some status bits.  But it's pretty clear that it's going to 
require an explicit ack of some form.)


Hope that clarifies,
Rusty.

  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-10 Thread Anthony Liguori

Rusty Russell wrote:

On Saturday 10 November 2007 10:45:38 Anthony Liguori wrote:
  

The problem is the ABI.  We can either require that PCI configuration
values are accessed with natural instructions, or it makes very little
sense to use the PCI configuration space for virtio configuration
information.



To me it seems logical and simplest to allow any accesses, lay out your 
structure and emulate it in the obvious way.
  


Okay, I've got some updates that I'm going to send out now to the PCI 
virtio driver but I'll also change it to switch over to a memory 
layout.  It's not the best PCI ABI but I can certainly live with it.


You can put the configuration information somewhere else, but the PCI config 
space seems the logical place.
  


If we're treating the PCI config space as an opaque memory blob, instead 
of as distinct registers, I think it makes more sense to just put it in 
memory.  In the backend, I have to treat it as a memory blob anyway and 
using the PCI config space just means that I have to write all the 
various PIO handlers for the different access sizes.  It's much more 
elegant in my mind just to have the driver provide some memory that the 
host fills out.


Thanks for all the review,

Anthony Liguori


Either virtio config looks like a shared memory area (as lguest
currently implements it), or it looks like hardware registers (like
virtio-pci implements it).  After thinking about it for a while, I don't
think the two can be reconciled.  There are subtle differences between
the two that can't be hidden in the virtio interface.  For instance, in
the PCI model, you get notified when values are read/written whereas in
the lguest model, you don't and need explicit status bits.



No.  You need those status bits even if you have your register model, 
otherwise you can't tell when the configuration as a whole is stable.  
Consider the feature bits.  Worse, consider extending the feature bits beyond 
32 bits.


(We will have to deal with dynamic configuration changes in future; I was 
planning on using some status bits.  But it's pretty clear that it's going to 
require an explicit ack of some form.)


Hope that clarifies,
Rusty.

  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-09 Thread Rusty Russell
On Saturday 10 November 2007 10:45:38 Anthony Liguori wrote:
> The problem is the ABI.  We can either require that PCI configuration
> values are accessed with natural instructions, or it makes very little
> sense to use the PCI configuration space for virtio configuration
> information.

To me it seems logical and simplest to allow any accesses, lay out your 
structure and emulate it in the obvious way.

You can put the configuration information somewhere else, but the PCI config 
space seems the logical place.

> Either virtio config looks like a shared memory area (as lguest
> currently implements it), or it looks like hardware registers (like
> virtio-pci implements it).  After thinking about it for a while, I don't
> think the two can be reconciled.  There are subtle differences between
> the two that can't be hidden in the virtio interface.  For instance, in
> the PCI model, you get notified when values are read/written whereas in
> the lguest model, you don't and need explicit status bits.

No.  You need those status bits even if you have your register model, 
otherwise you can't tell when the configuration as a whole is stable.  
Consider the feature bits.  Worse, consider extending the feature bits beyond 
32 bits.

(We will have to deal with dynamic configuration changes in future; I was 
planning on using some status bits.  But it's pretty clear that it's going to 
require an explicit ack of some form.)

Hope that clarifies,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-09 Thread Anthony Liguori

Rusty Russell wrote:

On Friday 09 November 2007 09:33:04 Anthony Liguori wrote:
  

switch (addr) {
case VIRTIO_BLK_CONFIG_MAX_SEG:
   return vdev->max_seg & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 1:
   return (vdev->max_seg >> 8) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 2:
   return (vdev->max_seg >> 16) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 3:
   return (vdev->max_seg >> 24) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE:
   return vdev->max_size & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 1:
   return (vdev->max_size >> 8) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 2:
   return (vdev->max_size >> 16) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 3:
   return (vdev->max_size >> 24) & 0xFF;
...



struct virtio_blk_config
{
uint32_t max_seg, max_size;
};

...
struct virtio_blk_config conf = { vdev->max_seg, vdev->max_size };

return ((unsigned char *))[addr];

(Which strongly implies our headers should expose that nominal struct, rather 
than numerical constants).
  


The problem is the ABI.  We can either require that PCI configuration 
values are accessed with natural instructions, or it makes very little 
sense to use the PCI configuration space for virtio configuration 
information.  If we really can't find a way to do this (and I think my 
current implementation is the best compromise since it hides this from 
everything else), then I think I'll switch over to just writing a PFN 
into a PCI configuration slot and then have that page store the virtio 
configuration information (much like is done with lguest).


Either virtio config looks like a shared memory area (as lguest 
currently implements it), or it looks like hardware registers (like 
virtio-pci implements it).  After thinking about it for a while, I don't 
think the two can be reconciled.  There are subtle differences between 
the two that can't be hidden in the virtio interface.  For instance, in 
the PCI model, you get notified when values are read/written whereas in 
the lguest model, you don't and need explicit status bits.


If you're very against the switch() magic, then I'll switch over to just 
using a shared memory area.


Regards,

Anthony Liguori


Rusty.

  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-09 Thread Rusty Russell
On Friday 09 November 2007 09:33:04 Anthony Liguori wrote:
> I really want to make sure that if a guest tries
> to read a 4-byte PCI config field, that it does so using an "outl"
> instruction so that in my QEMU backend

So you want to enforce PCI requirements onto virtio config accesses.  This 
doesn't seem very nice: the fact that PCI accesses use different namespaces 
for different sizes makes sense from a primitive hardware point of view, but 
sucks for software.  Fortunately, if you insist on byte-at-a-time they're the 
same.

> switch (addr) {
> case VIRTIO_BLK_CONFIG_MAX_SEG:
>return vdev->max_seg & 0xFF;
> case VIRTIO_BLK_CONFIG_MAX_SEG + 1:
>return (vdev->max_seg >> 8) & 0xFF;
> case VIRTIO_BLK_CONFIG_MAX_SEG + 2:
>return (vdev->max_seg >> 16) & 0xFF;
> case VIRTIO_BLK_CONFIG_MAX_SEG + 3:
>return (vdev->max_seg >> 24) & 0xFF;
> case VIRTIO_BLK_CONFIG_MAX_SIZE:
>return vdev->max_size & 0xFF;
> case VIRTIO_BLK_CONFIG_MAX_SIZE + 1:
>return (vdev->max_size >> 8) & 0xFF;
> case VIRTIO_BLK_CONFIG_MAX_SIZE + 2:
>return (vdev->max_size >> 16) & 0xFF;
> case VIRTIO_BLK_CONFIG_MAX_SIZE + 3:
>return (vdev->max_size >> 24) & 0xFF;
> ...

struct virtio_blk_config
{
uint32_t max_seg, max_size;
};

...
struct virtio_blk_config conf = { vdev->max_seg, vdev->max_size };

return ((unsigned char *))[addr];

(Which strongly implies our headers should expose that nominal struct, rather 
than numerical constants).

Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-09 Thread Rusty Russell
On Saturday 10 November 2007 10:45:38 Anthony Liguori wrote:
 The problem is the ABI.  We can either require that PCI configuration
 values are accessed with natural instructions, or it makes very little
 sense to use the PCI configuration space for virtio configuration
 information.

To me it seems logical and simplest to allow any accesses, lay out your 
structure and emulate it in the obvious way.

You can put the configuration information somewhere else, but the PCI config 
space seems the logical place.

 Either virtio config looks like a shared memory area (as lguest
 currently implements it), or it looks like hardware registers (like
 virtio-pci implements it).  After thinking about it for a while, I don't
 think the two can be reconciled.  There are subtle differences between
 the two that can't be hidden in the virtio interface.  For instance, in
 the PCI model, you get notified when values are read/written whereas in
 the lguest model, you don't and need explicit status bits.

No.  You need those status bits even if you have your register model, 
otherwise you can't tell when the configuration as a whole is stable.  
Consider the feature bits.  Worse, consider extending the feature bits beyond 
32 bits.

(We will have to deal with dynamic configuration changes in future; I was 
planning on using some status bits.  But it's pretty clear that it's going to 
require an explicit ack of some form.)

Hope that clarifies,
Rusty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-09 Thread Rusty Russell
On Friday 09 November 2007 09:33:04 Anthony Liguori wrote:
 I really want to make sure that if a guest tries
 to read a 4-byte PCI config field, that it does so using an outl
 instruction so that in my QEMU backend

So you want to enforce PCI requirements onto virtio config accesses.  This 
doesn't seem very nice: the fact that PCI accesses use different namespaces 
for different sizes makes sense from a primitive hardware point of view, but 
sucks for software.  Fortunately, if you insist on byte-at-a-time they're the 
same.

 switch (addr) {
 case VIRTIO_BLK_CONFIG_MAX_SEG:
return vdev-max_seg  0xFF;
 case VIRTIO_BLK_CONFIG_MAX_SEG + 1:
return (vdev-max_seg  8)  0xFF;
 case VIRTIO_BLK_CONFIG_MAX_SEG + 2:
return (vdev-max_seg  16)  0xFF;
 case VIRTIO_BLK_CONFIG_MAX_SEG + 3:
return (vdev-max_seg  24)  0xFF;
 case VIRTIO_BLK_CONFIG_MAX_SIZE:
return vdev-max_size  0xFF;
 case VIRTIO_BLK_CONFIG_MAX_SIZE + 1:
return (vdev-max_size  8)  0xFF;
 case VIRTIO_BLK_CONFIG_MAX_SIZE + 2:
return (vdev-max_size  16)  0xFF;
 case VIRTIO_BLK_CONFIG_MAX_SIZE + 3:
return (vdev-max_size  24)  0xFF;
 ...

struct virtio_blk_config
{
uint32_t max_seg, max_size;
};

...
struct virtio_blk_config conf = { vdev-max_seg, vdev-max_size };

return ((unsigned char *)conf)[addr];

(Which strongly implies our headers should expose that nominal struct, rather 
than numerical constants).

Rusty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-09 Thread Anthony Liguori

Rusty Russell wrote:

On Friday 09 November 2007 09:33:04 Anthony Liguori wrote:
  

switch (addr) {
case VIRTIO_BLK_CONFIG_MAX_SEG:
   return vdev-max_seg  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 1:
   return (vdev-max_seg  8)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 2:
   return (vdev-max_seg  16)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 3:
   return (vdev-max_seg  24)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE:
   return vdev-max_size  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 1:
   return (vdev-max_size  8)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 2:
   return (vdev-max_size  16)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 3:
   return (vdev-max_size  24)  0xFF;
...



struct virtio_blk_config
{
uint32_t max_seg, max_size;
};

...
struct virtio_blk_config conf = { vdev-max_seg, vdev-max_size };

return ((unsigned char *)conf)[addr];

(Which strongly implies our headers should expose that nominal struct, rather 
than numerical constants).
  


The problem is the ABI.  We can either require that PCI configuration 
values are accessed with natural instructions, or it makes very little 
sense to use the PCI configuration space for virtio configuration 
information.  If we really can't find a way to do this (and I think my 
current implementation is the best compromise since it hides this from 
everything else), then I think I'll switch over to just writing a PFN 
into a PCI configuration slot and then have that page store the virtio 
configuration information (much like is done with lguest).


Either virtio config looks like a shared memory area (as lguest 
currently implements it), or it looks like hardware registers (like 
virtio-pci implements it).  After thinking about it for a while, I don't 
think the two can be reconciled.  There are subtle differences between 
the two that can't be hidden in the virtio interface.  For instance, in 
the PCI model, you get notified when values are read/written whereas in 
the lguest model, you don't and need explicit status bits.


If you're very against the switch() magic, then I'll switch over to just 
using a shared memory area.


Regards,

Anthony Liguori


Rusty.

  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lguest] [PATCH] virtio config_ops refactoring

2007-11-08 Thread Anthony Liguori

Dor Laor wrote:

ron minnich wrote:


Hi, I'm sorry, I've been stuck on other things (NFS RDMA anyone?) and
missed part of this discussion.

Is it really the case that operations on virtio devices will involve
outl/inl etc.?


What's the problem with them?
Except for the kick event it's not performance critical and the 
difference between pio vmexit

and hypercall exit is very small.


If you're a nutty guy who's interesting in creating the most absolute 
minimal VMM to run exotic paravirtual guests on massive clusters, then 
requiring PIO implies that you have to have an instruction decoder which 
is goes against the earlier goal ;-)


Regards,

Anthony Liguori


I don't know about problems in other architectures, maybe mmio is better?
Dor.



Apologies in advance for my failure to pay attention.

thanks

ron
___
Lguest mailing list
[EMAIL PROTECTED]
https://ozlabs.org/mailman/listinfo/lguest





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lguest] [PATCH] virtio config_ops refactoring

2007-11-08 Thread Anthony Liguori

ron minnich wrote:

Hi, I'm sorry, I've been stuck on other things (NFS RDMA anyone?) and
missed part of this discussion.

Is it really the case that operations on virtio devices will involve
outl/inl etc.?
  


No, this is just for the PCI virtio transport.  lguest's virtio 
transport uses hypercalls and shared memory.


Regards,

Anthony Liguori


Apologies in advance for my failure to pay attention.

thanks

ron
  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lguest] [PATCH] virtio config_ops refactoring

2007-11-08 Thread ron minnich
Hi, I'm sorry, I've been stuck on other things (NFS RDMA anyone?) and
missed part of this discussion.

Is it really the case that operations on virtio devices will involve
outl/inl etc.?

Apologies in advance for my failure to pay attention.

thanks

ron
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-08 Thread Anthony Liguori

Rusty Russell wrote:

On Thursday 08 November 2007 13:41:16 Anthony Liguori wrote:
  

Rusty Russell wrote:


On Thursday 08 November 2007 04:30:50 Anthony Liguori wrote:
  

I would prefer that the virtio API not expose a little endian standard.
I'm currently converting config->get() ops to ioreadXX depending on the
size which already does the endianness conversion for me so this just
messes things up.  I think it's better to let the backend deal with
endianness since it's trivial to handle for both the PCI backend and the
lguest backend (lguest doesn't need to do any endianness conversion).


-ETOOMUCHMAGIC.  We should either expose all the XX interfaces (but this
isn't a high-speed interface, so let's not) or not "sometimes" convert
endianness. Getting surprises because a field happens to be packed into 4
bytes is counter-intuitive.
  

Then I think it's necessary to expose the XX interfaces.  Otherwise, the
backend has to deal with doing all register operations at a per-byte
granularity which adds a whole lot of complexity on a per-device basis
(as opposed to a little complexity once in the transport layer).



Huh?  Take a look at the drivers, this simply isn't true.  Do you have 
evidence that it will be true later?
  


I'm a bit confused.  So right now, the higher level virtio functions do 
endianness conversion.  I really want to make sure that if a guest tries 
to read a 4-byte PCI config field, that it does so using an "outl" 
instruction so that in my QEMU backend, I don't have to deal with a 
guest reading/writing a single byte within a 4-byte configuration 
field.  It's the difference between having in the PIO handler:


switch (addr) {
case VIRTIO_BLK_CONFIG_MAX_SEG:
   return vdev->max_seg;
case VIRTIO_BLK_CONFIG_MAX_SIZE:
   return vdev->max_size;

}

and:

switch (addr) {
case VIRTIO_BLK_CONFIG_MAX_SEG:
  return vdev->max_seg & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 1:
  return (vdev->max_seg >> 8) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 2:
  return (vdev->max_seg >> 16) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 3:
  return (vdev->max_seg >> 24) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE:
  return vdev->max_size & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 1:
  return (vdev->max_size >> 8) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 2:
  return (vdev->max_size >> 16) & 0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 3:
  return (vdev->max_size >> 24) & 0xFF;
...
}


It's the host-side code I'm concerned about, not the guest-side code.  
I'm happy to just ignore the whole endianness conversion thing and 
always pass values through in the CPU bitness but it's very important to 
me that the PCI config registers are accessed with their natural sized 
instructions (just as they would with a real PCI device).


Regards,

Anthony Liguori

Plus your code will be smaller doing a single writeb/readb loop than trying to 
do a switch statement.


  

You really want to be able to rely on multi-byte atomic operations too
when setting values.  Otherwise, you need another register to just to
signal when it's okay for the device to examine any given register.



You already do; the status register fills this role.  For example, you can't 
tell what features a guest understands until it updates the status register.


Hope that clarifies,
Rusty.

  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-08 Thread Rusty Russell
On Thursday 08 November 2007 13:41:16 Anthony Liguori wrote:
> Rusty Russell wrote:
> > On Thursday 08 November 2007 04:30:50 Anthony Liguori wrote:
> >> I would prefer that the virtio API not expose a little endian standard.
> >> I'm currently converting config->get() ops to ioreadXX depending on the
> >> size which already does the endianness conversion for me so this just
> >> messes things up.  I think it's better to let the backend deal with
> >> endianness since it's trivial to handle for both the PCI backend and the
> >> lguest backend (lguest doesn't need to do any endianness conversion).
> >
> > -ETOOMUCHMAGIC.  We should either expose all the XX interfaces (but this
> > isn't a high-speed interface, so let's not) or not "sometimes" convert
> > endianness. Getting surprises because a field happens to be packed into 4
> > bytes is counter-intuitive.
>
> Then I think it's necessary to expose the XX interfaces.  Otherwise, the
> backend has to deal with doing all register operations at a per-byte
> granularity which adds a whole lot of complexity on a per-device basis
> (as opposed to a little complexity once in the transport layer).

Huh?  Take a look at the drivers, this simply isn't true.  Do you have 
evidence that it will be true later?

Plus your code will be smaller doing a single writeb/readb loop than trying to 
do a switch statement.

> You really want to be able to rely on multi-byte atomic operations too
> when setting values.  Otherwise, you need another register to just to
> signal when it's okay for the device to examine any given register.

You already do; the status register fills this role.  For example, you can't 
tell what features a guest understands until it updates the status register.

Hope that clarifies,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lguest] [PATCH] virtio config_ops refactoring

2007-11-08 Thread Anthony Liguori

ron minnich wrote:

Hi, I'm sorry, I've been stuck on other things (NFS RDMA anyone?) and
missed part of this discussion.

Is it really the case that operations on virtio devices will involve
outl/inl etc.?
  


No, this is just for the PCI virtio transport.  lguest's virtio 
transport uses hypercalls and shared memory.


Regards,

Anthony Liguori


Apologies in advance for my failure to pay attention.

thanks

ron
  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lguest] [PATCH] virtio config_ops refactoring

2007-11-08 Thread ron minnich
Hi, I'm sorry, I've been stuck on other things (NFS RDMA anyone?) and
missed part of this discussion.

Is it really the case that operations on virtio devices will involve
outl/inl etc.?

Apologies in advance for my failure to pay attention.

thanks

ron
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-08 Thread Rusty Russell
On Thursday 08 November 2007 13:41:16 Anthony Liguori wrote:
 Rusty Russell wrote:
  On Thursday 08 November 2007 04:30:50 Anthony Liguori wrote:
  I would prefer that the virtio API not expose a little endian standard.
  I'm currently converting config-get() ops to ioreadXX depending on the
  size which already does the endianness conversion for me so this just
  messes things up.  I think it's better to let the backend deal with
  endianness since it's trivial to handle for both the PCI backend and the
  lguest backend (lguest doesn't need to do any endianness conversion).
 
  -ETOOMUCHMAGIC.  We should either expose all the XX interfaces (but this
  isn't a high-speed interface, so let's not) or not sometimes convert
  endianness. Getting surprises because a field happens to be packed into 4
  bytes is counter-intuitive.

 Then I think it's necessary to expose the XX interfaces.  Otherwise, the
 backend has to deal with doing all register operations at a per-byte
 granularity which adds a whole lot of complexity on a per-device basis
 (as opposed to a little complexity once in the transport layer).

Huh?  Take a look at the drivers, this simply isn't true.  Do you have 
evidence that it will be true later?

Plus your code will be smaller doing a single writeb/readb loop than trying to 
do a switch statement.

 You really want to be able to rely on multi-byte atomic operations too
 when setting values.  Otherwise, you need another register to just to
 signal when it's okay for the device to examine any given register.

You already do; the status register fills this role.  For example, you can't 
tell what features a guest understands until it updates the status register.

Hope that clarifies,
Rusty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-08 Thread Anthony Liguori

Rusty Russell wrote:

On Thursday 08 November 2007 13:41:16 Anthony Liguori wrote:
  

Rusty Russell wrote:


On Thursday 08 November 2007 04:30:50 Anthony Liguori wrote:
  

I would prefer that the virtio API not expose a little endian standard.
I'm currently converting config-get() ops to ioreadXX depending on the
size which already does the endianness conversion for me so this just
messes things up.  I think it's better to let the backend deal with
endianness since it's trivial to handle for both the PCI backend and the
lguest backend (lguest doesn't need to do any endianness conversion).


-ETOOMUCHMAGIC.  We should either expose all the XX interfaces (but this
isn't a high-speed interface, so let's not) or not sometimes convert
endianness. Getting surprises because a field happens to be packed into 4
bytes is counter-intuitive.
  

Then I think it's necessary to expose the XX interfaces.  Otherwise, the
backend has to deal with doing all register operations at a per-byte
granularity which adds a whole lot of complexity on a per-device basis
(as opposed to a little complexity once in the transport layer).



Huh?  Take a look at the drivers, this simply isn't true.  Do you have 
evidence that it will be true later?
  


I'm a bit confused.  So right now, the higher level virtio functions do 
endianness conversion.  I really want to make sure that if a guest tries 
to read a 4-byte PCI config field, that it does so using an outl 
instruction so that in my QEMU backend, I don't have to deal with a 
guest reading/writing a single byte within a 4-byte configuration 
field.  It's the difference between having in the PIO handler:


switch (addr) {
case VIRTIO_BLK_CONFIG_MAX_SEG:
   return vdev-max_seg;
case VIRTIO_BLK_CONFIG_MAX_SIZE:
   return vdev-max_size;

}

and:

switch (addr) {
case VIRTIO_BLK_CONFIG_MAX_SEG:
  return vdev-max_seg  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 1:
  return (vdev-max_seg  8)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 2:
  return (vdev-max_seg  16)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SEG + 3:
  return (vdev-max_seg  24)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE:
  return vdev-max_size  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 1:
  return (vdev-max_size  8)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 2:
  return (vdev-max_size  16)  0xFF;
case VIRTIO_BLK_CONFIG_MAX_SIZE + 3:
  return (vdev-max_size  24)  0xFF;
...
}


It's the host-side code I'm concerned about, not the guest-side code.  
I'm happy to just ignore the whole endianness conversion thing and 
always pass values through in the CPU bitness but it's very important to 
me that the PCI config registers are accessed with their natural sized 
instructions (just as they would with a real PCI device).


Regards,

Anthony Liguori

Plus your code will be smaller doing a single writeb/readb loop than trying to 
do a switch statement.


  

You really want to be able to rely on multi-byte atomic operations too
when setting values.  Otherwise, you need another register to just to
signal when it's okay for the device to examine any given register.



You already do; the status register fills this role.  For example, you can't 
tell what features a guest understands until it updates the status register.


Hope that clarifies,
Rusty.

  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Lguest] [PATCH] virtio config_ops refactoring

2007-11-08 Thread Anthony Liguori

Dor Laor wrote:

ron minnich wrote:


Hi, I'm sorry, I've been stuck on other things (NFS RDMA anyone?) and
missed part of this discussion.

Is it really the case that operations on virtio devices will involve
outl/inl etc.?


What's the problem with them?
Except for the kick event it's not performance critical and the 
difference between pio vmexit

and hypercall exit is very small.


If you're a nutty guy who's interesting in creating the most absolute 
minimal VMM to run exotic paravirtual guests on massive clusters, then 
requiring PIO implies that you have to have an instruction decoder which 
is goes against the earlier goal ;-)


Regards,

Anthony Liguori


I don't know about problems in other architectures, maybe mmio is better?
Dor.



Apologies in advance for my failure to pay attention.

thanks

ron
___
Lguest mailing list
[EMAIL PROTECTED]
https://ozlabs.org/mailman/listinfo/lguest





-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-07 Thread Anthony Liguori

Rusty Russell wrote:

On Thursday 08 November 2007 04:30:50 Anthony Liguori wrote:

I would prefer that the virtio API not expose a little endian standard.
I'm currently converting config->get() ops to ioreadXX depending on the
size which already does the endianness conversion for me so this just
messes things up.  I think it's better to let the backend deal with
endianness since it's trivial to handle for both the PCI backend and the
lguest backend (lguest doesn't need to do any endianness conversion).


-ETOOMUCHMAGIC.  We should either expose all the XX interfaces (but this isn't 
a high-speed interface, so let's not) or not "sometimes" convert endianness.  
Getting surprises because a field happens to be packed into 4 bytes is 
counter-intuitive.


Then I think it's necessary to expose the XX interfaces.  Otherwise, the 
backend has to deal with doing all register operations at a per-byte 
granularity which adds a whole lot of complexity on a per-device basis 
(as opposed to a little complexity once in the transport layer).


You really want to be able to rely on multi-byte atomic operations too 
when setting values.  Otherwise, you need another register to just to 
signal when it's okay for the device to examine any given register.


Regards,

Anthony Liguori

Since your most trivial implementation is to do a byte at a time, I don't 
think you have a good argument on that basis either.


Cheers,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-07 Thread Rusty Russell
On Thursday 08 November 2007 04:30:50 Anthony Liguori wrote:
> I would prefer that the virtio API not expose a little endian standard.
> I'm currently converting config->get() ops to ioreadXX depending on the
> size which already does the endianness conversion for me so this just
> messes things up.  I think it's better to let the backend deal with
> endianness since it's trivial to handle for both the PCI backend and the
> lguest backend (lguest doesn't need to do any endianness conversion).

-ETOOMUCHMAGIC.  We should either expose all the XX interfaces (but this isn't 
a high-speed interface, so let's not) or not "sometimes" convert endianness.  
Getting surprises because a field happens to be packed into 4 bytes is 
counter-intuitive.

Since your most trivial implementation is to do a byte at a time, I don't 
think you have a good argument on that basis either.

Cheers,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-07 Thread Anthony Liguori

Rusty Russell wrote:

After discussion with Anthony, the virtio config has been simplified.  We
lose some minor features (the virtio_net address must now be 6 bytes) but
it turns out to be a wash in terms of complexity, while simplifying PCI.
  


Hi Rusty,

Thanks for posting this!  It's really simplified things for me.


-
 /**
- * virtio_use_bit - helper to use a feature bit in a bitfield value.
- * @dev: the virtio device
- * @token: the token as returned from vdev->config->find().
- * @len: the length of the field.
- * @bitnum: the bit to test.
+ * __virtio_config_val - get a single virtio config without feature check.
+ * @vdev: the virtio device
+ * @offset: the type to search for.
+ * @val: a pointer to the value to fill in.
  *
- * If handed a NULL token, it returns false, otherwise returns bit status.
- * If it's one, it sets the mirroring acknowledgement bit. */
-int virtio_use_bit(struct virtio_device *vdev,
-  void *token, unsigned int len, unsigned int bitnum);
+ * The value is endian-corrected and returned in v. */
+#define __virtio_config_val(vdev, offset, v) do {  \
+   BUILD_BUG_ON(sizeof(*(v)) != 1 && sizeof(*(v)) != 2 \
+&& sizeof(*(v)) != 4 && sizeof(*(v)) != 8);\
+   (vdev)->config->get((vdev), (offset), (v), sizeof(*(v))); \
+   switch (sizeof(*(v))) { \
+   case 2: le16_to_cpus((__u16 *) v); break;   \
+   case 4: le32_to_cpus((__u32 *) v); break;   \
+   case 8: le64_to_cpus((__u64 *) v); break;   \
+   }   \
+} while(0)
  


I would prefer that the virtio API not expose a little endian standard.  
I'm currently converting config->get() ops to ioreadXX depending on the 
size which already does the endianness conversion for me so this just 
messes things up.  I think it's better to let the backend deal with 
endianness since it's trivial to handle for both the PCI backend and the 
lguest backend (lguest doesn't need to do any endianness conversion).


Regards,

Anthony Liguori


 #endif /* __KERNEL__ */
 #endif /* _LINUX_VIRTIO_CONFIG_H */
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index ae469ae..8bf1602 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -5,16 +5,16 @@
 /* The ID for virtio_net */
 #define VIRTIO_ID_NET  1

-/* The bitmap of config for virtio net */
-#define VIRTIO_CONFIG_NET_F0x40
+/* The feature bitmap for virtio net */
 #define VIRTIO_NET_F_NO_CSUM   0
 #define VIRTIO_NET_F_TSO4  1
 #define VIRTIO_NET_F_UFO   2
 #define VIRTIO_NET_F_TSO4_ECN  3
 #define VIRTIO_NET_F_TSO6  4
+#define VIRTIO_NET_F_MAC   5

-/* The config defining mac address. */
-#define VIRTIO_CONFIG_NET_MAC_F0x41
+/* The config defining mac address (6 bytes) */
+#define VIRTIO_CONFIG_NET_MAC_F0

 /* This is the first element of the scatter-gather list.  If you don't
  * specify GSO or CSUM features, you can simply ignore the header. */
  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-07 Thread Anthony Liguori

Rusty Russell wrote:

After discussion with Anthony, the virtio config has been simplified.  We
lose some minor features (the virtio_net address must now be 6 bytes) but
it turns out to be a wash in terms of complexity, while simplifying PCI.
  


Hi Rusty,

Thanks for posting this!  It's really simplified things for me.


-
 /**
- * virtio_use_bit - helper to use a feature bit in a bitfield value.
- * @dev: the virtio device
- * @token: the token as returned from vdev-config-find().
- * @len: the length of the field.
- * @bitnum: the bit to test.
+ * __virtio_config_val - get a single virtio config without feature check.
+ * @vdev: the virtio device
+ * @offset: the type to search for.
+ * @val: a pointer to the value to fill in.
  *
- * If handed a NULL token, it returns false, otherwise returns bit status.
- * If it's one, it sets the mirroring acknowledgement bit. */
-int virtio_use_bit(struct virtio_device *vdev,
-  void *token, unsigned int len, unsigned int bitnum);
+ * The value is endian-corrected and returned in v. */
+#define __virtio_config_val(vdev, offset, v) do {  \
+   BUILD_BUG_ON(sizeof(*(v)) != 1  sizeof(*(v)) != 2 \
+ sizeof(*(v)) != 4  sizeof(*(v)) != 8);\
+   (vdev)-config-get((vdev), (offset), (v), sizeof(*(v))); \
+   switch (sizeof(*(v))) { \
+   case 2: le16_to_cpus((__u16 *) v); break;   \
+   case 4: le32_to_cpus((__u32 *) v); break;   \
+   case 8: le64_to_cpus((__u64 *) v); break;   \
+   }   \
+} while(0)
  


I would prefer that the virtio API not expose a little endian standard.  
I'm currently converting config-get() ops to ioreadXX depending on the 
size which already does the endianness conversion for me so this just 
messes things up.  I think it's better to let the backend deal with 
endianness since it's trivial to handle for both the PCI backend and the 
lguest backend (lguest doesn't need to do any endianness conversion).


Regards,

Anthony Liguori


 #endif /* __KERNEL__ */
 #endif /* _LINUX_VIRTIO_CONFIG_H */
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index ae469ae..8bf1602 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -5,16 +5,16 @@
 /* The ID for virtio_net */
 #define VIRTIO_ID_NET  1

-/* The bitmap of config for virtio net */
-#define VIRTIO_CONFIG_NET_F0x40
+/* The feature bitmap for virtio net */
 #define VIRTIO_NET_F_NO_CSUM   0
 #define VIRTIO_NET_F_TSO4  1
 #define VIRTIO_NET_F_UFO   2
 #define VIRTIO_NET_F_TSO4_ECN  3
 #define VIRTIO_NET_F_TSO6  4
+#define VIRTIO_NET_F_MAC   5

-/* The config defining mac address. */
-#define VIRTIO_CONFIG_NET_MAC_F0x41
+/* The config defining mac address (6 bytes) */
+#define VIRTIO_CONFIG_NET_MAC_F0

 /* This is the first element of the scatter-gather list.  If you don't
  * specify GSO or CSUM features, you can simply ignore the header. */
  


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-07 Thread Rusty Russell
On Thursday 08 November 2007 04:30:50 Anthony Liguori wrote:
 I would prefer that the virtio API not expose a little endian standard.
 I'm currently converting config-get() ops to ioreadXX depending on the
 size which already does the endianness conversion for me so this just
 messes things up.  I think it's better to let the backend deal with
 endianness since it's trivial to handle for both the PCI backend and the
 lguest backend (lguest doesn't need to do any endianness conversion).

-ETOOMUCHMAGIC.  We should either expose all the XX interfaces (but this isn't 
a high-speed interface, so let's not) or not sometimes convert endianness.  
Getting surprises because a field happens to be packed into 4 bytes is 
counter-intuitive.

Since your most trivial implementation is to do a byte at a time, I don't 
think you have a good argument on that basis either.

Cheers,
Rusty.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio config_ops refactoring

2007-11-07 Thread Anthony Liguori

Rusty Russell wrote:

On Thursday 08 November 2007 04:30:50 Anthony Liguori wrote:

I would prefer that the virtio API not expose a little endian standard.
I'm currently converting config-get() ops to ioreadXX depending on the
size which already does the endianness conversion for me so this just
messes things up.  I think it's better to let the backend deal with
endianness since it's trivial to handle for both the PCI backend and the
lguest backend (lguest doesn't need to do any endianness conversion).


-ETOOMUCHMAGIC.  We should either expose all the XX interfaces (but this isn't 
a high-speed interface, so let's not) or not sometimes convert endianness.  
Getting surprises because a field happens to be packed into 4 bytes is 
counter-intuitive.


Then I think it's necessary to expose the XX interfaces.  Otherwise, the 
backend has to deal with doing all register operations at a per-byte 
granularity which adds a whole lot of complexity on a per-device basis 
(as opposed to a little complexity once in the transport layer).


You really want to be able to rely on multi-byte atomic operations too 
when setting values.  Otherwise, you need another register to just to 
signal when it's okay for the device to examine any given register.


Regards,

Anthony Liguori

Since your most trivial implementation is to do a byte at a time, I don't 
think you have a good argument on that basis either.


Cheers,
Rusty.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] virtio config_ops refactoring

2007-11-06 Thread Rusty Russell
After discussion with Anthony, the virtio config has been simplified.  We
lose some minor features (the virtio_net address must now be 6 bytes) but
it turns out to be a wash in terms of complexity, while simplifying PCI.

This can be found in the new virtio git tree, in the "patches/1" branch
(new branches will be created as I rebase to keep up with Linus).

git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-virtio.git

I've also posted below, for easy commentry.
Cheers,
Rusty.

===
Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply use an agreed offset within the config
space and assume everyone knows the size of any entry it is interested
in (the config space can now only be extended at the end).

The main driver-visible change is that we indicate what fields are
present with an explicit feature bit.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index f266839..e5e5890 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "linux/lguest_launcher.h"
 #include "linux/virtio_config.h"
 #include "linux/virtio_net.h"
@@ -96,13 +97,11 @@ struct device_list
/* The descriptor page for the devices. */
u8 *descpage;
 
-   /* The tail of the last descriptor. */
-   unsigned int desc_used;
-
/* A single linked list of devices. */
struct device *dev;
-   /* ... And an end pointer so we can easily append new devices */
-   struct device **lastdev;
+   /* And a pointer to the last device for easy append and also for
+* configuration appending. */
+   struct device *lastdev;
 };
 
 /* The list of Guest devices, based on command line arguments. */
@@ -980,54 +979,44 @@ static void handle_input(int fd)
  *
  * All devices need a descriptor so the Guest knows it exists, and a "struct
  * device" so the Launcher can keep track of it.  We have common helper
- * routines to allocate them.
- *
- * This routine allocates a new "struct lguest_device_desc" from descriptor
- * table just above the Guest's normal memory.  It returns a pointer to that
- * descriptor. */
-static struct lguest_device_desc *new_dev_desc(u16 type)
-{
-   struct lguest_device_desc *d;
-
-   /* We only have one page for all the descriptors. */
-   if (devices.desc_used + sizeof(*d) > getpagesize())
-   errx(1, "Too many devices");
-
-   /* We don't need to set config_len or status: page is 0 already. */
-   d = (void *)devices.descpage + devices.desc_used;
-   d->type = type;
-   devices.desc_used += sizeof(*d);
+ * routines to allocate and manage them. */
 
-   return d;
+/* The layout of the device page is a "struct lguest_device_desc" followed by a
+ * number of virtqueue descriptors, then two sets of feature bits, then an
+ * array of configuration bytes.  This routine returns the configuration
+ * pointer. */
+static void *device_config(const struct device *dev)
+{
+   return (void *)(dev->desc + 1)
+   + dev->desc->num_vq * sizeof(struct lguest_vqconfig)
+   + dev->desc->feature_len * 2;
 }
 
-/* Each device descriptor is followed by some configuration information.
- * Each configuration field looks like: u8 type, u8 len, [... len bytes...].
- *
- * This routine adds a new field to an existing device's descriptor.  It only
- * works for the last device, but that's OK because that's how we use it. */
-static void add_desc_field(struct device *dev, u8 type, u8 len, const void *c)
+/* This routine allocates a new "struct lguest_device_desc" from descriptor
+ * table page just above the Guest's normal memory.  It returns a pointer to
+ * that descriptor. */
+static struct lguest_device_desc *new_dev_desc(u16 type)
 {
-   /* This is the last descriptor, right? */
-   assert(devices.descpage + devices.desc_used
-  == (u8 *)(dev->desc + 1) + dev->desc->config_len);
+   struct lguest_device_desc d = { .type = type }; 
+   void *p;
 
-   /* We only have one page of device descriptions. */
-   if (devices.desc_used + 2 + len > getpagesize())
-   errx(1, "Too many devices");
+   /* Figure out where the next device config is, based on the last one. */
+   if (devices.lastdev)
+   p = device_config(devices.lastdev)
+   + devices.lastdev->desc->config_len;
+   else
+   p = devices.descpage;
 
-   /* Copy in the new config header: type then length. */
-   devices.descpage[devices.desc_used++] = type;
-   devices.descpage[devices.desc_used++] = len;
-   memcpy(devices.descpage + devices.desc_used, c, len);
-   devices.desc_used += len;
+   /* We only have one page for all the descriptors. */
+   if (p + sizeof(d) > (void *)devices.descpage + getpagesize())
+  

[PATCH] virtio config_ops refactoring

2007-11-06 Thread Rusty Russell
After discussion with Anthony, the virtio config has been simplified.  We
lose some minor features (the virtio_net address must now be 6 bytes) but
it turns out to be a wash in terms of complexity, while simplifying PCI.

This can be found in the new virtio git tree, in the patches/1 branch
(new branches will be created as I rebase to keep up with Linus).

git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-virtio.git

I've also posted below, for easy commentry.
Cheers,
Rusty.

===
Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply use an agreed offset within the config
space and assume everyone knows the size of any entry it is interested
in (the config space can now only be extended at the end).

The main driver-visible change is that we indicate what fields are
present with an explicit feature bit.

Signed-off-by: Rusty Russell [EMAIL PROTECTED]

diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index f266839..e5e5890 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -34,6 +34,7 @@
 #include zlib.h
 #include assert.h
 #include sched.h
+#include limits.h
 #include linux/lguest_launcher.h
 #include linux/virtio_config.h
 #include linux/virtio_net.h
@@ -96,13 +97,11 @@ struct device_list
/* The descriptor page for the devices. */
u8 *descpage;
 
-   /* The tail of the last descriptor. */
-   unsigned int desc_used;
-
/* A single linked list of devices. */
struct device *dev;
-   /* ... And an end pointer so we can easily append new devices */
-   struct device **lastdev;
+   /* And a pointer to the last device for easy append and also for
+* configuration appending. */
+   struct device *lastdev;
 };
 
 /* The list of Guest devices, based on command line arguments. */
@@ -980,54 +979,44 @@ static void handle_input(int fd)
  *
  * All devices need a descriptor so the Guest knows it exists, and a struct
  * device so the Launcher can keep track of it.  We have common helper
- * routines to allocate them.
- *
- * This routine allocates a new struct lguest_device_desc from descriptor
- * table just above the Guest's normal memory.  It returns a pointer to that
- * descriptor. */
-static struct lguest_device_desc *new_dev_desc(u16 type)
-{
-   struct lguest_device_desc *d;
-
-   /* We only have one page for all the descriptors. */
-   if (devices.desc_used + sizeof(*d)  getpagesize())
-   errx(1, Too many devices);
-
-   /* We don't need to set config_len or status: page is 0 already. */
-   d = (void *)devices.descpage + devices.desc_used;
-   d-type = type;
-   devices.desc_used += sizeof(*d);
+ * routines to allocate and manage them. */
 
-   return d;
+/* The layout of the device page is a struct lguest_device_desc followed by a
+ * number of virtqueue descriptors, then two sets of feature bits, then an
+ * array of configuration bytes.  This routine returns the configuration
+ * pointer. */
+static void *device_config(const struct device *dev)
+{
+   return (void *)(dev-desc + 1)
+   + dev-desc-num_vq * sizeof(struct lguest_vqconfig)
+   + dev-desc-feature_len * 2;
 }
 
-/* Each device descriptor is followed by some configuration information.
- * Each configuration field looks like: u8 type, u8 len, [... len bytes...].
- *
- * This routine adds a new field to an existing device's descriptor.  It only
- * works for the last device, but that's OK because that's how we use it. */
-static void add_desc_field(struct device *dev, u8 type, u8 len, const void *c)
+/* This routine allocates a new struct lguest_device_desc from descriptor
+ * table page just above the Guest's normal memory.  It returns a pointer to
+ * that descriptor. */
+static struct lguest_device_desc *new_dev_desc(u16 type)
 {
-   /* This is the last descriptor, right? */
-   assert(devices.descpage + devices.desc_used
-  == (u8 *)(dev-desc + 1) + dev-desc-config_len);
+   struct lguest_device_desc d = { .type = type }; 
+   void *p;
 
-   /* We only have one page of device descriptions. */
-   if (devices.desc_used + 2 + len  getpagesize())
-   errx(1, Too many devices);
+   /* Figure out where the next device config is, based on the last one. */
+   if (devices.lastdev)
+   p = device_config(devices.lastdev)
+   + devices.lastdev-desc-config_len;
+   else
+   p = devices.descpage;
 
-   /* Copy in the new config header: type then length. */
-   devices.descpage[devices.desc_used++] = type;
-   devices.descpage[devices.desc_used++] = len;
-   memcpy(devices.descpage + devices.desc_used, c, len);
-   devices.desc_used += len;
+   /* We only have one page for all the descriptors. */
+   if (p + sizeof(d)  (void *)devices.descpage + getpagesize())
+