Re: [virtio-comment] Re: [RFC] ivshmem v2: Shared memory device specification

2020-07-27 Thread Michael S. Tsirkin
On Mon, Jul 27, 2020 at 04:17:06PM +0200, Jan Kiszka wrote:
> On 27.07.20 15:56, Michael S. Tsirkin wrote:
> > On Mon, Jul 27, 2020 at 03:39:32PM +0200, Jan Kiszka wrote:
> > > On 27.07.20 15:20, Michael S. Tsirkin wrote:
> > > > On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:
> > > > >  Vendor Specific Capability (ID 09h)
> > > > > 
> > > > > This capability must always be present.
> > > > > 
> > > > > | Offset | Register| Content  
> > > > >   |
> > > > > |---:|:|:---|
> > > > > |00h | ID  | 09h  
> > > > >   |
> > > > > |01h | Next Capability | Pointer to next capability or 00h
> > > > >   |
> > > > > |02h | Length  | 20h if Base Address is present, 18h 
> > > > > otherwise  |
> > > > > |03h | Privileged Control  | Bit 0 (read/write): one-shot 
> > > > > interrupt mode|
> > > > > || | Bits 1-7: Reserved (0 on read, 
> > > > > writes ignored) |
> > > > > |04h | State Table Size| 32-bit size of read-only State Table 
> > > > >   |
> > > > > |08h | R/W Section Size| 64-bit size of common read/write 
> > > > > section   |
> > > > > |10h | Output Section Size | 64-bit size of output sections   
> > > > >   |
> > > > > |18h | Base Address| optional: 64-bit base address of 
> > > > > shared memory |
> > > > > 
> > > > > All registers are read-only. Writes are ignored, except to bit 0 of
> > > > > the Privileged Control register.
> > > > 
> > > > 
> > > > Is there value in making this follow the virtio vendor-specific
> > > > capability format? That will cost several extra bytes - do you envision
> > > > having many of these in the config space?
> > > 
> > > Of course, this could be modeled with via virtio_pci_cap as well. Would 
> > > add
> > > 12 unused by bytes and one type byte. If it helps to make the device look
> > > more virtio'ish, but I'm afraid there are more differences at PCI level.
> > 
> > I guess it will be useful if we ever find it handy to make an ivshmem
> > device also be a virtio device. Can't say why yet but if we don't care
> > it vaguely seems kind of like a good idea. I guess it will also be handy
> > if you ever need another vendor specific cap: you already get a way to
> > identify it without breaking drivers.
> > 
> 
> I can look into that. Those 12 wasted bytes are a bit ugly, but so far we
> are not short on config space, even in the non-extended range.
> 
> More problematic is that the existing specification of virtio_pci_cap
> assumes that this describes a structure in a PCI resource, rather than even
> being that data itself, and even a register (privileged control).
> 
> Would it be possible to split the types into two ranges, one for the
> existing structure, one for others - like ivshmem - that will only share the
> cfg_type field?

Sure.

> > 
> > > I do not see a use case for having multiple of those caps above per 
> > > device.
> > > If someone comes around with a valid use case for having multiple,
> > > non-consequitive shared memory regions for one device, we would need to 
> > > add
> > > registers for them. But that would also only work for non-BAR regions due 
> > > to
> > > limited BARs.
> > 
> > 
> > OK, I guess this answers the below too.
> > 
> > > > Also, do we want to define an extended capability format in case this
> > > > is a pci extended capability?
> > > > 
> > > 
> > > What would be the practical benefit? Do you see PCIe caps that could 
> > > become
> > > useful in virtual setups?
> > 
> > So if we ever have a huge number of these caps, PCIe allows many more
> > caps.
> > 
> > > We don't do that for regular virtio devices
> > > either, do we?
> > 
> > We don't, there's a small number of these so we don't run out of config
> > space.
> 
> Right. But then it would not a be a problem to add PCIe (right before adding
> it becomes impossible) and push new caps into the extended space. And all
> that without breaking existing drivers. It's just a cap, and the spec so far
> does not state that there must be no other cap, neither in current virtio
> nor this ivshmem device.
> 
> Jan

Right.


> -- 
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux




Re: [virtio-comment] Re: [RFC] ivshmem v2: Shared memory device specification

2020-07-27 Thread Jan Kiszka

On 27.07.20 15:56, Michael S. Tsirkin wrote:

On Mon, Jul 27, 2020 at 03:39:32PM +0200, Jan Kiszka wrote:

On 27.07.20 15:20, Michael S. Tsirkin wrote:

On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:

 Vendor Specific Capability (ID 09h)

This capability must always be present.

| Offset | Register| Content
|
|---:|:|:---|
|00h | ID  | 09h
|
|01h | Next Capability | Pointer to next capability or 00h  
|
|02h | Length  | 20h if Base Address is present, 18h otherwise  
|
|03h | Privileged Control  | Bit 0 (read/write): one-shot interrupt mode
|
|| | Bits 1-7: Reserved (0 on read, writes ignored) 
|
|04h | State Table Size| 32-bit size of read-only State Table   
|
|08h | R/W Section Size| 64-bit size of common read/write section   
|
|10h | Output Section Size | 64-bit size of output sections 
|
|18h | Base Address| optional: 64-bit base address of shared memory 
|

All registers are read-only. Writes are ignored, except to bit 0 of
the Privileged Control register.



Is there value in making this follow the virtio vendor-specific
capability format? That will cost several extra bytes - do you envision
having many of these in the config space?


Of course, this could be modeled with via virtio_pci_cap as well. Would add
12 unused by bytes and one type byte. If it helps to make the device look
more virtio'ish, but I'm afraid there are more differences at PCI level.


I guess it will be useful if we ever find it handy to make an ivshmem
device also be a virtio device. Can't say why yet but if we don't care
it vaguely seems kind of like a good idea. I guess it will also be handy
if you ever need another vendor specific cap: you already get a way to
identify it without breaking drivers.



I can look into that. Those 12 wasted bytes are a bit ugly, but so far 
we are not short on config space, even in the non-extended range.


More problematic is that the existing specification of virtio_pci_cap 
assumes that this describes a structure in a PCI resource, rather than 
even being that data itself, and even a register (privileged control).


Would it be possible to split the types into two ranges, one for the 
existing structure, one for others - like ivshmem - that will only share 
the cfg_type field?





I do not see a use case for having multiple of those caps above per device.
If someone comes around with a valid use case for having multiple,
non-consequitive shared memory regions for one device, we would need to add
registers for them. But that would also only work for non-BAR regions due to
limited BARs.



OK, I guess this answers the below too.


Also, do we want to define an extended capability format in case this
is a pci extended capability?



What would be the practical benefit? Do you see PCIe caps that could become
useful in virtual setups?


So if we ever have a huge number of these caps, PCIe allows many more
caps.


We don't do that for regular virtio devices
either, do we?


We don't, there's a small number of these so we don't run out of config
space.


Right. But then it would not a be a problem to add PCIe (right before 
adding it becomes impossible) and push new caps into the extended space. 
And all that without breaking existing drivers. It's just a cap, and the 
spec so far does not state that there must be no other cap, neither in 
current virtio nor this ivshmem device.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: [virtio-comment] Re: [RFC] ivshmem v2: Shared memory device specification

2020-07-27 Thread Michael S. Tsirkin
On Mon, Jul 27, 2020 at 03:39:32PM +0200, Jan Kiszka wrote:
> On 27.07.20 15:20, Michael S. Tsirkin wrote:
> > On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:
> > >  Vendor Specific Capability (ID 09h)
> > > 
> > > This capability must always be present.
> > > 
> > > | Offset | Register| Content  
> > >   |
> > > |---:|:|:---|
> > > |00h | ID  | 09h  
> > >   |
> > > |01h | Next Capability | Pointer to next capability or 00h
> > >   |
> > > |02h | Length  | 20h if Base Address is present, 18h 
> > > otherwise  |
> > > |03h | Privileged Control  | Bit 0 (read/write): one-shot interrupt 
> > > mode|
> > > || | Bits 1-7: Reserved (0 on read, writes 
> > > ignored) |
> > > |04h | State Table Size| 32-bit size of read-only State Table 
> > >   |
> > > |08h | R/W Section Size| 64-bit size of common read/write section 
> > >   |
> > > |10h | Output Section Size | 64-bit size of output sections   
> > >   |
> > > |18h | Base Address| optional: 64-bit base address of shared 
> > > memory |
> > > 
> > > All registers are read-only. Writes are ignored, except to bit 0 of
> > > the Privileged Control register.
> > 
> > 
> > Is there value in making this follow the virtio vendor-specific
> > capability format? That will cost several extra bytes - do you envision
> > having many of these in the config space?
> 
> Of course, this could be modeled with via virtio_pci_cap as well. Would add
> 12 unused by bytes and one type byte. If it helps to make the device look
> more virtio'ish, but I'm afraid there are more differences at PCI level.

I guess it will be useful if we ever find it handy to make an ivshmem
device also be a virtio device. Can't say why yet but if we don't care
it vaguely seems kind of like a good idea. I guess it will also be handy
if you ever need another vendor specific cap: you already get a way to
identify it without breaking drivers.


> I do not see a use case for having multiple of those caps above per device.
> If someone comes around with a valid use case for having multiple,
> non-consequitive shared memory regions for one device, we would need to add
> registers for them. But that would also only work for non-BAR regions due to
> limited BARs.


OK, I guess this answers the below too.

> > Also, do we want to define an extended capability format in case this
> > is a pci extended capability?
> > 
> 
> What would be the practical benefit? Do you see PCIe caps that could become
> useful in virtual setups?

So if we ever have a huge number of these caps, PCIe allows many more
caps.

> We don't do that for regular virtio devices
> either, do we?

We don't, there's a small number of these so we don't run out of config
space.

> 
> Thanks,
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux




Re: [virtio-comment] Re: [RFC] ivshmem v2: Shared memory device specification

2020-07-27 Thread Jan Kiszka

On 27.07.20 15:20, Michael S. Tsirkin wrote:

On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:

 Vendor Specific Capability (ID 09h)

This capability must always be present.

| Offset | Register| Content
|
|---:|:|:---|
|00h | ID  | 09h
|
|01h | Next Capability | Pointer to next capability or 00h  
|
|02h | Length  | 20h if Base Address is present, 18h otherwise  
|
|03h | Privileged Control  | Bit 0 (read/write): one-shot interrupt mode
|
|| | Bits 1-7: Reserved (0 on read, writes ignored) 
|
|04h | State Table Size| 32-bit size of read-only State Table   
|
|08h | R/W Section Size| 64-bit size of common read/write section   
|
|10h | Output Section Size | 64-bit size of output sections 
|
|18h | Base Address| optional: 64-bit base address of shared memory 
|

All registers are read-only. Writes are ignored, except to bit 0 of
the Privileged Control register.



Is there value in making this follow the virtio vendor-specific
capability format? That will cost several extra bytes - do you envision
having many of these in the config space?


Of course, this could be modeled with via virtio_pci_cap as well. Would 
add 12 unused by bytes and one type byte. If it helps to make the device 
look more virtio'ish, but I'm afraid there are more differences at PCI 
level.


I do not see a use case for having multiple of those caps above per 
device. If someone comes around with a valid use case for having 
multiple, non-consequitive shared memory regions for one device, we 
would need to add registers for them. But that would also only work for 
non-BAR regions due to limited BARs.



Also, do we want to define an extended capability format in case this
is a pci extended capability?



What would be the practical benefit? Do you see PCIe caps that could 
become useful in virtual setups? We don't do that for regular virtio 
devices either, do we?


Thanks,
Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: [RFC] ivshmem v2: Shared memory device specification

2020-07-27 Thread Michael S. Tsirkin
On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:
>  Vendor Specific Capability (ID 09h)
> 
> This capability must always be present.
> 
> | Offset | Register| Content  
>   |
> |---:|:|:---|
> |00h | ID  | 09h  
>   |
> |01h | Next Capability | Pointer to next capability or 00h
>   |
> |02h | Length  | 20h if Base Address is present, 18h 
> otherwise  |
> |03h | Privileged Control  | Bit 0 (read/write): one-shot interrupt mode  
>   |
> || | Bits 1-7: Reserved (0 on read, writes 
> ignored) |
> |04h | State Table Size| 32-bit size of read-only State Table 
>   |
> |08h | R/W Section Size| 64-bit size of common read/write section 
>   |
> |10h | Output Section Size | 64-bit size of output sections   
>   |
> |18h | Base Address| optional: 64-bit base address of shared 
> memory |
> 
> All registers are read-only. Writes are ignored, except to bit 0 of
> the Privileged Control register.


Is there value in making this follow the virtio vendor-specific
capability format? That will cost several extra bytes - do you envision
having many of these in the config space?
Also, do we want to define an extended capability format in case this
is a pci extended capability?

-- 
MST




Re: [virtio-comment] [RFC] ivshmem v2: Shared memory device specification

2020-07-27 Thread Stefan Hajnoczi
On Thu, Jul 23, 2020 at 09:02:09AM +0200, Jan Kiszka wrote:
> On 23.07.20 08:54, Stefan Hajnoczi wrote:
> > On Fri, Jul 17, 2020 at 06:15:58PM +0200, Jan Kiszka wrote:
> > > On 15.07.20 15:27, Stefan Hajnoczi wrote:
> > > > On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:
> > 
> > Thanks for the responses. It would be great to update the spec with
> > these clarifications.
> > 
> > > > > If BAR 2 is not present, the shared memory region is not relocatable
> > > > > by the user. In that case, the hypervisor has to implement the Base
> > > > > Address register in the vendor-specific capability.
> > > > 
> > > > What does relocatable mean in this context?
> > > 
> > > That the guest can decide (via BAR) where the resource should show up in 
> > > the
> > > physical guest address space. We do not want to support this in setups 
> > > like
> > > for static partitioning hypervisors, and then we use that side-channel
> > > read-only configuration.
> > 
> > I see. I'm not sure what is vendor-specific about non-relocatable shared
> > memory. I guess it could be added to the spec too?
> 
> That "vendor-specific" comes from the PCI spec which - to my understanding -
> provides us no other means to introduce registers to the config space that
> are outside of the PCI spec. I could introduce a name for the ivshmem vendor
> cap and use that name here - would that be better?

Sounds good.


signature.asc
Description: PGP signature


Re: [virtio-comment] [RFC] ivshmem v2: Shared memory device specification

2020-07-23 Thread Jan Kiszka

On 23.07.20 08:54, Stefan Hajnoczi wrote:

On Fri, Jul 17, 2020 at 06:15:58PM +0200, Jan Kiszka wrote:

On 15.07.20 15:27, Stefan Hajnoczi wrote:

On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:


Thanks for the responses. It would be great to update the spec with
these clarifications.


If BAR 2 is not present, the shared memory region is not relocatable
by the user. In that case, the hypervisor has to implement the Base
Address register in the vendor-specific capability.


What does relocatable mean in this context?


That the guest can decide (via BAR) where the resource should show up in the
physical guest address space. We do not want to support this in setups like
for static partitioning hypervisors, and then we use that side-channel
read-only configuration.


I see. I'm not sure what is vendor-specific about non-relocatable shared
memory. I guess it could be added to the spec too?


That "vendor-specific" comes from the PCI spec which - to my 
understanding - provides us no other means to introduce registers to the 
config space that are outside of the PCI spec. I could introduce a name 
for the ivshmem vendor cap and use that name here - would that be better?




In any case, since "relocatable" hasn't been fully defined, I suggest
making the statement more general:

   If BAR 2 is not present the hypervisor has to implement the Base
   Address Register in the vendor-specific capability. This can be used
   for vendor-specific shared memory functionality.



Will integrate this.

Thanks,
Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux



Re: [virtio-comment] [RFC] ivshmem v2: Shared memory device specification

2020-07-23 Thread Stefan Hajnoczi
On Fri, Jul 17, 2020 at 06:15:58PM +0200, Jan Kiszka wrote:
> On 15.07.20 15:27, Stefan Hajnoczi wrote:
> > On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:

Thanks for the responses. It would be great to update the spec with
these clarifications.

> > > If BAR 2 is not present, the shared memory region is not relocatable
> > > by the user. In that case, the hypervisor has to implement the Base
> > > Address register in the vendor-specific capability.
> > 
> > What does relocatable mean in this context?
> 
> That the guest can decide (via BAR) where the resource should show up in the
> physical guest address space. We do not want to support this in setups like
> for static partitioning hypervisors, and then we use that side-channel
> read-only configuration.

I see. I'm not sure what is vendor-specific about non-relocatable shared
memory. I guess it could be added to the spec too?

In any case, since "relocatable" hasn't been fully defined, I suggest
making the statement more general:

  If BAR 2 is not present the hypervisor has to implement the Base
  Address Register in the vendor-specific capability. This can be used
  for vendor-specific shared memory functionality.


signature.asc
Description: PGP signature


Re: [virtio-comment] [RFC] ivshmem v2: Shared memory device specification

2020-07-17 Thread Jan Kiszka

On 15.07.20 15:27, Stefan Hajnoczi wrote:

On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:

IVSHMEM Device Specification


** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! **


Hi Jan,
Thanks for posting this! I have a posted comments where I wasn't sure
what the spec meant.


The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to
define the minimally needed building blocks a hypervisor has to
provide for enabling guest-to-guest communication. The details of
communication protocols shall remain opaque to the hypervisor so that
guests are free to define them as simple or sophisticated as they
need.

For that purpose, the IVSHMEM provides the following features to its
users:

- Interconnection between up to 65536 peers

- Multi-purpose shared memory region

 - common read/writable section

 - output sections that are read/writable for one peer and only
   readable for the others

 - section with peer states

- Event signaling via interrupt to remote sides

- Support for life-cycle management via state value exchange and
   interrupt notification on changes, backed by a shared memory
   section

- Free choice of protocol to be used on top

- Protocol type declaration

- Register can be implemented either memory-mapped or via I/O,
   depending on platform support and lower VM-exit costs

- Unprivileged access to memory-mapped or I/O registers feasible

- Single discovery and configuration via standard PCI, no complexity
   by additionally defining a platform device model


Hypervisor Model


In order to provide a consistent link between peers, all connected
instances of IVSHMEM devices need to be configured, created and run
by the hypervisor according to the following requirements:

- The instances of the device shall appear as a PCI device to their
   users.

- The read/write shared memory section has to be of the same size for
   all peers. The size can be zero.

- If shared memory output sections are present (non-zero section
   size), there must be one reserved for each peer with exclusive
   write access. All output sections must have the same size and must
   be readable for all peers.

- The State Table must have the same size for all peers, must be
   large enough to hold the state values of all peers, and must be
   read-only for the user.


Who/what is the "user"? I guess this simply means that the State Table
is read-only and only the hypervisor can update the table entries?


Read-only for the guest, right. I used the term "user" here and 
elsewhere, switching to "guest" might be better.





- State register changes (explicit writes, peer resets) have to be
   propagated to the other peers by updating the corresponding State
   Table entry and issuing an interrupt to all other peers if they
   enabled reception.

- Interrupts events triggered by a peer have to be delivered to the
   target peer, provided the receiving side is valid and has enabled
   the reception.

- All peers must have the same interrupt delivery features available,
   i.e. MSI-X with the same maximum number of vectors on platforms
   supporting this mechanism, otherwise INTx with one vector.


Guest-side Programming Model


An IVSHMEM device appears as a PCI device to its users. Unless
otherwise noted, it conforms to the PCI Local Bus Specification,
Revision 3.0. As such, it is discoverable via the PCI configuration
space and provides a number of standard and custom PCI configuration
registers.

### Shared Memory Region Layout

The shared memory region is divided into several sections.

 +-+   -
 | |   :
 | Output Section for peer n-1 |   : Output Section Size
 | (n = Maximum Peers) |   :
 +-+   -
 : :
 : :
 : :
 +-+   -
 | |   :
 |  Output Section for peer 1  |   : Output Section Size
 | |   :
 +-+   -
 | |   :
 |  Output Section for peer 0  |   : Output Section Size
 | |   :
 +-+   -
 | |   :
 | Read/Write Section  |   : R/W Section Size
 | |   :
 +-+   -
 | |   :
 | State Table |   : State Table Size
 | |   :
 +-+   <-- Shared memory base address

The first section consists of the mandatory State Table. Its size is
defined by the State Table Size register and cannot be zero. This
section is read-only for all peers.

The second 

Re: [virtio-comment] [RFC] ivshmem v2: Shared memory device specification

2020-07-15 Thread Stefan Hajnoczi
On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote:
> IVSHMEM Device Specification
> 
> 
> ** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! 
> **

Hi Jan,
Thanks for posting this! I have a posted comments where I wasn't sure
what the spec meant.

> The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to
> define the minimally needed building blocks a hypervisor has to
> provide for enabling guest-to-guest communication. The details of
> communication protocols shall remain opaque to the hypervisor so that
> guests are free to define them as simple or sophisticated as they
> need.
> 
> For that purpose, the IVSHMEM provides the following features to its
> users:
> 
> - Interconnection between up to 65536 peers
> 
> - Multi-purpose shared memory region
> 
> - common read/writable section
> 
> - output sections that are read/writable for one peer and only
>   readable for the others
> 
> - section with peer states
> 
> - Event signaling via interrupt to remote sides
> 
> - Support for life-cycle management via state value exchange and
>   interrupt notification on changes, backed by a shared memory
>   section
> 
> - Free choice of protocol to be used on top
> 
> - Protocol type declaration
> 
> - Register can be implemented either memory-mapped or via I/O,
>   depending on platform support and lower VM-exit costs
> 
> - Unprivileged access to memory-mapped or I/O registers feasible
> 
> - Single discovery and configuration via standard PCI, no complexity
>   by additionally defining a platform device model
> 
> 
> Hypervisor Model
> 
> 
> In order to provide a consistent link between peers, all connected
> instances of IVSHMEM devices need to be configured, created and run
> by the hypervisor according to the following requirements:
> 
> - The instances of the device shall appear as a PCI device to their
>   users.
> 
> - The read/write shared memory section has to be of the same size for
>   all peers. The size can be zero.
> 
> - If shared memory output sections are present (non-zero section
>   size), there must be one reserved for each peer with exclusive
>   write access. All output sections must have the same size and must
>   be readable for all peers.
> 
> - The State Table must have the same size for all peers, must be
>   large enough to hold the state values of all peers, and must be
>   read-only for the user.

Who/what is the "user"? I guess this simply means that the State Table
is read-only and only the hypervisor can update the table entries?

> - State register changes (explicit writes, peer resets) have to be
>   propagated to the other peers by updating the corresponding State
>   Table entry and issuing an interrupt to all other peers if they
>   enabled reception.
> 
> - Interrupts events triggered by a peer have to be delivered to the
>   target peer, provided the receiving side is valid and has enabled
>   the reception.
> 
> - All peers must have the same interrupt delivery features available,
>   i.e. MSI-X with the same maximum number of vectors on platforms
>   supporting this mechanism, otherwise INTx with one vector.
> 
> 
> Guest-side Programming Model
> 
> 
> An IVSHMEM device appears as a PCI device to its users. Unless
> otherwise noted, it conforms to the PCI Local Bus Specification,
> Revision 3.0. As such, it is discoverable via the PCI configuration
> space and provides a number of standard and custom PCI configuration
> registers.
> 
> ### Shared Memory Region Layout
> 
> The shared memory region is divided into several sections.
> 
> +-+   -
> | |   :
> | Output Section for peer n-1 |   : Output Section Size
> | (n = Maximum Peers) |   :
> +-+   -
> : :
> : :
> : :
> +-+   -
> | |   :
> |  Output Section for peer 1  |   : Output Section Size
> | |   :
> +-+   -
> | |   :
> |  Output Section for peer 0  |   : Output Section Size
> | |   :
> +-+   -
> | |   :
> | Read/Write Section  |   : R/W Section Size
> | |   :
> +-+   -
> | |   :
> | State Table |   : State Table Size
> | |   :
> +-+   <-- Shared memory base address
> 
> The first section consists of the mandatory State Table. Its size is
> defined by the State Table Size register and cannot be zero. This
> section is read-only for 

Re: [RFC] ivshmem v2: Shared memory device specification

2020-06-17 Thread Jan Kiszka
On 17.06.20 17:32, Alex Bennée wrote:
> 
> Jan Kiszka  writes:
> 
>> Hi all,
>>
>> as requested by Michael, find below the current version of the Inter-VM 
>> Shared Memory device specification version 2 (as version 1 could be 
>> considered what is currently in QEMU).
>>
>> This posting is intended to collect feedback from the virtio community 
>> before officially proposing it to become part of the spec. As you can 
>> see, the spec format is not yet integrated with the virtio documents at 
>> all.
>>
>> IVSHMEM has value of its own, allowing unprivileged applications to 
>> establish links to other applications in VMs connected via this 
>> transport. In addition, and that is where virtio comes into play even 
>> more, it can be used to build virtio backend-frontend connections 
>> between two VMs. Prototypes have been developed, see e.g. [1], 
>> specifying that transport is still a to-do. I will try to reserve a few 
>> cycle in the upcoming weeks for a first draft.
>>
>> My current strategy for these two pieces, ivshmem2 and virtio-shmem, is 
>> propose them both to virtio but allowing virtio-shmem to also be mapped 
>> on other shared memory channels for virtual machines.
>>
>> The ivshmem2 device model is fairly stable now, also being in use in 
>> Jailhouse for quite a while. Still there are some aspects that could be 
>> worth to discuss in particular:
>>
>>  - Do we also need a platform device model, in addition to PCI? My
>>feelings are negative, but there has been at least one request.
>>
>>  - Should we add some feature flags, or is using the PCI revision ID
>>sufficient (...to do that later)? Currently, there is no need for
>>communicating features this way.
>>
>>  - Should we try to model the doorbell interface more flexibly, in way
>>that may allow mapping it on hardware-provided mailboxes (i.e. VM-
>>exit free channels)? Unfortunately, I'm not yet aware of any hardware
>>that could provide this feature and, thus, could act as a use case to
>>design against.
>>
>> Thanks,
>> Jan
>>
>> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg668749.html
>>
>> ---
>>
>> IVSHMEM Device Specification
>> 
>>
>> ** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! 
>> **
>>
>> The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to
>> define the minimally needed building blocks a hypervisor has to
>> provide for enabling guest-to-guest communication. The details of
>> communication protocols shall remain opaque to the hypervisor so that
>> guests are free to define them as simple or sophisticated as they
>> need.
>>
>> For that purpose, the IVSHMEM provides the following features to its
>> users:
>>
>> - Interconnection between up to 65536 peers
>>
>> - Multi-purpose shared memory region
>>
>> - common read/writable section
>>
>> - output sections that are read/writable for one peer and only
>>   readable for the others
>>
>> - section with peer states
>>
>> - Event signaling via interrupt to remote sides
>>
>> - Support for life-cycle management via state value exchange and
>>   interrupt notification on changes, backed by a shared memory
>>   section
>>
>> - Free choice of protocol to be used on top
>>
>> - Protocol type declaration
>>
>> - Register can be implemented either memory-mapped or via I/O,
>>   depending on platform support and lower VM-exit costs
>>
>> - Unprivileged access to memory-mapped or I/O registers feasible
>>
>> - Single discovery and configuration via standard PCI, no complexity
>>   by additionally defining a platform device model
>>
>>
>> Hypervisor Model
>> 
>>
>> In order to provide a consistent link between peers, all connected
>> instances of IVSHMEM devices need to be configured, created and run
>> by the hypervisor according to the following requirements:
>>
>> - The instances of the device shall appear as a PCI device to their
>>   users.
> 
> Would there ever be scope for a plain MMIO style device to be reported?

As pointed out above, there has been a question already targeting at
MMIO binding.

> While I agree PCI is pretty useful from the point of view of easy
> discovery I note a number of machine types prefer device tree reported
> devices because they are faster to bring up for cloudy fast VM purposes
> (c.f. microvm).

Can't you avoid that probing delay by listing devices explicitly in the
device tree? I have no data on that, just heard about. Maybe the same
works with ACPI.

What I heard so far the concerns were more targeting the complexity of
PCI driver stacks in the guests. But that complexity quickly grows also
on the MMIO platform side when getting feature-wise on a similar level
as PCI.

> 
>> - The read/write shared memory section has to be of the same size for
>>   all peers. The size can be zero.
>>
>> - If shared memory output sections are present (non-zero section
>>   size), there must be one reserved for each peer 

Re: [RFC] ivshmem v2: Shared memory device specification

2020-06-17 Thread Alex Bennée


Jan Kiszka  writes:

> Hi all,
>
> as requested by Michael, find below the current version of the Inter-VM 
> Shared Memory device specification version 2 (as version 1 could be 
> considered what is currently in QEMU).
>
> This posting is intended to collect feedback from the virtio community 
> before officially proposing it to become part of the spec. As you can 
> see, the spec format is not yet integrated with the virtio documents at 
> all.
>
> IVSHMEM has value of its own, allowing unprivileged applications to 
> establish links to other applications in VMs connected via this 
> transport. In addition, and that is where virtio comes into play even 
> more, it can be used to build virtio backend-frontend connections 
> between two VMs. Prototypes have been developed, see e.g. [1], 
> specifying that transport is still a to-do. I will try to reserve a few 
> cycle in the upcoming weeks for a first draft.
>
> My current strategy for these two pieces, ivshmem2 and virtio-shmem, is 
> propose them both to virtio but allowing virtio-shmem to also be mapped 
> on other shared memory channels for virtual machines.
>
> The ivshmem2 device model is fairly stable now, also being in use in 
> Jailhouse for quite a while. Still there are some aspects that could be 
> worth to discuss in particular:
>
>  - Do we also need a platform device model, in addition to PCI? My
>feelings are negative, but there has been at least one request.
>
>  - Should we add some feature flags, or is using the PCI revision ID
>sufficient (...to do that later)? Currently, there is no need for
>communicating features this way.
>
>  - Should we try to model the doorbell interface more flexibly, in way
>that may allow mapping it on hardware-provided mailboxes (i.e. VM-
>exit free channels)? Unfortunately, I'm not yet aware of any hardware
>that could provide this feature and, thus, could act as a use case to
>design against.
>
> Thanks,
> Jan
>
> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg668749.html
>
> ---
>
> IVSHMEM Device Specification
> 
>
> ** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! 
> **
>
> The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to
> define the minimally needed building blocks a hypervisor has to
> provide for enabling guest-to-guest communication. The details of
> communication protocols shall remain opaque to the hypervisor so that
> guests are free to define them as simple or sophisticated as they
> need.
>
> For that purpose, the IVSHMEM provides the following features to its
> users:
>
> - Interconnection between up to 65536 peers
>
> - Multi-purpose shared memory region
>
> - common read/writable section
>
> - output sections that are read/writable for one peer and only
>   readable for the others
>
> - section with peer states
>
> - Event signaling via interrupt to remote sides
>
> - Support for life-cycle management via state value exchange and
>   interrupt notification on changes, backed by a shared memory
>   section
>
> - Free choice of protocol to be used on top
>
> - Protocol type declaration
>
> - Register can be implemented either memory-mapped or via I/O,
>   depending on platform support and lower VM-exit costs
>
> - Unprivileged access to memory-mapped or I/O registers feasible
>
> - Single discovery and configuration via standard PCI, no complexity
>   by additionally defining a platform device model
>
>
> Hypervisor Model
> 
>
> In order to provide a consistent link between peers, all connected
> instances of IVSHMEM devices need to be configured, created and run
> by the hypervisor according to the following requirements:
>
> - The instances of the device shall appear as a PCI device to their
>   users.

Would there ever be scope for a plain MMIO style device to be reported?
While I agree PCI is pretty useful from the point of view of easy
discovery I note a number of machine types prefer device tree reported
devices because they are faster to bring up for cloudy fast VM purposes
(c.f. microvm).

> - The read/write shared memory section has to be of the same size for
>   all peers. The size can be zero.
>
> - If shared memory output sections are present (non-zero section
>   size), there must be one reserved for each peer with exclusive
>   write access. All output sections must have the same size and must
>   be readable for all peers.
>
> - The State Table must have the same size for all peers, must be
>   large enough to hold the state values of all peers, and must be
>   read-only for the user.
>
> - State register changes (explicit writes, peer resets) have to be
>   propagated to the other peers by updating the corresponding State
>   Table entry and issuing an interrupt to all other peers if they
>   enabled reception.
>
> - Interrupts events triggered by a peer have to be delivered to the
>   target peer, provided the receiving side is 

[RFC] ivshmem v2: Shared memory device specification

2020-05-25 Thread Jan Kiszka
Hi all,

as requested by Michael, find below the current version of the Inter-VM 
Shared Memory device specification version 2 (as version 1 could be 
considered what is currently in QEMU).

This posting is intended to collect feedback from the virtio community 
before officially proposing it to become part of the spec. As you can 
see, the spec format is not yet integrated with the virtio documents at 
all.

IVSHMEM has value of its own, allowing unprivileged applications to 
establish links to other applications in VMs connected via this 
transport. In addition, and that is where virtio comes into play even 
more, it can be used to build virtio backend-frontend connections 
between two VMs. Prototypes have been developed, see e.g. [1], 
specifying that transport is still a to-do. I will try to reserve a few 
cycle in the upcoming weeks for a first draft.

My current strategy for these two pieces, ivshmem2 and virtio-shmem, is 
propose them both to virtio but allowing virtio-shmem to also be mapped 
on other shared memory channels for virtual machines.

The ivshmem2 device model is fairly stable now, also being in use in 
Jailhouse for quite a while. Still there are some aspects that could be 
worth to discuss in particular:

 - Do we also need a platform device model, in addition to PCI? My
   feelings are negative, but there has been at least one request.

 - Should we add some feature flags, or is using the PCI revision ID
   sufficient (...to do that later)? Currently, there is no need for
   communicating features this way.

 - Should we try to model the doorbell interface more flexibly, in way
   that may allow mapping it on hardware-provided mailboxes (i.e. VM-
   exit free channels)? Unfortunately, I'm not yet aware of any hardware
   that could provide this feature and, thus, could act as a use case to
   design against.

Thanks,
Jan

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg668749.html

---

IVSHMEM Device Specification


** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! **

The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to
define the minimally needed building blocks a hypervisor has to
provide for enabling guest-to-guest communication. The details of
communication protocols shall remain opaque to the hypervisor so that
guests are free to define them as simple or sophisticated as they
need.

For that purpose, the IVSHMEM provides the following features to its
users:

- Interconnection between up to 65536 peers

- Multi-purpose shared memory region

- common read/writable section

- output sections that are read/writable for one peer and only
  readable for the others

- section with peer states

- Event signaling via interrupt to remote sides

- Support for life-cycle management via state value exchange and
  interrupt notification on changes, backed by a shared memory
  section

- Free choice of protocol to be used on top

- Protocol type declaration

- Register can be implemented either memory-mapped or via I/O,
  depending on platform support and lower VM-exit costs

- Unprivileged access to memory-mapped or I/O registers feasible

- Single discovery and configuration via standard PCI, no complexity
  by additionally defining a platform device model


Hypervisor Model


In order to provide a consistent link between peers, all connected
instances of IVSHMEM devices need to be configured, created and run
by the hypervisor according to the following requirements:

- The instances of the device shall appear as a PCI device to their
  users.

- The read/write shared memory section has to be of the same size for
  all peers. The size can be zero.

- If shared memory output sections are present (non-zero section
  size), there must be one reserved for each peer with exclusive
  write access. All output sections must have the same size and must
  be readable for all peers.

- The State Table must have the same size for all peers, must be
  large enough to hold the state values of all peers, and must be
  read-only for the user.

- State register changes (explicit writes, peer resets) have to be
  propagated to the other peers by updating the corresponding State
  Table entry and issuing an interrupt to all other peers if they
  enabled reception.

- Interrupts events triggered by a peer have to be delivered to the
  target peer, provided the receiving side is valid and has enabled
  the reception.

- All peers must have the same interrupt delivery features available,
  i.e. MSI-X with the same maximum number of vectors on platforms
  supporting this mechanism, otherwise INTx with one vector.


Guest-side Programming Model


An IVSHMEM device appears as a PCI device to its users. Unless
otherwise noted, it conforms to the PCI Local Bus Specification,
Revision 3.0. As such, it is discoverable via the PCI configuration
space and provides a number of