Re: [Xen-devel] [DOC v4] Xen transport for 9pfs

2017-02-10 Thread Konrad Rzeszutek Wilk
On Thu, Feb 09, 2017 at 05:31:46PM -0800, Stefano Stabellini wrote:
> On Wed, 8 Feb 2017, Konrad Rzeszutek Wilk wrote:
> > > ## Ring Setup
> > > 
> > > The shared page has the following layout:
> > > 
> > > typedef uint32_t XEN_9PFS_RING_IDX;
> > > 
> > > struct xen_9pfs_intf {
> > >   XEN_9PFS_RING_IDX in_cons, in_prod;
> > >   uint8_t pad[56];
> > >   XEN_9PFS_RING_IDX out_cons, out_prod;
> > > 
> > >   uint32_t ring_order;
> > > /* this is an array of (1 << ring_order) elements */
> > >   grant_ref_t ref[1];
> > > };
> > > 
> > > /* not actually C compliant (ring_order changes from ring to ring) */
> > > struct ring_data {
> > > char in[((1 << ring_order) << PAGE_SHIFT) / 2];
> > > char out[((1 << ring_order) << PAGE_SHIFT) / 2];
> > > };
> > > 
> > 
> > This is the same comment about the the PV Calls structure.
> > 
> > Would it make sense to add the 'in_events' and 'out_events'
> > as a notification mechanism?
> 
> As I wrote in the case of PV Calls, given that it's just an optimization
> and increases complexity, what if we add some padding right after
> 
>   XEN_9PFS_RING_IDX out_cons, out_prod;
> 
> so that if we want to add it in the future, we can just place there,
> instead of the first 4 bytes of the padding array?

Yeah. Padding makes me sleep easy at night :-)

> 
> struct xen_9pfs_intf {
>   XEN_9PFS_RING_IDX in_cons, in_prod;
>   uint8_t pad[56];
>   XEN_9PFS_RING_IDX out_cons, out_prod;
>   uint8_t pad[56];
> 
>   uint32_t ring_order;
> /* this is an array of (1 << ring_order) elements */
>   grant_ref_t ref[1];
> };
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC v4] Xen transport for 9pfs

2017-02-09 Thread Stefano Stabellini
On Wed, 8 Feb 2017, Konrad Rzeszutek Wilk wrote:
> > ## Ring Setup
> > 
> > The shared page has the following layout:
> > 
> > typedef uint32_t XEN_9PFS_RING_IDX;
> > 
> > struct xen_9pfs_intf {
> > XEN_9PFS_RING_IDX in_cons, in_prod;
> > uint8_t pad[56];
> > XEN_9PFS_RING_IDX out_cons, out_prod;
> > 
> > uint32_t ring_order;
> > /* this is an array of (1 << ring_order) elements */
> > grant_ref_t ref[1];
> > };
> > 
> > /* not actually C compliant (ring_order changes from ring to ring) */
> > struct ring_data {
> > char in[((1 << ring_order) << PAGE_SHIFT) / 2];
> > char out[((1 << ring_order) << PAGE_SHIFT) / 2];
> > };
> > 
> 
> This is the same comment about the the PV Calls structure.
> 
> Would it make sense to add the 'in_events' and 'out_events'
> as a notification mechanism?

As I wrote in the case of PV Calls, given that it's just an optimization
and increases complexity, what if we add some padding right after

  XEN_9PFS_RING_IDX out_cons, out_prod;

so that if we want to add it in the future, we can just place there,
instead of the first 4 bytes of the padding array?

struct xen_9pfs_intf {
XEN_9PFS_RING_IDX in_cons, in_prod;
uint8_t pad[56];
XEN_9PFS_RING_IDX out_cons, out_prod;
uint8_t pad[56];

uint32_t ring_order;
/* this is an array of (1 << ring_order) elements */
grant_ref_t ref[1];
};


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC v4] Xen transport for 9pfs

2017-02-08 Thread Konrad Rzeszutek Wilk
> ## Ring Setup
> 
> The shared page has the following layout:
> 
> typedef uint32_t XEN_9PFS_RING_IDX;
> 
> struct xen_9pfs_intf {
>   XEN_9PFS_RING_IDX in_cons, in_prod;
>   uint8_t pad[56];
>   XEN_9PFS_RING_IDX out_cons, out_prod;
> 
>   uint32_t ring_order;
> /* this is an array of (1 << ring_order) elements */
>   grant_ref_t ref[1];
> };
> 
> /* not actually C compliant (ring_order changes from ring to ring) */
> struct ring_data {
> char in[((1 << ring_order) << PAGE_SHIFT) / 2];
> char out[((1 << ring_order) << PAGE_SHIFT) / 2];
> };
> 

This is the same comment about the the PV Calls structure.

Would it make sense to add the 'in_events' and 'out_events'
as a notification mechanism?


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [DOC v4] Xen transport for 9pfs

2017-02-07 Thread Stefano Stabellini
Changes in v4:
- Backend XenBus Nodes first
- add version negotiation
- remove complex optimization to avoid unnecessary event notifications
- move out indexes to separate cacheline
- many clarifications

Changes in v3:
- clarify a few statements
- rename port- to event-channel-
- use grant_ref_t ref[1] instead of ref[]

Changes in v2:
- fix copy/paste error
- rename ring-ref- to ring-ref
- fix memory barriers
- add "verify prod/cons against local copy"
- add a paragraph on high level design
- add a note on the maximum possible max-ring-page-order value
- add mechanisms to avoid unnecessary evtchn notifications

---

# Xen transport for 9pfs version 1 

## Background

9pfs is a network filesystem protocol developed for Plan 9. 9pfs is very
simple and describes a series of commands and responses. It is
completely independent from the communication channels, in fact many
clients and servers support multiple channels, usually called
"transports". For example the Linux client supports tcp and unix
sockets, fds, virtio and rdma.


### 9pfs protocol

This document won't cover the full 9pfs specification. Please refer to
this [paper] and this [website] for a detailed description of it.
However it is useful to know that each 9pfs request and response has the
following header:

struct header {
uint32_t size;
uint8_t id;
uint16_t tag;
} __attribute__((packed));

0 4  57
+-+--++
|  size   |id|tag |
+-+--++

- *size*
The size of the request or response.

- *id*
The 9pfs request or response operation.

- *tag*
Unique id that identifies a specific request/response pair. It is used
to multiplex operations on a single channel.

It is possible to have multiple requests in-flight at any given time.


## Rationale

This document describes a Xen based transport for 9pfs, in the
traditional PV frontend and backend format. The PV frontend is used by
the client to send commands to the server. The PV backend is used by the
9pfs server to receive commands from clients and send back responses.

The transport protocol supports multiple rings up to the maximum
supported by the backend. The size of every ring is also configurable
and can span multiple pages, up to the maximum supported by the backend
(although it cannot be more than 2MB). The design is to exploit
parallelism at the vCPU level and support multiple outstanding requests
simultaneously.

This document does not cover the 9pfs client/server design or
implementation, only the transport for it.


## Xenstore

The frontend and the backend connect via xenstore to exchange
information. The toolstack creates front and back nodes with state
[XenbusStateInitialising]. The protocol node name is **9pfs**.

Multiple rings are supported for each frontend and backend connection.

### Backend XenBus Nodes

Backend specific properties, written by the backend, read by the
frontend:

versions
 Values: 

 List of comma separated protocol versions supported by the backend.
 For example "1,2,3". Currently the value is just "1", as there is
 only one version. N.B.: this is the version of the Xen trasport
 protocol, not the version of 9pfs supported by the server.

max-rings
 Values: 

 The maximum supported number of rings per frontend.

max-ring-page-order
 Values: 

 The maximum supported size of a memory allocation in units of
 log2n(machine pages), e.g. 0 == 1 page,  1 = 2 pages, 2 == 4
 pages, etc.

Backend configuration nodes, written by the toolstack, read by the
backend:

path
 Values: 

 Host filesystem path to share.

tag
 Values: 

 Alphanumeric tag that identifies the 9pfs share. The client needs
 to know the tag to be able to mount it.

security-model
 Values: "none"

 *none*: files are stored using the same credentials as they are
 created on the guest (no user ownership squash or remap)
 Only "none" is supported in this version of the protocol.

### Frontend XenBus Nodes

version
 Values: 

 Protocol version, chosen among the ones supported by the backend
 (see **versions** under [Backend XenBus Nodes]). Currently the
 value must be "1".

num-rings
 Values: 

 Number of rings. It needs to be lower or equal to max-rings.

event-channel- (event-channel-0, event-channel-1, etc)
 Values: 

 The identifier of the Xen event channel used to signal activity
 in the ring buffer. One for each ring.

ring-ref (ring-ref0, ring-ref1, etc)
 Values: 

 The Xen grant reference granting permission for the backend to
 map a page with information to setup a share ring. One for