On 27.04.20 09:53, Paul Durrant wrote:
From: Paul Durrant <pdurr...@amazon.com>
... to specify a separate migration stream that will also be suitable for
live update.
The original scope of the document was to support non-cooperative migration
of guests [1] but, since then, live update of xenstored has been brought into
scope. Thus it makes more sense to define a separate image format for
serializing xenstore state that is suitable for both purposes.
The document has been limited to specifying a new image format. The mechanism
for acquiring the image for live update or migration is not covered as that
is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
also expected that, when the first implementation of live update or migration
making use of this specification is committed, that the document is moved from
docs/designs into docs/specs.
[1] See
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
Signed-off-by: Paul Durrant <pdurr...@amazon.com>
---
Juergen Gross <jgr...@suse.com>
Andrew Cooper <andrew.coop...@citrix.com>
George Dunlap <george.dun...@citrix.com>
Ian Jackson <ian.jack...@eu.citrix.com>
Jan Beulich <jbeul...@suse.com>
Julien Grall <jul...@xen.org>
Stefano Stabellini <sstabell...@kernel.org>
Wei Liu <w...@xen.org>
Mind adding CC: before those mail addresses in order to let git add
those to the recipients list?
v2:
- Address comments from Juergen
Not all unfortunately. :-(
+### CONNECTION_DATA
-Each WATCH_DATA record specifies a registered watch and is formatted as
-follows:
+For live update the image format will contain a `CONNECTION_DATA` record for
+each connection to xenstore. For migration it will only contain a record for
+the domain being migrated.
```
- 0 1 2 3 octet
-+-------+-------+-------+-------+
-| WATCH_DATA |
-+-------------------------------+
-| wpath length |
-+-------------------------------+
-| wpath data |
-...
-| pad (0 to 3 octets) |
-+-------------------------------+
+ 0 1 2 3 4 5 6 7 octet
++-------+-------+-------+-------+-------+-------+-------+-------+
+| conn-id | pad |
++---------------+-----------------------------------------------+
+| conn-type | conn-spec
...
I asked whether it wouldn't be better to drop the pad and move conn-type
and a 2-byte (unified) flag field at its position. This together ...
++-------------------------------+-------------------------------+
+| data-len | data
+-------------------------------+
-| token length |
-+-------------------------------+
-| token data |
...
-| pad (0 to 3 octets) |
-+-------------------------------+
```
-wpath length and token length are specified in octets (excluding the NUL
-terminator). The wpath should be as described for the `WATCH` operation in
-[2]. The token is an arbitrary string of octets not containing any NUL
-values.
+| Field | Description |
+|-------------|-------------------------------------------------|
+| `conn-id` | A non-zero number used to identify this |
+| | connection in subsequent connection-specific |
+| | records |
+| | |
+| `conn-type` | 0x0000: shared ring |
+| | 0x0001: socket |
+| | |
+| `conn-spec` | See below |
+| | |
+| `data-len` | The length (in octets) of any pending data not |
+| | yet written to the connection |
+| | |
+| `data` | Pending data (may be empty) |
-**TRANSACTION_DATA**
+The format of `conn-spec` is dependent upon `conn-type`.
+\pagebreak
-Each TRANSACTION_DATA record specifies an open transaction and is formatted
-as follows:
+For `shared ring` connections it is as follows:
```
- 0 1 2 3 octet
-+-------+-------+-------+-------+
-| TRANSACTION_DATA |
-+-------------------------------+
-| tx_id |
-+-------------------------------+
+ 0 1 2 3 4 5 6 7 octet
+ +-------+-------+-------+-------+-------+-------+
+ | domid | tdomid | flags |
++---------------+---------------+---------------+---------------+
+| revtchn | levtchn |
++-------------------------------+-------------------------------+
+| mfn |
++---------------------------------------------------------------+
... with dropping levtchn (which isn't needed IMO) will make it much
easier to have a union in C (which needs to be aligned to 8 bytes
and have a length of a multiple of 8 bytes due to mfn).
So something like:
struct xs_state_connection {
uint32_t conn_id;
uint16_t conn_type;
#define XS_STATE_CONN_TYPE_RING 0
#define XS_STATE_CONN_TYPE_SOCKET 1
uint16_t flags;
#define XS_STATE_CONN_INTRODUCED 0x0001
#define XS_STATE_CONN_RELEASED 0x0002
#define XS_STATE_CONN_READONLY 0x0004
union {
struct {
uint16_t domid;
uint16_t tdomid;
#define XS_STATE_DOMID_INVALID 0xffffU
uint32_t evtchn;
uint64_t mfn;
#define XS_STATE_MFN_INVALID 0xffffffffffffffffUL
} ring;
int32_t socket_fd;
} spec;
uint32_t data_out_len;
uint8_t data[];
};
```
-where tx_id is the non-zero identifier values of an open transaction.
-
-### Protocol Extension
+| Field | Description |
+|------------|--------------------------------------------------|
+| `domid` | The domain-id that owns the shared page |
+| | |
+| `tdomid` | The domain-id that `domid` acts on behalf of if |
+| | it has been subject to an SET_TARGET |
+| | operation [2] or DOMID_INVALID otherwise |
DOMID_INVALID needs to be defined (or we need a reference where it is
coming from).
+| | |
+| `flags` | A bit-wise OR of: |
+| | 0x0001: INTRODUCE has been issued |
+| | 0x0002: RELEASE has been issued |
+| | |
+| `revtchn` | The port number of the interdomain channel used |
+| | by `domid` to communicate with xenstored |
+| | |
+| `levtchn` | For a live update this will be the port number |
+| | of the interdomain channel used by xenstored |
+| | itself otherwise, for migration, it will be -1 |
+| | |
+| `mfn` | The MFN of the shared page for a live update or |
+| | INVALID_MFN otherwise |
INVALID_MFN is an internal detail of the hypervisor. We should have a
local definition for it here (as in my example above).
+
+Since the ABI guarantees that entry 1 in `domid`'s grant table will always
+contain the GFN of the shared page, so for a live update `mfn` can be used to
+give confidence that `domid` has not been re-cycled during the update.
+
+
+For `socket` connections it is as follows:
-Before xenstore state is migrated it is necessary to wait for any pending
-reads, writes, watch registrations etc. to complete, and also to make sure
-that xenstored does not start processing any new requests (so that new
-requests remain pending on the shared ring for subsequent processing on the
-new host). Hence the following operation is needed:
```
-QUIESCE <domid>|
-
-Complete processing of any request issued by the specified domain, and
-do not process any further requests from the shared ring.
+ 0 1 2 3 4 5 6 7 octet
+ +-------+-------+-------+-------+-------+-------+
+ | flags | socket-fd |
+ +---------------+-------------------------------+
-START_DOMAIN_TRANSACTION <domid>|<transid>|
+ 0 1 2 3 octet
++-------+-------+-------+-------+
+| conn-id |
++-------------------------------+
+| tx-id |
++---------------+---------------+
+| access | perm-count |
++---------------+---------------+
+| perm1 |
++-------------------------------+
+...
++-------------------------------+
+| permN |
++---------------+---------------+
+| path-len | value-len |
++---------------+---------------+
+| path
+...
+| value
+...
+```
+
+
+| Field | Description |
+|--------------|------------------------------------------------|
+| `conn-id` | If this value is non-zero then this record |
+| | related to a pending transaction |
+| | |
+| `tx-id` | This value should be ignored if `conn-id` is |
+| | zero. Otherwise it specifies the id of the |
+| | pending transaction |
+| | |
+| `access` | This value should be ignored if this record |
+| | does not relate to a pending transaction, |
+| | otherwise it specifies the accesses made to |
+| | the node and hence is a bitwise OR of: |
+| | |
+| | 0x0001: read |
+| | 0x0002: written |
+| | |
+| | The value will be zero for a deleted node |
+| | |
+| `perm-count` | The number (N) of node permission specifiers |
+| | (which will be 0 for a node deleted in a |
+| | pending transaction) |
+| | |
+| `perm1..N` | A list of zero or more node permission |
+| | specifiers (see below) |
+| | |
+| `path-len` | The length (in octets) of `path` including the |
+| | NUL terminator |
+| | |
+| `value-len` | The length (in octets) of `value` (which will |
+| | be zero for a deleted node) |
I asked you to put the path-len and value-len fields before the perm
array.
Juergen