Hi,
This discussion is to understand the process of VM migration (to the extent
needed), and the consequences on address mappings. This is for all
(mostly, me) to understand the issues; not all of what's discussed here
will be put in the draft. I've tried to incorporate what David has said so
far.
Situation: VM X is running on server S connected to lNVE L. X will move to
server S' connected to lNVE L'.
(For now, we'll assume a push model where addressing information is sent
out "eagerly" rather than "lazily".)
A high level picture of what the VM orchestration system does: (please let
me know if I have some significant details wrong)
M1a) tell S' that VM X will move from S to S'. S' prepares to receive a
copy of the VM image, memory, etc.
M1b) tell S to begin copying X from S to S'.
M2a) tell S to pause X
M2b) tell S' to start X
M3a) if X runs fine on S', tell S to destroy X (migration successful)
M3b) if not, tell S' to destroy X, and tell S to continue running X
(migration aborted)
M1x happen before M2x which happen before M3x. The order of orchestration
cannot be assumed to reflect in the order of signaling messages.
----
M1a results in N1:
N1) S' sends a "pre-associate" message to L', which in turn sends out an
update that "X will move to L' ".
(This can be done by an address update with a lower preference, such that
other VMs in X's VM still use the old location L, but those who want to
know where X is moving (like L) can find out.)
M2a results in N2a:
N2a) S sends a dissociate message to L with a timeout. When L receives
this, it points its route for X to L' (if it already learned this from L')
or to discard.
M2b results in N2b:
N2b) S' sends an associate message to L', whereupon L' makes itself the
"official" nexthop for X (normal preference; VM moved).
M3b (error condition) will be left for future discussion.
----
Race conditions:
R1) if N1 BEFORE N2a => traffic to X still arriving at L gets redirected to
L'
If not, traffic to X arriving at L gets dropped
R2) if N2a BEFORE N2b => traffic redirected from L to L' gets dropped until
N2b occurs (and is processed).
R3) N2a times out => redirection at L stops. Traffic sent to L from NVEs
that still haven't received the control plane update from N2b (i.e., VM
moved) will be dropped.
Please let me know of other interesting race conditions!
--
Kireeti
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3