Hi all, Aldrin wrote about standardized MESSAGING, http://www.ietf.org/mail-archive/web/nvo3/current/msg01580.html.
I agree totally with Aldrin but there is more than MESSAGING packets to the whole picture. If we want to automate things between the server operational domain and the network operational domain more than MESSAGING packets are needed, i.e. not only a location mapping database but also some resource database(s). I'm biased by SIP and going to use MESSAGING methods from SIP together with a reference architecture and use cases in this post to explain my vision of a NVO3 architecture. A SIP proxy is usually stateful, i.e. it can keep track of sessions and you have a centralized place from where you can collect data to generate an overview of your system. SIP leverages distributed databases, you don't need to have all data in one single database and the SIP UA uses the pull model the fetch *desired* location and service related information from the databases. BGP is sort of one single database; it is distributed to all peers and the peer decides locally what to do with the data - it is the push model, IMHO. It is also easier to use SIP RFCs to describe MESSAGING methods in this post, there is a lot of things to consider in MESSAGING methods and it doesn't mean that the syntax should be like in SIP but you can get an overview of the MESSAGING methods in the SIP RFCs and there are years of experience behind those RFCs. So it is worthwhile to at least to read the abstract/introduction of the mentioned RFCs in this post. For example, Sunny wrote , http://www.ietf.org/mail-archive/web/nvo3/current/msg01596.html , "The issue here is how to push endpoint updates to the NVE when endpoints move" - this is the REFER method (if the endpoint in this context is the NVE) or INFO method (if the endpoint is the VM). Remember, no need to use the SIP syntax, instead focus on the MESSAGING methods, i.e. how to setup, move and delete sessions (tunnels). Sorry for the long e-mail, I don't have time to write a draft (I need soon to focus on my customer cases) and for sure there are holes in the flow of use cases but hopefully you can get an idea what I'm looking for. The reference architecture: - an L2 pod where all 4000 VLANs are enabled in the local network which is attached via two L3 ToRs to the DC L3 backbone - L3 routing service are either provide by the ToRs (attached to the L2 pod) or "behind" the DC L3 backbone - hypervisors from several vendors are leveraged, with a standardized NVE stack - hypervisor based NVEs (hNVE) and asic based NVEs (aNVE) are deployed in the NVO3 network - for high bandwidth multicast streams the underlying network supports RFC6513 - two types of NVO proxies (just as an example and to deal with the organizational silos) * tenant NVO (tNVE) proxy, keeping track of tenant related information (in the server operational domain). * network NVO (nNVE) proxy, keeping track of network resources e.g. NVO3 gateways, NG-MVPN mcast groups etc. (in the network operational domain). Use cases for a tenant: 1. Provisioning of tenant information The server administrator configures the following tenant information at the tNVO proxy - Customer ID - IP subnet to be used for the servers (to keep it simple in this study, only one subnet, but of course there will be several subnets if the web-app-db tiers are used) - IP default gateway for that subnet - option to support high bandwidth multicast (yes/no) 2. Provisioning of network resources The network administrator configures the following network information at the nNVO proxy - location of the NVO gateways (for WAN connectivity and legacy non-virtualized hosts, i.e. ToR switches attached to the L2 pods) and their locally assigned VLANs - list of locally assigned VLANs at each L2 pods - location of NVO gateways that are capable to produce NG-MVPN services, their locally assigned VLANs and assigned multicast groups - Customer ID mapping to VNI, MPLS VRF instances (name, route distinguisher, route target) etc. 3. Provisioning of a NVE The server administrator configures a new hNVE at the hypervisor, the admin must tell which tNVE proxy shall be used and there might be some authentication solution that must be obeyed. The hNVE leverages a MESSAGING solution to register itself to the assigned tNVE proxy. Ditto for aNVE, the network admin chooses the appropriate nNVE proxy. Note that a standardized MESSAGING method (register method) is applied by the NVEs and not by e.g. the centralized console of a hypervisor solution. In SIP the REGISTER method is leveraged, see RFC 3261, section 10 http://tools.ietf.org/html/rfc3261#page-56 4. Provisioning of a VM The server admin assigns a VM to the correct CID, the hNVE notice that a new server is deployed and leverages a MESSAGING method to fetch connectivity services from the assigned tNVE proxy. In SIP the INVITE method is applied, see RFC 3261, section 4 http://tools.ietf.org/html/rfc3261#page-10 I’ll switch partly to the SIP lingo – the INVITE is sent to the assigned tNVE proxy together with MAC address of the VM and the CID. The tNVE proxy forward this INVITE to the nNVE proxy (if several, then it forks the INVITE to all proxies), and the tNVE adds the following data to the INVITE - IP subnet - IP default gateway - high bandwidth multicast requested - supported encapsulation schemes with priority Next the nNVE proxy will process the INVITE, assigning the following in the 200 (OK) message - location of the NVO WAN GW, including supported encapsulation schemes - location of the NVO ToR GW to reach legacy hosts, including supported encapsulation schemes - a VLAN to be used for the VM (one VLAN ID shall be used that is locally available for all three parties, i.e. the WAN GW, ToR GW and at the L2 pod where the initiator hNVE is located). - VNI - NG-MVPN mcast group The hNVE enable the VLAN and starts to enable the two tunnels to the WAN GW and ToR GW with the provided parameters. Ditto at the two aNVEs, but since these are routers some other mechanism can be used, for example XMPP, NETCONF etc to configure all required network parameters on the routers, including adding the MAC address of the new VM at the two aNVEs. If successful, two new unicast tunnels have been established and we have stateful information about the tunnels at both proxies including MAC information of the VM – one with focus on the tenant and one with focus on the network resources. The two ToRs that are attached to the L2 pod of the hNVE are assigned the new VLAN for the VM and are added to NG-MVPN multicast group, same for the WAN GW and the ToR GW. 5. Add/deleting VMs on an hNVE with existing tunnels The server admin adds/deletes a VM at the hNVE discussed in use case 4. A MESSAGING method is needed to update the MAC tables on all NVEs belonging to the VNI. No changes to the tunnels. In SIP the INFO method can be used, see https://tools.ietf.org/html/rfc6086 Think there are other similar methods that could be used, not sure. 6. The hNVE is moved to another host If the make before break approach is preferred the existing tunnels should be gracefully moved from the existing host to another host. A MESSAGING method is needed for this. In SIP the REFER method is leveraged, see http://tools.ietf.org/html/rfc3515 There are other similar methods that could be used. This is a very rough overview, but hopefully you can see what could be achieved with a standardized MESSAGING solution, including distributed location databases and network resources databases. NVO3 have so much more potential than which/how many encapsulation schemes should be used etc; here is an opportunity to bridge the gap between the DC silos and automate provisioning of resources etc. Best regards, Patrick _______________________________________________ nvo3 mailing list [email protected] https://www.ietf.org/mailman/listinfo/nvo3
