Re: [nvo3] Push or pull?

Jon Hudson Thu, 27 Sep 2012 10:39:14 -0700

A few comments inline...

On Sep 27, 2012, at 7:27 AM, Lucy yong <[email protected]> wrote:


> Let me try more. RT lets SP operator to control route or traffic in a VPN, it 
> is very useful for VPN services because CE are typical network site or LANs. 
> As you said, RT provides constraint capability.
> 
> However, in Data Center, a closed used group have many VMs or servers. 
> Although an NVO instance provides a communication for the VMs in the group, 
> some VMs may not need to communicate with some other VMs.

True, and desirable. If VM_A_01 only needs to talk to VM_A_02-07, then I don't 
want it to be able to touch VM_A_08 or VM_B_x.

> Who need to communicate with whom depends on the application running on VMs. 
> DC operator may not have any clue ahead.

Not likely. All the policies should have be setup when the application was 
first setup and all traffic is deliberate and known down to 
src_ip_port<->dst_ip_port. 

In the case of a customer in a large public cloud those policies may be more 
open ended as given from customer -> public_cloud_operator: there will still be 
well define enough policies to know what communication flows need to be open if 
you move VM_A_01 from DC01:POD02:RM05:RW02:RK08:pSERV23 ---> 
DC01:POD02:RM02:RW07:RK11:pSERV42

> Thus, how can they use RT to control or constraint the endpoint route?
> 
> Stand in host shoes for a moment, why ARP is designed for host protocol long 
> time ago? It is because a host connecting to a network does not need to know 
> all other hosts on the same network. the protocol lets a host query which one 
> a host want to know at any time. This is different between host and a switch. 

You _may_ see a policy as open ended as Cust_BA has VLANs 11-42, and any<->any 
is permitted between those VLANs and so as new VMs are     
instantiated a new learning sequence is needed. However firewall, access, 
backup etc policies will likely need to be setup for the VM with a fix ipaddr 
and so VMs are not typically just appearing willy-nilly like Tribbles.

> 
> When NVE is on a server, there will be just a small set of VMs attaching to 
> it although it is a switch.

We can't assume this. I can easily buy systems today with 32 cores and up to 
1TB of memory. For many folks that's 16 VMs minimum and 128 possible.

> When these VMs only needs to communicate with some small set of other VMs in 
> the closed user group, why NVE needs to have all the VM locations in the 
> group. Furthermore, depending on the application on VMs, VM communication 
> pattern may dynamically change, how can DC operator use RT accomplish this? 

Perhaps with a new unknown application. But if VM_A_01 has for weeks been 
talking to VM_A_[04,7,11,15] for weeks or months and suddenly starts talking to 
VM_A_22, your VM may be possessed. 

In a Datacenter, surprise is a bad thing. Boring is your goal.

> 
> I have no doubt that SP VPN is very mature technology and have plenty 
> features for various purposes. However, this is not necessary mean that all 
> these apply to DC virtual overlay well. NVO3 application is quite different 
> from SP VPN application. This is why we write vpn-gap-analysis draft to show 
> what are in common and where is the gap. Draft-hy-nvo3-vpn-gap-analysis. 
> 
> To make clear, this expression does not mean I suggest we should develop a 
> new solution for NV03. In fact, I think we can extend and/or simplify VPN 
> solution for NVO3.
> 
> Hope this helps.
> 
> Lucy  
> 
> -----Original Message-----
> From: Luyuan Fang (lufang) [mailto:[email protected]] 
> Sent: Wednesday, September 26, 2012 4:33 PM
> To: Lucy yong; Shah, Himanshu; Thomas Narten; Kireeti Kompella
> Cc: [email protected]
> Subject: RE: [nvo3] Push or pull?
> 
> Not clear what your interpretation has to do with change RT when VM moves...
> Anyway, if a NVE does not even know what RTs/VPNs he has, maybe he should be 
> fired? :-)
> Luyuan
> 
>> -----Original Message-----
>> From: Lucy yong [mailto:[email protected]]
>> Sent: Wednesday, September 26, 2012 5:22 PM
>> To: Luyuan Fang (lufang); Shah, Himanshu; Thomas Narten; Kireeti
>> Kompella
>> Cc: [email protected]
>> Subject: RE: [nvo3] Push or pull?
>> 
>> OK, I see we have different interpretations on NVE to have all the
>> endpoint route. My interpretation is that although an EVI provides the
>> communication among the VMs in a closed use group, it is not necessary
>> for a VM to communicate to all other VMs in the group at a time or all
>> time, therefore having each NVE to maintain all the endpoint routes in
>> an EVI is not necessary. This is the concern I got from Sunny's mail.
>> 
>> Seem that you have different interpretation.
>> Lucy
>> 
>> -----Original Message-----
>> From: Luyuan Fang (lufang) [mailto:[email protected]]
>> Sent: Wednesday, September 26, 2012 4:00 PM
>> To: Lucy yong; Shah, Himanshu; Thomas Narten; Kireeti Kompella
>> Cc: [email protected]
>> Subject: RE: [nvo3] Push or pull?
>> 
>> Why does the VM need to change RT when it moves? Is not the VM supposed
>> to stay in the same VPN and only changing location?
>> The VM should keep the same RT in order to maintain the membership of
>> that VPN it belongs to.
>> 
>> Luyuan
>> 
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf
>> Of
>>> Lucy yong
>>> Sent: Wednesday, September 26, 2012 4:48 PM
>>> To: Luyuan Fang (lufang); Shah, Himanshu; Thomas Narten; Kireeti
>>> Kompella
>>> Cc: [email protected]
>>> Subject: Re: [nvo3] Push or pull?
>>> 
>>> 
>>> 
>>> Why do you want to change RT on the fly? Would not that create
>> security
>>> issues? (And peer group or ORF don't change RT on the fly anyway).
>>> [[LY]] to support VM mobility. What is an interested endpoint moving
>>> from one NVE to another?
>>> lucy
>>> 
>>> RT-rewrite can be done when inter-connecting the VPN across two ASes
>>> which are (or were) administrated by different administrative
>> domains,
>>> but they are carefully controlled/designed and configured by the
>>> operators on the ASBRs or RRs. No RT change on the fly as I know of.
>>> 
>>> Luyuan
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Lucy yong [mailto:[email protected]]
>>>> Sent: Wednesday, September 26, 2012 4:13 PM
>>>> To: Luyuan Fang (lufang); Shah, Himanshu; Thomas Narten; Kireeti
>>>> Kompella
>>>> Cc: [email protected]
>>>> Subject: RE: [nvo3] Push or pull?
>>>> 
>>>> RT is easy to define different topology, but not flexible to
>> changes
>>> on
>>>> fly. Peer group or ORF is more about filtering setting, it may
>> better
>>>> fit here.
>>>> Lucy
>>>> 
>>>> -----Original Message-----
>>>> From: Luyuan Fang (lufang) [mailto:[email protected]]
>>>> Sent: Wednesday, September 26, 2012 3:10 PM
>>>> To: Lucy yong; Shah, Himanshu; Thomas Narten; Kireeti Kompella
>>>> Cc: [email protected]
>>>> Subject: RE: [nvo3] Push or pull?
>>>> 
>>>>> in push model, BGP peer group or ORF may be used to avoid every
>> NVE
>>>> to have all endpoint routes;
>>>> 
>>>> In BGP VPN case, it is most efficient to use RT Constraint [RFC
>> 6484]
>>>> for selective route distribution - only send the VPN routes to the
>>> peer
>>>> who has the relevant VPNs.
>>>> Luyuan
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: [email protected] [mailto:[email protected]] On
>>> Behalf
>>>> Of
>>>>> Lucy yong
>>>>> Sent: Wednesday, September 26, 2012 4:03 PM
>>>>> To: Shah, Himanshu; Thomas Narten; Kireeti Kompella
>>>>> Cc: [email protected]
>>>>> Subject: Re: [nvo3] Push or pull?
>>>>> 
>>>>> I agree with Thomas. Both "push" and "pull" models have their
>>>>> application space. To add on two points, in push model, BGP peer
>>>> group
>>>>> or ORF may be used to avoid every NVE to have all endpoint
>> routes;
>>> in
>>>>> the pull model, an NVE will have temporary caching to reduce the
>>>> number
>>>>> of queries.
>>>>> 
>>>>> Lucy
>>>>> 
>>>>> -----Original Message-----
>>>>> From: [email protected] [mailto:[email protected]] On
>>> Behalf
>>>> Of
>>>>> Shah, Himanshu
>>>>> Sent: Wednesday, September 26, 2012 2:46 PM
>>>>> To: Thomas Narten; Kireeti Kompella
>>>>> Cc: [email protected]
>>>>> Subject: Re: [nvo3] Push or pull?
>>>>> 
>>>>> I kind of agree with Thomas.
>>>>> 
>>>>> Cisco gave LISP (pull based) presentation which is a working
>> model,
>>>>> during NVO3 interim.
>>>>> I believe there are several ways to skin a cat and we should not
>>>> limit
>>>>> our options.
>>>>> Besides, I also got an impression from the chairs that discussing
>>>>> preference of one solution over other is
>>>>> rather premature based on where the NVO3 is.
>>>>> 
>>>>> Regards,
>>>>> himanshu
>>>>> 
>>>>> -----Original Message-----
>>>>> From: [email protected] [mailto:[email protected]] On
>>> Behalf
>>>> Of
>>>>> Thomas Narten
>>>>> Sent: Wednesday, September 26, 2012 2:04 PM
>>>>> To: Kireeti Kompella
>>>>> Cc: [email protected]
>>>>> Subject: Re: [nvo3] Push or pull?
>>>>> 
>>>>> Hi Kireeti.
>>>>> 
>>>>> Kireeti Kompella <[email protected]> writes:
>>>>> 
>>>>>> I'm glad you brought this up. Actually, this conversation has
>>>>>> happened several times, to my knowledge, without a firm
>>>> conclusion.
>>>>> I
>>>>>> doubt we can close it, but at least, let's air it.
>>>>> 
>>>>>> Push: send route updates to everyone (first see Aldrin's
>> comment
>>>>> about
>>>>>> RT Constraint) as soon as you (the AUTHORITY/ORACLE) get them.
>>>>> 
>>>>>> Pull: sit on updates you get until someone asks for them.
>>>>> 
>>>>>> I could try to convince you what a terrible idea Pull is. I
>> could
>>>>>> refer to the Internet, which is all Push, and scales reasonably
>>>> well.
>>>>> 
>>>>> You mean like DNS or ARP?
>>>>> 
>>>>> I do not think we should say "push is good, pull is bad". That is
>>>> just
>>>>> too categorical a statement.
>>>>> 
>>>>>> I could ask you what happens to packets while the Pull is being
>>>>>> responded to, or a bunch of related questions. I won't.
>>>>> 
>>>>> They get queued. Or dropped. Or possibly something else. Yes,
>> there
>>>> are
>>>>> implications to that. But not necessarily a show stopper either.
>>>>> 
>>>>>>> In my view, this puts an unnecessary load on NVEs.
>>>>> 
>>>>>> Let's talk instead about the "unnecessary load". Can someone
>>>> quantify
>>>>>> this?  Is it CPU? memory? messaging? What's the bottleneck or
>>> pain
>>>>>> point?
>>>>> 
>>>>> Some or all of the above.
>>>>> 
>>>>> If typical VNs are smallish, I agree that an NVE can preload full
>>>>> tables with no problem. But what about for very large VNs? Should
>>> the
>>>>> architecture *force* such preloading of full tables, even if the
>>>>> working set of routes is actually very small?
>>>>> 
>>>>> And what about for very large VNs where there is a lot of VM
>>>> mobility?
>>>>> Should all NVEs be required to get update info even for
>>> destinations
>>>>> they don't care about?
>>>>> 
>>>>>> Here's my back-of-the-envelop calculation for memory,
>> normalized
>>> to
>>>> a
>>>>>> VM. Let's say a VM has 10,000 friends in the DC that it might
>>>>> possibly
>>>>>> want to talk to, but only one that it really wants to talk to.
>>>> Let's
>>>>>> say that a FIB route entry takes 100 bytes. That adds up to a
>>>>> possible
>>>>>> total of 1MB vs. an actual of 100 bytes. Is 1MB really
>> something
>>>> one
>>>>>> should optimize, especially considering that the VM has
>> probably
>>>> been
>>>>>> allocated 4GB?
>>>>> 
>>>>> Are you really arguing that the difference between 1MB and 100
>>> bytes
>>>>> is just noise?     And who says this is in conventional memory on
>>> a
>>>>> host?
>>>>> I could see this being done in the ASIC...
>>>>> 
>>>>>> Maybe there is a dimension to this that really is an issue. I
>>> would
>>>>>> love to know, especially with numbers backing it up. But let's
>>>> first
>>>>>> convince ourselves that this is a problem worth solving before
>>>>>> spending cycles solving it.
>>>>> 
>>>>> I do not think we should today require that the NVO3 architecture
>>> (in
>>>> a
>>>>> MUST sense) support only push. I think we should allow for either
>>>> push
>>>>> or pull, or some combination. I can see benefits with both
>>>> approaches.
>>>>> 
>>>>> Note also that we may be looking at the problem from different
>>>>> perspectives. For example, in a single data center, I can imagine
>> a
>>>>> centralized directory service holding the complete address
>> mapping
>>>>> information for all the VNs in the DC. An NVE in such cases can
>>> query
>>>>> such a mapping system with very very low latency.
>>>>> 
>>>>> Thomas
>>>>> 
>>>>> _______________________________________________
>>>>> nvo3 mailing list
>>>>> [email protected]
>>>>> https://www.ietf.org/mailman/listinfo/nvo3
>>>>> _______________________________________________
>>>>> nvo3 mailing list
>>>>> [email protected]
>>>>> https://www.ietf.org/mailman/listinfo/nvo3
>>>>> _______________________________________________
>>>>> nvo3 mailing list
>>>>> [email protected]
>>>>> https://www.ietf.org/mailman/listinfo/nvo3
>>> _______________________________________________
>>> nvo3 mailing list
>>> [email protected]
>>> https://www.ietf.org/mailman/listinfo/nvo3
> _______________________________________________
> nvo3 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nvo3
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Push or pull?

Reply via email to