Re: [vpp-dev] Requirement on Load Balancer plugin for VPP

Zhou, Danny Tue, 25 Apr 2017 07:19:40 -0700


From: [email protected] [mailto:[email protected]] On 
Behalf Of Thomas F Herbert
Sent: Tuesday, April 25, 2017 10:01 PM
To: [email protected]
Subject: Re: [vpp-dev] Requirement on Load Balancer plugin for VPP

On 04/25/2017 04:45 AM, Zhou, Danny wrote:
Thanks Pierre, comments inline.

From: Pierre Pfister (ppfister) [mailto:[email protected]]
Sent: Tuesday, April 25, 2017 4:11 PM
To: Ni, Hongjun <[email protected]><mailto:[email protected]>
Cc: Zhou, Danny <[email protected]><mailto:[email protected]>; Ed 
Warnicke <[email protected]><mailto:[email protected]>; Li, Johnson 
<[email protected]><mailto:[email protected]>; 
[email protected]<mailto:[email protected]>
Subject: Re: [vpp-dev] Requirement on Load Balancer plugin for VPP

Le 25 avr. 2017 à 09:52, Ni, Hongjun 
<[email protected]<mailto:[email protected]>> a écrit :

Hi Pierre,

For LB distribution case, I think we could assign a node IP for each LB box.
When received packets from client, LB will do both SNAT and DNAT. i.e. source 
IP -> LB's Node IP, destination IP -> AS's IP.
When returned packets from AS, LB also do both DNAT and SNAT. i.e. source IP -> 
AS's IP, destination IP -> Client's IP.
Does NSH solve this problem solve this problem of transparently forwarding the 
traffic.
[Zhou, Danny]  No, this has nothing to do with NSH. We are trying to use VPP to 
replace in_kernel iptables/Netfilter based distributed load balancer 
(controlled by Kube-proxy) for high performance container networking in NFV 
environment. And our learnings from NSH work is that even VPP's VTEP 
implementation has much high performance than in_kernel VTP, but still brings 
significant negative performance impact in comparison to processing non-tunnel 
packets (as you can see from the published CSIT performance reports), so legacy 
DNAT/SNAT based approach still has its unique benefits when processing small 
packets.

I see.
Doing so you completely hide the client's source address from the application.
You also require per-connexion binding at the load balancer (MagLev does 
per-connexion binding, but in a way which allows for hash collisions, because 
it is not a big deal if two flows use the same entry in the hash table. This 
allows for smaller and fixed size hash table, which also provides a performance 
advantage to MagLev).

In my humble opinion, using SNAT+DNAT is a terribly bad idea, so I would advise 
you to reconsider finding a way to either:
- Enable any type of packet tunneling protocol in your ASs (IPinIP, L2TP, 
whatever-other-protocol, and extend VPP's LB plugin with the one you pick).
- Put some box closer to the ASs (bump in the wire) for decap.
- If your routers support MPLS, you could also use it as encap.
[Zhou, Danny] In a cloud environment where hundreds of or thousands of ASs are 
dynamically deployed in a VM or a container, it is not easy for orchestrator 
(within global view) to find a close enough boxes to be configured 
automatically in order to offload encap/decap works. Mostly like, it will be 
still software to do the encap/decap work. Secondly, if we are target small 
packet line rate performance, adding the tunnel heads increases the total 
packet size hence decrease the packet efficiency and cause packet loss. I would 
consider adding GRE tunnels for LB is like abuse of tunneling protocol, as 
those tunneling protocols are not designed for this case. SNAT + DNA has its 
own disadvantage, but they are widely used in software centric Cloud 
environment orchestrated by Openstack or Kubernetes.

If you really want to use SNAT+DNAT (god forbid), and are willing to suffer (or 
somehow like suffering), you may try to:
- Use VPP's SNAT on the client-facing interface. The SNAT will just change 
clients source addresses to one of LB's source addresses.
- Extend VPP's LB plugin to support DNAT "encap".
- Extend VPP's LB plugin to support return traffic and stateless SNAT base on 
LB flow table (And find a way to make that work on multiple cores...).
The client->AS traffic, in VPP, would do ---> client-facing-iface --> SNAT --> 
LB(DNAT) --> AS-facing-iface
The AS->client traffic, in VPP, would do ---> AS-facing-iface --> LB(Stateless 
SNAT) --> SNAT Plugin (doing DNAT-back) --> client-facing-iface

Now the choice is all yours.
But I will have warned you.

Cheers,

- Pierre

Thanks,
Hongjun

From: Pierre Pfister (ppfister) [mailto:[email protected]]
Sent: Tuesday, April 25, 2017 3:12 PM
To: Zhou, Danny <[email protected]<mailto:[email protected]>>
Cc: Ni, Hongjun <[email protected]<mailto:[email protected]>>; Ed 
Warnicke <[email protected]<mailto:[email protected]>>; Li, Johnson 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: Re: [vpp-dev] Requirement on Load Balancer plugin for VPP

Hello all,

As mentioned by Ed, introducing return traffic would dramatically reduce the 
performance of the solution.
-> Return traffic typically consists of data packets, whereas forward traffic 
mostly consists of ACKs. So you will have to have significantly more LB boxes 
if you want to support all your return traffic.
-> Having to deal with return traffic also means that we need to either make 
sure return traffic goes through the same core, or add locks to the structures 
(for now, everything is lockless, per-core), or steer traffic for core to core.

There also is something that I am not sure to understand. You mentioned DNAT in 
order to steer the traffic to the AS, but how do you make sure the return 
traffic goes back to the LB ? My guess is that all the traffic coming out of 
the ASs is routed toward one LB, is that right ? How do you make sure the 
return traffic is evenly distributed between LBs ?

It's a pretty interesting requirement that you have, but I am quite sure the 
solution will have to be quite far from MagLev's design, and probably less 
efficient.

- Pierre

Le 25 avr. 2017 à 05:11, Zhou, Danny 
<[email protected]<mailto:[email protected]>> a écrit :

Share  my two cents as well:

Firstly, introducing GRE or whatever other tunneling protocols to LB introduces 
performance overhead (for encap and decap) to both the load balancer as well as 
the network service. Secondly, other mechanism on the network service node not 
only needs to decap the GRE but also needs to perform a DNAT operation in order 
to change the destination IP of the original frame from LB's IP to the service 
entity's IP, which introduces the complexity to the network service.

Existing well-known load balancers such as Netfilter or Nginx do not adopt this 
tunneling approach, they just simply do a service node selection followed by a 
NAT operation.

-Danny

From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Ni, Hongjun
Sent: Tuesday, April 25, 2017 11:05 AM
To: Ed Warnicke <[email protected]<mailto:[email protected]>>
Cc: Li, Johnson <[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: Re: [vpp-dev] Requirement on Load Balancer plugin for VPP

Hi Ed,

Thanks for your prompt response.

This item is required to handle legacy AS, because some legacy AS does not want 
to change their underlay forwarding infrastructure.

Besides, some AS IPs are private and invisible outside the AS cluster domain, 
and not allowed to expose to external network.

Thanks,
Hongjun

From: Ed Warnicke [mailto:[email protected]]
Sent: Tuesday, April 25, 2017 10:44 AM
To: Ni, Hongjun <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>; Li, Johnson 
<[email protected]<mailto:[email protected]>>
Subject: Re: [vpp-dev] Requirement on Load Balancer plugin for VPP

Hongjun,

I can see this point of view, but it radically reduces the scalability of the 
whole system.
Wouldn't it just make sense to run vpp or some other mechanism to decap the GRE 
on whatever is running the other AS and feed whatever we are
load balancing to?  Forcing back traffic through the central load balancer 
radically reduces scalability (which is why
Maglev, which inspired what we are doing here, doesn't do it that way either).

Ed

On Mon, Apr 24, 2017 at 7:18 PM, Ni, Hongjun 
<[email protected]<mailto:[email protected]>> wrote:
Hey,

Currently, traffic received for a given VIP (or VIP prefix) is tunneled using 
GRE towards
the different ASs in a way that (tries to) ensure that a given session will
always be tunneled to the same AS.

But in real environment, many Application Servers do not support GRE feature.
So we raise a requirement for LB in VPP:
(1). When received traffic for a VIP, the LB need to do load balance, then do 
DNAT to change traffic's destination IP from VIP to AS's IP.
(2). When returned traffic from AS, the LB will do SNAT first to change 
traffic's source IP from AS's IP to VIP, then go through load balance sessions, 
and then sent to clients.

Any comments about this requirement are welcome.

Thanks a lot,
Hongjun

_______________________________________________
vpp-dev mailing list
[email protected]<mailto:[email protected]>
https://lists.fd.io/mailman/listinfo/vpp-dev

_______________________________________________
vpp-dev mailing list
[email protected]<mailto:[email protected]>
https://lists.fd.io/mailman/listinfo/vpp-dev

_______________________________________________

vpp-dev mailing list

[email protected]<mailto:[email protected]>

https://lists.fd.io/mailman/listinfo/vpp-dev

--
Thomas F Herbert
Fast Data Planes
Office of Technology
Red Hat

_______________________________________________
vpp-dev mailing list
[email protected]
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Requirement on Load Balancer plugin for VPP

Reply via email to