NTT engineer in the wings?

2018-07-15 Thread JASON BOTHE via NANOG


If there is someone listening from NTT engineering, would you kindly write 
back? 

The IP NOC is unable to locate anyone because it’s Sunday so I thought I might 
try here. 

Thanks!

J~



Re: Linux BNG

2018-07-15 Thread Raymond Burkholder

On 07/15/2018 10:56 AM, Denys Fedoryshchenko wrote:

On 2018-07-15 19:00, Raymond Burkholder wrote:

On 07/15/2018 09:03 AM, Denys Fedoryshchenko wrote:

On 2018-07-14 22:05, Baldur Norddahl wrote:
About OVS, i didnt looked much at it, as i thought it is not suitable 
for BNG purposes,
like for tens of thousands users termination, i thought it is more about 
high speed switching for tens of VM.


I would call it more of a generic all purpose tool for customized 
L2/L3/L4/L5 packet forwarding.  It works well for datacenter as well as 
ISP related scenarios.  Due to the wide variety of rule matching, 
encapsulations supported, and the ability to attach a customized 
controller for specialized packet handling.



On edge based translations, is hardware based forwarding actually
necessary, since there are so many software functions being performed
anyway?
IMO at current moment 20-40G on single box is a boundary point when 
packet forwarding
is preferable(but still not necessary) to do in hardware, as passing 
packets

thru whole Linux stack is really not best option. But it works.
I'm trying to find an alternative solution, bypassing full stack using XDP,
so i can go beyond 40G.


Tied to XDP is eBPF (which is what makes tcpdump fast).

Another tool is P4 which provides tools to build customized SW/HW 
forwarders.  But I'm not sure how applicable it is to BNG.

--
Raymond Burkholder
r...@oneunified.net
https://blog.raymond.burkholder.net


Re: Linux BNG

2018-07-15 Thread Baldur Norddahl
søn. 15. jul. 2018 18.57 skrev Denys Fedoryshchenko :

>
> Openflow IMO by nature is built to do complex matching, and for example
> for
> typical 12-tuple it is 750-4000 entries max in switches, but you go to
> l2 only matching
> which was possible at moment i tested, on my experience, only on PF5820
> - you can do L2
> entries only matching, then it can go 80k flows.
> But again, sticking to specific vendor is not recommended.
>

It would be possible to implement a general forward to controller policy
and then upload matching on MAC address only as a offload strategy. You
would have a different device doing the layer 3 stuff. The OpenFlow switch
just adds and removes vlan tagging based on MAC matching.

Regards


Re: Linux BNG

2018-07-15 Thread Denys Fedoryshchenko

On 2018-07-15 19:00, Raymond Burkholder wrote:

On 07/15/2018 09:03 AM, Denys Fedoryshchenko wrote:

On 2018-07-14 22:05, Baldur Norddahl wrote:
I have considered OpenFlow and might do that. We have OpenFlow 
capable
switches and I may be able to offload the work to the switch 
hardware.
But I also consider this solution harder to get right than the idea 
of

using Linux with tap devices. Also it appears the Openvswitch
implements a different flavour of OpenFlow than the hardware switch
(the hardware is limited to some fixed tables that Broadcom made up),
so I might not be able to start with the software and then move on to
hardware.


AFAIK openflow is suitable for datacenters, but doesn't scale well for 
users termination purposes.

You will run out from TCAM much sooner than you expect.


Denys, could you expand on this?  In a linux based solution (say with
OVS), TCAM is memory/software based, and in following their dev
threads, they have been optimizing flow caches continuously for
various types of flows: megaflow, tiny flows, flow quantity and
variety, caching, ...

When you mate OVS with something like a Mellanox Spectrum switch (via
SwitchDev) for hardware based forwarding, I could see certain hardware
limitations applying, but don't have first hand experience with that.

But I suppose you will see these TCAM issues on hardware only
specialized openflow switches.
Yes, definitely only on hardware switches and biggest issue it is 
vendor+hardware
dependent. This means if i find "right" switch, and make your solution 
depending on it,
and vendor decided to issue new revision, or even new firmware, there is 
no guarantee

"unusual" setup will keep working.
That what makes many people afraid to use it.

Openflow IMO by nature is built to do complex matching, and for example 
for
typical 12-tuple it is 750-4000 entries max in switches, but you go to 
l2 only matching
which was possible at moment i tested, on my experience, only on PF5820 
- you can do L2

entries only matching, then it can go 80k flows.
But again, sticking to specific vendor is not recommended.

About OVS, i didnt looked much at it, as i thought it is not suitable 
for BNG purposes,
like for tens of thousands users termination, i thought it is more about 
high speed

switching for tens of VM.




On edge based translations, is hardware based forwarding actually
necessary, since there are so many software functions being performed
anyway?
IMO at current moment 20-40G on single box is a boundary point when 
packet forwarding
is preferable(but still not necessary) to do in hardware, as passing 
packets

thru whole Linux stack is really not best option. But it works.
I'm trying to find an alternative solution, bypassing full stack using 
XDP,

so i can go beyond 40G.



But then, it may be conceivable that by buying a number of servers,
and load spreading across the servers will provide some resiliency and
will come in at a lower cost than putting in 'big iron' anyway.

Because then there are some additional benefits:  you can run Network
Function Virtualization at the edge and provide additional services to
customers.

+1

For IPoE/PPPoE - servers scale very well, while on "hardware" eventually 
you will
hit a limit how many line cards you can put in chassis and then you need 
to buy new chassis.
I am not talking that some chassis have countless unobvious limitations 
you might hit
inside chassis (in pretty old Cisco 6500/7600, which is not EOL, it is a 
nightmare).


If ISP have big enough chassis, he need to remember, that he need second 
one
at same place, and preferable with same amount of line cards, while with 
servers

you are more reliable even with N+M(where M for example N/4) redundancy.

Also when premium customers ask me for some unusual things, it is much 
easier
to move them to separate nodes with extended options for termination, 
where i can

implement their demands over custom vCPE.


Re: Linux BNG

2018-07-15 Thread Baldur Norddahl




Den 15/07/2018 kl. 18.00 skrev Raymond Burkholder:
But I think a clarification on Baldur's speed requirements is needed. 
He indicates that there are a bunch of locations:  do each of the 
locations require 10G throughput, or was the throughput defined for 
all sites in aggregate?  If the sites indivdiually have  smaller 
throughput, the software based boxes might do, but if that is at each 
site, then software-only boxes may not handle the throughput.


We have considerably more than 10G of total traffic. We are currently 
transporting it all to one of two locations before doing the BNG 
function. We then have VRRP to enable failover to the other location. 
Transport is by MPLS and L2VPN.


I set the goal post at 10G per server. To handle more traffic we will 
have multiple servers. Load balance does not need to be dynamic. We 
would just distribute the customers so each customer is always handled 
by the same server. 10G per server translates to approximately 5000 
customers per server (2018 - this number is expected to drop as time goes).


I am wondering if we could make an open source system (does not strictly 
have to be Linux) that could do the BNG function at 10G per server, with 
a server in the price range of 1k - 2k USD. For many sizes of ISP this 
would be far far cheaper than any of the solutions from Cisco, Juniper 
et al. Even if you had to get 10 servers to handle 100G you would likely 
still come out ahead of the big iron solution. And for a startup (like 
us) it is great to be able to start out with little investment and then 
let the solution grow with the business.


Regards,

Baldur



Re: Linux BNG

2018-07-15 Thread Raymond Burkholder

On 07/15/2018 09:03 AM, Denys Fedoryshchenko wrote:

On 2018-07-14 22:05, Baldur Norddahl wrote:

I have considered OpenFlow and might do that. We have OpenFlow capable
switches and I may be able to offload the work to the switch hardware.
But I also consider this solution harder to get right than the idea of
using Linux with tap devices. Also it appears the Openvswitch
implements a different flavour of OpenFlow than the hardware switch
(the hardware is limited to some fixed tables that Broadcom made up),
so I might not be able to start with the software and then move on to
hardware.


AFAIK openflow is suitable for datacenters, but doesn't scale well for 
users termination purposes.

You will run out from TCAM much sooner than you expect.


Denys, could you expand on this?  In a linux based solution (say with 
OVS), TCAM is memory/software based, and in following their dev threads, 
they have been optimizing flow caches continuously for various types of 
flows: megaflow, tiny flows, flow quantity and variety, caching, ...


When you mate OVS with something like a Mellanox Spectrum switch (via 
SwitchDev) for hardware based forwarding, I could see certain hardware 
limitations applying, but don't have first hand experience with that.


But I suppose you will see these TCAM issues on hardware only 
specialized openflow switches.


On edge based translations, is hardware based forwarding actually 
necessary, since there are so many software functions being performed 
anyway?


But I think a clarification on Baldur's speed requirements is needed. 
He indicates that there are a bunch of locations:  do each of the 
locations require 10G throughput, or was the throughput defined for all 
sites in aggregate?  If the sites indivdiually have  smaller throughput, 
the software based boxes might do, but if that is at each site, then 
software-only boxes may not handle the throughput.


But then, it may be conceivable that by buying a number of servers, and 
load spreading across the servers will provide some resiliency and will 
come in at a lower cost than putting in 'big iron' anyway.


Because then there are some additional benefits:  you can run Network 
Function Virtualization at the edge and provide additional services to 
customers.


I forgot to mention this in the earlier thread, but there are some 
companies out there which provide devices with many ports on them and 
provide compute at the same time.  So software based Linux switches are 
possible, with out reverting to a combination of physical switch and 
separate compute box.   In a Linux based switch, by using IRQ affiinity, 
traffic from ports can be balanced across CPUs.  So by collapsing switch 
and compute, additional savings might be able to be realized.


As a couple of side notes:  1) the DPDK people support a user space 
dataplane version of OVS/OpenFlow, and 2) an eBPF version of the OVS 
dataplane is being worked on.  In summary, OVS supports three current 
dataplanes with a fourth on the way.  1) native kernel, 2) hardware 
offload via TC (SwitchDev), 3) DPDK, 4) eBPF.


Linux tap device has very high overhead, it suits no more than working 
as some hotspot gateway for 100s of users.


As does the 'veth' construct.





--
Raymond Burkholder
r...@oneunified.net
https://blog.raymond.burkholder.net


Re: Linux BNG

2018-07-15 Thread Ahad Aboss
Hi Baldur,



Based on the information you provided, CPE connects to the POI via
different service provider (access network provider / middle man) before it
reaches your network/POP.



With this construct, you are typically responsible for IP allocation and
session authentication via DHCP (option 82) with AAA or via Radius for
PPPoE.  You may also have to deal with the S-TAG and C-TAG  at BNG level.
Here are some options to consider;



*Option 1.*



Use Radius for session authentication and IP/DNS allocation to CPE. You can
configure BBA-GROUP on the BNG to overcome the 409x vlan limitation as well
as the S-TAG and C-TAG. BBA-GROUP can handle multiple session. BBA-GROUP is
also a well-supported feature.



Here is an example of the config for your BNG (Cisco router)  ;

===

*bba-group pppoe NAME -1*

 virtual-template 1

 sessions per-mac limit 2

!

*bba-group pppoe NAME -2*

 virtual-template 2

 sessions per-mac limit 2

!

interface GigabitEthernet1/3.100

* encapsulation dot1Q 100 second-dot1q 500-4094*

 no ip redirects

 no ip unreachables

 no ip proxy-arp

 ip flow ingress

 ip flow egress

 ip multicast boundary 30

 *pppoe enable group NAME -1*

 no cdp enable

!

interface GigabitEthernet1/3.200

 encapsulation dot1Q 200 second-dot1q 200-300

 no ip redirects

 no ip unreachables

 no ip proxy-arp

 ip flow ingress

 ip flow egress

 ip multicast boundary 30

 *pppoe enable group NAME -2*

 no cdp enable



Configure Virtual templates too.

===

*Option 2.*



You can deploy a DHCP server using DHCP option 82 to handle all IP or IPoE
sessions.

DHCP option 82 provides you with additional flexibility that can scale as
your customer base grows. You can perform authentication using a
combination of CircuitID, RemoteID or CPE MAC-ADD etc.



I hope this information helps.



Cheers,

Ahad


On Sat, Jul 14, 2018 at 10:13 PM, Baldur Norddahl  wrote:

> Hello
>
> I am investigating Linux as a BNG. The BNG (Broadband Network Gateway)
> being the thing that acts as default gateway for our customers.
>
> The setup is one VLAN per customer. Because 4095 VLANs is not enough, we
> have QinQ with double VLAN tagging on the customers. The customers can use
> DHCP or static configuration. DHCP packets need to be option82 tagged and
> forwarded to a DHCP server. Every customer has one or more static IP
> addresses.
>
> IPv4 subnets need to be shared among multiple customers to conserve
> address space. We are currently using /26 IPv4 subnets with 60 customers
> sharing the same default gateway and netmask. In Linux terms this means 60
> VLAN interfaces per bridge interface.
>
> However Linux is not quite ready for the task. The primary problem being
> that the system does not scale to thousands of VLAN interfaces.
>
> We do not want customers to be able to send non routed packets directly to
> each other (needs proxy arp). Also customers should not be able to steal
> another customers IP address. We want to hard code the relation between IP
> address and VLAN tagging. This can be implemented using ebtables, but we
> are unsure that it could scale to thousands of customers.
>
> I am considering writing a small program or kernel module. This would
> create two TAP devices (tap0 and tap1). Traffic received on tap0 with VLAN
> tagging, will be stripped of VLAN tagging and delivered on tap1. Traffic
> received on tap1 without VLAN tagging, will be tagged according to a lookup
> table using the destination IP address and then delivered on tap0. ARP and
> DHCP would need some special handling.
>
> This would be completely stateless for the IPv4 implementation. The IPv6
> implementation would be harder, because Link Local addressing needs to be
> supported and that can not be stateless. The customer CPE will make up its
> own Link Local address based on its MAC address and we do not know what
> that is in advance.
>
> The goal is to support traffic of minimum of 10 Gbit/s per server. Ideally
> I would have a server with 4x 10 Gbit/s interfaces combined into two 20
> Gbit/s channels using bonding (LACP). One channel each for upstream and
> downstream (customer facing). The upstream would be layer 3 untagged and
> routed traffic to our transit routers.
>
> I am looking for comments, ideas or alternatives. Right now I am
> considering what kind of CPU would be best for this. Unless I take steps to
> mitigate, the workload would probably go to one CPU core only and be
> limited to things like CPU cache and PCI bus bandwidth.
>
> Regards,
>
> Baldur
>
>


-- 
Regards,

Ahad
Swiftel Networks
"*Where the best is good enough*"


Re: Linux BNG

2018-07-15 Thread Denys Fedoryshchenko

On 2018-07-14 22:05, Baldur Norddahl wrote:

I have considered OpenFlow and might do that. We have OpenFlow capable
switches and I may be able to offload the work to the switch hardware.
But I also consider this solution harder to get right than the idea of
using Linux with tap devices. Also it appears the Openvswitch
implements a different flavour of OpenFlow than the hardware switch
(the hardware is limited to some fixed tables that Broadcom made up),
so I might not be able to start with the software and then move on to
hardware.
AFAIK openflow is suitable for datacenters, but doesn't scale well for 
users termination purposes.

You will run out from TCAM much sooner than you expect.
Linux tap device has very high overhead, it suits no more than working 
as some hotspot gateway for 100s of users.


Regards,

Baldur


Re: Linux BNG

2018-07-15 Thread Denys Fedoryshchenko

On 2018-07-15 06:09, Jérôme Nicolle wrote:

Hi Baldur,

Le 14/07/2018 à 14:13, Baldur Norddahl a écrit :

I am investigating Linux as a BNG


As we say in France, it's like your trying to buttfuck flies (a local
saying standing for "reinventing the wheel for no practical reason").

You can say that about whole opensource ecosystem, why to bother, if
*proprietary solution name* exists. It is endless flamewar topic.



Linux' kernel networking stack is not made for this kind of job. 6WIND
or fd.io may be right on the spot, but it's still a lot of dark magic
for something that has been done over and over for the past 20 years by
most vendors.

And it just works.

Linux developers are working continuously to improve this, for example
latest feature, XDP, able to process several Mpps on <$1000 server.
Ask yourself, why Cloudflare "buttfuck flies" and doesn't buy some
proprietary vendor who 20 years does filtering in hardware?
https://blog.cloudflare.com/how-to-drop-10-million-packets/
I am doing experiments with XDP as well, to terminate PPPoE, and it is
doing that quite well over XDP.



DHCP (implying straight L2 from the CPE to the BNG) may be an option
bust most codebases are still young. PPP, on the other hand, is
field-tested for extremely large scale deployments with most vendors.

DHCP here, at least from RFC 2131 existence in March 1997.
Quite old, isn't it?
When you stick to PPPoE, you tie yourself with necessary layers of
encapsulation/decapsulation, and this is seriously degrading performance 
at
_user_ level at least. With some development experience of firmware for 
routers,

i can tell that hardware offloading of ipv4 routing (DHCP) obviousl
is much easier and cheaper, than offloading PPPoE encap/decap+ipv4 
routing.

Also Vendors keep screwing up their routers with PPP, and for example
one of them failed processing properly PADO in newest firmware revision.
Another problem, with PPPoE you subscribe to headache called reduced 
mtu, that also

will give a lot of unpleasant hours for ISP support.



If I were in you shooes, and I don't say I'd want to (my BNGs are 
scaled

to less than a few thousand of subscribers, with 1-4 concurrent session
each), I'd stick to plain old bitstream (PPP) model, with a decent
subscriber framework on my BNGs (I mostly use Juniper MXs, but I also
like Nokia's and Cisco's for some features).

I am consulting operators from few hundreds to hundreds of thousands.
It is very rare, when Linux bng doesn't suit them.



But let's say we would want to go forward and ditch legacy / 
proprietary

code to surf on the NFV bullshit-wave. What would you actually need ?

Linux does soft-recirculation at every encapsulation level by memory
copy. You can't scale anything with that. You need to streamline
decapsulation with 6wind's turborouter or fd.io frameworks. It'll cost
you a few thousand of man-hours to implement your first prototype.

6wind/fd.io is great solutions, but not suitable for mentioned task.
They are mostly created for very tailor made tasks or even as core of 
some
vendor solution. Implementing your BNG based on such frameworks, or 
DPDK, is really
reinventing the wheel, unless you will sell it or can save by that 
millions

of US$.



Let's say you got a woking framework to treat subsequent headers on the
fly (because decapsulation is not really needed, what you want is just
to forward the payload, right ?)… Well, you'd need to address
provisionning protocols on the same layers. Who would want to rebase a
DHCP server with alien packet forms incoming ? I gess no one.
accel-ppp does all that and exactly for IPoE termination, and no black 
magic

there.



Well, I could dissert on the topic for hours, because I've already 
spent

months to address such design issues in scalable ISP networks, and the
conclusion is :

- PPPoE is simple and proven. Its rigid structure alleviates most of 
the

dual-stack issues. It is well supported and largelly deployed.

PPPoE has VERY serious flaws.
1)Security of PPPoE sucks big time. Anybody who run rogue PPPoE server 
in your network
will create significant headache for you, while with DHCP you have at 
least "DHCP snooping".
DHCP snooping supported in very many vendors switches, while for PPPoE 
most of them

have nothing, except... you stick each user to his own vlan.
Why to pppox them then?
2)DHCP can send some circuit information in Option 82, this is very 
useful for

billing and very cost efficient on last stage of access switches.
3)Modern FTTX(GPON) solutions are built with QinQ in mind, so IPoE fits 
there flawlessly.


- DHCP requires hacks (in the form of undocummented options from 
several
vendors) to seemingly work on IPv4, but the multicast boundaries for 
NDP

are a PITA to handle, so no one implemented that properly yet. So it is
to avoid for now.
While you can do multicast(mostly for IPTV, yes it is not easy, and need 
some
vendor magic on "native" layer (DHCP), with PPP you can forget about 
multicast

entirely.


Re: Linux BNG

2018-07-15 Thread James Bensley
Hi Baldur,

These guys made a PPPoE client for VPP - you could probably extend
that into a PPP server:

https://lists.fd.io/g/vpp-dev/message/9181
https://github.com/raydonetworks/vpp-pppoeclient

Although, I would agree that deploying PPP now is a bit of a step
backwards and IPoE is the way to be doing this in 2018.

If you want subscribers with a S-TAG/C-TAG landing in unique virtual
interfaces with shared gateway etc, IPv4 + IPv6 (DHCP/v6) and were
deploying this on "real service provider networking kit" [1] then the
way to do this is with pseudowire headend termination. (PWHE/PWHT).
However, you're going to struggle to implement something like PWHT on
the native Linux networking stack. Many of the features you want exist
in Linux like DHCP/v6, IPv4/6, MPLS, LDP, pseudowires etc, but not all
together as a combined service offering. My two pence would be to buy
kit from some like Cisco or Juniper as I don't think the open source
world is quite there yet. Alternatively if it *must* be Linux look at
adding the code to https://wiki.fd.io/view/VPP/Features as it has all
constituent parts (DHCP, IP, MPLS, bridges etc.) but not glued
together. VPP is an order of magnitude faster than the native Kernel
networking stack. I'd be shocked if you did all that you wanted to do
at 10Gbps line rate with one CPU core.

Cheers,
James.

[1] Which means the expensive stuff big name vendors like Cisco and Juniper sell


Re: (perhaps off topic, but) Microwave Towers

2018-07-15 Thread Wayne Bouchard
I was going to say... in my experience (I've been to a lot of the
Arizona electronics sites, having grown up around broadcasting) that
most of the microwave equipment in use was for Bell. That was by far
the most populous tower on any mountain top. The broadcasters don't
send their signals anywhere but either from downtown to the transmiter
or in some cases from the big town to a small town to feed a local low
power transmitter (like 5kw VHF as opposed to the normal 100kw).
Anything else was Satelite. I know the railroad did some wireless
(Sprint's towers were also quite densely packed with directional
horns) but a lot of their communication for rail signaling was
hardwire as far as I was aware.

-Wayne

On Sat, Jul 14, 2018 at 12:20:34PM -0500, frnk...@iname.com wrote:
> Is it possibly AT's old network?
> https://99percentinvisible.org/article/vintage-skynet-atts-abandoned-long-lines-microwave-tower-network/
> http://long-lines.net/places-routes/
> 
> This network runs through our service territory, too.  The horns are 
> distinctive.  
> 
> Frank
> 
> -Original Message-
> From: NANOG  On Behalf Of Miles Fidelman
> Sent: Saturday, July 14, 2018 9:54 AM
> To: nanog@nanog.org
> Subject: (perhaps off topic, but) Microwave Towers
> 
> Hi Folks,
> 
> I find myself driving down Route 66.  On our way through Arizona, I was 
> surprised by what look like a lot of old-style microwave links.  They 
> pretty much follow the East-West rail line - where I'd expect there's a 
> lot of fiber buried.
> 
> Struck me as somewhat interesting.
> 
> It also struck me that folks here might have some comments.
> 
> Miles Fidelman
> 
> -- 
> In theory, there is no difference between theory and practice.
> In practice, there is.   Yogi Berra
> 
> 
> 

---
Wayne Bouchard
w...@typo.org
Network Dude
http://www.typo.org/~web/


Re: (perhaps off topic, but) Microwave Towers

2018-07-15 Thread Radu-Adrian Feurdean
On Sat, Jul 14, 2018, at 17:07, Keith Stokes wrote:
> There’s a lot less backhoe fade with microwave. ;-)
> 
> Kidding aside, I’m sure there are plenty of scenarios where microwave 
> makes better sense than fiber especially since it’s a lot easier to 

HFT or any low-latency app is such a scenario (0.999c through air being 50% 
faster than 0.66c though fiber), but that region doesn't fit for that use.