Re: Sunday traffic curiosity

2020-03-22 Thread Mark Tinka



On 23/Mar/20 05:05, Aaron Gould wrote:
> I can see it now Business driver that moved the world towards multicast 
>  2020 Coronavirus

Hehe, the Coronavirus has only accelerated and amplified what was
already coming - the new economy.

You're constantly hearing about "changing business models" or "needing
to be agile" in the face of the Coronavirus. A few businesses saw this
and switched from product to value, years ago. Many feel it in their
bones but are too afraid to make the switch. The rest don't see it at
all and are wondering what the heck is going on. I empathize.


> Also, I wonder how much money would be lost by big pipe providers with 
> multicast working everywhere

Money is going to be lost, for sure, but not to Multicast. That tech. is
dead & gone.

Mark.


Re: Sunday traffic curiosity

2020-03-22 Thread Hugo Slabbert
>
> But that’s already happening. All big content providers are doing just
> that. They even sponsor you the appliance(s) to make more money and save on
> transit costs ;)


Noted; this was a comment on what's already the case, not a proposal for
how to address it instead.  Apologies as I used poor phrasing here.

-- 
Hugo Slabbert   | email, xmpp/jabber: h...@slabnet.com
pgp key: B178313E   | also on Signal


On Sun, Mar 22, 2020 at 6:45 PM Łukasz Bromirski 
wrote:

> Hugo,
>
> > On 23 Mar 2020, at 01:32, Hugo Slabbert  wrote:
> >
> > I think that's the thing:
> > Drop cache boxes inside eyeball networks; fill the caches during
> off-peak; unicast from the cache boxes inside the eyeball provider's
> network to subscribers.  Do a single stream from source to each
> "replication point" (cache box) rather than a stream per ultimate receiver
> from the source, then a unicast stream per ultimate receiver from their
> selected "replication point".  You solve the administrative control problem
> since the "replication point" is an appliance just getting power &
> connectivity from the connectivity service provider, with the appliance
> remaining under the administrative control of the content provider.
>
> But that’s already happening. All big content providers are doing just
> that. They even sponsor you the appliance(s) to make more money and save on
> transit costs ;)
>
> —
> ./


Re: Sunday traffic curiosity

2020-03-22 Thread Owen DeLong



> On Mar 22, 2020, at 15:49 , Mark Tinka  wrote:
> 
> 
> 
> On 23/Mar/20 00:19, Randy Bush wrote:
> 
>> 
>> add to that it is the TV model in a VOD world.  works for sports, maybe,
>> not for netflix
> 
> Agreed - on-demand is the new economy, and sport is the single thing
> still propping up the old economy.
> 
> When sport eventually makes into the new world, linear TV would have
> lost its last legs.

How do you see that happening? Are people going to stop wanting to watch live,
or are teams going to somehow play asynchronously (e.g. Lakers vs. Celtics,
the Lakers play on November 5 at 6 PM and the Celtics play on November 8
at 11 AM)?

Further, it would be more accurate to say that events with large live audiences
are the only thing propping up the “old economy” and sport is probably by far
the largest current application of live streaming.

Remember, this discussion started with a question about live-streaming church
services.

In the “new normal” of a COVID lockdown world, with the huge increase in
teleconferencing, etc. there may well be additional audiences for many-to-many
multicast that aren’t currently implemented.

IMO, the only sane way to do this also helps solve the v4/v6 conferencing 
question.

Local Aggregation Points (LAPs) are anycast customer terminations. Backbone 
between
LAPs supports IPv6-only and IPv6 multicast (intra-domain only). LAPs are not 
sharing
routing table space with backbone routers. Likely some tunnel mechanism is used 
to
link LAPs to each other to shield backbone routers from multicast state 
information.

Each “session” (whether an individual chat, group chat, etc.) gets a unique IPv6
multicast group. Each LAP with at least one user logged into a given session 
will
join that multicast group across the backbone. Users connects to LAPs via 
unicast.
If voice, video, slide, chat streams need to be separated, use different port 
numbers
to do that.

IPv4 nat traversal for the IPv4-only clients is left as an exercise for the 
reader.

Owen



Re: Sunday traffic curiosity

2020-03-22 Thread Owen DeLong



> On Mar 22, 2020, at 13:41 , Alexandre Petrescu  
> wrote:
> 
> 
> Le 22/03/2020 à 21:31, Nick Hilliard a écrit :
>> Grant Taylor via NANOG wrote on 22/03/2020 19:17:
>>> What was wrong with Internet scale multicast?  Why did it get abandoned?
>> 
>> there wasn't any problem with inter-domain multicast that couldn't be 
>> resolved by handing over to level 3 engineering and the vendor's support 
>> escalation team.
>> 
>> But then again, there weren't many problems with inter-domain multicast that 
>> could be resolved without handing over to level 3 engineering and the 
>> vendor's support escalation team.
>> 
>> Nick
> 
> For my part I speculate multicast did not take off at any level (inter 
> domain, intra domain) because pipes grew larger (more bandwidth) faster than 
> uses ever needed.  Even now, I dont hear problems of bandwidth from some end 
> users, like friends using netflix.  I do hear in media that there _might_ be 
> an issue of capacity, but I did not hear that from end users.
> 
> On another hand, link-local multicast does seem to work ok, at least with 
> IPv6.  The problem it solves there is not related to the width of the pipe, 
> but more to resistance against 'storms' that were witnessed during ARP 
> storms.  I could guess that Ethernet pipes are now so large they could 
> accomodate many forms of ARP storms, but for one reason or another IPv6 ND 
> has multicast and no broadcast.  It might even be a problem in the name, in 
> that it is named 'IPv6 multicast ND' but underlying is often implemented with 
> pure broadcast and local filters.

In most cases, though “local” filters, they are filters at the hardware 
interface level and don’t bother the OS, so… WIN!

Also, in most cases, the solicited node address will likely be representative 
of an extremely small number of nodes on the network (very likely 1) so the 
number of CPUs that have to look at each NS packet is greatly reduced… WIN!

Reducing the network traffic to just the ports that need to receive it is a 
pretty small win, but reducing the CPUs that have to look at it and determine 
“Nope, not for me.” is a relatively larger win. If the host properly implements 
IGMP
joins and the switch does correct IGMP snooping, we get both. If not, we still 
get the CPU win.

> If the capacity is reached and if end users need more, then there are two 
> alternative solutions: grow capacity unicast (e.g. 1Tb/s Ethernet) or 
> multicast; it's useless to do both.  If we cant do 1 Tb/s Ethernet 
> ('apocalypse'  was called by some?) then we'll do multicast.

My inter domain multicast comment was largely tongue in cheek and was never 
intended as a serious proposal.

There are a plethora of issues with inter domain multicast including, but not 
limited to the fact that it’s a great way to invite smurf-style attacks (after 
all, smurfing is what mcast groups are intended to do).

Owen



Re: Sunday traffic curiosity

2020-03-22 Thread james jones
I know Facebook live had some congestion/capacity issues in some geographical 
regions this AM. 

Sent from my iPhone

> On Mar 22, 2020, at 2:59 PM, Andy Ringsmuth  wrote:
> 
> Fellow NANOGers,
> 
> Not a big deal by any means, but for those of you who have traffic data, I’m 
> curious what Sunday morning looked like as compared to other Sundays. Sure, 
> Netflix and similar companies have no doubt seen traffic increase, but I’m 
> wondering if an influx of church service streaming was substantial enough to 
> cause a noticeable traffic increase.
> 
> We livestream our services and have been for about a year or so, but normally 
> average just a handful of viewers. Today, we were around 150 watching live.
> 
> 
> Andy Ringsmuth
> a...@andyring.com
> 


RE: Sunday traffic curiosity

2020-03-22 Thread Aaron Gould
I can see it now Business driver that moved the world towards multicast 
 2020 Coronavirus

Also, I wonder how much money would be lost by big pipe providers with 
multicast working everywhere

-Aaron

-Original Message-
From: NANOG [mailto:nanog-boun...@nanog.org] On Behalf Of Alexandre Petrescu
Sent: Sunday, March 22, 2020 3:41 PM
To: nanog@nanog.org
Subject: Re: Sunday traffic curiosity


Le 22/03/2020 à 21:31, Nick Hilliard a écrit :
> Grant Taylor via NANOG wrote on 22/03/2020 19:17:
>> What was wrong with Internet scale multicast?  Why did it get abandoned?
>
> there wasn't any problem with inter-domain multicast that couldn't be 
> resolved by handing over to level 3 engineering and the vendor's 
> support escalation team.
>
> But then again, there weren't many problems with inter-domain 
> multicast that could be resolved without handing over to level 3 
> engineering and the vendor's support escalation team.
>
> Nick

For my part I speculate multicast did not take off at any level (inter 
domain, intra domain) because pipes grew larger (more bandwidth) faster 
than uses ever needed.  Even now, I dont hear problems of bandwidth from 
some end users, like friends using netflix.  I do hear in media that 
there _might_ be an issue of capacity, but I did not hear that from end 
users.

On another hand, link-local multicast does seem to work ok, at least 
with IPv6.  The problem it solves there is not related to the width of 
the pipe, but more to resistance against 'storms' that were witnessed 
during ARP storms.  I could guess that Ethernet pipes are now so large 
they could accomodate many forms of ARP storms, but for one reason or 
another IPv6 ND has multicast and no broadcast.  It might even be a 
problem in the name, in that it is named 'IPv6 multicast ND' but 
underlying is often implemented with pure broadcast and local filters.

If the capacity is reached and if end users need more, then there are 
two alternative solutions: grow capacity unicast (e.g. 1Tb/s Ethernet) 
or multicast; it's useless to do both.  If we cant do 1 Tb/s Ethernet 
('apocalypse'  was called by some?) then we'll do multicast.

I think,

Alex, LF/HF 3




Re: Sunday traffic curiosity

2020-03-22 Thread Łukasz Bromirski
Hugo,

> On 23 Mar 2020, at 01:32, Hugo Slabbert  wrote:
> 
> I think that's the thing:
> Drop cache boxes inside eyeball networks; fill the caches during off-peak; 
> unicast from the cache boxes inside the eyeball provider's network to 
> subscribers.  Do a single stream from source to each "replication point" 
> (cache box) rather than a stream per ultimate receiver from the source, then 
> a unicast stream per ultimate receiver from their selected "replication 
> point".  You solve the administrative control problem since the "replication 
> point" is an appliance just getting power & connectivity from the 
> connectivity service provider, with the appliance remaining under the 
> administrative control of the content provider.

But that’s already happening. All big content providers are doing just that. 
They even sponsor you the appliance(s) to make more money and save on transit 
costs ;)

— 
./

Frontier Pennsylvania

2020-03-22 Thread Matt Hoppes
Does anyone have a contact for Frontier Central PA OSP contact?

There is a line that has been down for over 8 months that I have been unable to 
get them to hang. 

It is across a driveway and roadway. 

Re: Sunday traffic curiosity

2020-03-22 Thread Hugo Slabbert
I think that's the thing:
Drop cache boxes inside eyeball networks; fill the caches during off-peak;
unicast from the cache boxes inside the eyeball provider's network to
subscribers.  Do a single stream from source to each "replication point"
(cache box) rather than a stream per ultimate receiver from the source,
then a unicast stream per ultimate receiver from their selected
"replication point".  You solve the administrative control problem since
the "replication point" is an appliance just getting power & connectivity
from the connectivity service provider, with the appliance remaining under
the administrative control of the content provider.

It seems to be good enough to support business models pulling in billions
of dollars a year.

This does require the consumption of the media to be decoupled from the
original distribution of the content to the cache, obviously, hence the
live sports mismatch.  But it seems this catches enough of the use cases
and bandwidth demands, and to have won the "good enough" battle vs.
inter-domain multicast.

I would venture there are large percentage increases now in realtime use
cases as Zoom & friends take off more, but the bulk of the
anecdotal evidence thus far seems to indicate absolute traffic levels to
largely be below historical peaks from exceptional events (large
international content distribution events).

-- 
Hugo Slabbert   | email, xmpp/jabber: h...@slabnet.com
pgp key: B178313E   | also on Signal


On Sun, Mar 22, 2020 at 3:51 PM Mark Tinka  wrote:

>
>
> On 23/Mar/20 00:19, Randy Bush wrote:
>
> >
> > add to that it is the TV model in a VOD world.  works for sports, maybe,
> > not for netflix
>
> Agreed - on-demand is the new economy, and sport is the single thing
> still propping up the old economy.
>
> When sport eventually makes into the new world, linear TV would have
> lost its last legs.
>
> Mark.
>


Re: Sunday traffic curiosity

2020-03-22 Thread Mark Tinka



On 23/Mar/20 00:19, Randy Bush wrote:

>
> add to that it is the TV model in a VOD world.  works for sports, maybe,
> not for netflix

Agreed - on-demand is the new economy, and sport is the single thing
still propping up the old economy.

When sport eventually makes into the new world, linear TV would have
lost its last legs.

Mark.


Re: Sunday traffic curiosity

2020-03-22 Thread Mark Tinka


On 22/Mar/20 23:36, Valdis Kl ē tnieks wrote:

> It failed to scale for some of the exact same reasons QoS failed to scale -
> what works inside one administrative domain doesn't work once it crosses 
> domain
> boundaries.

This, for me, is one of the biggest reasons I feel inter-AS Multicast
does not work. Can you imagine trying to troubleshoot issues between two
or more separate networks?

At $previous_job, we carried and delivered IPTV streams from a head-end
that was under the domain of the broadcasting company. Co-ordination of
feed ingestion, e.t.c. got too complicated that we ended up agreeing to
take full management of the CE router. That isn't something you can
always expect; it worked for us because this was the first time it was
being done in the country.


>
> Plus, there's a lot more state to keep - if you think spanning tree gets ugly
> if the tree gets too big, think about what happens when the multicast covers
> 3,000 people in 117 ASN's, with people from multiple ASN's joining and leaving
> every few seconds.

We ran NG-MVPN which created plenty of RSVP-TE state in the core.

The next move to was migrate to mLDP just to simplify state management.
I'm not sure if the company ever did, as I had to leave.

Mark.



signature.asc
Description: OpenPGP digital signature


Re: Sunday traffic curiosity

2020-03-22 Thread Randy Bush
> It failed to scale for some of the exact same reasons QoS failed to
> scale - what works inside one administrative domain doesn't work once
> it crosses domain boundaries.
> 
> Plus, there's a lot more state to keep - if you think spanning tree
> gets ugly if the tree gets too big, think about what happens when the
> multicast covers 3,000 people in 117 ASN's, with people from multiple
> ASN's joining and leaving every few seconds.

add to that it is the TV model in a VOD world.  works for sports, maybe,
not for netflix

randy


Re: Sunday traffic curiosity

2020-03-22 Thread Valdis Klētnieks
On Sun, 22 Mar 2020 13:17:59 -0600, Grant Taylor via NANOG said:

> As someone who 1) wasn't around during the last Internet scale foray
> into multicast and 2) working with multicast in a closed environment,
> I'm curios:
>
> What was wrong with Internet scale multicast?  Why did it get abandoned?

It failed to scale for some of the exact same reasons QoS failed to scale -
what works inside one administrative domain doesn't work once it crosses domain
boundaries.

Plus, there's a lot more state to keep - if you think spanning tree gets ugly
if the tree gets too big, think about what happens when the multicast covers
3,000 people in 117 ASN's, with people from multiple ASN's joining and leaving
every few seconds.



pgpAD8OWKMaNy.pgp
Description: PGP signature


Re: Sunday traffic curiosity

2020-03-22 Thread Matthias Waehlisch


On Sun, 22 Mar 2020, John Kristoff wrote:

> On Sun, 22 Mar 2020 19:17:59 +
> Grant Taylor via NANOG  wrote:
> 
> > What was wrong with Internet scale multicast?  Why did it get abandoned?
> 
> There are about 20 years of archives to weed through, 
>
  most of the challenges, in particular incentive aspects, have been 
nicely discussed in "Deployment issues for the IP multicast service and 
architecture," IEEE Network 2000: 
https://www.cl.cam.ac.uk/teaching/1314/R02/papers/multicastdeploymentissues.pdf


Cheers
  matthias

-- 
Matthias Waehlisch
.  Freie Universitaet Berlin, Computer Science
.. http://www.cs.fu-berlin.de/~waehl


Re: Sunday traffic curiosity

2020-03-22 Thread Saku Ytti
On Sun, 22 Mar 2020 at 22:43, Alexandre Petrescu
 wrote:

> On another hand, link-local multicast does seem to work ok, at least
> with IPv6.  The problem it solves there is not related to the width of
> the pipe, but more to resistance against 'storms' that were witnessed
> during ARP storms.  I could guess that Ethernet pipes are now so large

This is a case where the cure is far worse than the poison. People do
not run IPv6 ND like this, because you can't scale it. It would be
trivial for anyone in the LAN to exhaust multicast states on the L2
switch. It is entirely uneconomical to build L2 switch which could
support all the mcast groups ND could need. So those do not exist
today, defensive configuration floods the ND frames, just the same as
ARP.

You also cannot scale interdomain multicast (bier is trying to solve
this), because every flow S,G needs to be programmed in HW with list
of egress entries, this is very expensive to store and very expensive
to look, it is flow routing. Today already lookup speeds are not
limited by silicon but by memory access, and the scale of the problem
is much much smaller (and bound) in ucast.

-- 
  ++ytti


Re: Sunday traffic curiosity

2020-03-22 Thread Alexandre Petrescu



Le 22/03/2020 à 21:31, Nick Hilliard a écrit :

Grant Taylor via NANOG wrote on 22/03/2020 19:17:

What was wrong with Internet scale multicast?  Why did it get abandoned?


there wasn't any problem with inter-domain multicast that couldn't be 
resolved by handing over to level 3 engineering and the vendor's 
support escalation team.


But then again, there weren't many problems with inter-domain 
multicast that could be resolved without handing over to level 3 
engineering and the vendor's support escalation team.


Nick


For my part I speculate multicast did not take off at any level (inter 
domain, intra domain) because pipes grew larger (more bandwidth) faster 
than uses ever needed.  Even now, I dont hear problems of bandwidth from 
some end users, like friends using netflix.  I do hear in media that 
there _might_ be an issue of capacity, but I did not hear that from end 
users.


On another hand, link-local multicast does seem to work ok, at least 
with IPv6.  The problem it solves there is not related to the width of 
the pipe, but more to resistance against 'storms' that were witnessed 
during ARP storms.  I could guess that Ethernet pipes are now so large 
they could accomodate many forms of ARP storms, but for one reason or 
another IPv6 ND has multicast and no broadcast.  It might even be a 
problem in the name, in that it is named 'IPv6 multicast ND' but 
underlying is often implemented with pure broadcast and local filters.


If the capacity is reached and if end users need more, then there are 
two alternative solutions: grow capacity unicast (e.g. 1Tb/s Ethernet) 
or multicast; it's useless to do both.  If we cant do 1 Tb/s Ethernet 
('apocalypse'  was called by some?) then we'll do multicast.


I think,

Alex, LF/HF 3



Re: Sunday traffic curiosity

2020-03-22 Thread Nick Hilliard

Grant Taylor via NANOG wrote on 22/03/2020 19:17:

What was wrong with Internet scale multicast?  Why did it get abandoned?


there wasn't any problem with inter-domain multicast that couldn't be 
resolved by handing over to level 3 engineering and the vendor's support 
escalation team.


But then again, there weren't many problems with inter-domain multicast 
that could be resolved without handing over to level 3 engineering and 
the vendor's support escalation team.


Nick


Re: Sunday traffic curiosity

2020-03-22 Thread John Kristoff
On Sun, 22 Mar 2020 19:17:59 +
Grant Taylor via NANOG  wrote:

> What was wrong with Internet scale multicast?  Why did it get abandoned?

There are about 20 years of archives to weed through, and some of our
friends are still trying to make this happen.  I expect someone (Hi
Lenny) to appear any moment and mention AMT.  So my take isn't
universally accepted, but it won't be too far from what you'll hear
from many. Brief summary off the top of my head:

1. Complexity.  Both in protocol mechanisms and the requirements in
network devices (i.e. snooping, state, troubleshooting).

2. Security. Driven in part by #1, threats abound.  SSM can eliminate
some of this and you can design a receiver-only model that removes most
of the remaining problems - congratulations you just reinvented over
the air broadcast TV.  Even if you don't do interdomain IP multicast,
you still may be at risk:

  

3. Need and business drivers.  Still far from compelling to build and
support all this to make it worthwhile for all but a few niche
environments.

Support and expertise in this area is also very thin.  Your inquiry
demonstrates this.  I stopped teaching it to students.  What remains is
becoming even less well supported than it has been.  There is almost no
interdomain IP multicast monitoring being done anymore.  There is scant
actual content being delivered, all the once popular stuff is gone.
The number of engineers who know this stuff are dwindling and some that
do know something about it are removing at least some parts of it:

  

John


Re: Sunday traffic curiosity

2020-03-22 Thread Mark Tinka



On 22/Mar/20 20:57, Andy Ringsmuth wrote:
> Fellow NANOGers,
>
> Not a big deal by any means, but for those of you who have traffic data, I’m 
> curious what Sunday morning looked like as compared to other Sundays. Sure, 
> Netflix and similar companies have no doubt seen traffic increase, but I’m 
> wondering if an influx of church service streaming was substantial enough to 
> cause a noticeable traffic increase.
>
> We livestream our services and have been for about a year or so, but normally 
> average just a handful of viewers. Today, we were around 150 watching live.

Our Sunday morning, today, was not our highest peak since the 17th. Our
highest peak since the 17th was yesterday (Saturday morning), at around
0900hrs UTC.

Peak increase since the 17th went to 15%. Saturday morning was at 17.5%.

Mark.


Re: Sunday traffic curiosity

2020-03-22 Thread Matt Hoppes

We didn't really see a noticeable inbound or outbound traffic change.

But we also streamed and had 80+ people watching online, so there was 
absolutely a traffic shift.


Still, Sunday Mornings are low traffic periods normally anyway, so the 
overall traffic "dent" was minimal.


Re: Sunday traffic curiosity

2020-03-22 Thread Saku Ytti
On Sun, 22 Mar 2020 at 21:20, Grant Taylor via NANOG  wrote:

> What was wrong with Internet scale multicast?  Why did it get abandoned?

It is flow based routing, we do not have a solution to store and
lookup large amount of flows.

-- 
  ++ytti


Re: Sunday traffic curiosity

2020-03-22 Thread Grant Taylor via NANOG

On 3/22/20 1:11 PM, John Kristoff wrote:

Owen DeLong  wrote:

Maybe it’s time to revisit inter-domain multicast?


Uhmm... no thank you.  :-)


As someone who 1) wasn't around during the last Internet scale foray 
into multicast and 2) working with multicast in a closed environment, 
I'm curios:


What was wrong with Internet scale multicast?  Why did it get abandoned?



--
Grant. . . .
unix || die



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Sunday traffic curiosity

2020-03-22 Thread nanog
We are still far away from apocalypse to realistically think about
inter-domain multicast.
And even if we were ..


On 3/22/20 8:08 PM, Owen DeLong wrote:
> Maybe it’s time to revisit inter-domain multicast?
> 
> Owen
> 
> 
>> On Mar 22, 2020, at 11:57 , Andy Ringsmuth  wrote:
>>
>> Fellow NANOGers,
>>
>> Not a big deal by any means, but for those of you who have traffic data, I’m 
>> curious what Sunday morning looked like as compared to other Sundays. Sure, 
>> Netflix and similar companies have no doubt seen traffic increase, but I’m 
>> wondering if an influx of church service streaming was substantial enough to 
>> cause a noticeable traffic increase.
>>
>> We livestream our services and have been for about a year or so, but 
>> normally average just a handful of viewers. Today, we were around 150 
>> watching live.
>>
>> 
>> Andy Ringsmuth
>> a...@andyring.com
>>
> 



Re: Sunday traffic curiosity

2020-03-22 Thread John Kristoff
On Sun, 22 Mar 2020 19:08:24 +
Owen DeLong  wrote:

> Maybe it’s time to revisit inter-domain multicast?

Uhmm... no thank you.  :-)

John


Re: Sunday traffic curiosity

2020-03-22 Thread Owen DeLong
Maybe it’s time to revisit inter-domain multicast?

Owen


> On Mar 22, 2020, at 11:57 , Andy Ringsmuth  wrote:
> 
> Fellow NANOGers,
> 
> Not a big deal by any means, but for those of you who have traffic data, I’m 
> curious what Sunday morning looked like as compared to other Sundays. Sure, 
> Netflix and similar companies have no doubt seen traffic increase, but I’m 
> wondering if an influx of church service streaming was substantial enough to 
> cause a noticeable traffic increase.
> 
> We livestream our services and have been for about a year or so, but normally 
> average just a handful of viewers. Today, we were around 150 watching live.
> 
> 
> Andy Ringsmuth
> a...@andyring.com
> 



Sunday traffic curiosity

2020-03-22 Thread Andy Ringsmuth
Fellow NANOGers,

Not a big deal by any means, but for those of you who have traffic data, I’m 
curious what Sunday morning looked like as compared to other Sundays. Sure, 
Netflix and similar companies have no doubt seen traffic increase, but I’m 
wondering if an influx of church service streaming was substantial enough to 
cause a noticeable traffic increase.

We livestream our services and have been for about a year or so, but normally 
average just a handful of viewers. Today, we were around 150 watching live.


Andy Ringsmuth
a...@andyring.com



Re: interesting troubleshooting

2020-03-22 Thread Mark Tinka



On 22/Mar/20 19:17, Saku Ytti wrote:

> You don't need both. My rule of thumb, green field, go with entropy
> and get all the services in one go. Brown field, go FAT, and target
> just PW, ensure you also have CW, then let transit LSR balance
> MPLS-IP. With entropy label you can entirely disable transit LSR
> payload heuristics.

We moved to our current strategy back in 2015/2016, after running
through multiple combinations of FAT and entropy.

I'm curious to give it another go in 2020, but if I'm honest, I'm
pleased with the simplicity of our current setup.

Mark.


Re: interesting troubleshooting

2020-03-22 Thread Saku Ytti
On Sun, 22 Mar 2020 at 16:25, Mark Tinka  wrote:

> So the latter. We used both FAT + entropy to provide even load balancing
> of l2vpn payloads in the edge and core, with little success.

You don't need both. My rule of thumb, green field, go with entropy
and get all the services in one go. Brown field, go FAT, and target
just PW, ensure you also have CW, then let transit LSR balance
MPLS-IP. With entropy label you can entirely disable transit LSR
payload heuristics.

-- 
  ++ytti


Re: interesting troubleshooting

2020-03-22 Thread Mark Tinka



On 22/Mar/20 11:52, Saku Ytti wrote:

> So you're not even talking about multivendor, as both ends are JNPR?
> Or are you confusing entropy label with FAT?

Some cases were MX480 to ASR920, but most were MX480 to MX480, either
transiting CRS.


>
> Transit doesn't know anything about FAT, FAT is PW specific and is
> only signalled between end-points. Entropy label applies to all
> services and is signalled to adjacent device. Transit just sees 1
> label longer label stack, with hope (not promise) that transit uses
> the additional label for hashing.

So the latter. We used both FAT + entropy to provide even load balancing
of l2vpn payloads in the edge and core, with little success.



> You really should be doing CW+FAT.

Yeah - just going back to basics with ECMP worked well, and I'd prefer
to use solutions that are less exotic as possible.


>  And looking your other email, dear
> god, don't do per-packet outside some unique application where you
> control the TCP stack :). Modern Windows, Linux, MacOS TCP stack
> considers out-of-order as packet loss, this is not inherent to TCP, if
> you can change TCP congestion control, you can make reordering
> entirely irrelevant to TCP. But in most cases of course we do not
> control TCP algo, so per-packet will not work one bit.

Like I said, that was 2014. We tested it for a couple of months, mucked
around as much as we could, and decided it wasn't worth the bother.


>
> Like OP, you should enable adaptive.

That's what I said we are doing since 2014, unless I wasn't clear.

Mark.



Re: interesting troubleshooting

2020-03-22 Thread Saku Ytti
On Sun, 22 Mar 2020 at 09:41, Mark Tinka  wrote:

> We weren't as successful (MX480 ingress/egress devices transiting a CRS
> core).

So you're not even talking about multivendor, as both ends are JNPR?
Or are you confusing entropy label with FAT?

Transit doesn't know anything about FAT, FAT is PW specific and is
only signalled between end-points. Entropy label applies to all
services and is signalled to adjacent device. Transit just sees 1
label longer label stack, with hope (not promise) that transit uses
the additional label for hashing.

> In the end, we updated our policy to avoid running LAG's in the
> backbone, and going ECMP instead. Even with l2vpn payloads, that spreads
> a lot more evenly.

You really should be doing CW+FAT. And looking your other email, dear
god, don't do per-packet outside some unique application where you
control the TCP stack :). Modern Windows, Linux, MacOS TCP stack
considers out-of-order as packet loss, this is not inherent to TCP, if
you can change TCP congestion control, you can make reordering
entirely irrelevant to TCP. But in most cases of course we do not
control TCP algo, so per-packet will not work one bit.

Like OP, you should enable adaptive. This thread is conflating few
different balancing issues, so I'll take the opportunity to classify
them.

1. Bad hashing implementation
1.1 Insufficient amount of hash-results
Think say 6500/7600, what if you only have 8 hash-results and
7 interfaces? You will inherently have 2x more traffic on one
interface
1.2 Bad algorithm
Different hashes have different use-cases, and we often try to
think golden-hammer for hashes (like we tend to use bad hashes for
password hashing, like SHA etc, when goal of SHA is to be fast in HW,
which is opposite to the goal of PW hash, as you want it to be slow).
Equally since the day1 of ethernet silicon, we've had CRC in the
slicion, and it has since then been grandfathered hash load-balancing
hash. But CRC goals are completely different to hash-algo goals, CRC
does not try, and does not need good diffusion quality, hash-algo only
needs perfect diffusion, nothing else matters. CRC has terrible
diffusion quality, instead of implementing specific good-diffusion
hash in silicon vendors do stuff like rot(crcN(x), crcM(x)) which
greatly improves diffusion, but is still very bad diffusion compared
to hash algos which are designed for perfect diffusion. Poor diffusion
means you have different flow count in egressInts. As I can't do math,
I did monte-carlo simulation to see what type of bias should we expect
even with _perfect_ diffusion:

- Here we have 3 egressInt and we run monte carlo until we stop
getting worse Bias (of course if we wait for heath death of universe,
we will see eventually see every flow in singleInt, even with perfect
diffusion). But in normal situation if you see worse bias, you should
blame poor diffusion quality of vendor algo, if you see bias of this
or lower, it's probably not diffusion you should blame

Flows | MaxBias | Example Flow Count per Int
1k | 6.9% | 395, 341, 264
10k | 2.2% | 3490, 3396, 3114
100k |0.6% |  33655, 32702, 33643
1M | 0.2% | 334969, 332424, 332607


2. Elephant flows
Even if we assume perfect diffusion, so each egressInt gets
exactly same amount of flows, the flows may still be wildly different
bps, and there is nothing we do by tuning the hash algo to fix this.
The prudent fix here is to have mapping-table between hash-result and
egressInt, so that we can inject bias, not to have fair distribution
between hash-result and egressInt, but to have fewer hash-results
point to the congested egressInt. This is easy, ~free to implement in
HW. JNPR does it, NOK is happy to implement should customer want it.
This of course also fixes bad algorithmic diffusion, so it's really
really great tool to have in your toolbox and I think everyone should
be running this feature.


3. Incorrect key recovery
   Balancing is promise that we know which keys identify a flow. In
common case this is simple problem, but there are lot of complexity
particularly in MPLS transit. The naive/simple problem everyone knows
about is pseudowire flow in-transit parsed as IPv4/IPv6 flow, when
DMAC starts with 4 or 6. Some vendors (JNPR, Huawei) do additional
checks, like perhaps IP checksum or IP packet length, but this is
actually making the situation worse, the problem triggers far less
often, but when it triggers, it will be so much more exotic, as now
you have underlaying frame where by luck you also have your IP packet
length supposedly correct. So you can end up in weird situations where
end-customers network works perfectly, then they implement IPSEC from
all hosts to concentrator, still riding over your backbone, and now
suddently one customer host stops working, after enabling IPSEC,
everything else works. The chances that this trouble-ticket even ever
ends on your table is low and the possibility that based on the
problem description  you'd blame the 

Re: interesting troubleshooting

2020-03-22 Thread Saku Ytti
Hey Tassos,

On Sat, 21 Mar 2020 at 22:51, Tassos Chatzithomaoglou
 wrote:

> Yep, the RFC gives this option.
> Does Juniper MX/ACX series support it?
> I know for sure Cisco doesn't.

I only run bidir, which Cisco do you mean? ASR9k allows you to configure it.

  both  Insert/Discard Flow label on transmit/recceive
  code  Flow label TLV code
  receive   Discard Flow label on receive
  transmit  Insert Flow label on transmit

JunOS as well:

  flow-label-receive   Advertise capability to pop Flow Label in
receive direction to remote PE
  flow-label-receive-static  Pop Flow Label from PW packets received
from remote PE
  flow-label-transmit  Advertise capability to push Flow Label in
transmit direction to remote PE
  flow-label-transmit-static  Push Flow Label on PW packets sent to remote PE


RP/0/RP0/CPU0:r14.labxtx01.us.(config-l2vpn-pwc-mpls)#do show l2vpn
xconnect interface Te0/2/0/3/7.1000 detail
..

  PW: neighbor 204.42.110.29, PW ID 1290, state is up ( established )
PW class ethernet-ccc, XC ID 0xa025
Encapsulation MPLS, protocol LDP
Source address 204.42.110.15
PW type Ethernet, control word disabled, interworking none
PW backup disable delay 0 sec
Sequencing not set
LSP : Up
Load Balance Hashing: src-dst-ip
Flow Label flags configured (Tx=1,Rx=0), negotiated (Tx=1,Rx=0)



y...@r28.labxtx01.us.bb# run show l2circuit connections interface et-0/0/54:3.0
...
Neighbor: 204.42.110.15
Interface Type  St Time last up  # Up trans
et-0/0/54:3.0(vc 1290)rmt   Up Mar 20 04:06:45 2020   7
  Remote PE: 204.42.110.15, Negotiated control-word: No
  Incoming label: 585, Outgoing label: 24003
  Negotiated PW status TLV: No
  Local interface: et-0/0/54:3.0, Status: Up, Encapsulation: ETHERNET
Description: BD: wmccall ixia 1-1
  Flow Label Transmit: No, Flow Label Receive: Yes
...


I didn't push bits, but at least I can signal unidir between ASR9k and PTX1k.


-- 
  ++ytti


Re: interesting troubleshooting

2020-03-22 Thread Matthew Petach
On Sat, Mar 21, 2020 at 12:53 AM Saku Ytti  wrote:

> Hey Matthew,
>
> > There are *several* caveats to doing dynamic monitoring and remapping of
> > flows; one of the biggest challenges is that it puts extra demands on the
> > line cards tracking the flows, especially as the number of flows rises to
> > large values.  I recommend reading
> >
> https://www.juniper.net/documentation/en_US/junos/topics/topic-map/load-balancing-aggregated-ethernet-interfaces.html#id-understanding-aggregated-ethernet-load-balancing
> > before configuring it.
>
> You are confusing two features. Stateful and adaptive. I was proposing
> adaptive, which just remaps the table, which is free, it is not flow
> aware. Amount of flow results is very small bound number, amount of
> states is very large unbound number.
>

Ah, apologies--you are right, I scanned down the linked document too
quickly,
thinking it was a single set of configuration notes.

Thanks for setting me straight on that.

Matt


>
> --
>   ++ytti
>
>


Re: interesting troubleshooting

2020-03-22 Thread Mark Tinka



On 22/Mar/20 10:08, Adam Atkinson wrote:

>
> I don't know how well-known this is, and it may not be something many
> people would want to do, but Enterasys switches, now part of Extreme's
> portfolio, allow "round-robin" as a load-sharing algorithm on LAGs.
>
> see e.g.
>
> https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-LACP-Output-Algorithm-as-Round-Robin
>
>
> This may not be the only product line supporting this.

So Junos does support both per-flow and per-packet load balancing on
LAG's on Trio line cards.

We tested this back in 2014 for a few months, and while the spread is
excellent (obviously), it creates a lot of out-of-order frame delivery
conditions, and all the pleasure & joy that goes along with that.

So we switched back to per-flow load balancing, and more recently, where
we run LAG's (802.1Q trunks between switches and an MX480 in the data
centre), we've gone 100Gbps so we don't have to deal with all this
anymore :-).

Mark.


Re: interesting troubleshooting

2020-03-22 Thread Adam Atkinson

On 20/03/2020 21:33, Nimrod Levy wrote:


I was contacted by my NOC to investigate a LAG that was not distributing
traffic evenly among the members to the point where one member was
congested while the utilization on the LAG was reasonably low.


I don't know how well-known this is, and it may not be something many 
people would want to do, but Enterasys switches, now part of Extreme's 
portfolio, allow "round-robin" as a load-sharing algorithm on LAGs.


see e.g.

https://gtacknowledge.extremenetworks.com/articles/How_To/How-to-configure-LACP-Output-Algorithm-as-Round-Robin

This may not be the only product line supporting this.



Re: interesting troubleshooting

2020-03-22 Thread Mark Tinka



On 21/Mar/20 18:25, Saku Ytti wrote:

> Yeah we run it in a multivendor network (JNPR, CSCO, NOK), works.
>
> I would also recommend people exclusively using CW+FAT and disabling
> LSR payload heuristics (JNPR default, but by default won't do with CW,
> can do with CW too).

We weren't as successful (MX480 ingress/egress devices transiting a CRS
core).

In the end, we updated our policy to avoid running LAG's in the
backbone, and going ECMP instead. Even with l2vpn payloads, that spreads
a lot more evenly.

Mark.


Re: China’s Slow Transnational Network

2020-03-22 Thread Pengxiong Zhu
Thank you for your insights. We are not so familiar with interconnect and
peering, we will ask you some questions for clarification first. Hope you
don't mind. :-)

When there is a tri-opoly, with no opportunity of competition, its easily
> possible to set prices which are very different than market conditions.

I assume the tri-opoly involves a Chinese ISP, outside ISP A, outside ISP
B. Who is competing with whom? Why its easily possible to set prices which
are very different than market conditions?


> additionally, the three don't purchase enough to cover demand for their
> own network.
>
Do you mean that the three don't purchase enough capacity for their traffic
going out of their network(China->Outside)? If this is what you mean,
however, we don't observe low speed in that direction. We assume there is
not so much traffic going out of China, comparing to the traffic coming in.
Also, why would the three purchase outbound traffic if they set their
inbound traffic artificially high? They could charge some peers less for
the outbound traffic to solve the problem.

Best,
Pengxiong Zhu
Department of Computer Science and Engineering
University of California, Riverside


On Mon, Mar 2, 2020 at 2:58 PM Tom Paseka  wrote:

> Most of the performance hit is because of commercial actions, not
> censorship.
>
> When there is a tri-opoly, with no opportunity of competition, its easily
> possible to set prices which are very different than market conditions.
> This is what is happening here.
>
> Prices are set artificially high, so their interconnection partners wont
> purchase enough capacity. additionally, the three don't purchase enough to
> cover demand for their own network. Results in congestion.
>
> On Mon, Mar 2, 2020 at 2:49 PM Pengxiong Zhu  wrote:
>
>> You seem to be implying that you don't believe/can't see the GFW
>>
>>
>> No, that's not what I meant. I thought mandatory content filtering at the
>> border means traffic throttling at the border, deliberately or accidentally
>> rate-limiting the traffic, now
>> I think he was referring to GFW and the side effect of deep packet
>> inspection.
>>
>> In fact, we designed a small experiment to locate the hops with GFW
>> presence, and then try to match them with the bottleneck hops. We found
>> only in 34.45% of the cases, the GFW hops match the bottleneck hops.
>>
>> Best,
>> Pengxiong Zhu
>> Department of Computer Science and Engineering
>> University of California, Riverside
>>
>>
>> On Mon, Mar 2, 2020 at 1:13 PM Matt Corallo  wrote:
>>
>>> > find out direct evidence of mandatory content filtering at the border
>>>
>>> You seem to be implying that you don't believe/can't see the GFW, which
>>> seems surprising. I've personally had issues with traffic crossing it
>>> getting RST'd (luckily I was fortunate enough to cross through a GFW
>>> instance which was easy to avoid with a simple iptables DROP), but its
>>> also one of the most well-studied bits of opaque internet censorship
>>> gear in the world. I'm not sure how you could possibly miss it.
>>>
>>> Matt
>>>
>>> On 3/2/20 2:55 PM, Pengxiong Zhu wrote:
>>> > Yes, we agree. The poor transnational Internet performance effectively
>>> > puts any foreign business that does not have a physical presence (i.e.,
>>> > servers) in China at a disadvantage.
>>> > The challenge is to find out direct evidence to prove mandatory content
>>> > filtering at the border, if the government is actually doing it.
>>> >
>>> > Best,
>>> > Pengxiong Zhu
>>> > Department of Computer Science and Engineering
>>> > University of California, Riverside
>>> >
>>> >
>>> > On Mon, Mar 2, 2020 at 8:38 AM Matt Corallo >> > > wrote:
>>> >
>>> > It also gives local competitors a leg up by helping domestic apps
>>> > perform better simply by being hosted domestically (or making
>>> > foreign players host inside China).
>>> >
>>> >> On Mar 2, 2020, at 11:27, Ben Cannon >> >> > wrote:
>>> >>
>>> >> 
>>> >> It’s the Government doing mandatory content filtering at the
>>> >> border.  Their hardware is either deliberately or accidentally
>>> >> poor-performing.
>>> >>
>>> >> I believe providing limited and throttled external connectivity
>>> >> may be deliberate; think of how that curtails for one thing;
>>> >> streaming video?
>>> >>
>>> >> -Ben.
>>> >>
>>> >> -Ben Cannon
>>> >> CEO 6x7 Networks & 6x7 Telecom, LLC
>>> >> b...@6by7.net 
>>> >>
>>> >>
>>> >>
>>> >>> On Mar 1, 2020, at 9:00 PM, Pengxiong Zhu >> >>> > wrote:
>>> >>>
>>> >>> Hi all,
>>> >>>
>>> >>> We are a group of researchers at University of California,
>>> >>> Riverside who have been working on measuring the transnational
>>> >>> network performance (and have previously asked questions on the
>>> >>> mailing list). Our work has now led to a publication in
>>> >>> Sigmetrics 2020 and we are