Re: JunOS/FRR/Nokia et al BGP critical issue

2023-08-31 Thread Jeff Tantsura
FRR fix went into 9.0 and has been back ported to 8.5 and 8.4 , Cumulus 5.6 
will include the fix.

Cheers,
Jeff

> On Aug 30, 2023, at 5:32 AM, Mark Prosser  wrote:
> 
> Thanks for sharing this, Mike. I saw it on lobste.rs yesterday and figured 
> everyone would be ahead.
> 
> I'm running VyOS in a volunteer WISP but not with BGP peering... I'm thinking 
> to test it now as we'll likely swap in VyOS for it soon.
> 
> I saw this PR as a reply on Mastodon:
> 
> https://github.com/FRRouting/frr/pull/14290
> 
> Warm regards,
> 
> Mark
> 



Re: Lossy cogent p2p experiences?

2023-08-31 Thread David Hubbard
That’s not what I’m trying to do, that’s just what I’m using during testing to 
demonstrate the loss to them.  It’s intended to bridge a number of networks 
with hundreds of flows, including inbound internet sources, but any new TCP 
flow is subject to numerous dropped packets at establishment and then ongoing 
loss every five to ten seconds.  The initial loss and ongoing bursts of loss 
cause the TCP window to shrink so much that any single flow, between systems 
that can’t be optimized, ends up varying from 50 Mbit/sec to something far 
short of a gigabit.  It was also fine for six months before this miserable 
behavior began in late June.


From: Eric Kuhnke 
Date: Thursday, August 31, 2023 at 4:51 PM
To: David Hubbard 
Cc: Nanog@nanog.org 
Subject: Re: Lossy cogent p2p experiences?
Cogent has asked many people NOT to purchase their ethernet private circuit 
point to point service unless they can guarantee that you won't move any single 
flow of greater than 2 Gbps. This works fine as long as the service is used 
mostly for mixed IP traffic like a bunch of randomly mixed customers together.

What you are trying to do is probably against the guidelines their engineering 
group has given them for what they can sell now.

This is a known weird limitation with Cogent's private circuit service.

The best working theory that several people I know in the neteng community have 
come up with is because Cogent does not want to adversely impact all other 
customers on their router in some sites, where the site's upstreams and links 
to neighboring POPs are implemented as something like 4 x 10 Gbps. In places 
where they have not upgraded that specific router to a full 100 Gbps upstream. 
Moving large flows >2Gbps could result in flat topping a traffic chart on just 
1 of those 10Gbps circuits.



On Thu, Aug 31, 2023 at 10:04 AM David Hubbard 
mailto:dhubb...@dino.hostasaurus.com>> wrote:
Hi all, curious if anyone who has used Cogent as a point to point provider has 
gone through packet loss issues with them and were able to successfully 
resolve?  I’ve got a non-rate-limited 10gig circuit between two geographic 
locations that have about 52ms of latency.  Mine is set up to support both 
jumbo frames and vlan tagging.  I do know Cogent packetizes these circuits, so 
they’re not like waves, and that the expected single session TCP performance 
may be limited to a few gbit/sec, but I should otherwise be able to fully 
utilize the circuit given enough flows.

Circuit went live earlier this year, had zero issues with it.  Testing with 
common tools like iperf would allow several gbit/sec of TCP traffic using 
single flows, even without an optimized TCP stack.  Using parallel flows or UDP 
we could easily get close to wire speed.  Starting about ten weeks ago we had a 
significant slowdown, to even complete failure, of bursty data replication 
tasks between equipment that was using this circuit.  Rounds of testing 
demonstrate that new flows often experience significant initial packet loss of 
several thousand packets, and will then have ongoing lesser packet loss every 
five to ten seconds after that.  There are times we can’t do better than 50 
Mbit/sec, but it’s rare to achieve gigabit most of the time unless we do a 
bunch of streams with a lot of tuning.  UDP we also see the loss, but can still 
push many gigabits through with one sender, or wire speed with several nodes.

For equipment which doesn’t use a tunable TCP stack, such as storage arrays or 
vmware, the retransmits completely ruin performance or may result in ongoing 
failure we can’t overcome.

Cogent support has been about as bad as you can get.  Everything is great, 
clean your fiber, iperf isn’t a good test, install a physical loop oh wait we 
don’t want that so go pull it back off, new updates come at three to seven day 
intervals, etc.  If the performance had never been good to begin with I’d have 
just attributed this to their circuits, but since it worked until late June, I 
know something has changed.  I’m hoping someone else has run into this and 
maybe knows of some hints I could give them to investigate.  To me it sounds 
like there’s a rate limiter / policer defined somewhere in the circuit, or an 
overloaded interface/device we’re forced to traverse, but they assure me this 
is not the case and claim to have destroyed and rebuilt the logical circuit.

Thanks!


Re: Lossy cogent p2p experiences?

2023-08-31 Thread Eric Kuhnke
Cogent has asked many people NOT to purchase their ethernet private circuit
point to point service unless they can guarantee that you won't move any
single flow of greater than 2 Gbps. This works fine as long as the service
is used mostly for mixed IP traffic like a bunch of randomly mixed
customers together.

What you are trying to do is probably against the guidelines their
engineering group has given them for what they can sell now.

This is a known weird limitation with Cogent's private circuit service.

The best working theory that several people I know in the neteng community
have come up with is because Cogent does not want to adversely impact all
other customers on their router in some sites, where the site's upstreams
and links to neighboring POPs are implemented as something like 4 x 10
Gbps. In places where they have not upgraded that specific router to a full
100 Gbps upstream. Moving large flows >2Gbps could result in flat topping a
traffic chart on just 1 of those 10Gbps circuits.



On Thu, Aug 31, 2023 at 10:04 AM David Hubbard <
dhubb...@dino.hostasaurus.com> wrote:

> Hi all, curious if anyone who has used Cogent as a point to point provider
> has gone through packet loss issues with them and were able to successfully
> resolve?  I’ve got a non-rate-limited 10gig circuit between two geographic
> locations that have about 52ms of latency.  Mine is set up to support both
> jumbo frames and vlan tagging.  I do know Cogent packetizes these circuits,
> so they’re not like waves, and that the expected single session TCP
> performance may be limited to a few gbit/sec, but I should otherwise be
> able to fully utilize the circuit given enough flows.
>
>
>
> Circuit went live earlier this year, had zero issues with it.  Testing
> with common tools like iperf would allow several gbit/sec of TCP traffic
> using single flows, even without an optimized TCP stack.  Using parallel
> flows or UDP we could easily get close to wire speed.  Starting about ten
> weeks ago we had a significant slowdown, to even complete failure, of
> bursty data replication tasks between equipment that was using this
> circuit.  Rounds of testing demonstrate that new flows often experience
> significant initial packet loss of several thousand packets, and will then
> have ongoing lesser packet loss every five to ten seconds after that.
> There are times we can’t do better than 50 Mbit/sec, but it’s rare to
> achieve gigabit most of the time unless we do a bunch of streams with a lot
> of tuning.  UDP we also see the loss, but can still push many gigabits
> through with one sender, or wire speed with several nodes.
>
>
>
> For equipment which doesn’t use a tunable TCP stack, such as storage
> arrays or vmware, the retransmits completely ruin performance or may result
> in ongoing failure we can’t overcome.
>
>
>
> Cogent support has been about as bad as you can get.  Everything is great,
> clean your fiber, iperf isn’t a good test, install a physical loop oh wait
> we don’t want that so go pull it back off, new updates come at three to
> seven day intervals, etc.  If the performance had never been good to begin
> with I’d have just attributed this to their circuits, but since it worked
> until late June, I know something has changed.  I’m hoping someone else has
> run into this and maybe knows of some hints I could give them to
> investigate.  To me it sounds like there’s a rate limiter / policer defined
> somewhere in the circuit, or an overloaded interface/device we’re forced to
> traverse, but they assure me this is not the case and claim to have
> destroyed and rebuilt the logical circuit.
>
>
>
> Thanks!
>


Feedback requested - 6WIND VSRs

2023-08-31 Thread Mark Prosser

Hello,

Does anyone in the ML have experience with 6WIND's VSR platform for 
private cloud / on-prem? The feature set sounds very good in the EVPN-SR 
domains. It also sounds a bit like the Fast, good, cheap venn diagram.


If so, please let me know your thoughts in regards to support, 
stability, warnings.


Warm regards,

Mark



Call for Board Member Nominations + More

2023-08-31 Thread Nanog News
*NANOG 89 is Just Around the Corner*
*Register for our Next Meeting off the Coast of San Diego, Oct. 16 - 18 *

Check out more info about the venue, peering forum, or the latest confirmed
talks.

*REGISTER NOW
*

*Call for Board Member Nominations *

*Nominations for Board of Director Candidates are Open*
Play a role in shaping NANOG's future.

As a NANOG member, you’ll have the right to exercise your civic duty every
year, and elect a Board of Directors that best reflects both you, and our
organization.

*LEARN MORE
*

*VIDEO: Students Share NANOG Experience*
*NANOG College Immersion (NCI) Program Enables Students to Attend
Conference*

*Why it's worth your time:* Learn what it is like to experience a meeting
through the eyes of young newcomers and see our NCI Program in action!

*WATCH NOW*



*Video of the Week **400ZR Revolutionizing Networking w/ Scott Wilkinson*

*Why it's worth your time:* 400ZR has changed how data centers are
interconnected and how data centers are designed.

In this presentation, Wilkinson will show the current status of 400ZR
development and deployment, how it changes network designs, and explain
what is coming next.

*WATCH NOW * 


[NANOG-announce] Call for Board Member Nominations + More

2023-08-31 Thread Nanog News
*NANOG 89 is Just Around the Corner*
*Register for our Next Meeting off the Coast of San Diego, Oct. 16 - 18 *

Check out more info about the venue, peering forum, or the latest confirmed
talks.

*REGISTER NOW
*

*Call for Board Member Nominations *

*Nominations for Board of Director Candidates are Open*
Play a role in shaping NANOG's future.

As a NANOG member, you’ll have the right to exercise your civic duty every
year, and elect a Board of Directors that best reflects both you, and our
organization.

*LEARN MORE
*

*VIDEO: Students Share NANOG Experience*
*NANOG College Immersion (NCI) Program Enables Students to Attend
Conference*

*Why it's worth your time:* Learn what it is like to experience a meeting
through the eyes of young newcomers and see our NCI Program in action!

*WATCH NOW*



*Video of the Week **400ZR Revolutionizing Networking w/ Scott Wilkinson*

*Why it's worth your time:* 400ZR has changed how data centers are
interconnected and how data centers are designed.

In this presentation, Wilkinson will show the current status of 400ZR
development and deployment, how it changes network designs, and explain
what is coming next.

*WATCH NOW * 
___
NANOG-announce mailing list
NANOG-announce@nanog.org
https://mailman.nanog.org/mailman/listinfo/nanog-announce


Re: Someone (with clue) from GoDaddy, please pick up the red courtesy phone?

2023-08-31 Thread Steve Sullivan

Hey Dan,

Sent you a PM on this.  There is also the dns affinity group within the 
Nanog Community platform, which I have been given Admin rights too.


Couldn't hurt to start posting DNS specific stuff @ 
 in the Halls of NANOG. OARC and 
NANOG are solid partners, and colocate events OFTEN, so its a great 
place for DNS @ NANOG if you will.

I have facilitated more than a few connections to the DNS world there.

Thanks,

Steve Sullivan
Membership Coordinator
OARC Mattermost Chat: @stevos

On 8/31/2023 10:10 AM, Rubens Kuhl wrote:

On Thu, Aug 31, 2023 at 2:07 PM Dan Mahoney (Gushi)
  wrote:

Hey there.

I hate that I only use NANOG for this, but it seems right now
that my day job can't add an additional nameserver to our NS-set.

There's no email support or ticket system, and I've been told several
untrue things about how the DNS and SRS work in text-based chat, and I
Need An Adult.

Does that include stating that Premium DNS is required to add DS
records to parent delegations even with no GoDaddy DNS servers
involved ?


Rubens




OpenPGP_signature
Description: OpenPGP digital signature


Re: Someone (with clue) from GoDaddy, please pick up the red courtesy phone?

2023-08-31 Thread Rubens Kuhl
On Thu, Aug 31, 2023 at 2:07 PM Dan Mahoney (Gushi)
 wrote:
>
> Hey there.
>
> I hate that I only use NANOG for this, but it seems right now
> that my day job can't add an additional nameserver to our NS-set.
>
> There's no email support or ticket system, and I've been told several
> untrue things about how the DNS and SRS work in text-based chat, and I
> Need An Adult.

Does that include stating that Premium DNS is required to add DS
records to parent delegations even with no GoDaddy DNS servers
involved ?


Rubens


Someone (with clue) from GoDaddy, please pick up the red courtesy phone?

2023-08-31 Thread Dan Mahoney (Gushi)

Hey there.

I hate that I only use NANOG for this, but it seems right now 
that my day job can't add an additional nameserver to our NS-set.


There's no email support or ticket system, and I've been told several 
untrue things about how the DNS and SRS work in text-based chat, and I 
Need An Adult.


Please contact me?

dmahoney - isc - org

--

Dan Mahoney
Techie,  Sysadmin,  WebGeek
Gushi on efnet/undernet IRC
FB:  fb.com/DanielMahoneyIV
LI:   linkedin.com/in/gushi
Site:  http://www.gushi.org
---



Lossy cogent p2p experiences?

2023-08-31 Thread David Hubbard
Hi all, curious if anyone who has used Cogent as a point to point provider has 
gone through packet loss issues with them and were able to successfully 
resolve?  I’ve got a non-rate-limited 10gig circuit between two geographic 
locations that have about 52ms of latency.  Mine is set up to support both 
jumbo frames and vlan tagging.  I do know Cogent packetizes these circuits, so 
they’re not like waves, and that the expected single session TCP performance 
may be limited to a few gbit/sec, but I should otherwise be able to fully 
utilize the circuit given enough flows.

Circuit went live earlier this year, had zero issues with it.  Testing with 
common tools like iperf would allow several gbit/sec of TCP traffic using 
single flows, even without an optimized TCP stack.  Using parallel flows or UDP 
we could easily get close to wire speed.  Starting about ten weeks ago we had a 
significant slowdown, to even complete failure, of bursty data replication 
tasks between equipment that was using this circuit.  Rounds of testing 
demonstrate that new flows often experience significant initial packet loss of 
several thousand packets, and will then have ongoing lesser packet loss every 
five to ten seconds after that.  There are times we can’t do better than 50 
Mbit/sec, but it’s rare to achieve gigabit most of the time unless we do a 
bunch of streams with a lot of tuning.  UDP we also see the loss, but can still 
push many gigabits through with one sender, or wire speed with several nodes.

For equipment which doesn’t use a tunable TCP stack, such as storage arrays or 
vmware, the retransmits completely ruin performance or may result in ongoing 
failure we can’t overcome.

Cogent support has been about as bad as you can get.  Everything is great, 
clean your fiber, iperf isn’t a good test, install a physical loop oh wait we 
don’t want that so go pull it back off, new updates come at three to seven day 
intervals, etc.  If the performance had never been good to begin with I’d have 
just attributed this to their circuits, but since it worked until late June, I 
know something has changed.  I’m hoping someone else has run into this and 
maybe knows of some hints I could give them to investigate.  To me it sounds 
like there’s a rate limiter / policer defined somewhere in the circuit, or an 
overloaded interface/device we’re forced to traverse, but they assure me this 
is not the case and claim to have destroyed and rebuilt the logical circuit.

Thanks!


Re: Destination Preference Attribute for BGP

2023-08-31 Thread Robert Raszuk
Hi Michael,

> two datacenters which user traffic can egress, and if one is used we want
that traffic to return to the same
> data center. It is a problem of asymmetry. It appears the only tools we
have are AS_Path and MED, and so
> I have been searching for another solution, that is when I came across
DPA.

If there are really legitimate reasons to force the symmetry I would use
disjoined address pools in each data center and asymmetry is gone the
moment you hit commit.

And redundancy could be still accomplished at the higher layer - front end
each DC with LB or use of multiple IP addresses in each DNS record.

Best,
R.


On Wed, Aug 30, 2023 at 6:57 PM michael brooks - ESC <
michael.bro...@adams12.org> wrote:

> >With AS-PATH prepend you have no control on the choice of which ASN
> should do what action on your advertisements.
> Robert- It is somewhat this problem we are trying to resolve.
>
> >I was imagining something sexier, especially given how pretty "useless"
> AS_PATH prepending is nowadays.
> I, too, am looking for something sexy (explained below). But can you
> explain why you think AS_PATH is "useless," Mark?
>
> For background, and the reason I asked about DPA:
> Currently, our routing carries user traffic to a single data center where
> it egresses to the Internet via three ISP circuits, two carriers. We are
> peering on a single switch stack, so we let L2 "load balance" user flows
> for us. We have now brought up another ISP circuit in a second data center,
> and are attempting to influence traffic to return the same path as it
> egressed our network. Simply, we now have two datacenters which user
> traffic can egress, and if one is used we want that traffic to return to
> the same data center. It is a problem of asymmetry. It appears the only
> tools we have are AS_Path and MED, and so I have been searching for another
> solution, that is when I came across DPA. In further looking at the
> problem, BGP Communities also seems to be a possible solution, but as the
> thread has explored, communities may/may not be scrubbed upstream. So,
> presently we are looking for a solution which can be used with our direct
> peers. Obviously, if someone has a better solution, I am all ears.
>
> A bit more info: we are also looking at an internal solution which passes
> IGP metric into MED to influence pathing.
>
> To avoid TL;DR I will stop there in the hopes this is an intriguing enough
> problem to generate discussion.
>
>
>
>
> michael brooks
> Sr. Network Engineer
> Adams 12 Five Star Schools
> michael.bro...@adams12.org
> 
> "flying is learning how to throw yourself at the ground and miss"
>
>
>
> On Fri, Aug 18, 2023 at 1:39 AM Robert Raszuk  wrote:
>
>> Jakob,
>>
>> With AS-PATH prepend you have no control on the choice of which ASN
>> should do what action on your advertisements.
>>
>> However, the practice of publishing communities by (some) ASNs along with
>> their remote actions could be treated as an alternative to the DPA
>> attribute. It could result in remote PREPEND action too.
>>
>> If only those communities would not be deleted by some transit networks
>> 
>>
>> Thx,
>> R.
>>
>> On Thu, Aug 17, 2023 at 9:46 PM Jakob Heitz (jheitz) via NANOG <
>> nanog@nanog.org> wrote:
>>
>>> "prepend as-path" has taken its place.
>>>
>>>
>>>
>>> Kind Regards,
>>>
>>> Jakob
>>>
>>>
>>>
>>>
>>>
>>> Date: Wed, 16 Aug 2023 21:42:22 +0200
>>> From: Mark Tinka 
>>>
>>> On 8/16/23 16:16, michael brooks - ESC wrote:
>>>
>>> > Perhaps (probably) naively, it seems to me that DPA would have been a
>>> > useful BGP attribute. Can anyone shed light on why this RFC never
>>> > moved beyond draft status? I cannot find much information on this
>>> > other than IETF's data tracker
>>> > (https://datatracker.ietf.org/doc/draft-ietf-idr-bgp-dpa/
>>> )
>>> and RFC6938
>>> > (which implies DPA was in use,?but then was deprecated).
>>>
>>> I've never heard of this draft until now, but reading it, I can see why
>>> it would likely not be adopted today (not sure what the consensus would
>>> have been back in the '90's).
>>>
>>> DPA looks like MED on drugs.
>>>
>>> Not sure operators want remote downstream ISP's arbitrarily choosing
>>> which of their peering interconnects (and backbone links) carry traffic
>>> from source to them. BGP is a poor communicator of bandwidth and
>>> shilling cost, in general. Those kinds of decisions tend to be locally
>>> made, and permitting outside influence could be a rather hard sell.
>>>
>>> It reminds me of how router vendors implemented GMPLS in the hopes that
>>> optical operators would allow their customers to build and control
>>> circuits in the optical domain in some fantastic fashion.
>>>
>>> Or how router vendors built Sync-E and