across the
internet for ten seconds; 3.47 Gbps, 162 retransmits. Across the P2P, this
time at least, 637 Mbps, 3633 retransmits.
David
From: David Hubbard
Date: Friday, September 1, 2023 at 10:19 AM
To: Nanog@nanog.org
Subject: Re: Lossy cogent p2p experiences?
The initial and recurring
On Sat, 9 Sept 2023 at 21:36, Benny Lyne Amorsen
wrote:
> The Linux TCP stack does not immediately start backing off when it
> encounters packet reordering. In the server world, packet-based
> round-robin is a fairly common interface bonding strategy, with the
> accompanying reordering, and
On 9/9/23 22:29, Dave Cohen wrote:
At a previous $dayjob at a Tier 1, we would only support LAG for a
customer L2/3 service if the ports were on the same card. The response
we gave if customers pushed back was "we don't consider LAG a form of
circuit protection, so we're not going to
At a previous $dayjob at a Tier 1, we would only support LAG for a customer
L2/3 service if the ports were on the same card. The response we gave if
customers pushed back was "we don't consider LAG a form of circuit
protection, so we're not going to consider physical resiliency in the
design",
On 9/9/23 20:44, Randy Bush wrote:
i am going to be foolish and comment, as i have not seen this raised
if i am running a lag, i can not resist adding a bit of resilience by
having it spread across line cards.
surprise! line cards from vendor do not have uniform hashing
or rotating
i am going to be foolish and comment, as i have not seen this raised
if i am running a lag, i can not resist adding a bit of resilience by
having it spread across line cards.
surprise! line cards from vendor do not have uniform hashing
or rotating algorithms.
randy
Mark Tinka writes:
> Oh? What is it then, if it's not spraying successive packets across
> member links?
It sprays the packets more or less randomly across links, and each link
then does individual buffering. It introduces an unnecessary random
delay to each packet, when it could just place
It was intended to detect congestion. The obvious response was in some way to
pace the sender(s) so that it was alleviated.
Sent using a machine that autocorrects in interesting ways...
> On Sep 7, 2023, at 11:19 PM, Mark Tinka wrote:
>
>
>
>> On 9/7/23 09:51, Saku Ytti wrote:
>>
>>
On Fri, 8 Sept 2023 at 09:17, Mark Tinka wrote:
> > Unfortunately that is not strict round-robin load balancing.
>
> Oh? What is it then, if it's not spraying successive packets across
> member links?
I believe the suggestion is that round-robin out-performs random
spray. Random spray is what
On 9/7/23 09:51, Saku Ytti wrote:
Perhaps if congestion control used latency or FEC instead of loss, we
could tolerate reordering while not underperforming under loss, but
I'm sure in decades following that decision we'd learn new ways how we
don't understand any of this.
Isn't this partly
On 9/7/23 09:31, Benny Lyne Amorsen wrote:
Unfortunately that is not strict round-robin load balancing.
Oh? What is it then, if it's not spraying successive packets across
member links?
I do not
know about any equipment that offers actual round-robin
load-balancing.
Cisco had both
Saku Ytti wrote:
And you will be wrong. Packet arriving out of order, will be
considered previous packet lost by host, and host will signal need for
resend.
As I already quote the very old and fundamental paper on
the E2E argument:
End-To-End Arguments in System Design
On Thu, 7 Sept 2023 at 15:45, Benny Lyne Amorsen
wrote:
> Juniper's solution will cause way too much packet reordering for TCP to
> handle. I am arguing that strict round-robin load balancing will
> function better than hash-based in a lot of real-world
> scenarios.
And you will be wrong.
Tom Beecher wrote:
Well, not exactly the same thing. (But it's my mistake, I was referring to
L3 balancing, not L2 interface stuff.)
That should be a correct referring.
load-balance per-packet will cause massive reordering,
If buffering delay of ECM paths can not be controlled , yes.
Mark Tinka writes:
> set interfaces ae2 aggregated-ether-options load-balance per-packet
>
> I ran per-packet on a Juniper LAG 10 years ago. It produced 100%
> perfect traffic distribution. But the reordering was insane, and the
> applications could not tolerate it.
Unfortunately that is
On Thu, 7 Sept 2023 at 00:00, David Bass wrote:
> Per packet LB is one of those ideas that at a conceptual level are great, but
> in practice are obvious that they’re out of touch with reality. Kind of like
> the EIGRP protocol from Cisco and using the load, reliability, and MTU
> metrics.
Benny Lyne Amorsen wrote:
TCP looks quite different in 2023 than it did in 1998. It should handle
packet reordering quite gracefully;
Maybe and, even if it isn't, TCP may be modified. But that
is not my primary point.
ECMP, in general, means pathes consist of multiple routers
and links. The
Per packet LB is one of those ideas that at a conceptual level are great,
but in practice are obvious that they’re out of touch with reality. Kind
of like the EIGRP protocol from Cisco and using the load, reliability, and
MTU metrics.
On Wed, Sep 6, 2023 at 1:13 PM Mark Tinka wrote:
>
>
> On
On Wed, 6 Sept 2023 at 19:28, Mark Tinka wrote:
> Yes, this has been my understanding of, specifically, Juniper's
> forwarding complex.
Correct, packet is sprayed to some PPE, and PPEs do not run in
deterministic time, after PPEs there is reorder block that restores
flow, if it has to.
EZchip
On 9/6/23 18:52, Tom Beecher wrote:
Well, not exactly the same thing. (But it's my mistake, I was
referring to L3 balancing, not L2 interface stuff.)
Fair enough.
load-balance per-packet will cause massive reordering, because it's
random spray , caring about nothing except equal loading
>
> Unless you specifically configure true "per-packet" on your LAG:
>
Well, not exactly the same thing. (But it's my mistake, I was referring to
L3 balancing, not L2 interface stuff.)
load-balance per-packet will cause massive reordering, because it's random
spray , caring about nothing except
On 9/6/23 12:01, Saku Ytti wrote:
Fun fact about the real world, devices do not internally guarantee
order. That is, even if you have identical latency links, 0
congestion, order is not guaranteed between packet1 coming from
interfaceI1 and packet2 coming from interfaceI2, which packet first
> If you applications can tolerate reordering, per-packet is fine. In the public
> Internet space, it seems we aren't there yet.
Yeah this
During lockdown here in Italy one day we started getting calls about
performance issues performance degradation, vpns dropping or becoming unusable,
and
On 9/6/23 11:20, Benny Lyne Amorsen wrote:
TCP looks quite different in 2023 than it did in 1998. It should handle
packet reordering quite gracefully; in the best case the NIC will
reassemble the out-of-order TCP packets into a 64k packet and the OS
will never even know they were reordered.
On 9/6/23 17:27, Tom Beecher wrote:
At least on MX, what Juniper calls 'per-packet' is really 'per-flow'.
Unless you specifically configure true "per-packet" on your LAG:
set interfaces ae2 aggregated-ether-options load-balance per-packet
I ran per-packet on a Juniper LAG 10 years
On 9/6/23 16:14, Saku Ytti wrote:
For example Juniper offers true per-packet, I think mostly used in
high performance computing.
Cisco did it too with CEF supporting "ip load-sharing per-packet" at the
interface level.
I am not sure this is still supported on modern code/boxes.
Mark.
>
> For example Juniper offers true per-packet, I think mostly used in
> high performance computing.
>
At least on MX, what Juniper calls 'per-packet' is really 'per-flow'.
On Wed, Sep 6, 2023 at 10:17 AM Saku Ytti wrote:
> On Wed, 6 Sept 2023 at 17:10, Benny Lyne Amorsen
> wrote:
>
> > TCP
On Wed, 6 Sept 2023 at 17:10, Benny Lyne Amorsen
wrote:
> TCP looks quite different in 2023 than it did in 1998. It should handle
> packet reordering quite gracefully; in the best case the NIC will
I think the opposite is true, TCP was designed to be order agnostic.
But everyone uses cubic, and
Mark Tinka writes:
> And just because I said per-flow load balancing has been the gold
> standard for the last 25 years, does not mean it is the best
> solution. It just means it is the gold standard.
TCP looks quite different in 2023 than it did in 1998. It should handle
packet reordering
William Herrin wrote:
I recognize what happens in the real world, not in the lab or text books.
What's the difference between theory and practice?
W.r.t. the fact that there are so many wrong theories
and wrong practices, there is no difference.
In theory, there is no difference.
On Wed, Sep 6, 2023 at 12:23 AM Mark Tinka wrote:
> I recognize what happens in the real world, not in the lab or text books.
What's the difference between theory and practice? In theory, there is
no difference.
--
William Herrin
b...@herrin.us
https://bill.herrin.us/
Saku Ytti wrote:
Fun fact about the real world, devices do not internally guarantee
order. That is, even if you have identical latency links, 0
congestion, order is not guaranteed between packet1 coming from
interfaceI1 and packet2 coming from interfaceI2, which packet first
goes to interfaceE1
On Wed, 6 Sept 2023 at 10:27, Mark Tinka wrote:
> I recognize what happens in the real world, not in the lab or text books.
Fun fact about the real world, devices do not internally guarantee
order. That is, even if you have identical latency links, 0
congestion, order is not guaranteed between
On 9/6/23 09:12, Masataka Ohta wrote:
you now recognize that per-flow load balancing is not a very
good idea.
You keep moving the goal posts. Stay on-topic.
I was asking you to clarify your post as to whether you were speaking of
per-flow or per-packet load balancing. You did not do
Mark Tinka wrote:
Are you saying you thought a 100G Ethernet link actually consisting
of 4 parallel 25G links, which is an example of "equal speed multi
parallel point to point links", were relying on hashing?
No...
So, though you wrote:
>> If you have multiple parallel links over which
On 9/4/23 13:27, Nick Hilliard wrote:
this is an excellent example of what we're not talking about in this
thread.
It is amusing how he tried to pivot the discussion. Nobody was talking
about how lane transport in optical modules works.
Mark.
On 9/4/23 13:04, Masataka Ohta wrote:
Are you saying you thought a 100G Ethernet link actually consisting
of 4 parallel 25G links, which is an example of "equal speed multi
parallel point to point links", were relying on hashing?
No... you are saying that.
Mark.
>
> Cogent support has been about as bad as you can get. Everything is great,
> clean your fiber, iperf isn’t a good test, install a physical loop oh wait
> we don’t want that so go pull it back off, new updates come at three to
> seven day intervals, etc. If the performance had never been good
Nick Hilliard wrote:
Are you saying you thought a 100G Ethernet link actually consisting
of 4 parallel 25G links, which is an example of "equal speed multi
parallel point to point links", were relying on hashing?
this is an excellent example of what we're not talking about in this
thread.
William Herrin wrote:
Well it doesn't show up in long slow pipes because the low
transmission speed spaces out the packets,
Wrong. That is a phenomenon with slow access and fast backbone,
which has nothing to do with this thread.
If backbone is as slow as access, there can be no "space out"
On Mon, Sep 4, 2023 at 7:07 AM Masataka Ohta
wrote:
> William Herrin wrote:
> > So, I've actually studied this in real-world conditions and TCP
> > behaves exactly as I described in my previous email for exactly the
> > reasons I explained.
>
> Yes of course, which is my point. Your problem is
William Herrin wrote:
No, not at all. First, though you explain slow start,
it has nothing to do with long fat pipe. Long fat
pipe problem is addressed by window scaling (and SACK).
So, I've actually studied this in real-world conditions and TCP
behaves exactly as I described in my previous
On Mon, Sep 4, 2023 at 12:13 AM Masataka Ohta
wrote:
> William Herrin wrote:
> > That sounds like normal TCP behavior over a long fat pipe.
>
> No, not at all. First, though you explain slow start,
> it has nothing to do with long fat pipe. Long fat
> pipe problem is addressed by window scaling
Masataka Ohta wrote on 04/09/2023 12:04:
Are you saying you thought a 100G Ethernet link actually consisting
of 4 parallel 25G links, which is an example of "equal speed multi
parallel point to point links", were relying on hashing?
this is an excellent example of what we're not talking about
Mark Tinka wrote:
ECMP, surely, is a too abstract concept to properly manage/operate
simple situations with equal speed multi parallel point to point links.
I must have been doing something wrong for the last 25 years.
Are you saying you thought a 100G Ethernet link actually consisting
of 4
William Herrin wrote:
Hi David,
That sounds like normal TCP behavior over a long fat pipe.
No, not at all. First, though you explain slow start,
it has nothing to do with long fat pipe. Long fat
pipe problem is addressed by window scaling (and SACK).
As David Hubbard wrote:
: I've got a
Nick Hilliard wrote:
In this case, "Without buffer bloat" is an essential assumption.
I can see how this conclusion could potentially be reached in
specific styles of lab configs,
I'm not interested in how poorly you configure your
lab.
but the real world is more complicated and
And,
Masataka Ohta wrote on 03/09/2023 14:32:
See, for example, the famous paper of "Sizing Router Buffers".
With thousands of TCP connections at the backbone recognized
by the paper, buffers with thousands of packets won't cause
packet reordering.
What you said reminds me of the old saying: in
On Thu, Aug 31, 2023 at 2:42 PM David Hubbard
wrote:
> any new TCP flow is subject to numerous dropped packets at establishment and
> then ongoing loss every five to ten seconds.
Hi David,
That sounds like normal TCP behavior over a long fat pipe. After
establishment, TCP sends a burst of 10
On 9/3/23 15:01, Masataka Ohta wrote:
Why, do you think, you can rely on existence of flows?
You have not quite answered my question - but I will assume you are in
favour of per-packet load balancing.
I have deployed per-packet load balancing before, ironically, trying to
deal with
On 9/3/23 15:01, Masataka Ohta wrote:
Why, do you think, you can rely on existence of flows?
You have not quite answered my question - but I will assume you are in
favour of per-packet load balancing.
I have deployed per-packet load balancing before, ironically, trying to
deal with
Nick Hilliard wrote:
the proper thing to do is to use the links with round robin
fashion without hashing. Without buffer bloat, packet
reordering probability within each TCP connection is
negligible.
Can you provide some real world data to back this position up?
See, for example, the famous
Mark Tinka wrote:
So you mean, what... per-packet load balancing, in lieu of per-flow load
balancing?
Why, do you think, you can rely on existence of flows?
So, if you internally have 10 parallel 1G circuits expecting
perfect hashing over them, it is not "non-rate-limited 10gig".
It is
Masataka Ohta wrote on 03/09/2023 08:59:
the proper thing to do is to use the links with round robin
fashion without hashing. Without buffer bloat, packet
reordering probability within each TCP connection is
negligible.
Can you provide some real world data to back this position up?
What you
On 9/3/23 09:59, Masataka Ohta wrote:
If you have multiple parallel links over which many slow
TCP connections are running, which should be your assumption,
the proper thing to do is to use the links with round robin
fashion without hashing. Without buffer bloat, packet
reordering
Mark Tinka wrote:
Wrong. It can be performed only at the edges by policing total
incoming traffic without detecting flows.
I am not talking about policing in the core, I am talking about
detection in the core.
I'm not talking about detection at all.
Policing at the edge is pretty
Masataka Ohta wrote on 02/09/2023 16:04:
100 50Mbps flows are as harmful as 1 5Gbps flow.
This is quite an unusual opinion. Maybe you could explain?
Nick
On 9/2/23 17:38, Masataka Ohta wrote:
Wrong. It can be performed only at the edges by policing total
incoming traffic without detecting flows.
I am not talking about policing in the core, I am talking about
detection in the core.
Policing at the edge is pretty standard. You can police a
Mark Tinka wrote:
it is the
core's ability to balance the Layer 2 payload across multiple links
effectively.
Wrong. It can be performed only at the edges by policing total
incoming traffic without detecting flows.
While some vendors have implemented adaptive load balancing algorithms
On 9/2/23 17:04, Masataka Ohta wrote:
Both of you are totally wrong, because the proper thing to do
here is to police, if *ANY*, based on total traffic without
detecting any flow.
I don't think it's as much an issue of flow detection as it is the
core's ability to balance the Layer 2
Mark Tinka wrote:
On 9/1/23 15:59, Mike Hammett wrote:
I wouldn't call 50 megabit/s an elephant flow
Fair point.
Both of you are totally wrong, because the proper thing to do
here is to police, if *ANY*, based on total traffic without
detecting any flow.
100 50Mbps flows are as harmful
On 9/2/23 08:43, Saku Ytti wrote:
What in particular are you missing?
As I explained, PTX/MX both allow for example speculating on transit
pseudowires having CW on them. Which is non-default and requires
'zero-control-word'. You should be looking at 'hash-key' on PTX and
'enhanced-hash-key'
On Fri, 1 Sept 2023 at 22:56, Mark Tinka wrote:
> PTX1000/10001 (Express) offers no real configurable options for load
> balancing the same way MX (Trio) does. This is what took us by surprise.
What in particular are you missing?
As I explained, PTX/MX both allow for example speculating on
On 9/1/23 21:52, Mike Hammett wrote:
It doesn't help the OP at all, but this is why (thus far, anyway), I
overwhelmingly prefer wavelength transport to anything switched. Can't
have over-subscription or congestion issues on a wavelength.
Large IP/MPLS operators insist on optical transport
On 9/1/23 15:55, Saku Ytti wrote:
Personally I would recommend turning off LSR payload heuristics,
because there is no accurate way for LSR to tell what the label is
carrying, and wrong guess while rare will be extremely hard to root
cause, because you will never hear it, because the person
http://www.midwest-ix.com
- Original Message -
From: "David Hubbard"
To: "Nanog@nanog.org"
Sent: Thursday, August 31, 2023 10:55:19 AM
Subject: Lossy cogent p2p experiences?
Hi all, curious if anyone who has used Cogent as a point to point provider has
gone
On 9/1/23 15:59, Mike Hammett wrote:
I wouldn't call 50 megabit/s an elephant flow
Fair point.
Mark.
. Using the
Nokia kit for example the 7750 does a great job of "adaptive-load-balancing"
but the 7250 is lacklustre at best.
-Original Message-
From: NANOG On Behalf Of Saku Ytti
Sent: Friday, September 1, 2023 8:51 PM
To: Eric Kuhnke
Cc: nanog@nanog.org
Subject: Re: Lossy
On Fri, 1 Sept 2023 at 18:37, Lukas Tribus wrote:
> On the hand a workaround at the edge at least for EoMPLS would be to
> enable control-word.
Juniper LSR can actually do heuristics on pseudowires with CW.
--
++ytti
On Fri, 1 Sept 2023 at 15:55, Saku Ytti wrote:
>
> On Fri, 1 Sept 2023 at 16:46, Mark Tinka wrote:
>
> > Yes, this was our conclusion as well after moving our core to PTX1000/10001.
>
> Personally I would recommend turning off LSR payload heuristics,
> because there is no accurate way for LSR to
AM
To: Mike Hammett , Saku Ytti
Cc: nanog@nanog.org
Subject: Re: Lossy cogent p2p experiences?
On 9/1/23 15:44, Mike Hammett wrote:
and I would say the OP wasn't even about elephant flows, just about a network
that can't deliver anything acceptable.
Unless Cogent are not trying to accept
On 9/1/23 15:44, Mike Hammett wrote:
and I would say the OP wasn't even about elephant flows, just about a
network that can't deliver anything acceptable.
Unless Cogent are not trying to accept (and by extension, may not be
able to guarantee) large Ethernet flows because they can't balance
nanog@nanog.org
Sent: Friday, September 1, 2023 8:56:03 AM
Subject: Re: Lossy cogent p2p experiences?
On 9/1/23 15:44, Mike Hammett wrote:
and I would say the OP wasn't even about elephant flows, just about a network
that can't deliver anything acceptable.
Unless Cogent are not trying to
On Fri, 1 Sept 2023 at 16:46, Mark Tinka wrote:
> Yes, this was our conclusion as well after moving our core to PTX1000/10001.
Personally I would recommend turning off LSR payload heuristics,
because there is no accurate way for LSR to tell what the label is
carrying, and wrong guess while rare
On 9/1/23 15:29, Saku Ytti wrote:
PTX and MX as LSR look inside pseudowire to see if it's IP (dangerous
guess to make for LSR), CSR/ASR9k does not. So PTX and MX LSR will
balance your pseudowire even without FAT.
Yes, this was our conclusion as well after moving our core to PTX1000/10001.
Ytti"
To: "Mark Tinka"
Cc: nanog@nanog.org
Sent: Friday, September 1, 2023 8:29:12 AM
Subject: Re: Lossy cogent p2p experiences?
On Fri, 1 Sept 2023 at 14:54, Mark Tinka wrote:
> When we switched our P devices to PTX1000 and PTX10001, we've had
> surprisingly good performanc
On Fri, 1 Sept 2023 at 14:54, Mark Tinka wrote:
> When we switched our P devices to PTX1000 and PTX10001, we've had
> surprisingly good performance of all manner of traffic across native
> IP/MPLS and 802.1AX links, even without explicitly configuring FAT for
> EoMPLS traffic.
PTX and MX as LSR
On 9/1/23 10:50, Saku Ytti wrote:
It is a very plausible theory, and everyone has this problem to a
lesser or greater degree. There was a time when edge interfaces were
much lower capacity than backbone interfaces, but I don't think that
time will ever come back. So this problem is systemic.
On Thu, 31 Aug 2023 at 23:56, Eric Kuhnke wrote:
> The best working theory that several people I know in the neteng community
> have come up with is because Cogent does not want to adversely impact all
> other customers on their router in some sites, where the site's upstreams and
> links to
before this miserable
behavior began in late June.
From: Eric Kuhnke
Date: Thursday, August 31, 2023 at 4:51 PM
To: David Hubbard
Cc: Nanog@nanog.org
Subject: Re: Lossy cogent p2p experiences?
Cogent has asked many people NOT to purchase their ethernet private circuit
point to point service
Cogent has asked many people NOT to purchase their ethernet private circuit
point to point service unless they can guarantee that you won't move any
single flow of greater than 2 Gbps. This works fine as long as the service
is used mostly for mixed IP traffic like a bunch of randomly mixed
Hi all, curious if anyone who has used Cogent as a point to point provider has
gone through packet loss issues with them and were able to successfully
resolve? I’ve got a non-rate-limited 10gig circuit between two geographic
locations that have about 52ms of latency. Mine is set up to support
82 matches
Mail list logo