Re: uPRF strict more

2021-09-29 Thread Anoop Ghanwani
This is not true for all ASICs.  Some ASICs choose to incur the penalty in
a different way, e.g., by halving the prefix tables.  The prefix table is
then duplicated so that uRPF SA and forwarding DA lookups can happen in
parallel.  What kind of penalty is incurred is a question worth asking the
equipment vendor.

On Wed, Sep 29, 2021 at 1:10 PM Jean St-Laurent via NANOG 
wrote:

> Thanks a lot for sharing.
>
> So 100 Gbps at line rate with 80B frames is about ~150 Mpps.
>
> 100 Gbps at line rate with 208B frames is about ~60 Mpps.
>
> It's a significant penalty.
>
> Jean
>
> -Original Message-
> From: brad dreisbach 
> Sent: September 29, 2021 3:33 PM
> To: Jean St-Laurent 
> Cc: 'brad dreisbach' ; 'Phil Bedard' <
> bedard.p...@gmail.com>; 'North American Network Operators' Group' <
> nanog@nanog.org>
> Subject: Re: uPRF strict more
>
> On Wed, Sep 29, 2021 at 02:54:43PM -0400, Jean St-Laurent wrote:
> >Hi Brad,
> >
> >I'd be interested to hear more about this pps penalty. Do we talk about
> 5% penalty or something closer to 50%?
> >
> >Let me know if you still have some numbers close to you related to PPS
> with uRPF loose.
>
> iirc, strict vs loose doesnt matter, its still an extra lookup which
> effects the performance. i was able to find some numbers to give an example.
>
> the 4x100G tomahawk card was able to pass min frame size(which iirc on
> ixia is
> 80B) at line rate with no features enabled. turn on uRPF and it is only
> able to pass 208B frames at line rate.
>
> similar results were seen with several generations of cisco and juniper
> line cards(if i tested nokia i cant recall, we had stopped doing urpf when
> they were introduced into the network).
>
> -b
>
>
>


Re: Google uploading your plain text passwords

2021-06-12 Thread Anoop Ghanwani
On Fri, Jun 11, 2021 at 12:51 PM Matthew Petach 
wrote:

>
> Having my email password compromised?
> That's a bit of a "meh" moment.
> Suddenly discovering that one password now gave access to
> potentially all my financial accounts as well?
> That's a wake up in the night with cold sweats moment.  :(
>

Thanks for articulating the issue so well.

And glad I saw this discussion because I had no idea that
if my gmail account was compromised all my financial accounts
would become accessible.

The issue is discussed quite nicely here:
https://www.howtogeek.com/174312/can-google-employees-see-my-saved-google-chrome-passwords/


Re: SRv6

2020-09-16 Thread Anoop Ghanwani
On Tue, Sep 15, 2020 at 5:08 PM Randy Bush  wrote:

> > You might be on to something, but I'm unsure... are you suggesting that
> it's
> > any less private over SRv6 than it was over MPLS ?
>
> neither srv6, srmpls, mpls, gre, ... provide privacy.  they all
> transport the payload in nekkid cleartext.
>
> Dance like no one's watching. Encrypt like everyone is.
>

It depends on the definition of VPN.  In terms of services like
MPLS-based VPNs, it refers to the extension of a Private network
over a shared infrastructure, allowing entities using the shared
infrastructure to have their own private address space and routing
tables.


Re: Devil's Advocate - Segment Routing, Why?

2020-06-19 Thread Anoop Ghanwani
On Wed, Jun 17, 2020 at 11:40 AM Dave Bell  wrote:

>
>
> On Wed, 17 Jun 2020 at 18:42, Saku Ytti  wrote:
>
>> Hey,
>>
>> > Why do we really need SR? Be it SR-MPLS or SRv6 or SRv6+?
>>
>> I don't like this, SR-MPLS and SRv6 are just utterly different things
>> to me, and no answer meaningfully applies to both.
>>
>
> I don't understand the point of SRv6. What equipment can support IPv6
> routing, but can't support MPLS label switching?
>
> I'm a big fan of SR-MPLS however.
>
> One of the advantages cited for SRv6 over MPLS is that the packet contains
a record of where it has been.


Re: Partial vs Full tables

2020-06-08 Thread Anoop Ghanwani
There are many different tries -- see here for some examples.
https://www.drdobbs.com/cpp/fast-ip-routing-with-lc-tries/184410638

And an enhancement to LC-tries
http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A469814=-2401

Then there are radix-n (n-ary trie) lookups, e.g. radix-4 would look up
4-bits at a time and branch 16 ways.

Here's a good tutorial, and I don't think even this is exhaustive.
http://klamath.stanford.edu/~pankaj/talks/hoti_tutorial.ppt

On Mon, Jun 8, 2020 at 4:19 PM Josh Hoppes  wrote:

> Juniper Networks has also tried using Bloom filters.
>
> https://patents.google.com/patent/US20170187624
>
> I think the QFX10002 was the first product they made which used this
> approach.
>
>
> https://forums.juniper.net/t5/Archive/Juniper-QFX10002-Technical-Overview/ba-p/270358
>
> On Mon, Jun 8, 2020 at 1:45 PM William Herrin  wrote:
> >
> > On Mon, Jun 8, 2020 at 10:52 AM  wrote:
> > > Every "fast" FIB implementation I'm aware of takes a set of prefixes,
> stores them in some sort of data structure, which can perform a
> longest-prefix lookup on the destination address and eventually get to an
> actual physical interface for forwarding that packet.  Exactly how those
> prefixes are stored and exactly how load-balancing is performed is *very*
> platform specific, and has tons of variability.  I've worked on at least a
> dozen different hardware based forwarding planes, and not a single pair of
> them used the same set of data structures and design tradeoffs.
> >
> > Howdy,
> >
> > AFAIK, there are two basic approaches: TCAM and Trie.  You can get off
> > in to the weeds fast dealing with how you manage that TCAM or Trie and
> > the Trie-based implementations have all manner of caching strategies
> > to speed them up but the basics go back to TCAM and Trie.
> >
> > TCAM (ternary content addressable memory) is a sort of tri-state SRAM
> > with a special read function. It's organized in rows and each bit in a
> > row is set to 0, 1 or Don't-Care. You organize the routes in that
> > memory in order from most to least specific with the netmask expressed
> > as don't-care bits. You feed the address you want to match in to the
> > TCAM. It's evaluated against every row in parallel during that clock
> > cycle. The TCAM spits out the first matching row.
> >
> > A Trie is a tree data structure organized by bits in the address.
> > Ordinary memory and CPU. Log-nish traversal down to the most specific
> > route. What you expect from a tree.
> >
> > Or have I missed one?
> >
> > Regards,
> > Bill Herrin
> >
> >
> > --
> > William Herrin
> > b...@herrin.us
> > https://bill.herrin.us/
>


Re: IS-IS on FRR - Is Anyone Running It?

2020-04-06 Thread Anoop Ghanwani
On Sun, Apr 5, 2020 at 10:52 PM Saku Ytti  wrote:

> The only thing that is larger in your network is hellos, and I'm not
> even sure how that works, considering 802.3 cannot signal larger
> frames than 1500B.
>
> Probably this method:
https://en.wikipedia.org/wiki/EtherType#Jumbo_frames


TCP and anycast (was Re: ECN)

2019-11-13 Thread Anoop Ghanwani
RFC 7094 (https://tools.ietf.org/html/rfc7094) describes the pitfalls &
risks of using TCP with an anycast address.  It recognizes that there are
valid use cases for it, though.

Specifically, section 3.1 says this:
>>>

   Most stateful transport protocols (e.g., TCP), without modification,
   do not understand the properties of anycast; hence, they will fail
   probabilistically, but possibly catastrophically, when using anycast
   addresses in the presence of "normal" routing dynamics.

...

   This can lead
   to a protocol working fine in, say, a test lab but not in the global
   Internet.

>>>

On Wed, Nov 13, 2019 at 3:33 PM Warren Kumari  wrote:

> On Thu, Nov 14, 2019 at 12:25 AM Matt Corallo  wrote:
> >
> > This sounds like a bug on Cloudflare’s end (cause trying to do anycast
> TCP is... out of spec to say the least), not a bug in ECN/ECMP.
>
> Err. I really don't think that there is any sort of spec that
> covers that :-P
>
> Using Anycast for TCP is incredibly common - the DNS root servers for
> one obvious example.
> More TCP centric well-known examples are Fastly and LinkedIn -
> LinkedIn in particular did a really good podcast on their experience
> with this.
>
> There is also a good NANOG talk from the ~2000s (?) on people using
> TCP anycast for long lived (serving ISO files, which were long-lived
> in those days) flows, and how reliable it is - perhaps that's the talk
> Todd mentioned?
>
> W
>
> >
> > > On Nov 13, 2019, at 11:07, Toke Høiland-Jørgensen via NANOG <
> nanog@nanog.org> wrote:
> > >
> > > 
> > >>
> > >> Hello
> > >>
> > >> I have a customer that believes my network has a ECN problem. We do
> > >> not, we just move packets. But how do I prove it?
> > >>
> > >> Is there a tool that checks for ECN trouble? Ideally something I could
> > >> run on the NLNOG Ring network.
> > >>
> > >> I believe it likely that it is the destination that has the problem.
> > >
> > > Hi Baldur
> > >
> > > I believe I may be that customer :)
> > >
> > > First of all, thank you for looking into the issue! We've been having
> > > great fun over on the ecn-sane mailing list trying to figure out what's
> > > going on. I'll summarise below, but see this thread for the discussion
> > > and debugging details:
> > >
> https://lists.bufferbloat.net/pipermail/ecn-sane/2019-November/000527.html
> > >
> > > The short version is that the problem appears to come from a
> combination
> > > of the ECMP routing in your network, and Cloudflare's heavy use of
> > > anycast. Specifically, a router in your network appears to be doing
> ECMP
> > > by hashing on the packet header, *including the ECN bits*. This breaks
> > > TCP connections with ECN because the TCP SYN (with no ECN bits set) end
> > > up taking a different path than the rest of the flow (which is marked
> as
> > > ECT(0)). When the destination is anycasted, this means that the data
> > > packets go to a different server than the SYN did. This second server
> > > doesn't recognise the connection, and so replies with a TCP RST. To fix
> > > this, simply exclude the ECN bits (or the whole TOS byte) from your
> > > router's ECMP hash.
> > >
> > > For a longer exposition, see below. You should be able to verify this
> > > from somewhere else in the network, but if there's anything else you
> > > want me to test, do let me know. Also, would you mind sharing the
> router
> > > make and model that does this? We're trying to collect real-world
> > > examples of network problems caused by ECN and this is definitely an
> > > interesting example.
> > >
> > > -Toke
> > >
> > >
> > >
> > > The long version:
> > >
> > > From my end I can see that I have two paths to Cloudflare; which is
> > > taken appears to be based on a hash of the packet header, as can be
> seen
> > > by varying the source port:
> > >
> > > $ traceroute -q 1 --sport=1 104.24.125.13
> > > traceroute to 104.24.125.13 (104.24.125.13), 30 hops max, 60 byte
> packets
> > > 1  _gateway (10.42.3.1)  0.357 ms
> > > 2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  4.707 ms
> > > 3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.283 ms
> > > 4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.667 ms
> > > 5  netnod-ix-cph-blue-9000.cloudflare.com (212.237.192.246)  1.406 ms
> > > 6  104.24.125.13 (104.24.125.13)  1.322 ms
> > >
> > > $ traceroute -q 1 --sport=10001 104.24.125.13
> > > traceroute to 104.24.125.13 (104.24.125.13), 30 hops max, 60 byte
> packets
> > > 1  _gateway (10.42.3.1)  0.293 ms
> > > 2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  3.430 ms
> > > 3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.194 ms
> > > 4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.297 ms
> > > 5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.805 ms
> > > 6  149.6.142.130 (149.6.142.130)  6.925 ms
> > > 7  104.24.125.13 (104.24.125.13)  1.501 ms
> > >
> > >
> > > This is fine in itself. However, the problem stems from the fact that
> > > the ECN bits 

Re: ECN

2019-11-13 Thread Anoop Ghanwani
Not to condone what cloudflare is doing, but...

An ECN connection will have different bits on various packets for the
duration of the connection -- pure ACKs (ACKs not piggybacking on data)
will have the ECN bits as 00b, while all other packets will have either
01b, 10b (when no congestion was experienced) or 11b (when congestion was
experienced).  So using the ECN bits as part of the hash would affect
performance throughout the life of the connection.

On Wed, Nov 13, 2019 at 9:00 AM Matt Corallo  wrote:

> Not ideal, sure, but if it’s only for the SYN (as you seem to indicate),
> splitting the flow shouldn’t have material performance degradation?
>
> > On Nov 13, 2019, at 11:51, Toke Høiland-Jørgensen  wrote:
> >
> > 
> >
> >> On 13 November 2019 17:20:18 CET, Matt Corallo 
> wrote:
> >> This sounds like a bug on Cloudflare’s end (cause trying to do anycast
> >> TCP is... out of spec to say the least), not a bug in ECN/ECMP.
> >
> > Even without anycast, an ECMP shouldn't hash on the ECN bits. Doing so
> will split the flow over multiple paths; avoiding that is the whole point
> of doing the flow-based hashing in the first place.
> >
> > Anycast "only" turns a potential degradation of TCP performance into a
> hard failure... :)
> >
> > -Toke
>
>


Re: reliably detecting the presence of a bridge?

2015-12-16 Thread Anoop Ghanwani
If LLDP (link layer discovery protocol) is enabled, you could try using
that.  There is a system capabilities TLV in the LLDPDU sent by a system,
but I'm not sure how reliably it is filled in, especially if a device is
capable of both switching and routing.  The way LLDP is supposed to work is
a device will receive LLDPDUs from other devices immediately adjacent to
it.  It can then read the LLDP database of those devices (via management)
and figure out what those devices are connected to, and so on.

Otherwise, bridges are supposed to be "transparent," so there is no way to
know they are present by using user data frames.

Anoop


Re: Segment Routing for L2VPN?

2015-09-23 Thread Anoop Ghanwani
It depends on what type of L2VPN we are talking about.

If we are talking about VPLS (where we learn from the data path) changes
are needed in order to make it work with segment routing.  Basically, the
VC label must be assigned and used in such a way that it indicates not only
the service for the packet, but also the PE from which it originated.  That
is because with SR, we would have lost the path (PW) that the packet used
to get to the destination PE.

If we are talking about BGP E-VPN where data path learning is not used,
then it should work with segment routing without any changes.

Anoop

On Mon, Sep 21, 2015 at 11:32 AM, Jeff Tantsura 
wrote:

> Hi,
>
> In most well designed IP routing stacks the way to get to a labeled
> (tunneled) next hop is decoupled from a service, so if a service requires
> such next hop it is upto (usually RIB) to return one (best, multiple might
> exist) which would be used for forwarding. If it is a Segment Routed one
> so it will then be used.
>
> Cheers,
> Jeff
>
> -Original Message-
> From: Mohan Nanduri 
> Date: Sunday, September 20, 2015 at 12:59 PM
> To: Jason Lixfeld 
> Cc: "nanog@nanog.org" 
> Subject: Re: Segment Routing for L2VPN?
>
> >No, it works with L2VPNs also. Outer label is going to be SR label and
> >inner label is your L2VPN label.
> >
> >Cheers,
> >-Mohan
> >
> >
> >On Sun, Sep 20, 2015 at 3:23 PM, Jason Lixfeld  wrote:
> >> Hello!
> >>
> >> I've been doing some reading recently on Segment Routing.  By all
> >>accounts, it seems that the (only?) implementation for SR supports
> >>L3VPN.  Am I dumb and just missing the L2VPN bits, or is L3VPN simply
> >>the extent of the first generation?
> >>
> >> Sent from my iPhone
>
>