Re: BGP and The zero window edge

2021-04-21 Thread Pawel Malachowski
Dnia Wed, Apr 21, 2021 at 08:59:06PM +, Jakob Heitz (jheitz) via NANOG 
napisał(a):

> Has anyone else seen this before or can provide data to analyze?
> On or off list.

- https://labs.ripe.net/author/romain_fontugne/bgp-zombies/
- https://www.slideshare.net/atendesoftware/bgp-zombie-routes


kind regards,
-- 
Pawel Malachowski


Re: AWS Using Class E IPv4 Address on internal Routing

2021-03-09 Thread Pawel Malachowski
Dnia Tue, Mar 09, 2021 at 07:00:47AM -0700, Forrest Christian (List Account) 
napisał(a):

> to get them to make this work for selected purposes.   Router-to-Router
> links, especially between higher-end routers seems to be one of those cases
> that it might be useful.

BTW, some platforms and OS-es (like Linux) support routing IPv4 via IPv6
nexthops, this may help to conserve v4 p2p space in some environments.


-- 
Pawel Malachowski
@pawmal80


Re: DPDK and energy efficiency

2021-02-23 Thread Pawel Malachowski
> > No, it is not PMD that runs the processor in a polling loop.
> > It is the application itself, thay may or may not busy loop,
> > depending on application programmers choice.
> 
> From one of my earlier references [2]:
> 
> "we found that a poll mode driver (PMD)
> thread accounted for approximately 99.7 percent
> CPU occupancy (a full core utilization)."
> 
> And further on:
> 
> "we found that the thread kept spinning on the following code block:
> 
> *for ( ; ; ) {for ( i = 0; i < poll_cnt; i ++) {dp_netdev_process_rxq_port
> (pmd, list[i].port, poll_list[i].rx) ;}}*
> This indicates that the thread was continuously
> monitoring and executing the receiving data path."

This comes from OVS code and shows OVS thread spinning, not DPDK PMD.
Blame the OVS application for not using e.g. _mm_pause() and burning
the CPU like crazy.


For comparison, take a look at top+i7z output from DPDK-based 100G DDoS
scrubber currently lifting some low traffic using cores 1-13 on 16 core
host. It uses naive DPDK::rte_pause() throttling to enter C1.

Tasks: 342 total,   1 running, 195 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.6 us,  0.6 sy,  0.0 ni, 89.7 id,  3.1 wa,  0.0 hi,  0.0 si,  0.0 st

Core [core-id]  :Actual Freq (Mult.)  C0%   Halt(C1)%  C3 %   C6 %  
Temp  VCore
Core 1 [0]:   1467.73 (14.68x)  2.155.35   192.3
43  0.6724
Core 2 [1]:   1201.09 (12.01x)  11.793.9   0   0
39  0.6575
Core 3 [2]:   1200.06 (12.00x)  11.893.8   0   0
42  0.6543
Core 4 [3]:   1200.14 (12.00x)  11.893.8   0   0
41  0.6549
Core 5 [4]:   1200.10 (12.00x)  11.893.8   0   0
41  0.6526
Core 6 [5]:   1200.12 (12.00x)  11.893.8   0   0
40  0.6559
Core 7 [6]:   1201.01 (12.01x)  11.893.8   0   0
41  0.6559
Core 8 [7]:   1201.02 (12.01x)  11.893.8   0   0
43  0.6525
Core 9 [8]:   1201.00 (12.01x)  11.893.8   0   0
41  0.6857
Core 10 [9]:  1201.04 (12.01x)  11.893.8   0   0
40  0.6541
Core 11 [10]: 1201.95 (12.02x)  13.692.9   0   0
40  0.6558
Core 12 [11]: 1201.02 (12.01x)  11.893.8   0   0
42  0.6526
Core 13 [12]: 1204.97 (12.05x)  17.690.8   0   0
45  0.6814
Core 14 [13]: 1248.39 (12.48x)  28.284.7   0   0
41  0.6855
Core 15 [14]: 2790.74 (27.91x)  91.9   0   1   1
41  0.8885 <-- not PMD
Core 16 [15]: 1262.29 (12.62x)  13.134.9 1.756.2
43  0.6616

$ dataplanectl stats fcore | grep total
fcore total idle 393788223887 work 860443658 (0.2%) (forced-idle 7458486526622) 
recv 202201388561 drop 61259353721 (30.3%) limit 269909758 (0.1%) pass 
140606076622 (69.6%) ingress 66048460 (0.0%/0.0%) sent 162580376914 
(80.4%/100.0%) overflow 0 (0.0%) sampled 628488188/628488188



-- 
Pawel Malachowski
@pawmal80


Re: DPDK and energy efficiency

2021-02-23 Thread Pawel Malachowski
Dnia Mon, Feb 22, 2021 at 12:45:52PM +0100, Etienne-Victor Depasquale 
napisał(a):

> Every research paper I've read indicates that, regardless of whether it has
> packets to process or not, DPDK PMDs (poll-mode drivers) prevent the CPU
> from falling into an LPI (low-power idle).
> 
> When it has no packets to process, the PMD runs the processor in a polling
> loop that keeps utilization of the running core at 100%.

No, it is not PMD that runs the processor in a polling loop.
It is the application itself, thay may or may not busy loop,
depending on application programmers choice.


-- 
Pawel Malachowski
@pawmal80


Re: DPDK and energy efficiency

2021-02-22 Thread Pawel Malachowski
Dnia Mon, Feb 22, 2021 at 01:01:45PM +0100, Etienne-Victor Depasquale 
napisał(a):

> It is, after all, Intel's response to the problem of general-purpose
> scheduling of its processors - which prevents the processor from being
> viable under high networking loads.

It totally makes sense to busy poll under high networking load.
By high networking load I mean roughly > 7 Mpps RX+TX per one x86 CPU core.

I partially agree it may be hard to mix DPDK and non-DPDK workload
on a single CPU, not only because of advanced power management logic
requirement for the dataplane application, but also due to LLC trashing.
It heavily depends on usecase and dataset sizes, for example
optimised FIB may fit nicely into cache and use only tiny, hot part
of the dataset, but CGNAT Mflow mapping likely won't fit. For such
a usecase I would recommand dedicated CPU or cache partitioning (CAT),
if available.

In case of low volume traffic like 20-40G of IMIX one can dedicate
e.g. 2 cores and interleave busy polling with halt instructions to
lower the usage significantly (~60-80% core underutilisation).



-- 
Pawel Malachowski
@pawmal80


Re: DPDK and energy efficiency

2021-02-22 Thread Pawel Malachowski
Dnia Mon, Feb 22, 2021 at 08:33:35AM -0300, Douglas Fischer napisał(a):

> But IMHO, the questions do not cover the actual reality of DPDK.
> That característic of "100% CPU" depends on several aspects, like:
>  - How old are the hardware on DPDK.
>  - What type of DPDK Instructions are made(Very Dynamic as Statefull CGNAT,
> ou Static ACLs?)
>  - Using or not the measurements of DPDK Input/Drop/Fowarding.
>  - CPU Affinity done according to the demand of traffic
>  - SR-IOV (sharing resources) on DPDK.

It consumes 100% only if you busy poll (which is the default approach).
One can switch between polling and interrupts (or monitor, if supported),
or introduce halt instructions, in case of low/medium traffic volume.


-- 
Pawel Malachowski
@pawmal80