Interesting! Thanks Fan for bringing that up. So if I understand correctly, with the previous DPDK behavior we could have say 128 packets in the rxq, VPP would request 256, get 32, and the request 224 (256-32) again, etc. While VPP request more packets, the NIC has the opportunity to add packets in the rxq and VPP could end up with 256... With the new behavior, with the same initial state, VPP requests 256 packets, get 128 and call it a day. If that's the case, maybe a better heuristic could be to retry up to 8 times (256/32) before giving up?
Best ben > -----Original Message----- > From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Zhang, Fan > Sent: Friday, January 6, 2023 16:04 > To: vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] Slow VPP performance vs. DPDK l2fwd / l3wfd > > There was a change in DPDK 21.11 to impact no-multi-seg options for VPP. > > In VPP's DPDK RX, the original implementation was to fetch 256 packets. If > not enough packets are fetched from NIC queue then try again with smaller > amount. > > DPDK 21.11 introduced the change by not slicing the big burst size to > smaller (say 32) ones and performing NIC RX multiple times when "no-multi- > seg" was enabled, this caused VPP always drained NIC queue in the first > attempt and the NIC cannot keep up to fill enough descriptors into the > queue before CPU does another RX burst - at least it was the case for > Intel FVL and CVL. > > This caused a lot of empty polling in the end, and the vpp vector size was > always 64 instead of 256 (for CVL and FVL). > > > I addressed the problem for CVL/FVL by letting VPP only does smaller burst > size (up to 32) multiple times manually instead. However I didn't test on > MLX NICs due to the lack of the HW. (a9fe20f4b dpdk: improve rx burst > count per loop) > > > Since different HW has its sweet point of the burst size that makes it > capable working with CPU in harmony - possibly with different problems as > well, this won't be easily addressed by non-vendor developers. > > > > > Regards, > > Fan > > > > > > On 1/6/2023 2:16 PM, r...@gmx.net <mailto:r...@gmx.net> wrote: > > > Hi Matt, > > thanks a lot. I ended temporarily solving it via downgrade to > v21.10 and there the option `no-multi-seg` provides full line speed 100 > Gbps ( tested with mixed TRex profile avg. pkt 900 Bytes). > Weirdly enough any v22.xx causes major performance drop with MLX5 > DPDK PMD enabled. I will open another thread to discuss usage of TRex with > rdma driver. > > Below the working config for v21.10 with my Mellanox-ConnectX-6-DX > cards: > > unix { > exec /etc/vpp/exec.cmd > # l2fwd mode based on mac > # exec /etc/vpp/l2fwd.cmd > nodaemon > log /var/log/vpp/vpp.log > full-coredump > cli-listen /run/vpp/cli.sock > gid vpp > > ## run vpp in the interactive mode > # interactive > > ## do not use colors in terminal output > # nocolor > > ## do not display banner > # nobanner > } > > api-trace { > ## This stanza controls binary API tracing. Unless there is > a very strong reason, > ## please leave this feature enabled. > on > ## Additional parameters: > ## > ## To set the number of binary API trace records in the > circular buffer, configure nitems > ## > ## nitems <nnn> > ## > ## To save the api message table decode tables, configure a > filename. Results in /tmp/<filename> > ## Very handy for understanding api message changes between > versions, identifying missing > ## plugins, and so forth. > ## > ## save-api-table <filename> > } > > api-segment { > gid vpp > } > > socksvr { > default > } > > #memory { > ## Set the main heap size, default is 1G > # main-heap-size 8G > > ## Set the main heap page size. Default page size is OS > default page > ## which is in most cases 4K. if different page size is > specified VPP > ## will try to allocate main heap by using specified > page size. > ## special keyword 'default-hugepage' will use system > default hugepage > ## size > # main-heap-page-size 1G > ## Set the default huge page size. > # default-hugepage-size 1G > #} > > cpu { > ## In the VPP there is one main thread and optionally > the user can create worker(s) > ## The main thread and worker thread(s) can be pinned to > CPU core(s) manually or automatically > > ## Manual pinning of thread(s) to CPU core(s) > > ## Set logical CPU core where main thread runs, if main > core is not set > ## VPP will use core 1 if available > main-core 6 > # 2,4,6,8,10,12,14,16 > ## Set logical CPU core(s) where worker threads are > running > corelist-workers 12 > # find right worker via lscpu and numa assignement , > check PCI address NICs for NUMA match > #corelist-workers 4,6,8,10,12,14,16 > ## Automatic pinning of thread(s) to CPU core(s) > > ## Sets number of CPU core(s) to be skipped (1 ... N-1) > ## Skipped CPU core(s) are not used for pinning main > thread and working thread(s). > ## The main thread is automatically pinned to the first > available CPU core and worker(s) > ## are pinned to next free CPU core(s) after core > assigned to main thread > # skip-cores 4 > > ## Specify a number of workers to be created > ## Workers are pinned to N consecutive CPU cores while > skipping "skip-cores" CPU core(s) > ## and main thread's CPU core > # workers 1 > > ## Set scheduling policy and priority of main and worker > threads > > ## Scheduling policy options are: other (SCHED_OTHER), > batch (SCHED_BATCH) > ## idle (SCHED_IDLE), fifo (SCHED_FIFO), rr (SCHED_RR) > # scheduler-policy fifo > > ## Scheduling priority is used only for "real-time > policies (fifo and rr), > ## and has to be in the range of priorities supported > for a particular policy > # scheduler-priority 50 > } > > #buffers { > ## Increase number of buffers allocated, needed only in > scenarios with > ## large number of interfaces and worker threads. Value > is per numa node. > ## Default is 16384 (8192 if running unpriviledged) > # buffers-per-numa 128000 > > ## Size of buffer data area > ## Default is 2048 > # default data-size 2048 > > ## Size of the memory pages allocated for buffer data > ## Default will try 'default-hugepage' then 'default' > ## you can also pass a size in K/M/G e.g. '8M' > # page-size default-hugepage > #} > > dpdk { > ## Change default settings for all interfaces > dev default { > ## Number of receive queues, enables RSS > ## Default is 1 > # num-rx-queues 3 > > ## Number of transmit queues, Default is equal > ## to number of worker threads or 1 if no workers > treads > # num-tx-queues 3 > > ## Number of descriptors in transmit and receive > rings > ## increasing or reducing number can impact > performance > ## Default is 1024 for both rx and tx > num-rx-desc 4096 > num-tx-desc 4096 > > ## VLAN strip offload mode for interface > ## Default is off > # vlan-strip-offload on > > ## TCP Segment Offload > ## Default is off > ## To enable TSO, 'enable-tcp-udp-checksum' must be > set > # tso on > > ## Devargs > ## device specific init args > ## Default is NULL > # devargs safe-mode-support=1,pipeline-mode- > support=1 > # devargs > mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=9,txq_inline_mpw=128,rxq_pkt > _pad_en=1,dv_flow_en=0 > ## rss-queues > ## set valid rss steering queues > # rss-queues 0,2,5-7 > } > > ## Whitelist specific interface by specifying PCI > address > dev 0000:4b:00.0 > dev 0000:4b:00.1 > ## Blacklist specific device type by specifying PCI > vendor:device > ## Whitelist entries take precedence > # blacklist 8086:10fb > > ## Set interface name > # dev 0000:02:00.1 { > # name eth0 > # } > > ## Whitelist specific interface by specifying PCI > address and in > ## addition specify custom parameters for this interface > # dev 0000:02:00.1 { > # num-rx-queues 2 > # } > > ## Change UIO driver used by VPP, Options are: igb_uio, > vfio-pci, > ## uio_pci_generic or auto (default) > # uio-driver vfio-pci > > ## Disable multi-segment buffers, improves performance > but > ## disables Jumbo MTU support > no-multi-seg > > ## Change hugepages allocation per-socket, needed only > if there is need for > ## larger number of mbufs. Default is 256M on each > detected CPU socket > socket-mem 4096,4096 > > ## Disables UDP / TCP TX checksum offload. Typically > needed for use > ## faster vector PMDs (together with no-multi-seg) > # no-tx-checksum-offload > > ## Enable UDP / TCP TX checksum offload > ## This is the reversed option of 'no-tx-checksum- > offload' > # enable-tcp-udp-checksum > > ## Enable/Disable AVX-512 vPMDs > #max-simd-bitwidth <256|512> > } > > ## node variant defaults > #node { > > ## specify the preferred default variant > # default { variant avx512 } > > ## specify the preferred variant, for a given node > # ip4-rewrite { variant avx2 } > > #} > > > # plugins { > ## Adjusting the plugin path depending on where the VPP > plugins are > # path /ws/vpp/build-root/install-vpp- > native/vpp/lib/vpp_plugins > > ## Disable all plugins by default and then selectively > enable specific plugins > # plugin default { disable } > # plugin dpdk_plugin.so { enable } > # plugin acl_plugin.so { enable } > > ## Enable all plugins by default and then selectively > disable specific plugins > # plugin dpdk_plugin.so { disable } > # plugin acl_plugin.so { disable } > # } > > ## Statistics Segment > # statseg { > # socket-name <filename>, name of the stats segment > socket > # defaults to /run/vpp/stats.sock > # size <nnn>[KMG], size of the stats segment, defaults > to 32mb > # page-size <nnn>, page size, ie. 2m, defaults to 4k > # per-node-counters on | off, defaults to none > # update-interval <f64-seconds>, sets the segment scrape > / update interval > # } > > ## L2 FIB > # l2fib { > ## l2fib hash table size. > # table-size 512M > > ## l2fib hash table number of buckets. Must be power of > 2. > # num-buckets 524288 > # } > > ## ipsec > # { > # ip4 { > ## ipsec for ipv4 tunnel lookup hash number of buckets. > # num-buckets 524288 > # } > # ip6 { > ## ipsec for ipv6 tunnel lookup hash number of buckets. > # num-buckets 524288 > # } > # } > > # logging { > ## set default logging level for logging buffer > ## logging levels: emerg, alert,crit, error, warn, > notice, info, debug, disabled > # default-log-level debug > ## set default logging level for syslog or stderr output > # default-syslog-log-level info > ## Set per-class configuration > # class dpdk/cryptodev { rate-limit 100 level debug > syslog-level error } > # } > > > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22424): https://lists.fd.io/g/vpp-dev/message/22424 Mute This Topic: https://lists.fd.io/mt/95959719/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-