Re: kernel 4.15.0-rc9+ (net-next) high cpu load at 50Gbit/s - about 6Mpps
On Sun, 2018-01-28 at 19:26 +0100, Paweł Staszewski wrote: > > W dniu 27.01.2018 o 23:23, Paweł Staszewski pisze: > > Hi > > > > > > Today I made some real life traffic tests with kernel 4.15.0-rc9 > > > > but when traffic reach 50Gbit/s and about 6Mpps cpou load rises fast > > from 48% to 100% for all cpu cores. > > > > Here is some graph that presenting how cpu load rises when there was > > more pps. > > > > > > https://ibb.co/mhD5ob > > > > > > here is perf record from that time: > > > > https://pastebin.com/3zqG1rvE > > > > > > There is 8x 10G ixgbe 82599 interfaces teamed with teamd. > > > > No traffic queueing - only pfifo fast on all interfaces. > > > > No NAT or iptables forles other than INPUT (about 30rules) > > > > All nic's have same ethtool settings: > > > > ethtool -k eth0 > > Features for eth0: > > Cannot get device udp-fragmentation-offload settings: Operation not > > supported > > rx-checksumming: on > > tx-checksumming: on > > tx-checksum-ipv4: off [fixed] > > tx-checksum-ip-generic: on > > tx-checksum-ipv6: off [fixed] > > tx-checksum-fcoe-crc: off [fixed] > > tx-checksum-sctp: on > > scatter-gather: on > > tx-scatter-gather: on > > tx-scatter-gather-fraglist: off [fixed] > > tcp-segmentation-offload: on > > tx-tcp-segmentation: on > > tx-tcp-ecn-segmentation: off [fixed] > > tx-tcp-mangleid-segmentation: off > > tx-tcp6-segmentation: on > > udp-fragmentation-offload: off > > generic-segmentation-offload: on > > generic-receive-offload: on > > large-receive-offload: off > > rx-vlan-offload: on > > tx-vlan-offload: on > > ntuple-filters: on > > receive-hashing: on > > highdma: on [fixed] > > rx-vlan-filter: on > > vlan-challenged: off [fixed] > > tx-lockless: off [fixed] > > netns-local: off [fixed] > > tx-gso-robust: off [fixed] > > tx-fcoe-segmentation: off [fixed] > > tx-gre-segmentation: on > > tx-gre-csum-segmentation: on > > tx-ipxip4-segmentation: on > > tx-ipxip6-segmentation: on > > tx-udp_tnl-segmentation: on > > tx-udp_tnl-csum-segmentation: on > > tx-gso-partial: on > > tx-sctp-segmentation: off [fixed] > > tx-esp-segmentation: off [fixed] > > fcoe-mtu: off [fixed] > > tx-nocache-copy: off > > loopback: off [fixed] > > rx-fcs: off [fixed] > > rx-all: off > > tx-vlan-stag-hw-insert: off [fixed] > > rx-vlan-stag-hw-parse: off [fixed] > > rx-vlan-stag-filter: off [fixed] > > l2-fwd-offload: off > > hw-tc-offload: off > > esp-hw-offload: off [fixed] > > esp-tx-csum-hw-offload: off [fixed] > > rx-udp_tunnel-port-offload: on > > > > > > ethtool -g eth0 > > Ring parameters for eth0: > > Pre-set maximums: > > RX: 4096 > > RX Mini: 0 > > RX Jumbo: 0 > > TX: 4096 > > Current hardware settings: > > RX: 4096 > > RX Mini: 0 > > RX Jumbo: 0 > > TX: 2048 > > > > > > ethtool -c eth0 > > Coalesce parameters for eth0: > > Adaptive RX: off TX: off > > stats-block-usecs: 0 > > sample-interval: 0 > > pkt-rate-low: 0 > > pkt-rate-high: 0 > > > > rx-usecs: 512 > > rx-frames: 0 > > rx-usecs-irq: 0 > > rx-frames-irq: 0 > > > > tx-usecs: 0 > > tx-frames: 0 > > tx-usecs-irq: 0 > > tx-frames-irq: 0 > > > > rx-usecs-low: 0 > > rx-frame-low: 0 > > tx-usecs-low: 0 > > tx-frame-low: 0 > > > > rx-usecs-high: 0 > > rx-frame-high: 0 > > tx-usecs-high: 0 > > tx-frame-high: 0 > > > > > > > > > > > > Peft top for kernel 4.15.0-rc9 below (all 40 cores 100% cpu load with > 6.3Mpps) > > 20.96% [kernel] [k] queued_spin_lock_slowpath > 5.51% [kernel] [k] ixgbe_poll > 5.49% [kernel] [k] ixgbe_xmit_frame_ring > 4.39% [kernel] [k] do_raw_spin_lock > 4.29% [kernel] [k] sch_direct_xmit > 4.11% [kernel] [k] fib_table_lookup > 3.11% [team_mode_roundrobin] [k] rr_transmit > 2.71% [kernel] [k] __dev_queue_xmit > 2.62% [kernel] [k] __ptr_ring_peek > 2.39% [kernel] [k] skb_release_data > 2.18% [kernel] [k] dev_gro_receive > 1.75% [kernel] [k] __qdisc_run > 1.67% [kernel] [k] pfifo_fast_enqueue > 1.57% [kernel] [k] netdev_pick_tx > 1.56% [kernel] [k] page_frag_free > 1.48% [kernel] [k] ip_finish_output2 > 1.38% [kernel] [k] __slab_free > 1.36% [kernel] [k] skb_unref > 1.34% [kernel] [k] ixgbe_maybe_stop_tx > 1.30% [kernel] [k] vlan_do_receive > 1.28% [kernel] [k] pfifo_fast_dequeue > 1.23% [kernel] [k] virt_to_head_page > > > > Same configuration kernel 4.15.0-rc3 (50% cpu load on all 40 cores with > 6.3Mpps) > > 7.81% [kernel] [k]
Re: kernel 4.15.0-rc9+ (net-next) high cpu load at 50Gbit/s - about 6Mpps
W dniu 27.01.2018 o 23:23, Paweł Staszewski pisze: Hi Today I made some real life traffic tests with kernel 4.15.0-rc9 but when traffic reach 50Gbit/s and about 6Mpps cpou load rises fast from 48% to 100% for all cpu cores. Here is some graph that presenting how cpu load rises when there was more pps. https://ibb.co/mhD5ob here is perf record from that time: https://pastebin.com/3zqG1rvE There is 8x 10G ixgbe 82599 interfaces teamed with teamd. No traffic queueing - only pfifo fast on all interfaces. No NAT or iptables forles other than INPUT (about 30rules) All nic's have same ethtool settings: ethtool -k eth0 Features for eth0: Cannot get device udp-fragmentation-offload settings: Operation not supported rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: on receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: on tx-gre-csum-segmentation: on tx-ipxip4-segmentation: on tx-ipxip6-segmentation: on tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-gso-partial: on tx-sctp-segmentation: off [fixed] tx-esp-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off hw-tc-offload: off esp-hw-offload: off [fixed] esp-tx-csum-hw-offload: off [fixed] rx-udp_tunnel-port-offload: on ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 4096 Current hardware settings: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 2048 ethtool -c eth0 Coalesce parameters for eth0: Adaptive RX: off TX: off stats-block-usecs: 0 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 512 rx-frames: 0 rx-usecs-irq: 0 rx-frames-irq: 0 tx-usecs: 0 tx-frames: 0 tx-usecs-irq: 0 tx-frames-irq: 0 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0 Peft top for kernel 4.15.0-rc9 below (all 40 cores 100% cpu load with 6.3Mpps) 20.96% [kernel] [k] queued_spin_lock_slowpath 5.51% [kernel] [k] ixgbe_poll 5.49% [kernel] [k] ixgbe_xmit_frame_ring 4.39% [kernel] [k] do_raw_spin_lock 4.29% [kernel] [k] sch_direct_xmit 4.11% [kernel] [k] fib_table_lookup 3.11% [team_mode_roundrobin] [k] rr_transmit 2.71% [kernel] [k] __dev_queue_xmit 2.62% [kernel] [k] __ptr_ring_peek 2.39% [kernel] [k] skb_release_data 2.18% [kernel] [k] dev_gro_receive 1.75% [kernel] [k] __qdisc_run 1.67% [kernel] [k] pfifo_fast_enqueue 1.57% [kernel] [k] netdev_pick_tx 1.56% [kernel] [k] page_frag_free 1.48% [kernel] [k] ip_finish_output2 1.38% [kernel] [k] __slab_free 1.36% [kernel] [k] skb_unref 1.34% [kernel] [k] ixgbe_maybe_stop_tx 1.30% [kernel] [k] vlan_do_receive 1.28% [kernel] [k] pfifo_fast_dequeue 1.23% [kernel] [k] virt_to_head_page Same configuration kernel 4.15.0-rc3 (50% cpu load on all 40 cores with 6.3Mpps) 7.81% [kernel] [k] ixgbe_xmit_frame_ring 7.61% [kernel] [k] ixgbe_poll 7.09% [kernel] [k] do_raw_spin_lock 5.63% [kernel] [k] fib_table_lookup 5.19% [kernel] [k] __dev_queue_xmit 4.38% [team_mode_roundrobin] [k] rr_transmit 3.10% [kernel] [k] netdev_pick_tx 2.79% [kernel] [k] skb_release_data 2.34% [kernel] [k] dev_gro_receive 1.99% [kernel] [k] page_frag_free 1.96% [kernel] [k] skb_unref 1.92% [kernel] [k] virt_to_head_page 1.90% [kernel] [k] ixgbe_maybe_stop_tx
kernel 4.15.0-rc9+ (net-next) high cpu load at 50Gbit/s - about 6Mpps
Hi Today I made some real life traffic tests with kernel 4.15.0-rc9 but when traffic reach 50Gbit/s and about 6Mpps cpou load rises fast from 48% to 100% for all cpu cores. Here is some graph that presenting how cpu load rises when there was more pps. https://ibb.co/mhD5ob here is perf record from that time: https://pastebin.com/3zqG1rvE There is 8x 10G ixgbe 82599 interfaces teamed with teamd. No traffic queueing - only pfifo fast on all interfaces. No NAT or iptables forles other than INPUT (about 30rules) All nic's have same ethtool settings: ethtool -k eth0 Features for eth0: Cannot get device udp-fragmentation-offload settings: Operation not supported rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: on receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: on tx-gre-csum-segmentation: on tx-ipxip4-segmentation: on tx-ipxip6-segmentation: on tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-gso-partial: on tx-sctp-segmentation: off [fixed] tx-esp-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off hw-tc-offload: off esp-hw-offload: off [fixed] esp-tx-csum-hw-offload: off [fixed] rx-udp_tunnel-port-offload: on ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 4096 Current hardware settings: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 2048 ethtool -c eth0 Coalesce parameters for eth0: Adaptive RX: off TX: off stats-block-usecs: 0 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 512 rx-frames: 0 rx-usecs-irq: 0 rx-frames-irq: 0 tx-usecs: 0 tx-frames: 0 tx-usecs-irq: 0 tx-frames-irq: 0 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0