One of my out-of-tree patches is a network impairment tool that acts a lot like
an Ethernet bridge with latency, jitter, etc.

We noticed recently that we were seeing igb adapter errors when testing with 
our emulator
at high speeds.  For whatever reason, it is only easily reproduced when we add 
jitter
to our emulator.  This would cause a bit more CPU usage and lock contention in 
our software,
and would increase the skb pkts allocated at any given time.

I bisected the problem to the commit below:

Author: Eric Dumazet <eduma...@google.com>
Date:   Wed Aug 31 10:42:29 2016 -0700

    softirq: Let ksoftirqd do its job

    A while back, Paolo and Hannes sent an RFC patch adding threaded-able
    napi poll loop support : (https://patchwork.ozlabs.org/patch/620657/)
....

If I replace my emulator with a bridge, then I do not see the problem.  But, I 
also do not
(or very rarely?) see the problem when configuring the emulator with zero 
latency and jitter,
which is how the bridge would act.

Any idea what sort of (bad?) behaviour would be able to cause this tx q timeout?

If you have any interest, I will be happy to email you my out-of-tree patches 
and
instructions to reproduce the problem.


The kernel splat looks like this, and repeats often:


May 17 16:03:09 localhost.localdomain kernel: audit: type=1131 audit(1526598189.492:159): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 17 16:03:39 localhost.localdomain kernel: ------------[ cut here 
]------------
May 17 16:03:39 localhost.localdomain kernel: WARNING: CPU: 5 PID: 0 at 
/home/greearb/git/linux-bisect/net/sched/sch_generic.c:316 
dev_watchdog+0x234/0x240
May 17 16:03:39 localhost.localdomain kernel: NETDEV WATCHDOG: eth5 (igb): 
transmit queue 0 timed out
May 17 16:03:39 localhost.localdomain kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 fuse macvlan wanlink(O) pktgen cfg80211 sunrpc coretemp intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass ipmi_ssif iTCO_wdt iTCO_vendor_support joydev i2c_i801 lpc_ich i2c_smbus ioatdma shpchp wmi ipmi_si ipmi_msghandler tpm_tis tpm_tis_core tpm acpi_power_meter acpi_pad sch_fq_codel ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca i2c_algo_bit i2c_core fjes ipv6 crc_ccitt [last unloaded: nf_conntrack]
May 17 16:03:39 localhost.localdomain kernel: CPU: 5 PID: 0 Comm: swapper/5 
Tainted: G           O    4.8.0-rc7+ #132
May 17 16:03:39 localhost.localdomain kernel: Hardware name: Iron_Systems,Inc 
CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
May 17 16:03:39 localhost.localdomain kernel:  0000000000000000 
ffff88087fd43d78 ffffffff81417eb1 ffff88087fd43dc8
May 17 16:03:39 localhost.localdomain kernel:  0000000000000000 
ffff88087fd43db8 ffffffff81103556 0000013c7fd43da8
May 17 16:03:39 localhost.localdomain kernel:  0000000000000000 
ffff880854221940 0000000000000005 ffff880854bb8000
May 17 16:03:39 localhost.localdomain kernel: Call Trace:
May 17 16:03:39 localhost.localdomain kernel:  <IRQ>  [<ffffffff81417eb1>] 
dump_stack+0x63/0x82
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff81103556>] 
__warn+0xc6/0xe0
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff811035ba>] 
warn_slowpath_fmt+0x4a/0x50
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff817b3844>] 
dev_watchdog+0x234/0x240
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff817b3610>] ? 
qdisc_rcu_free+0x40/0x40
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff8116ea50>] 
call_timer_fn+0x30/0x150
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff817b3610>] ? 
qdisc_rcu_free+0x40/0x40
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff8116f35a>] 
run_timer_softirq+0x1ea/0x450
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff81176d97>] ? 
ktime_get+0x37/0xa0
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff8104fd21>] ? 
lapic_next_deadline+0x21/0x30
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff8117cffd>] ? 
clockevents_program_event+0x7d/0x120
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff81108b7a>] 
__do_softirq+0xca/0x2d0
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff81108ee3>] 
irq_exit+0xb3/0xc0
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff8105099d>] 
smp_apic_timer_interrupt+0x3d/0x50
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff81867882>] 
apic_timer_interrupt+0x82/0x90
May 17 16:03:39 localhost.localdomain kernel:  <EOI>  [<ffffffff816f9c06>] ? 
cpuidle_enter_state+0x126/0x300
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff816f9e02>] 
cpuidle_enter+0x12/0x20
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff81144ba5>] 
call_cpuidle+0x25/0x40
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff81144f6a>] 
cpu_startup_entry+0x2ba/0x380
May 17 16:03:39 localhost.localdomain kernel:  [<ffffffff8104e8d9>] 
start_secondary+0x149/0x170
May 17 16:03:39 localhost.localdomain kernel: ---[ end trace f62c6dd947785e8f 
]---


Thanks,
Ben

--
Ben Greear <gree...@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

Reply via email to