RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-18 Thread Koehrer Mathias (ETAS/ESW5)
Hi all,

> > >>
> > >> Can you continue your bisection using 'git bisect'?  You've already
> > >> narrowed it down between 4.0 and 4.1, so you're well on your way.
> > >>
> > >
> > > OK - done.
> > > And finally I was successful!
> > > The following git commit is the one that is causing the trouble!
> > > (The full commit is in the attachment).
> > > + BEGIN +++
> > > commit 387d37577fdd05e9472c20885464c2a53b3c945f
> > > Author: Matthew Garrett 
> > > Date:   Tue Apr 7 11:07:00 2015 -0700
> > >
> > > PCI: Don't clear ASPM bits when the FADT declares it's
> > > unsupported
> > >
> > > Communications with a hardware vendor confirm that the expected
> behaviour
> > > on systems that set the FADT ASPM disable bit but which still grant 
> > > full
> > > PCIe control is for the OS to leave any BIOS configuration intact and
> > > refuse to touch the ASPM bits.  This mimics the behaviour of Windows.
> > >
> > > Signed-off-by: Matthew Garrett 
> > > Signed-off-by: Bjorn Helgaas 
> > > + HEADER +++
> > >
> > > The only files that are modified by this commit are
> > > drivers/acpi/pci_root.c drivers/pci/pcie/aspm.c
> > > include/linux/pci-aspm.h
> > >
> > > This is all generic PCIe stuff - however I do not really understand
> > > what the changes of the commit are...
> > >
> > > In my setup I am using a dual port igb Ethernet adapter.
> > > This has an onboard PCIe switch and it might be that the
> > > configuration of this PCIe switch on the Intel board is causing the 
> > > trouble.
> > >
> > > Please see also the output of "lspci -v" in the attachment.
> > > The relevant PCI address of the NIC is 04:00.0 / 04:00.1
> > >
> > Hi Mathias,
> >
> > If you could set the output of lspci -vvv it might be more useful as
> > most of the configuration data isn't present in the lspci dump you had
> > attached.  Specifically if you could do this for the working case and
> > the non-working case we could verify if this issue is actually due to
> > the ASPM configuration on the device.
> >
> > Also one thing you might try is booting your kernel with the kernel
> > parameter "pcie_aspm=off".  It sounds like the extra latency is likely
> > due to your platform enabling ASPM on the device and this in turn will
> > add latency if the PCIe link is disabled when you attempt to perform a
> > read as it takes some time to bring the PCIe link up when in L1 state.
> 
> So if we assume it's this commit causing the regression, then it's safe to 
> assume that
> this system's BIOS is claiming to not support ASPM in the FADT, but the BIOS 
> is
> leaving ASPM configured in some way on the relevant devices.
> 
> Also, unfortunately, taking a look at the code which handles "pcie_aspm=off", 
> it
> won't be sufficient to disable ASPM on these this system, as disabling these 
> states is
> skipped when the FADT doesn't advertise ASPM support.
> 
> What would be needed is an option like "force", but which force _disables_ 
> ASPM.
> "force-disable", maybe.
> 

OK, I have now built a "good" kernel 
(using commit 37a9c502c0af013aaae094556830100c2bb133ac) and
a "bad" kernel
(using commit 387d37577fdd05e9472c20885464c2a53b3c945f).

Please find attached the outputs of "lspci -vvv" for both cases.
As assumed, in the "bad" case, the PCIe switch on the NIC board and the
two Ethernet controllers show "ASPM L1 Enabled" in "LnkCtl".
In the "good" case this is "ASPM disabled".

I tried also the kernel option "pcie_aspm=off" in the "bad" case.
However this had no impact, the issue still occurred!


Switching to kernel 4.8 I set the configuration for 
"Default ASPM policy" to CONFIG_PCIEASPM_PERFORMANCE
however this did not show any effect. 
This in contrast to the help text provided in the kernel configuration:
"Disable PCI Express ASPM L0s and L1, even if the BIOS enabled them."

For me the first step should be to make the CONFIG_PCIEASPM_PERFORMANCE
work as expected: It this is set, the ASPM should be forced to be disabled.
This is currently not the case.

During the boot phase I see in dmesg: 
"ACPI FADT declares the system doesn't support PCIe ASPM, so disable it"
This leads to a call of pcie_no_aspm() and this sets the aspm_policy to 
POLICY_DEFAULT
instead to the value that has been selected in the kernel configuration.

The following patch fixes the issue for me on kernel 4.8.
The config value CONFIG_PCIEASPM_PERFORMANCE is considered correctly.

+++ BEGIN +++
Consider the CONFIG_PCIEASPM_* values within pcie_no_aspm().

Signed-off-by: Mathias Koehrer 

---
 drivers/pci/pcie/aspm.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-4.8/drivers/pci/pcie/aspm.c
===
--- linux-4.8.orig/drivers/pci/pcie/aspm.c
+++ 

RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-17 Thread Koehrer Mathias (ETAS/ESW5)
Hi Julia!
> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
> > > anything RT specific about what you are seeing, but it might be nice
> > > to get confirmation.  Also, bisection would probably be easier if you 
> > > confirm on a
> vanilla kernel.
> > >
> > > I find it unlikely that it's a kernel config option that changed
> > > which regressed you, but instead was a code change to a driver.
> > > Which driver is now the question, and the surface area is still big
> > > (processor mapping attributes for this region, PCI root complex 
> > > configuration,
> PCI brige configuration, igb driver itself, etc.).
> > >
> > > Big enough that I'd recommend a bisection.  It looks like a
> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow 
> > > down,
> assuming all goes well.
> > >
> >
> > I have now repeated my tests using the vanilla kernel.
> > There I got the very same issue.
> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue 
> > appears.
> 
> Great, thanks for confirming!  That helps narrow things down quite a bit.
> 
> > Here is my exact (reproducible) test description:
> > I applied the following patch to the kernel to get the igb trace.
> > This patch instruments the igb_rd32() function to measure the call to
> > readl() which is used to access registers of the igb NIC.
> 
> I took your test setup and ran it between 4.0 and 4.1 on the hardware on my 
> desk,
> which is an Atom-based board with dual I210s, however I didn't see much
> difference.
> 
> However, it's a fairly simple board, with a much simpler PCI topology than 
> your
> workstation.  I'll see if I can find some other hardware to test on.
> 
> [..]
> > This means, that I think that some other stuff in kernel 4.1 has
> > changed, which has impact on the igb accesses.
> >
> > Any idea what component could cause this kind of issue?
> 
> Can you continue your bisection using 'git bisect'?  You've already narrowed 
> it down
> between 4.0 and 4.1, so you're well on your way.
> 

OK - done.
And finally I was successful!
The following git commit is the one that is causing the trouble!
(The full commit is in the attachment).
+ BEGIN +++
commit 387d37577fdd05e9472c20885464c2a53b3c945f
Author: Matthew Garrett 
Date:   Tue Apr 7 11:07:00 2015 -0700

PCI: Don't clear ASPM bits when the FADT declares it's unsupported

Communications with a hardware vendor confirm that the expected behaviour
on systems that set the FADT ASPM disable bit but which still grant full
PCIe control is for the OS to leave any BIOS configuration intact and
refuse to touch the ASPM bits.  This mimics the behaviour of Windows.

Signed-off-by: Matthew Garrett 
Signed-off-by: Bjorn Helgaas 
+ HEADER +++

The only files that are modified by this commit are 
drivers/acpi/pci_root.c
drivers/pci/pcie/aspm.c
include/linux/pci-aspm.h

This is all generic PCIe stuff - however I do not really understand what
the changes of the commit are...

In my setup I am using a dual port igb Ethernet adapter.
This has an onboard PCIe switch and it might be that the configuration of this
PCIe switch on the Intel board is causing the trouble.

Please see also the output of "lspci -v" in the attachment.
The relevant PCI address of the NIC is 04:00.0 / 04:00.1

Any feedback on this is welcome!

Thanks

Mathias







387d37577fdd05e9472c20885464c2a53b3c945f.patch.gz
Description: 387d37577fdd05e9472c20885464c2a53b3c945f.patch.gz


lspci.gz
Description: lspci.gz


RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-14 Thread Koehrer Mathias (ETAS/ESW5)
Hi Julia,
> Have you tested on a vanilla (non-RT) kernel?  I doubt there is anything RT 
> specific
> about what you are seeing, but it might be nice to get confirmation.  Also, 
> bisection
> would probably be easier if you confirm on a vanilla kernel.
> 
> I find it unlikely that it's a kernel config option that changed which 
> regressed you, but
> instead was a code change to a driver.  Which driver is now the question, and 
> the
> surface area is still big (processor mapping attributes for this region, PCI 
> root
> complex configuration, PCI brige configuration, igb driver itself, etc.).
> 
> Big enough that I'd recommend a bisection.  It looks like a bisection between 
> 3.18
> and 4.8 would take you about 18 tries to narrow down, assuming all goes well.
> 

I have now repeated my tests using the vanilla kernel.
There I got the very same issue.
Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.


Here is my exact (reproducible) test description:
I applied the following patch to the kernel to get the igb trace.
This patch instruments the igb_rd32() function to measure the call
to readl() which is used to access registers of the igb NIC.


++ BEGIN PATCH 

Index: linux-4.8/drivers/net/ethernet/intel/igb/trace.h
===
--- /dev/null
+++ linux-4.8/drivers/net/ethernet/intel/igb/trace.h
@@ -0,0 +1,34 @@
+
+#if !defined(_TRACE_IGB_H_) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_IGB_H_ 
+
+#include 
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM igb
+
+
+#define _TRACE_H_
+
+
+TRACE_EVENT(igb,
+TP_PROTO(u32 val),
+TP_ARGS(val),
+TP_STRUCT__entry(
+__field(u32, val)
+),
+TP_fast_assign(
+__entry->val = val;
+),
+TP_printk("val: %u",
+   __entry->val)
+);
+
+
+#endif
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH drivers/net/ethernet/intel/igb 
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+#include 
Index: linux-4.8/drivers/net/ethernet/intel/igb/Makefile
===
--- linux-4.8.orig/drivers/net/ethernet/intel/igb/Makefile
+++ linux-4.8/drivers/net/ethernet/intel/igb/Makefile
@@ -28,6 +28,7 @@
 #
 # Makefile for the Intel(R) 82575 PCI-Express ethernet driver
 #
+ccflags-y += -I.
 
 obj-$(CONFIG_IGB) += igb.o
 
Index: linux-4.8/drivers/net/ethernet/intel/igb/igb_main.c
===
--- linux-4.8.orig/drivers/net/ethernet/intel/igb/igb_main.c
+++ linux-4.8/drivers/net/ethernet/intel/igb/igb_main.c
@@ -57,6 +57,9 @@
 #include 
 #include "igb.h"
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 #define MAJ 5
 #define MIN 3
 #define BUILD 0
@@ -753,7 +756,9 @@ u32 igb_rd32(struct e1000_hw *hw, u32 re
if (E1000_REMOVED(hw_addr))
return ~value;
 
+trace_igb(801);
value = readl(_addr[reg]);
+trace_igb(802);
 
/* reads should not return all F's */
if (!(~value) && (!reg || !(~readl(hw_addr {


++ END PATCH 


I build the kernel with this patch applied, rebooted the PC to run this kernel 
and used the 
following script for my test.

++ BEGIN SCRIPT  +
#!/bin/bash

for f in /sys/devices/system/cpu/cpu[0-9]*/cpufreq/scaling_governor ; do
if [ -w $f ]; then
echo "performance" > $f
fi
done

if true; then
rmmod igb
modprobe igb
ethtool -L eth2 combined 1
ifconfig eth2 up 192.168.100.111
fi

ifconfig

mount /sys/kernel/debug

( cd /sys/kernel/debug/tracing
  echo 0 > tracing_on
  echo 0 > events/enable
  echo 1 > events/igb/enable
  echo "print-parent" > trace_options
  echo "latency-format" > trace_options
  echo 1 > tracing_on

  sleep 4
  cat trace
)
++ END SCRIPT  +

The results of this for kernel 4.0:
[...]
kworker/-12393...1 49699046us : igb: val: 801
kworker/-12393...1 49699047us : igb: val: 802
kworker/-12393...1 49699047us : igb: val: 801
kworker/-12393...1 49699048us+: igb: val: 802
kworker/-12393...1 49699099us : igb: val: 801
kworker/-12393...1 49699100us : igb: val: 802
kworker/-12393...1 49699100us : igb: val: 801
kworker/-12393...1 49699102us : igb: val: 802
kworker/-12393...1 49699102us : igb: val: 801
kworker/-12393...1 49699103us : igb: val: 802
kworker/-12393...1 49699103us : igb: val: 801
kworker/-12393...1 49699104us : igb: val: 802
kworker/-12393...1 49699104us : igb: val: 801
kworker/-12393...1 49699105us : igb: val: 802
kworker/-12393...1 49699105us : igb: val: 801
kworker/-12393...1 49699107us : igb: val: 802
kworker/-12393...1 49699107us : igb: val: 801
kworker/-12393...1 49699108us : igb: val: 

RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-13 Thread Koehrer Mathias (ETAS/ESW5)
Hi all!
> 
> Hi Julia,
> 
> thanks for the detailed analysis!
> >
> > [...]
> > Okay, we finally received our wakeup event.  We were expecting to be
> > woken up at 10024735653388ns, but were actually woken up at
> 10024735682387ns.
> >
> >   10024735682387 - 10024735653388 = 28999ns
> >
> > Our timer fired ~29us late!  But why...?
> >
> > Sorry I don't have answers, just more questions.  I do wonder what
> > kworker/0:3 was up to at the time the timer interrupt should have fired.
> >
> >Julia
> I have now instrumented the igb driver to generate trace points that allows 
> me to
> identify in more details what's going on.
> This delivered interesting results!
> 
> Here are some places where I added traces:
> In file igb_ptp.c:
> void igb_ptp_rx_hang(struct igb_adapter *adapter) {
> struct e1000_hw *hw = >hw;
> unsigned long rx_event;
> u32 tsyncrxctl;
> trace_igb(700);
> tsyncrxctl = rd32(E1000_TSYNCRXCTL);
> trace_igb(701);
> 
> /* Other hardware uses per-packet timestamps */
> if (hw->mac.type != e1000_82576)
> return;
> ...
> 
> In file igb_main.c:
> static void igb_check_lvmmc(struct igb_adapter *adapter) {
> struct e1000_hw *hw = >hw;
> u32 lvmmc;
> 
> trace_igb(600);
> lvmmc = rd32(E1000_LVMMC);
> trace_igb(601);
> if (lvmmc) {
> ...
> 
> When I run now my test, I get the following trace:
> [...]
> kworker/-607 0... 107315621us+: igb: val: 700
> kworker/-607 0d..h... 107315640us : irq_handler_entry: irq=47 
> name=eth2-rx-0
> kworker/-607 0d..h... 107315640us : irq_handler_exit: irq=47 ret=handled
> kworker/-607 0d..h1.. 107315640us : sched_waking: comm=irq/47-eth2-rx-
> pid=18009 prio=49 target_cpu=000
> kworker/-607 0dN.h2.. 107315641us : sched_wakeup: comm=irq/47-eth2-rx-
> pid=18009 prio=49 target_cpu=000
> kworker/-607 0dN.h1.. 107315643us : irq_handler_entry: irq=48 
> name=eth2-tx-0
> kworker/-607 0dN.h1.. 107315643us : irq_handler_exit: irq=48 ret=handled
> kworker/-607 0dN.h2.. 107315643us : sched_waking: comm=irq/48-eth2-tx-
> pid=18010 prio=49 target_cpu=000
> kworker/-607 0dN.h3.. 107315644us : sched_wakeup: comm=irq/48-eth2-tx-
> pid=18010 prio=49 target_cpu=000
> kworker/-607 0dN..1.. 107315644us : rcu_utilization: Start context switch
> kworker/-607 0dN..1.. 107315644us : rcu_utilization: End context switch
> kworker/-607 0dN..2.. 107315644us : sched_stat_runtime: comm=kworker/0:1
> pid=607 runtime=88996 [ns] vruntime=49754678074 [ns]
> kworker/-607 0d...2.. 107315645us : sched_switch: prev_comm=kworker/0:1
> prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=irq/47-eth2-rx-
> next_pid=18009 next_prio=49
> irq/47-e-18009   0d11 107315646us : softirq_raise: vec=3 [action=NET_RX]
> irq/47-e-18009   0.12 107315646us : softirq_entry: vec=3 [action=NET_RX]
> irq/47-e-18009   0.12 107315647us : napi_poll: napi poll on napi struct
> 88040ae58c50 for device eth2 work 0 budget 64
> irq/47-e-18009   0.12 107315647us : softirq_exit: vec=3 [action=NET_RX]
> irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: Start context switch
> irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: End context switch
> irq/47-e-18009   0d...2.. 107315648us : sched_switch: 
> prev_comm=irq/47-eth2-rx-
> prev_pid=18009 prev_prio=49 prev_state=S ==> next_comm=irq/48-eth2-tx-
> next_pid=18010 next_prio=49
> irq/48-e-18010   0d11 107315649us : softirq_raise: vec=3 [action=NET_RX]
> irq/48-e-18010   0.12 107315649us : softirq_entry: vec=3 [action=NET_RX]
> irq/48-e-18010   0.12 107315650us : napi_poll: napi poll on napi struct
> 88040ae5f450 for device eth2 work 0 budget 64
> irq/48-e-18010   0.12 107315650us : softirq_exit: vec=3 [action=NET_RX]
> irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: Start context switch
> irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: End context switch
> irq/48-e-18010   0d...2.. 107315651us : sched_switch: 
> prev_comm=irq/48-eth2-tx-
> prev_pid=18010 prev_prio=49 prev_state=S ==> next_comm=kworker/0:1
> next_pid=607 next_prio=120
> kworker/-607 0... 107315652us : igb: val: 701
> kworker/-607 0... 107315652us : igb: val: 106
> kworker/-607 0... 107315652us : igb: val: 107
> kworker/-607 0... 107315652us+: igb: val: 600
> kworker/-607 0d..h... 107315689us : local_timer_entry: vector=239
> kworker/-607 0d..h1.. 107315689us : hrtimer_interrupt: cpu=0 offset=-34521
> curr=kworker/0:1[120] thread=cyclictest[19]
> kworker/-607 0d..h1.. 107315689us : hrtimer_cancel: 
> hrtimer=8803d42efe18
> kworker/-607 0d..h... 107315689us : hrtimer_expire_entry:
> hrtimer=8803d42efe18 function=hrtimer_wakeup now=752735681960
> kworker/-607 0d..h1.. 107315689us : sched_waking: comm=cyclictest 
> pid=18015
> prio=19 target_cpu=000
> kworker/-607 0dN.h2.. 107315690us : sched_wakeup: 

Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-13 Thread Koehrer Mathias (ETAS/ESW5)
Hi Julia,

thanks for the detailed analysis!
> 
> [...] 
> Okay, we finally received our wakeup event.  We were expecting to be woken up 
> at
> 10024735653388ns, but were actually woken up at 10024735682387ns.
> 
>   10024735682387 - 10024735653388 = 28999ns
> 
> Our timer fired ~29us late!  But why...?
> 
> Sorry I don't have answers, just more questions.  I do wonder what
> kworker/0:3 was up to at the time the timer interrupt should have fired.
> 
>Julia
I have now instrumented the igb driver to generate trace points that allows me 
to identify in 
more details what's going on.
This delivered interesting results!

Here are some places where I added traces:
In file igb_ptp.c:
void igb_ptp_rx_hang(struct igb_adapter *adapter)
{
struct e1000_hw *hw = >hw;
unsigned long rx_event;
u32 tsyncrxctl;
trace_igb(700);
tsyncrxctl = rd32(E1000_TSYNCRXCTL);
trace_igb(701);

/* Other hardware uses per-packet timestamps */
if (hw->mac.type != e1000_82576)
return;
...

In file igb_main.c:
static void igb_check_lvmmc(struct igb_adapter *adapter)
{
struct e1000_hw *hw = >hw;
u32 lvmmc;

trace_igb(600);
lvmmc = rd32(E1000_LVMMC);
trace_igb(601);
if (lvmmc) {
...

When I run now my test, I get the following trace:
[...]
kworker/-607 0... 107315621us+: igb: val: 700
kworker/-607 0d..h... 107315640us : irq_handler_entry: irq=47 name=eth2-rx-0
kworker/-607 0d..h... 107315640us : irq_handler_exit: irq=47 ret=handled
kworker/-607 0d..h1.. 107315640us : sched_waking: comm=irq/47-eth2-rx- 
pid=18009 prio=49 target_cpu=000
kworker/-607 0dN.h2.. 107315641us : sched_wakeup: comm=irq/47-eth2-rx- 
pid=18009 prio=49 target_cpu=000
kworker/-607 0dN.h1.. 107315643us : irq_handler_entry: irq=48 name=eth2-tx-0
kworker/-607 0dN.h1.. 107315643us : irq_handler_exit: irq=48 ret=handled
kworker/-607 0dN.h2.. 107315643us : sched_waking: comm=irq/48-eth2-tx- 
pid=18010 prio=49 target_cpu=000
kworker/-607 0dN.h3.. 107315644us : sched_wakeup: comm=irq/48-eth2-tx- 
pid=18010 prio=49 target_cpu=000
kworker/-607 0dN..1.. 107315644us : rcu_utilization: Start context switch
kworker/-607 0dN..1.. 107315644us : rcu_utilization: End context switch
kworker/-607 0dN..2.. 107315644us : sched_stat_runtime: comm=kworker/0:1 
pid=607 runtime=88996 [ns] vruntime=49754678074 [ns]
kworker/-607 0d...2.. 107315645us : sched_switch: prev_comm=kworker/0:1 
prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=irq/47-eth2-rx- 
next_pid=18009 next_prio=49
irq/47-e-18009   0d11 107315646us : softirq_raise: vec=3 [action=NET_RX]
irq/47-e-18009   0.12 107315646us : softirq_entry: vec=3 [action=NET_RX]
irq/47-e-18009   0.12 107315647us : napi_poll: napi poll on napi struct 
88040ae58c50 for device eth2 work 0 budget 64
irq/47-e-18009   0.12 107315647us : softirq_exit: vec=3 [action=NET_RX]
irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: Start context switch
irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: End context switch
irq/47-e-18009   0d...2.. 107315648us : sched_switch: prev_comm=irq/47-eth2-rx- 
prev_pid=18009 prev_prio=49 prev_state=S ==> next_comm=irq/48-eth2-tx- 
next_pid=18010 next_prio=49
irq/48-e-18010   0d11 107315649us : softirq_raise: vec=3 [action=NET_RX]
irq/48-e-18010   0.12 107315649us : softirq_entry: vec=3 [action=NET_RX]
irq/48-e-18010   0.12 107315650us : napi_poll: napi poll on napi struct 
88040ae5f450 for device eth2 work 0 budget 64
irq/48-e-18010   0.12 107315650us : softirq_exit: vec=3 [action=NET_RX]
irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: Start context switch
irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: End context switch
irq/48-e-18010   0d...2.. 107315651us : sched_switch: prev_comm=irq/48-eth2-tx- 
prev_pid=18010 prev_prio=49 prev_state=S ==> next_comm=kworker/0:1 next_pid=607 
next_prio=120
kworker/-607 0... 107315652us : igb: val: 701
kworker/-607 0... 107315652us : igb: val: 106
kworker/-607 0... 107315652us : igb: val: 107
kworker/-607 0... 107315652us+: igb: val: 600
kworker/-607 0d..h... 107315689us : local_timer_entry: vector=239
kworker/-607 0d..h1.. 107315689us : hrtimer_interrupt: cpu=0 offset=-34521 
curr=kworker/0:1[120] thread=cyclictest[19]
kworker/-607 0d..h1.. 107315689us : hrtimer_cancel: hrtimer=8803d42efe18
kworker/-607 0d..h... 107315689us : hrtimer_expire_entry: 
hrtimer=8803d42efe18 function=hrtimer_wakeup now=752735681960
kworker/-607 0d..h1.. 107315689us : sched_waking: comm=cyclictest pid=18015 
prio=19 target_cpu=000
kworker/-607 0dN.h2.. 107315690us : sched_wakeup: comm=cyclictest pid=18015 
prio=19 target_cpu=000
kworker/-607 0dN.h... 107315690us : hrtimer_expire_exit: 
hrtimer=8803d42efe18
kworker/-607 0dN.h1.. 107315690us : hrtimer_interrupt: cpu=0 offset=318040 

Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-07 Thread Koehrer Mathias (ETAS/ESW5)
Hi Mitch,
> > >
> > > Although, to be clear, it isn't the fact that there exists 8
> > > threads, it's
> > that the device is
> > > firing all 8 interrupts at the same time.  The time spent in hardirq
> > context just waking
> > > up all 8 of those threads (and the cyclictest wakeup) is enough to
> > > cause
> > your
> > > regression.
> > >
> > > netdev/igb folks-
> > >
> > > Under what conditions should it be expected that the i350 trigger
> > > all of
> > the TxRx
> > > interrupts simultaneously?  Any ideas here?
> 
> I can answer that! I wrote that code.
> 
> We trigger the interrupts once a second because MSI and MSI-X interrupts are 
> NOT
> guaranteed to be delivered. If this happens, the queues being serviced by 
> this "lost"
> interrupt are completely stuck.
> 
> The device automatically masks each interrupt vector after it fires, 
> expecting the
> ISR to re-enable the vector after processing is complete. If the interrupt is 
> lost, the
> ISR doesn't run, so the vector ends up permanently masked. At this point, any
> queues associated with that vector are stuck. The only recovery is through the
> netdev watchdog, which initiates a reset.
> 
> During development of igb, we had several platforms with chipsets that 
> routinely
> dropped MSI messages under stress. Things would be running fine and then, pow,
> all the traffic on a queue would stop.
> 
> So, I added code to fire each vector once per second. Just unmasking the 
> interrupt
> isn't enough - we need to trigger the ISR to get the queues cleaned up so the 
> device
> can work again.
> 
> Is this workaround still needed? I don't know. Modern chipsets don't break a 
> sweat
> handling gigabit-speed traffic, and they *probably* don't drop interrupts. 
> But I'd still
> rather have that insurance.
> 
> You could try to remove the write to the EICS registers in the watchdog task 
> to see if
> that takes care of your problem. But I wouldn't want to remove that code
> permanently, because we have seen lost interrupts in the past.
> 
> You also could try staggering the writes so that not all vectors fire each 
> second. But
> then you'll potentially incur a much longer delay if an interrupt does get 
> lost, which
> means you could trigger netdev watchdog events.
> 
> -Mitch

Thanks for the explanation. Now I have switched over to kernel 4.8-rt1.
This shows the very same effect.
I modified the in-kernel's igb_main.c (function igb_watchdog_task) to comment 
out
the access to the EICS registers:

--- igb_main.c.orig 2016-10-07 10:43:37.855873754 +0200
+++ igb_main.c  2016-10-07 10:31:51.451346292 +0200
@@ -4613,6 +4613,7 @@ no_wait:
}

/* Cause software interrupt to ensure Rx ring is cleaned */
+#if 0
if (adapter->flags & IGB_FLAG_HAS_MSIX) {
u32 eics = 0;

@@ -4622,6 +4623,7 @@ no_wait:
} else {
wr32(E1000_ICS, E1000_ICS_RXDMT0);
}
+#endif

igb_spoof_check(adapter);
igb_ptp_rx_hang(adapter);


The result is now slighty better, however the max latency that has been 
measured by
cyclictest is still much higher compared to the values of kernel 3.18.27-rt27.
I have now enabled all events, the execution of 
# cyclictest -a -i 105 -m -n -p 80 -t 1  -b 30 -C
delivers the following trace values


cyclicte-10062   0... 10025813402us : sys_exit: NR 230 = 0
cyclicte-10062   0... 10025813402us : sys_enter: NR 230 (1, 1, 
773ff930, 0, 0, 2bd7e12e)
cyclicte-10062   0... 10025813402us : hrtimer_init: 
hrtimer=88040a017e18 clockid=CLOCK_MONOTONIC mode=HRTIMER_MODE_ABS
cyclicte-10062   0d...1.. 10025813403us : hrtimer_start: 
hrtimer=88040a017e18 function=hrtimer_wakeup expires=10024735653388 
softexpires=10024735653388
cyclicte-10062   0d...1.. 10025813403us : write_msr: 6e0, value 20ca630b9aef
cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: Start context switch
cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: End context switch
cyclicte-10062   0d...2.. 10025813404us : sched_switch: prev_comm=cyclictest 
prev_pid=10062 prev_prio=19 prev_state=S ==> next_comm=kworker/0:3 
next_pid=1388 next_prio=120
cyclicte-10062   0d...2.. 10025813404us+: x86_fpu_regs_deactivated: x86/fpu: 
88040c603ec0 fpregs_active: 0 fpstate_active: 1 counter: 69 xfeatures: 2 
xcomp_bv: 0
kworker/-13880d..h... 10025813468us : irq_handler_entry: irq=48 
name=eth2-tx-0
kworker/-13880d..h... 10025813468us : irq_handler_exit: irq=48 ret=handled
kworker/-13880d..h1.. 10025813469us : sched_waking: comm=irq/48-eth2-tx- 
pid=10057 prio=49 target_cpu=000
kworker/-13880dN.h2.. 10025813469us : sched_wakeup: comm=irq/48-eth2-tx- 
pid=10057 prio=49 target_cpu=000
kworker/-13880dN.h1.. 10025813470us : irq_handler_entry: irq=47 
name=eth2-rx-0
kworker/-13880dN.h1.. 10025813470us : irq_handler_exit: irq=47 ret=handled
kworker/-13880dN.h2.. 10025813471us : sched_waking: comm=irq/47-eth2-rx- 
pid=10056 prio=49 target_cpu=000
kworker/-1388

Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

2016-10-06 Thread Koehrer Mathias (ETAS/ESW5)
Hi all,
> 
> Although, to be clear, it isn't the fact that there exists 8 threads, it's 
> that the device is
> firing all 8 interrupts at the same time.  The time spent in hardirq context 
> just waking
> up all 8 of those threads (and the cyclictest wakeup) is enough to cause your
> regression.
> 
> netdev/igb folks-
> 
> Under what conditions should it be expected that the i350 trigger all of the 
> TxRx
> interrupts simultaneously?  Any ideas here?
> 
> See the start of this thread here:
> 
>   http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56@FE-
> MBX1012.de.bosch.com
> 
Greg recommended to use "ethtool -L eth2 combined 1" to reduce the number of 
queues.
I tried that. Now, I have actually only three irqs (eth2, eth2-rx-0, eth2-tx-0).
However the issue remains the same.

I ran the cyclictest again:
# cyclictest -a -i 105 -m -n -p 80 -t 1  -b 23 -C
(Note: When using 105us instead of 100us the long latencies seem to occur more 
often).

Here are the final lines of the kernel trace output:
  -0   4d...2.. 1344661649us : sched_switch: prev_comm=swapper/4 
prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/4 next_pid=56 
next_prio=98
ktimerso-46  3d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/3 
prev_pid=46 prev_prio=98 prev_state=S ==> next_comm=swapper/3 next_pid=0 
next_prio=120
ktimerso-24  1d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/1 
prev_pid=24 prev_prio=98 prev_state=S ==> next_comm=swapper/1 next_pid=0 
next_prio=120
ktimerso-79  6d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/6 
prev_pid=79 prev_prio=98 prev_state=S ==> next_comm=swapper/6 next_pid=0 
next_prio=120
ktimerso-35  2d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/2 
prev_pid=35 prev_prio=98 prev_state=S ==> next_comm=swapper/2 next_pid=0 
next_prio=120
  rcuc/5-67  5d...2.. 1344661650us : sched_switch: prev_comm=rcuc/5 
prev_pid=67 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/5 next_pid=68 
next_prio=98
  rcuc/7-89  7d...2.. 1344661650us : sched_switch: prev_comm=rcuc/7 
prev_pid=89 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/7 next_pid=90 
next_prio=98
ktimerso-4   0d...211 1344661650us : sched_wakeup: comm=rcu_preempt pid=8 
prio=98 target_cpu=000
  rcuc/4-56  4d...2.. 1344661651us : sched_switch: prev_comm=rcuc/4 
prev_pid=56 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/4 next_pid=57 
next_prio=98
ktimerso-4   0d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/0 
prev_pid=4 prev_prio=98 prev_state=S ==> next_comm=rcu_preempt next_pid=8 
next_prio=98
ktimerso-90  7d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/7 
prev_pid=90 prev_prio=98 prev_state=S ==> next_comm=swapper/7 next_pid=0 
next_prio=120
ktimerso-68  5d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/5 
prev_pid=68 prev_prio=98 prev_state=S ==> next_comm=swapper/5 next_pid=0 
next_prio=120
rcu_pree-8   0d...3.. 1344661652us : sched_wakeup: comm=rcuop/0 pid=10 
prio=120 target_cpu=000
ktimerso-57  4d...2.. 1344661652us : sched_switch: prev_comm=ktimersoftd/4 
prev_pid=57 prev_prio=98 prev_state=S ==> next_comm=swapper/4 next_pid=0 
next_prio=120
rcu_pree-8   0d...2.. 1344661653us+: sched_switch: prev_comm=rcu_preempt 
prev_pid=8 prev_prio=98 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 
next_prio=120
kworker/-5   0dN.h2.. 1344661741us : sched_wakeup: comm=cyclictest pid=6314 
prio=19 target_cpu=000
kworker/-5   0d...2.. 1344661742us : sched_switch: prev_comm=kworker/0:0 
prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 
next_prio=19
cyclicte-63140d...2.. 1344661743us : sched_switch: prev_comm=cyclictest 
prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=rcuop/0 next_pid=10 
next_prio=120
 rcuop/0-10  0d...2.. 1344661744us!: sched_switch: prev_comm=rcuop/0 
prev_pid=10 prev_prio=120 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 
next_prio=120
kworker/-5   0dN.h2.. 1344661858us : sched_wakeup: comm=cyclictest pid=6314 
prio=19 target_cpu=000
kworker/-5   0d...2.. 1344661859us : sched_switch: prev_comm=kworker/0:0 
prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 
next_prio=19
cyclicte-63140d...2.. 1344661860us!: sched_switch: prev_comm=cyclictest 
prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 
next_prio=120
kworker/-5   0dN.h2.. 1344661966us : sched_wakeup: comm=cyclictest pid=6314 
prio=19 target_cpu=000
kworker/-5   0d...2.. 1344661966us : sched_switch: prev_comm=kworker/0:0 
prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 
next_prio=19
cyclicte-63140d...2.. 1344661967us+: sched_switch: prev_comm=cyclictest 
prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 
next_prio=120
kworker/-5   0dN.h2.. 1344662052us : sched_wakeup: comm=cyclictest pid=6314 
prio=19 target_cpu=000
kworker/-5   

Intel Ethernet driver igb causes huge latencies with cyclictest (rt-tests)

2016-10-05 Thread Koehrer Mathias (ETAS/ESW5)
Hi all,

I noticed that with fairly new versions of the Linux kernel, the igb driver
causes huge latencies with the cyclictest in a RT_PREEMPT environment.
The root cause seems to be the number of interrupts that are used for the igb
NIC devices as multiple of these irqs may occur at the same time (see below).

With the kernel 4.6.7-rt14 the igb uses 9 (!) irqs per NIC on an Intel Core i7 
PC (x86-64):
E.g. eth2, and eth2-TxRx-0, eth2-TxRx-1, ... , eth2-TxRx-7.

Running the very same machine with kernel 3.18.27-rt27 there are only 2 irqs:
eth2 and eth2-TxRx0

The issue with the many irqs is now that they are all fired roughly the same 
time
even if the link is down as nothing is connected to the NIC.
I analyzed the execution of the cyclictest tool using the kernel tracer on 
kernel 4.6.7-rt14:

kworker/-5   0dN.h2.. 1504647372us : sched_wakeup: comm=cyclictest pid=5887 
prio=19 target_cpu=000
kworker/-5   0dN.h3.. 1504647374us : sched_wakeup: comm=irq/54-eth2-TxR 
pid=5883 prio=49 target_cpu=000
kworker/-5   0dN.h3.. 1504647375us : sched_wakeup: comm=irq/53-eth2-TxR 
pid=5882 prio=49 target_cpu=000
kworker/-5   0dN.h3.. 1504647377us : sched_wakeup: comm=irq/52-eth2-TxR 
pid=5881 prio=49 target_cpu=000
kworker/-5   0dN.h3.. 1504647378us : sched_wakeup: comm=irq/51-eth2-TxR 
pid=5880 prio=49 target_cpu=000
kworker/-5   0dN.h3.. 1504647380us : sched_wakeup: comm=irq/50-eth2-TxR 
pid=5879 prio=49 target_cpu=000
kworker/-5   0dN.h3.. 1504647381us : sched_wakeup: comm=irq/49-eth2-TxR 
pid=5878 prio=49 target_cpu=000
kworker/-5   0dN.h3.. 1504647382us : sched_wakeup: comm=irq/48-eth2-TxR 
pid=5877 prio=49 target_cpu=000
kworker/-5   0dN.h3.. 1504647383us : sched_wakeup: comm=irq/47-eth2-TxR 
pid=5876 prio=49 target_cpu=000
kworker/-5   0d...2.. 1504647384us : sched_switch: prev_comm=kworker/0:0 
prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=5887 
next_prio=19

Here it can be clearly seen that eight irqs from the igb are coming in at the 
same time.
This leads to a fairly long phase of running in irq mode which hurts the real 
time latency.

In my setup I have no cable connected to the eth2,
I do a 
# modprobe igb
# ifconfig eth2 up 192.168.100.111

I did multiple tests with analyzing and modifying the igb driver.
The function "igb_watchdog_task" seems to be the root cause of the issue.
Whenever I disable this function the cyclictest shows great results.

There has been lengthy discussion on that topic on the rt-users mailing list:
http://marc.info/?t=14745483663=1=2 

My question is now:
How can I either use only 1 irq per NIC using the igb driver or how can 
the driver be reorganized to let the watchdog task trigger the irqs alternately.

Thanks for any feedback

Regards

Mathias