subject:"rt_e1000e\: Detected Hardware Unit Hang"

Re: rt_e1000e: Detected Hardware Unit Hang

2018-11-27 Thread Jan Kiszka via Xenomai

On 27.11.18 10:47, Means Lee wrote:
I can't reduce the size of my patch for the e1000e realtime driver existed in
Xenomai, because the non-realtime driver evolutions a lot. So I offered the diff
file of three files I changed when porting the driver. On my view, the hardware
oprations shall be unchanged so I focused on the change of netdev.c. I modfied
param.c and e1000.h changed the private data structure and parameters a little.

For the question you asked:
- the upstream Linux driver I ported over works fine with my hardware, even when
I try to put a strong pressure on it(UDP broadcast storm).
- when I meet the hardware unit hang, the Tx completion interrupt didn't
dissapper but it do reduced a lot.

- I didn't enable CONFIG_IPIPE_DEBUG_CONTEXT, but I do uses lock in several
places.

Then this should be the next thing to do. This not only detects direct locking
issues but also those triggered by calling into Linux functions that take normal
locks.

- then the TX path
- the interrupt registertion code was shown below:
err = rtdm_irq_request(>irq_handle,
adapter->pdev->irq, e1000_intr_msi, 0,
netdev->name, adapter);
- I write a new index of ring buffer to TDT register to notify the hardware
there is an packet should be sent.
writel(tx_ring->next_to_use, tx_ring->tail);//after writel, the
interrupt routine shall be launched.
- If the 'event flow' means the event during the transmit process, the event

I mean specifically if both the vanilla driver as well as the ported version
take the code path and receive the same interrupts when sending packets. Of
course, you can put identical package load on both because higher RTnet layers
do no exist for vanilla Linux. You may capture the outgoing traffic under RTnet
(RTcap) and replay that under Linux.

flow is shown below:
e1000e_xmit_frame send an packet atomicly
e1000_tx_map use DMA to map the packet(maped
before,so just get the DMA address)
e1000_tx_queue make sure the tx ring buffer
index right

write the TDT register to tell hardware to send an packet
after the hardware sent an packet, it supposed to trigger an TX completion
interrupt and the driver shall response:

e1000_intr_msi triggered when Tx/Rx
completion
e1000_clean_tx_irq recycle the transmit resource

By the way, I found that every time master station sent an Ready frame belongs
to RTcfg, this Tx hung shows up. And the comunication
before that works fine: the TDMA sync frame send properly and every stage
before the Ready frame goes well.
If I let it stay in the RTCFG_MAIN_CLIENT_2 stage(so far the master and slave
known each other), master and slave could comunicate
properly. So the Ready frame triggers this problem, but why? An frame of
specific format triggers the hardware hung, why it happens?

RTcfg is unlikely to be the reason, but maybe the transmission pattern triggers
the issue in the driver.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

Re: rt_e1000e: Detected Hardware Unit Hang

2018-11-27 Thread Means Lee via Xenomai

I can't reduce the size of my patch for the e1000e realtime driver existed
in Xenomai, because the non-realtime driver evolutions a lot. So I offered
the diff file of three files I changed when porting the driver. On my view,
the hardware oprations shall be unchanged so I focused on the change of
netdev.c. I modfied param.c and e1000.h changed the private data structure
and parameters a little.

For the question you asked:
- the upstream Linux driver I ported over works fine with my hardware, even
when I try to put a strong pressure on it(UDP broadcast storm).
- when I meet the hardware unit hang, the Tx completion interrupt didn't
dissapper but it do reduced a lot.
- I didn't enable CONFIG_IPIPE_DEBUG_CONTEXT, but I do uses lock in several
places.
- then the TX path
   - the interrupt registertion code was shown below:
err = rtdm_irq_request(>irq_handle,
  adapter->pdev->irq, e1000_intr_msi, 0,
  netdev->name, adapter);
   - I write a new index of ring buffer to TDT register to notify the
hardware there is an packet should be sent.
writel(tx_ring->next_to_use, tx_ring->tail);//after writel, the
interrupt routine shall be launched.
   - If the 'event flow' means the event during the transmit process, the
event flow is shown below:
e1000e_xmit_framesend an packet atomicly
 e1000_tx_map   use DMA to map the
packet(maped before,so just get the DMA address)
 e1000_tx_queuemake sure the tx ring buffer
index right
 write the TDT register to tell hardware to send an packet
after the hardware sent an packet, it supposed to trigger an TX completion
interrupt and the driver shall response:
e1000_intr_msitriggered when Tx/Rx
completion
 e1000_clean_tx_irqrecycle the transmit resource

By the way, I found that every time master station sent an Ready frame
belongs to RTcfg, this Tx hung shows up. And the comunication
 before that works fine: the TDMA sync frame send properly and every stage
before the Ready frame goes well.
If I let it stay in the RTCFG_MAIN_CLIENT_2 stage(so far the master and
slave known each other), master and slave could comunicate
properly. So the Ready frame triggers this problem, but why? An frame of
specific format triggers the hardware hung, why it happens?

Jan Kiszka  于2018年11月23日周五 下午7:59写道：

> On 21.11.18 02:36, Means Lee via Xenomai wrote:
> > Sure thing. As I ported e1000e-rt driver from mainline kernel e1000e
> > driver, which
> > the commit id is 089d7720383d7bc9ca6b8824a05dfa66f80d1f41, the patch
> file is
> >   kind of huge so I attach them here in this mail.
> > diff-with-nrt.diff is the diff file of mainline driver e1000e and my
> ported
> > driver.
> > diff-with-old-rt-e1000e.diff is the diff file of xenomai v3.0.7 driver
> and
> > my ported driver.
> >
>
> Unified diff ("diff -u"), please. I got that offlist, which is more
> readable,
> but it remains huge.
>
> So, let's analyze systematically:
>   - the upstream Linux driver you ported over works fine with your
> hardware, correct?
>   - hardware unit hand may mean that no TX completion interrupt arrived -
> can you confirm this based on /proc/xenomai/irq?
>   - did you enable CONFIG_IPIPE_DEBUG_CONTEXT? It can reveal invalid lock
> usage (common mistake when porting linux drivers over)
>   - then look into the TX path
>  - is the interrupt registered properly?
>  - is packet submission happening?
>  - is any interrupt arriving?
>  - compare event flow to vanilla Linux driver (add instrumentation to
>both)
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux
>
-- next part --
A non-text attachment was scrubbed...
Name: e1000.diff
Type: text/x-patch
Size: 2569 bytes
Desc: not available
URL: 

-- next part --
A non-text attachment was scrubbed...
Name: param.diff
Type: text/x-patch
Size: 800 bytes
Desc: not available
URL: 

-- next part --
A non-text attachment was scrubbed...
Name: netdev.diff
Type: text/x-patch
Size: 118926 bytes
Desc: not available
URL:

Re: rt_e1000e: Detected Hardware Unit Hang

2018-11-23 Thread Jan Kiszka via Xenomai


On 21.11.18 02:36, Means Lee via Xenomai wrote:

Sure thing. As I ported e1000e-rt driver from mainline kernel e1000e
driver, which
the commit id is 089d7720383d7bc9ca6b8824a05dfa66f80d1f41, the patch file is
  kind of huge so I attach them here in this mail.
diff-with-nrt.diff is the diff file of mainline driver e1000e and my ported
driver.
diff-with-old-rt-e1000e.diff is the diff file of xenomai v3.0.7 driver and
my ported driver.



Unified diff ("diff -u"), please. I got that offlist, which is more readable, 
but it remains huge.


So, let's analyze systematically:
 - the upstream Linux driver you ported over works fine with your
   hardware, correct?
 - hardware unit hand may mean that no TX completion interrupt arrived -
   can you confirm this based on /proc/xenomai/irq?
 - did you enable CONFIG_IPIPE_DEBUG_CONTEXT? It can reveal invalid lock
   usage (common mistake when porting linux drivers over)
 - then look into the TX path
- is the interrupt registered properly?
- is packet submission happening?
- is any interrupt arriving?
- compare event flow to vanilla Linux driver (add instrumentation to
  both)

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

Re: rt_e1000e: Detected Hardware Unit Hang

2018-11-20 Thread Means Lee via Xenomai

Sure thing. As I ported e1000e-rt driver from mainline kernel e1000e
driver, which
the commit id is 089d7720383d7bc9ca6b8824a05dfa66f80d1f41, the patch file is
 kind of huge so I attach them here in this mail.
diff-with-nrt.diff is the diff file of mainline driver e1000e and my ported
driver.
diff-with-old-rt-e1000e.diff is the diff file of xenomai v3.0.7 driver and
my ported driver.

Jan Kiszka  于2018年11月20日周二 下午5:42写道：

> On 19.11.18 22:57, Means Lee via Xenomai wrote:
> > I am porting newer e1000e driver to Xenomai 3.0.7. When I using my ported
> > driver to setup an
> > RTnet connection, something goes wrong. What I got is an Hardware Unit
> hang:
>
> Can you share the diff to/patch of the existing one?
>
> Jan
>
> >
> > [ 1598.783133] rt_e1000e: Detected Hardware Unit Hang:
> >   TDH  
> >   TDT  <1e>
> >   next_to_use  <1e>
> >   next_to_clean
> > buffer_info[next_to_clean]:
> >   time_stamp   <10004ea14>
> >   next_to_watch
> >   jiffies  <10004f458>
> >   next_to_watch.status <0>
> > MAC Status <40080083>
> > PHY Status <796d>
> > PHY 1000BASE-T Status  <3800>
> > PHY Extended Status<3000>
> > PCI Status <10>
> >
> > I'm sure that transmit and receive function works fine for I can send
> TDMA
> > sync frame normally.
> > But this message shown during the end of RTcfg ready stage. Is anybody
> has
> > any clue about how to fix this bug?
> >
>
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux
>
-- next part --
A non-text attachment was scrubbed...
Name: diff-with-old-rt-e1000e.diff
Type: text/x-patch
Size: 410032 bytes
Desc: not available
URL: 
<http://xenomai.org/pipermail/xenomai/attachments/20181121/0c3b022a/attachment.bin>
-- next part --
A non-text attachment was scrubbed...
Name: diff-with-nrt.diff
Type: text/x-patch
Size: 98297 bytes
Desc: not available
URL: 
<http://xenomai.org/pipermail/xenomai/attachments/20181121/0c3b022a/attachment-0001.bin>

Re: rt_e1000e: Detected Hardware Unit Hang

2018-11-20 Thread Jan Kiszka via Xenomai


On 19.11.18 22:57, Means Lee via Xenomai wrote:

I am porting newer e1000e driver to Xenomai 3.0.7. When I using my ported
driver to setup an
RTnet connection, something goes wrong. What I got is an Hardware Unit hang:


Can you share the diff to/patch of the existing one?

Jan



[ 1598.783133] rt_e1000e: Detected Hardware Unit Hang:
  TDH  
  TDT  <1e>
  next_to_use  <1e>
  next_to_clean
buffer_info[next_to_clean]:
  time_stamp   <10004ea14>
  next_to_watch
  jiffies  <10004f458>
  next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status<3000>
PCI Status <10>

I'm sure that transmit and receive function works fine for I can send TDMA
sync frame normally.
But this message shown during the end of RTcfg ready stage. Is anybody has
any clue about how to fix this bug?



--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

rt_e1000e: Detected Hardware Unit Hang

2018-11-19 Thread Means Lee via Xenomai

I am porting newer e1000e driver to Xenomai 3.0.7. When I using my ported
driver to setup an
RTnet connection, something goes wrong. What I got is an Hardware Unit hang:

[ 1598.783133] rt_e1000e: Detected Hardware Unit Hang:
 TDH  
 TDT  <1e>
 next_to_use  <1e>
 next_to_clean
   buffer_info[next_to_clean]:
 time_stamp   <10004ea14>
 next_to_watch
 jiffies  <10004f458>
 next_to_watch.status <0>
   MAC Status <40080083>
   PHY Status <796d>
   PHY 1000BASE-T Status  <3800>
   PHY Extended Status<3000>
   PCI Status <10>

I'm sure that transmit and receive function works fine for I can send TDMA
sync frame normally.
But this message shown during the end of RTcfg ready stage. Is anybody has
any clue about how to fix this bug?

Re: rt_e1000e: Detected Hardware Unit Hang

Re: rt_e1000e: Detected Hardware Unit Hang

Re: rt_e1000e: Detected Hardware Unit Hang

Re: rt_e1000e: Detected Hardware Unit Hang

Re: rt_e1000e: Detected Hardware Unit Hang

rt_e1000e: Detected Hardware Unit Hang

6 matches

Site Navigation

Mail list logo

Footer information