Re: am335x: cpsw: interrupt failure

2014-12-30 Thread Felipe Balbi
Hi,

On Mon, Dec 29, 2014 at 11:13:55AM -0600, Felipe Balbi wrote:
 U-Boot version: 2014.07
 Kernel config is omap2plus with enabled USB

 # cat /proc/version
 Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
 Mon Dec 8 22:47:43 CET 2014

 Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was 
 even
 blacklisted. Can you try with 4.9.x just to make sure ?

 Will do.

Adding linux-omap. Beginning of this discussion:
http://comments.gmane.org/gmane.linux.network/341427

Quick summary: starting with kernel 3.18 or commit
55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
custom boards) stalls at high network load. Reproducible via nuttcp
within some minutes

nuttcp -S (on BBB)
nuttcp -t -N 4 -T30m 192.168.1.235 (on host)

As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
but both show the same behavior.

Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
Mon Dec 8 22:47:43 CET 2014
Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
(Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
CET 2014

Let me know, if you can reproduce this issue.
   
   finally managed to reproduce this, it took quite a bit of effort though.
   I'll see if I can gether more information about the problem.
  
  Maybe check if the irqnr is 127 (or the last reserved interrupt)
  in irq-omap-intc.c. If so, also print out the previous interrupt.
  It seems the intc uses the last reserved interrupt to signal a
  spurious interrupt for the previous irqnr, so we should probably
  add some handling for that.
  
  If the previous interrupt is a cpsw interrupt, then there's probably
  something wrong with cpsw interrupt handling. Either a missing
  read-back to flush posted write in the cpsw interrupt handler,
  or the EOI registers are written at a wrong time.
 
 yeah, I'll go over it, but I first need to reproduce it again. Just
 rebooted to try again and after half an hour, couldn't reproduce it
 anymore. Interesting race to end the year :-)

alright, managed to reproduce multiple and I'm pretty confident I've
found the bug. Right now I'm testing with AM437x and AM335x to make sure
it's really working. If it's still running until tomorrow I'll send a
preliminary patch but I want to leave this running for quite a few days
before calling it fixed.

-- 
balbi


signature.asc
Description: Digital signature


Re: am335x: cpsw: interrupt failure

2014-12-29 Thread Yegor Yefremov
On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
yegorsli...@googlemail.com wrote:
 On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote:
 Hi,

 On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
 U-Boot version: 2014.07
 Kernel config is omap2plus with enabled USB

 # cat /proc/version
 Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
 Mon Dec 8 22:47:43 CET 2014

 Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
 blacklisted. Can you try with 4.9.x just to make sure ?

 Will do.

Adding linux-omap. Beginning of this discussion:
http://comments.gmane.org/gmane.linux.network/341427

Quick summary: starting with kernel 3.18 or commit
55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
custom boards) stalls at high network load. Reproducible via nuttcp
within some minutes

nuttcp -S (on BBB)
nuttcp -t -N 4 -T30m 192.168.1.235 (on host)

As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
but both show the same behavior.

Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
Mon Dec 8 22:47:43 CET 2014
Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
(Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
CET 2014

Let me know, if you can reproduce this issue.

Thanks.

Yegor
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: am335x: cpsw: interrupt failure

2014-12-29 Thread Peter Hurley
On 12/29/2014 04:33 AM, Yegor Yefremov wrote:
 On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
 yegorsli...@googlemail.com wrote:
 On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote:
 Hi,

 On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
 U-Boot version: 2014.07
 Kernel config is omap2plus with enabled USB

 # cat /proc/version
 Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
 Mon Dec 8 22:47:43 CET 2014

 Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
 blacklisted. Can you try with 4.9.x just to make sure ?

 Will do.
 
 Adding linux-omap. Beginning of this discussion:
 http://comments.gmane.org/gmane.linux.network/341427
 
 Quick summary: starting with kernel 3.18 or commit
 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
 custom boards) stalls at high network load. Reproducible via nuttcp
 within some minutes
 
 nuttcp -S (on BBB)
 nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
 
 As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
 but both show the same behavior.
 
 Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
 Mon Dec 8 22:47:43 CET 2014
 Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
 (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
 CET 2014
 
 Let me know, if you can reproduce this issue.

I have seen the irq 0 error messages on the black since 3.18+, but didn't
bisect it yet. For me, these errors occurred with a slightly misconfigured
emacs24-nox, which drove the cpu load way up - over 50% - with just
cursor movement (it still gets above 20% which seems unacceptably high).

I'm not sure if all the crashes were over ssh; I hadn't considered
the cpsw relevant until reading this. I'll retest over the serial console.

I have seen abrupt resets without messages on earlier kernels so perhaps
the commit is not the root cause.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: am335x: cpsw: interrupt failure

2014-12-29 Thread Felipe Balbi
On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote:
 On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
 yegorsli...@googlemail.com wrote:
  On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote:
  Hi,
 
  On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
  U-Boot version: 2014.07
  Kernel config is omap2plus with enabled USB
 
  # cat /proc/version
  Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
  20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
  Mon Dec 8 22:47:43 CET 2014
 
  Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
  blacklisted. Can you try with 4.9.x just to make sure ?
 
  Will do.
 
 Adding linux-omap. Beginning of this discussion:
 http://comments.gmane.org/gmane.linux.network/341427
 
 Quick summary: starting with kernel 3.18 or commit
 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
 custom boards) stalls at high network load. Reproducible via nuttcp
 within some minutes
 
 nuttcp -S (on BBB)
 nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
 
 As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
 but both show the same behavior.
 
 Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
 Mon Dec 8 22:47:43 CET 2014
 Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
 (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
 CET 2014
 
 Let me know, if you can reproduce this issue.

finally managed to reproduce this, it took quite a bit of effort though.
I'll see if I can gether more information about the problem.

-- 
balbi


signature.asc
Description: Digital signature


Re: am335x: cpsw: interrupt failure

2014-12-29 Thread Tony Lindgren
* Felipe Balbi ba...@ti.com [141229 07:53]:
 On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote:
  On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
  yegorsli...@googlemail.com wrote:
   On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote:
   Hi,
  
   On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
   U-Boot version: 2014.07
   Kernel config is omap2plus with enabled USB
  
   # cat /proc/version
   Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
   20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
   Mon Dec 8 22:47:43 CET 2014
  
   Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
   blacklisted. Can you try with 4.9.x just to make sure ?
  
   Will do.
  
  Adding linux-omap. Beginning of this discussion:
  http://comments.gmane.org/gmane.linux.network/341427
  
  Quick summary: starting with kernel 3.18 or commit
  55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
  custom boards) stalls at high network load. Reproducible via nuttcp
  within some minutes
  
  nuttcp -S (on BBB)
  nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
  
  As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
  but both show the same behavior.
  
  Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
  20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
  Mon Dec 8 22:47:43 CET 2014
  Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
  (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
  CET 2014
  
  Let me know, if you can reproduce this issue.
 
 finally managed to reproduce this, it took quite a bit of effort though.
 I'll see if I can gether more information about the problem.

Maybe check if the irqnr is 127 (or the last reserved interrupt)
in irq-omap-intc.c. If so, also print out the previous interrupt.
It seems the intc uses the last reserved interrupt to signal a
spurious interrupt for the previous irqnr, so we should probably
add some handling for that.

If the previous interrupt is a cpsw interrupt, then there's probably
something wrong with cpsw interrupt handling. Either a missing
read-back to flush posted write in the cpsw interrupt handler,
or the EOI registers are written at a wrong time.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: am335x: cpsw: interrupt failure

2014-12-29 Thread Felipe Balbi
On Mon, Dec 29, 2014 at 08:51:04AM -0800, Tony Lindgren wrote:
 * Felipe Balbi ba...@ti.com [141229 07:53]:
  On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote:
   On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov
   yegorsli...@googlemail.com wrote:
On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote:
Hi,
   
On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote:
U-Boot version: 2014.07
Kernel config is omap2plus with enabled USB
   
# cat /proc/version
Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
Mon Dec 8 22:47:43 CET 2014
   
Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even
blacklisted. Can you try with 4.9.x just to make sure ?
   
Will do.
   
   Adding linux-omap. Beginning of this discussion:
   http://comments.gmane.org/gmane.linux.network/341427
   
   Quick summary: starting with kernel 3.18 or commit
   55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some
   custom boards) stalls at high network load. Reproducible via nuttcp
   within some minutes
   
   nuttcp -S (on BBB)
   nuttcp -t -N 4 -T30m 192.168.1.235 (on host)
   
   As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains,
   but both show the same behavior.
   
   Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3
   20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP
   Mon Dec 8 22:47:43 CET 2014
   Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2
   (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29
   CET 2014
   
   Let me know, if you can reproduce this issue.
  
  finally managed to reproduce this, it took quite a bit of effort though.
  I'll see if I can gether more information about the problem.
 
 Maybe check if the irqnr is 127 (or the last reserved interrupt)
 in irq-omap-intc.c. If so, also print out the previous interrupt.
 It seems the intc uses the last reserved interrupt to signal a
 spurious interrupt for the previous irqnr, so we should probably
 add some handling for that.
 
 If the previous interrupt is a cpsw interrupt, then there's probably
 something wrong with cpsw interrupt handling. Either a missing
 read-back to flush posted write in the cpsw interrupt handler,
 or the EOI registers are written at a wrong time.

yeah, I'll go over it, but I first need to reproduce it again. Just
rebooted to try again and after half an hour, couldn't reproduce it
anymore. Interesting race to end the year :-)

cheers

-- 
balbi


signature.asc
Description: Digital signature