Re: am335x: cpsw: interrupt failure
Hi, On Mon, Dec 29, 2014 at 11:13:55AM -0600, Felipe Balbi wrote: U-Boot version: 2014.07 Kernel config is omap2plus with enabled USB # cat /proc/version Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even blacklisted. Can you try with 4.9.x just to make sure ? Will do. Adding linux-omap. Beginning of this discussion: http://comments.gmane.org/gmane.linux.network/341427 Quick summary: starting with kernel 3.18 or commit 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some custom boards) stalls at high network load. Reproducible via nuttcp within some minutes nuttcp -S (on BBB) nuttcp -t -N 4 -T30m 192.168.1.235 (on host) As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains, but both show the same behavior. Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2 (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29 CET 2014 Let me know, if you can reproduce this issue. finally managed to reproduce this, it took quite a bit of effort though. I'll see if I can gether more information about the problem. Maybe check if the irqnr is 127 (or the last reserved interrupt) in irq-omap-intc.c. If so, also print out the previous interrupt. It seems the intc uses the last reserved interrupt to signal a spurious interrupt for the previous irqnr, so we should probably add some handling for that. If the previous interrupt is a cpsw interrupt, then there's probably something wrong with cpsw interrupt handling. Either a missing read-back to flush posted write in the cpsw interrupt handler, or the EOI registers are written at a wrong time. yeah, I'll go over it, but I first need to reproduce it again. Just rebooted to try again and after half an hour, couldn't reproduce it anymore. Interesting race to end the year :-) alright, managed to reproduce multiple and I'm pretty confident I've found the bug. Right now I'm testing with AM437x and AM335x to make sure it's really working. If it's still running until tomorrow I'll send a preliminary patch but I want to leave this running for quite a few days before calling it fixed. -- balbi signature.asc Description: Digital signature
Re: am335x: cpsw: interrupt failure
On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov yegorsli...@googlemail.com wrote: On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote: Hi, On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote: U-Boot version: 2014.07 Kernel config is omap2plus with enabled USB # cat /proc/version Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even blacklisted. Can you try with 4.9.x just to make sure ? Will do. Adding linux-omap. Beginning of this discussion: http://comments.gmane.org/gmane.linux.network/341427 Quick summary: starting with kernel 3.18 or commit 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some custom boards) stalls at high network load. Reproducible via nuttcp within some minutes nuttcp -S (on BBB) nuttcp -t -N 4 -T30m 192.168.1.235 (on host) As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains, but both show the same behavior. Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2 (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29 CET 2014 Let me know, if you can reproduce this issue. Thanks. Yegor -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: am335x: cpsw: interrupt failure
On 12/29/2014 04:33 AM, Yegor Yefremov wrote: On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov yegorsli...@googlemail.com wrote: On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote: Hi, On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote: U-Boot version: 2014.07 Kernel config is omap2plus with enabled USB # cat /proc/version Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even blacklisted. Can you try with 4.9.x just to make sure ? Will do. Adding linux-omap. Beginning of this discussion: http://comments.gmane.org/gmane.linux.network/341427 Quick summary: starting with kernel 3.18 or commit 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some custom boards) stalls at high network load. Reproducible via nuttcp within some minutes nuttcp -S (on BBB) nuttcp -t -N 4 -T30m 192.168.1.235 (on host) As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains, but both show the same behavior. Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2 (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29 CET 2014 Let me know, if you can reproduce this issue. I have seen the irq 0 error messages on the black since 3.18+, but didn't bisect it yet. For me, these errors occurred with a slightly misconfigured emacs24-nox, which drove the cpu load way up - over 50% - with just cursor movement (it still gets above 20% which seems unacceptably high). I'm not sure if all the crashes were over ssh; I hadn't considered the cpsw relevant until reading this. I'll retest over the serial console. I have seen abrupt resets without messages on earlier kernels so perhaps the commit is not the root cause. Regards, Peter Hurley -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: am335x: cpsw: interrupt failure
On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote: On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov yegorsli...@googlemail.com wrote: On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote: Hi, On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote: U-Boot version: 2014.07 Kernel config is omap2plus with enabled USB # cat /proc/version Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even blacklisted. Can you try with 4.9.x just to make sure ? Will do. Adding linux-omap. Beginning of this discussion: http://comments.gmane.org/gmane.linux.network/341427 Quick summary: starting with kernel 3.18 or commit 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some custom boards) stalls at high network load. Reproducible via nuttcp within some minutes nuttcp -S (on BBB) nuttcp -t -N 4 -T30m 192.168.1.235 (on host) As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains, but both show the same behavior. Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2 (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29 CET 2014 Let me know, if you can reproduce this issue. finally managed to reproduce this, it took quite a bit of effort though. I'll see if I can gether more information about the problem. -- balbi signature.asc Description: Digital signature
Re: am335x: cpsw: interrupt failure
* Felipe Balbi ba...@ti.com [141229 07:53]: On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote: On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov yegorsli...@googlemail.com wrote: On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote: Hi, On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote: U-Boot version: 2014.07 Kernel config is omap2plus with enabled USB # cat /proc/version Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even blacklisted. Can you try with 4.9.x just to make sure ? Will do. Adding linux-omap. Beginning of this discussion: http://comments.gmane.org/gmane.linux.network/341427 Quick summary: starting with kernel 3.18 or commit 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some custom boards) stalls at high network load. Reproducible via nuttcp within some minutes nuttcp -S (on BBB) nuttcp -t -N 4 -T30m 192.168.1.235 (on host) As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains, but both show the same behavior. Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2 (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29 CET 2014 Let me know, if you can reproduce this issue. finally managed to reproduce this, it took quite a bit of effort though. I'll see if I can gether more information about the problem. Maybe check if the irqnr is 127 (or the last reserved interrupt) in irq-omap-intc.c. If so, also print out the previous interrupt. It seems the intc uses the last reserved interrupt to signal a spurious interrupt for the previous irqnr, so we should probably add some handling for that. If the previous interrupt is a cpsw interrupt, then there's probably something wrong with cpsw interrupt handling. Either a missing read-back to flush posted write in the cpsw interrupt handler, or the EOI registers are written at a wrong time. Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: am335x: cpsw: interrupt failure
On Mon, Dec 29, 2014 at 08:51:04AM -0800, Tony Lindgren wrote: * Felipe Balbi ba...@ti.com [141229 07:53]: On Mon, Dec 29, 2014 at 10:33:26AM +0100, Yegor Yefremov wrote: On Fri, Dec 12, 2014 at 8:19 PM, Yegor Yefremov yegorsli...@googlemail.com wrote: On Fri, Dec 12, 2014 at 6:32 PM, Felipe Balbi ba...@ti.com wrote: Hi, On Fri, Dec 12, 2014 at 01:00:51PM +0100, Yegor Yefremov wrote: U-Boot version: 2014.07 Kernel config is omap2plus with enabled USB # cat /proc/version Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Wasn't GCC 4.8.x total crap for building ARM kernels ? IIRC it was even blacklisted. Can you try with 4.9.x just to make sure ? Will do. Adding linux-omap. Beginning of this discussion: http://comments.gmane.org/gmane.linux.network/341427 Quick summary: starting with kernel 3.18 or commit 55601c9f24670ba926ebdd4d712ac3b177232330 am335x (at least BBB and some custom boards) stalls at high network load. Reproducible via nuttcp within some minutes nuttcp -S (on BBB) nuttcp -t -N 4 -T30m 192.168.1.235 (on host) As Felipe Balbi suggested, I tried both 4.8.3 and 4.9.2 toolchains, but both show the same behavior. Linux version 3.18.0 (user@user-VirtualBox) (gcc version 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29) ) #6 SMP Mon Dec 8 22:47:43 CET 2014 Linux version 3.18.1 (user@user-VirtualBox) (gcc version 4.9.2 (Buildroot 2015.02-git-00582-g10b9761) ) #1 SMP Mon Dec 29 09:22:29 CET 2014 Let me know, if you can reproduce this issue. finally managed to reproduce this, it took quite a bit of effort though. I'll see if I can gether more information about the problem. Maybe check if the irqnr is 127 (or the last reserved interrupt) in irq-omap-intc.c. If so, also print out the previous interrupt. It seems the intc uses the last reserved interrupt to signal a spurious interrupt for the previous irqnr, so we should probably add some handling for that. If the previous interrupt is a cpsw interrupt, then there's probably something wrong with cpsw interrupt handling. Either a missing read-back to flush posted write in the cpsw interrupt handler, or the EOI registers are written at a wrong time. yeah, I'll go over it, but I first need to reproduce it again. Just rebooted to try again and after half an hour, couldn't reproduce it anymore. Interesting race to end the year :-) cheers -- balbi signature.asc Description: Digital signature