Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
15.07.2014 1:42, Jonas Gorski: [...] or bw32(bp, B44_RXMAXLEN, bp-dev-mtu + ETH_HLEN + 8) ? This is the right one; mtu (the payload) + ETH_HLEN (14 bytes) + 8 (4 bytes for vlan tag, probably 4 extra bytes for custom header optionally used by broadcom switches) Ok, tested this. Unfortunately it's still panicing under load (and seemingly this change made no difference whatsoever): [ 271.21] [ cut here ] [ 271.22] WARNING: at net/core/dev.c:2194 skb_warn_bad_offload+0xc0/0xe8() [ 271.22] b44: caps=(0x4000, 0x) len=377 data_len=0 gso_size=57048 gso_type=32506 ip_summed=0 [ 271.24] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 271.30] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.10.44 #2 [ 271.30] Stack : 8030d552 0036 818201d0 0008 80272688 802bf23b 0003 8030cd00 818201d0 0008 802bb6e4 802bb6dc 8001c204 0003 80019bc4 80299520 0008 80273f28 8182bc5c 8182bbe8 ... [ 271.34] Call Trace: [ 271.34] [80010ca0] show_stack+0x48/0x70 [ 271.35] [80019cc0] warn_slowpath_common+0x78/0xa8 [ 271.35] [80019d1c] warn_slowpath_fmt+0x2c/0x38 [ 271.36] [801b2d10] skb_warn_bad_offload+0xc0/0xe8 [ 271.36] [801b68c4] __skb_gso_segment+0x50/0xec [ 271.37] [801de5bc] ip_forward_finish+0x108/0x1bc [ 271.37] [801b3da0] __netif_receive_skb_core+0x46c/0x52c [ 271.38] [81ad41d4] 0x81ad41d4 [ 271.38] [ 271.38] ---[ end trace b4f0aa7175b12bf7 ]--- [ 271.39] Unhandled kernel unaligned access[#1]: [ 271.39] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: GW 3.10.44 #2 [ 271.39] task: 81820028 ti: 8182a000 task.ti: 8182a000 [ 271.39] $ 0 : 0001 81696a48 0028 [ 271.39] $ 4 : 2d37d9ee 7088 [ 271.39] $ 8 : 002d 35373137 62323162 5d203766 [ 271.39] $12 : 03bf bc00 [ 271.39] $16 : 80e7fec0 0001 0001 0014 [ 271.39] $20 : 0008 802bb6e4 [ 271.39] $24 : 0003 80150bcc [ 271.39] $28 : 8182a000 8182bd28 802bb6dc 801ab22c [ 271.39] Hi: [ 271.39] Lo: 0083 [ 271.39] epc : 80064440 put_page+0x0/0x4c [ 271.39] Tainted: GW [ 271.39] ra: 801ab22c skb_release_data+0xc4/0x118 [ 271.39] Status: 1000b803 KERNEL EXL IE [ 271.39] Cause : 00800010 [ 271.39] BadVA : 2d37d9ee [ 271.39] PrId : 00029006 (Broadcom BMIPS3300) [ 271.39] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 271.39] Process ksoftirqd/0 (pid: 3, threadinfo=8182a000, task=81820028, tls=) [ 271.39] Stack : 80e7fec0 801ab294 80e7fec0 7088 80e7fec0 ffea 801ab2d0 802bb6e4 80e7fec0 80e5da40 0001 80e7fec0 801de5d4 0850 80f72ac0 81b68000 801de4b4 0001 801aa3e4 802bca98 802bca98 802bb6d0 81abc000 80e7fec0 801b3da0 0042 81ad0964 81b7df20 801aa3e4 802bb6e4 8018e658 010a 01f1 81abc3e8 81abc3c0 0042 80e7fec0 0017 0187 ... [ 271.39] Call Trace: [ 271.39] [80064440] put_page+0x0/0x4c [ 271.39] [801ab22c] skb_release_data+0xc4/0x118 [ 271.39] [801ab2d0] __kfree_skb+0x14/0xd4 [ 271.39] [801de5d4] ip_forward_finish+0x120/0x1bc [ 271.39] [801b3da0] __netif_receive_skb_core+0x46c/0x52c [ 271.39] [81ad41d4] 0x81ad41d4 [ 271.39] [ 271.39] Code: 3c058006 080190c9 24a538e4 8c82
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
15.07.2014 12:04, Nikolai Zhubr: 15.07.2014 1:42, Jonas Gorski: [...] or bw32(bp, B44_RXMAXLEN, bp-dev-mtu + ETH_HLEN + 8) ? This is the right one; mtu (the payload) + ETH_HLEN (14 bytes) + 8 (4 bytes for vlan tag, probably 4 extra bytes for custom header optionally used by broadcom switches) Ok, tested this. Unfortunately it's still panicing under load (and seemingly this change made no difference whatsoever): And I've performed yet another experiment. If I insert an additional router (running also openwrt but atheros-based) between this WL-500W and uplink (with the idea to filter out any strange and bogus incoming packets) and redo the same test, I get no panic but instead a silent spontaneous reboot in a few minutes after reaching 30mbit traffic. I'll retest this more carefully later, and meanwhile I think: 1. Apparently some (bogus?) packets ocasionally coming from uplink still confuse b44 driver and cause panics regardless of my B44_RXMAXLEN correction. 2. Silent reboot might probably indicate hardware problem like overheating. Although I have its case open and I touched its chips, well, they were acceptably warm I think. Another point is that CPU performance limits routing capability of this device (when using openwrt at least) somewhere around 33mbit, so getting close to continuous 100% CPU usage might probably lead to watchdog trigger? (Just a random speculation) Thank you. Nikolai [ 271.21] [ cut here ] [ 271.22] WARNING: at net/core/dev.c:2194 skb_warn_bad_offload+0xc0/0xe8() [ 271.22] b44: caps=(0x4000, 0x) len=377 data_len=0 gso_size=57048 gso_type=32506 ip_summed=0 [ 271.24] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 271.30] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.10.44 #2 [ 271.30] Stack : 8030d552 0036 818201d0 0008 80272688 802bf23b 0003 8030cd00 818201d0 0008 802bb6e4 802bb6dc 8001c204 0003 80019bc4 80299520 0008 80273f28 8182bc5c 8182bbe8 ... [ 271.34] Call Trace: [ 271.34] [80010ca0] show_stack+0x48/0x70 [ 271.35] [80019cc0] warn_slowpath_common+0x78/0xa8 [ 271.35] [80019d1c] warn_slowpath_fmt+0x2c/0x38 [ 271.36] [801b2d10] skb_warn_bad_offload+0xc0/0xe8 [ 271.36] [801b68c4] __skb_gso_segment+0x50/0xec [ 271.37] [801de5bc] ip_forward_finish+0x108/0x1bc [ 271.37] [801b3da0] __netif_receive_skb_core+0x46c/0x52c [ 271.38] [81ad41d4] 0x81ad41d4 [ 271.38] [ 271.38] ---[ end trace b4f0aa7175b12bf7 ]--- [ 271.39] Unhandled kernel unaligned access[#1]: [ 271.39] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G W 3.10.44 #2 [ 271.39] task: 81820028 ti: 8182a000 task.ti: 8182a000 [ 271.39] $ 0 : 0001 81696a48 0028 [ 271.39] $ 4 : 2d37d9ee 7088 [ 271.39] $ 8 : 002d 35373137 62323162 5d203766 [ 271.39] $12 : 03bf bc00 [ 271.39] $16 : 80e7fec0 0001 0001 0014 [ 271.39] $20 : 0008 802bb6e4 [ 271.39] $24 : 0003 80150bcc [ 271.39] $28 : 8182a000 8182bd28 802bb6dc 801ab22c [ 271.39] Hi : [ 271.39] Lo : 0083 [ 271.39] epc : 80064440 put_page+0x0/0x4c [ 271.39] Tainted: G W [ 271.39] ra : 801ab22c skb_release_data+0xc4/0x118 [ 271.39] Status: 1000b803 KERNEL EXL IE [ 271.39] Cause : 00800010 [ 271.39] BadVA : 2d37d9ee [ 271.39] PrId : 00029006 (Broadcom BMIPS3300) [ 271.39] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 271.39] Process ksoftirqd/0 (pid: 3,
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
15.07.2014 23:26, Nikolai Zhubr: [...] And I've performed yet another experiment. If I insert an additional router (running also openwrt but atheros-based) between this WL-500W and uplink (with the idea to filter out any strange and bogus incoming packets) and redo the same test, I get no panic but instead a silent spontaneous reboot in a few minutes after reaching 30mbit traffic. I'll Here is a slightly different panic, although also involving netif_receive_skb_core (And this is still with additional openwrt router inserted before uplink): [ 900.72] CPU 0 Unable to handle kernel paging request at virtual address 0004, epc == 80119aa0, ra == 8011b2e8 [ 900.72] Oops[#1]: [ 900.72] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.10.44 #2 [ 900.72] task: 81820028 ti: 8182a000 task.ti: 8182a000 [ 900.72] $ 0 : 10003800 80f29a48 [ 900.72] $ 4 : 802be1a0 802bdc1c fffc [ 900.72] $ 8 : 0384 2b82ea80 00989680 [ 900.72] $12 : 0384 3c87 [ 900.72] $16 : 802be1a0 802bdc1c 802bdc40 7fff [ 900.72] $20 : 0384 2aea8de9 [ 900.72] $24 : 80016dc0 [ 900.72] $28 : 8182a000 8182bb50 0384 8011b2e8 [ 900.72] Hi: [ 900.72] Lo: 3c87 [ 900.72] epc : 80119aa0 rb_insert_color+0x2c/0x14c [ 900.72] Not tainted [ 900.72] ra: 8011b2e8 timerqueue_add+0xc0/0x118 [ 900.72] Status: 10003802 KERNEL EXL [ 900.72] Cause : 0088 [ 900.72] BadVA : 0004 [ 900.72] PrId : 00029006 (Broadcom BMIPS3300) [ 900.72] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 900.72] Process ksoftirqd/0 (pid: 3, threadinfo=8182a000, task=81820028, tls=) [ 900.72] Stack : 802bdc40 7fff 0384 802be1a0 802bdc10 802bdc40 8003a144 802be1a0 802c 802c 802c 802bdbe0 2aea8de9 8003a9c8 8182bc08 80c52220 80eb93a0 0001 0001 2aea8de9 0384 0003 2aea8de9 0384 80f122e4 0007 802c2870 80a169b5 802c 802765f4 80276608 80012d00 0001 00014600 ... [ 900.72] Call Trace: [ 900.72] [80119aa0] rb_insert_color+0x2c/0x14c [ 900.72] [8011b2e8] timerqueue_add+0xc0/0x118 [ 900.72] [8003a144] __run_hrtimer.isra.26+0x7c/0xf8 [ 900.72] [8003a9c8] hrtimer_interrupt+0x14c/0x3f4 [ 900.72] [80012d00] c0_compare_interrupt+0x74/0xa0 [ 900.72] [8005335c] handle_irq_event_percpu+0x64/0x1ec [ 900.72] [80055e60] handle_percpu_irq+0x54/0x84 [ 900.72] [80052ce0] generic_handle_irq+0x28/0x44 [ 900.72] [8000e24c] do_IRQ+0x1c/0x2c [ 900.72] [8000a3ec] plat_irq_dispatch+0x40/0xb8 [ 900.72] [80001448] ret_from_irq+0x0/0x4 [ 900.72] [80005590] __copy_user_common+0x248/0x2d8 [ 900.72] [801a8830] skb_copy_ubufs+0xec/0x204 [ 900.72] [801b3db0] __netif_receive_skb_core+0x47c/0x52c [ 900.72] [81ad41d4] 0x81ad41d4 [ 900.72] [ 900.72] Code: 30660001 14c00047 8c660004 10460016 10c5 8cc8 [ 900.72] ---[ end trace de6e4d131b0441ac ]--- [ 900.72] Kernel panic - not syncing: Fatal exception in interrupt retest this more carefully later, and meanwhile I think: 1. Apparently some (bogus?) packets ocasionally coming from uplink still confuse b44 driver and cause panics regardless of my B44_RXMAXLEN correction. 2. Silent reboot might probably indicate hardware problem like overheating. Although I have its case open and I touched its chips, well, they were acceptably warm I think. Another point is that CPU performance limits routing capability of this device (when using openwrt at least) somewhere around 33mbit, so getting close to continuous 100% CPU usage might probably lead to watchdog trigger? (Just a random speculation) Thank you. Nikolai [ 271.21] [ cut here ] [ 271.22] WARNING: at net/core/dev.c:2194 skb_warn_bad_offload+0xc0/0xe8() [ 271.22] b44: caps=(0x4000, 0x) len=377 data_len=0 gso_size=57048 gso_type=32506 ip_summed=0 [ 271.24] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
On 21 June 2014 18:36, Nikolai Zhubr n-a-zh...@yandex.ru wrote: [ 637.43] [ cut here ] [ 637.44] WARNING: at net/core/dev.c:2194 skb_warn_bad_offload+0xc0/0xe8() [ 637.45] b44: caps=(0x4000, 0x) len=1500 data_len=0 gso_size=53118 gso_type=59551 ip_summed=0 [ 637.46] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 637.52] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.10.36 #1 [ 637.52] Stack : 8030d552 0036 818201d0 0008 8026cfd0 802bb23b 0003 8030cd00 818201d0 0008 802b76e4 802b76dc 8001c118 0003 80019ad8 80293ecc 0008 8026e870 8182bc5c 8182bbe8 ... [ 637.56] Call Trace: [ 637.56] [80010bb4] show_stack+0x48/0x70 [ 637.57] [80019bd4] warn_slowpath_common+0x78/0xa8 [ 637.57] [80019c30] warn_slowpath_fmt+0x2c/0x38 [ 637.58] [801b27dc] skb_warn_bad_offload+0xc0/0xe8 [ 637.58] [801b6390] __skb_gso_segment+0x50/0xec [ 637.59] [801de0dc] ip_forward_finish+0x108/0x1bc [ 637.59] [801b386c] __netif_receive_skb_core+0x46c/0x52c [ 637.60] [81acc16c] 0x81acc16c [ 637.60] [ 637.60] ---[ end trace 2c2a6a28d6589bcc ]--- Any idea anyone? Does above mean b44 provided a corrupted packet? Or some wrong pointer? ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
14.07.2014 18:42, Rafał Miłecki: [...] [ 637.56] Call Trace: [ 637.56] [80010bb4] show_stack+0x48/0x70 [ 637.57] [80019bd4] warn_slowpath_common+0x78/0xa8 [ 637.57] [80019c30] warn_slowpath_fmt+0x2c/0x38 [ 637.58] [801b27dc] skb_warn_bad_offload+0xc0/0xe8 [ 637.58] [801b6390] __skb_gso_segment+0x50/0xec [ 637.59] [801de0dc] ip_forward_finish+0x108/0x1bc [ 637.59] [801b386c] __netif_receive_skb_core+0x46c/0x52c [ 637.60] [81acc16c] 0x81acc16c [ 637.60] [ 637.60] ---[ end trace 2c2a6a28d6589bcc ]--- Any idea anyone? Does above mean b44 provided a corrupted packet? Or some wrong pointer? Yet another note: the problem apparently appeared since after 10.03.1. Maybe I could try to bisect the revision of interest, but doing it blindly would probably require tons of time, unless someone aware of what was happening to the driver at that time gives some enlightening instructions. Thank you. Nikolai . ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
On 2014-07-14 16:42, Rafał Miłecki wrote: On 21 June 2014 18:36, Nikolai Zhubr n-a-zh...@yandex.ru wrote: [ 637.43] [ cut here ] [ 637.44] WARNING: at net/core/dev.c:2194 skb_warn_bad_offload+0xc0/0xe8() [ 637.45] b44: caps=(0x4000, 0x) len=1500 data_len=0 gso_size=53118 gso_type=59551 ip_summed=0 [ 637.46] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 637.52] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.10.36 #1 [ 637.52] Stack : 8030d552 0036 818201d0 0008 8026cfd0 802bb23b 0003 8030cd00 818201d0 0008 802b76e4 802b76dc 8001c118 0003 80019ad8 80293ecc 0008 8026e870 8182bc5c 8182bbe8 ... [ 637.56] Call Trace: [ 637.56] [80010bb4] show_stack+0x48/0x70 [ 637.57] [80019bd4] warn_slowpath_common+0x78/0xa8 [ 637.57] [80019c30] warn_slowpath_fmt+0x2c/0x38 [ 637.58] [801b27dc] skb_warn_bad_offload+0xc0/0xe8 [ 637.58] [801b6390] __skb_gso_segment+0x50/0xec [ 637.59] [801de0dc] ip_forward_finish+0x108/0x1bc [ 637.59] [801b386c] __netif_receive_skb_core+0x46c/0x52c [ 637.60] [81acc16c] 0x81acc16c [ 637.60] [ 637.60] ---[ end trace 2c2a6a28d6589bcc ]--- Any idea anyone? Does above mean b44 provided a corrupted packet? Or some wrong pointer? It looks to me like the hardware is overwriting the skb shared info (at the end of the skb data buffer), possibly because the configured maximum frame length may be too big for the buffer. If I were to speculate wildly, I would guess that B44_RXMAXLEN refers to the maximum frame length, not the maximum buffer length - and in the code, it's being fed with the maximum buffer length. This would allow the hardware to receive slightly oversized frames which can corrupt the skb. - Felix ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
On Mon, Jul 14, 2014 at 6:23 PM, Felix Fietkau n...@openwrt.org wrote: On 2014-07-14 16:42, Rafał Miłecki wrote: On 21 June 2014 18:36, Nikolai Zhubr n-a-zh...@yandex.ru wrote: [ 637.43] [ cut here ] [ 637.44] WARNING: at net/core/dev.c:2194 skb_warn_bad_offload+0xc0/0xe8() [ 637.45] b44: caps=(0x4000, 0x) len=1500 data_len=0 gso_size=53118 gso_type=59551 ip_summed=0 [ 637.46] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 637.52] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.10.36 #1 [ 637.52] Stack : 8030d552 0036 818201d0 0008 8026cfd0 802bb23b 0003 8030cd00 818201d0 0008 802b76e4 802b76dc 8001c118 0003 80019ad8 80293ecc 0008 8026e870 8182bc5c 8182bbe8 ... [ 637.56] Call Trace: [ 637.56] [80010bb4] show_stack+0x48/0x70 [ 637.57] [80019bd4] warn_slowpath_common+0x78/0xa8 [ 637.57] [80019c30] warn_slowpath_fmt+0x2c/0x38 [ 637.58] [801b27dc] skb_warn_bad_offload+0xc0/0xe8 [ 637.58] [801b6390] __skb_gso_segment+0x50/0xec [ 637.59] [801de0dc] ip_forward_finish+0x108/0x1bc [ 637.59] [801b386c] __netif_receive_skb_core+0x46c/0x52c [ 637.60] [81acc16c] 0x81acc16c [ 637.60] [ 637.60] ---[ end trace 2c2a6a28d6589bcc ]--- Any idea anyone? Does above mean b44 provided a corrupted packet? Or some wrong pointer? It looks to me like the hardware is overwriting the skb shared info (at the end of the skb data buffer), possibly because the configured maximum frame length may be too big for the buffer. If I were to speculate wildly, I would guess that B44_RXMAXLEN refers to the maximum frame length, not the maximum buffer length - and in the code, it's being fed with the maximum buffer length. This would allow the hardware to receive slightly oversized frames which can corrupt the skb. Since there is a public datasheet[1], this is easily verifiable, and it looks you are right: Receive Maximum Length Register (RcvLength, Offset 0x404): The value stored in this register specifies the largest valid Ethernet Frame to be received. The same is true for the XmtMaxLength register, which is also set too large (it defaults to 1518). Jonas [1]: https://www.broadcom.com/collateral/pg/440X-PG02-R.pdf ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
14.07.2014 20:44, Jonas Gorski: [...] If I were to speculate wildly, I would guess that B44_RXMAXLEN refers to the maximum frame length, not the maximum buffer length - and in the code, it's being fed with the maximum buffer length. This would allow the hardware to receive slightly oversized frames which can corrupt the skb. Since there is a public datasheet[1], this is easily verifiable, and it looks you are right: Receive Maximum Length Register (RcvLength, Offset 0x404): The value stored in this register specifies the largest valid Ethernet Frame to be received. Ok, so I'd suppose bw32(bp, B44_RXMAXLEN, bp-dev-mtu + ETH_HLEN + 8 + RX_HEADER_LEN) should instead be bw32(bp, B44_RXMAXLEN, bp-dev-mtu + ETH_HLEN) ? or bw32(bp, B44_RXMAXLEN, bp-dev-mtu + ETH_HLEN + 8) ? or maybe even bw32(bp, B44_RXMAXLEN, bp-dev-mtu) ? Apology for my ignorance, just can't stand testing it immediately to hopefully get it right for BB. Thank you. Nikolai The same is true for the XmtMaxLength register, which is also set too large (it defaults to 1518). Jonas [1]: https://www.broadcom.com/collateral/pg/440X-PG02-R.pdf . ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
On 14 July 2014 18:44, Jonas Gorski j...@openwrt.org wrote: On Mon, Jul 14, 2014 at 6:23 PM, Felix Fietkau n...@openwrt.org wrote: It looks to me like the hardware is overwriting the skb shared info (at the end of the skb data buffer), possibly because the configured maximum frame length may be too big for the buffer. If I were to speculate wildly, I would guess that B44_RXMAXLEN refers to the maximum frame length, not the maximum buffer length - and in the code, it's being fed with the maximum buffer length. This would allow the hardware to receive slightly oversized frames which can corrupt the skb. Since there is a public datasheet[1], this is easily verifiable, and it looks you are right: Receive Maximum Length Register (RcvLength, Offset 0x404): The value stored in this register specifies the largest valid Ethernet Frame to be received. The same is true for the XmtMaxLength register, which is also set too large (it defaults to 1518). I wonder what's the point of that register if we set length per-skb anyway (b44_alloc_rx_skb): ctrl = (DESC_CTRL_LEN RX_PKT_BUF_SZ); ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
On Mon, Jul 14, 2014 at 11:48 PM, Nikolai Zhubr n-a-zh...@yandex.ru wrote: 14.07.2014 20:44, Jonas Gorski: [...] If I were to speculate wildly, I would guess that B44_RXMAXLEN refers to the maximum frame length, not the maximum buffer length - and in the code, it's being fed with the maximum buffer length. This would allow the hardware to receive slightly oversized frames which can corrupt the skb. Since there is a public datasheet[1], this is easily verifiable, and it looks you are right: Receive Maximum Length Register (RcvLength, Offset 0x404): The value stored in this register specifies the largest valid Ethernet Frame to be received. Ok, so I'd suppose bw32(bp, B44_RXMAXLEN, bp-dev-mtu + ETH_HLEN + 8 + RX_HEADER_LEN) should instead be bw32(bp, B44_RXMAXLEN, bp-dev-mtu + ETH_HLEN) ? or bw32(bp, B44_RXMAXLEN, bp-dev-mtu + ETH_HLEN + 8) ? This is the right one; mtu (the payload) + ETH_HLEN (14 bytes) + 8 (4 bytes for vlan tag, probably 4 extra bytes for custom header optionally used by broadcom switches) or maybe even bw32(bp, B44_RXMAXLEN, bp-dev-mtu) ? Apology for my ignorance, just can't stand testing it immediately to hopefully get it right for BB. Thank you. Nikolai Thanks for testing! Jonas ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
21.06.2014 0:23, Rafał Miłecki: [...] This time uplink load was even no more than 20 Mbit. Here is what I got (although it doesn't look very promising to me). Maybe I should enable some more debugging somewhere? [ 543.432000] Unhandled kernel unaligned access[#1]: [ 543.432000] Cpu 0 [ 543.432000] $ 0 : 1000b800 2280a89f [ 543.432000] $ 4 : 8032e4b0 804e86b6 8033 [ 543.432000] $ 8 : 804e86b6 2400 [ 543.432000] $12 : 03bf ac00 0c00 [ 543.432000] $16 : 8033 8179ebe0 1000b801 0100 [ 543.432000] $20 : 0005 0024 8039 8039 [ 543.432000] $24 : 0004 [ 543.432000] $28 : 81824000 81825e18 8031e490 801e3e90 [ 543.432000] Hi: fea0 [ 543.432000] Lo: 0160 [ 543.432000] epc : 801e3e9c __dst_free+0x2c/0x150 [ 543.432000] Tainted: G O [ 543.432000] ra: 801e3e90 __dst_free+0x20/0x150 [ 543.432000] Status: 1000b803KERNEL EXL IE [ 543.432000] Cause : 00800010 [ 543.432000] BadVA : 2280a99b [ 543.432000] PrId : 00029006 (Broadcom BMIPS3300) [ 543.432000] Modules linked in: nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat pppoe xt_conntrack xt_CT xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc b43legacy(O) b43(O) mac80211(O) crc_ccitt cfg80211(O) compat(O) arc4 aes_generic crypto_algapi switch_robo(O) switch_core(O) diag(O) [ 543.432000] Process ksoftirqd/0 (pid: 3, threadinfo=81824000, task=81822060, tls=) [ 543.432000] Stack : 8031e490 80063ab8 1000b802 945ef2d5 945ef2d5 8179ebe0 80063b88 [ 543.432000] 0009 80063bc4 0005 000c 8038f088 0001 0009 80020200 [ 543.432000] 8005426c 8039 8039 [ 543.432000] 0001 8002037c [ 543.432000] 81819e24 800202e4 81819e24 800202e4 [ 543.432000] ... [ 543.432000] Call Trace: [ 543.432000] [801e3e9c] __dst_free+0x2c/0x150 [ 543.432000] [80063b88] __rcu_process_callbacks+0x118/0x140 [ 543.432000] [80020200] __do_softirq+0xd0/0x1b4 [ 543.432000] [8002037c] run_ksoftirqd+0x98/0x154 [ 543.432000] [80037678] kthread+0x88/0x90 [ 543.432000] [80007cc0] kernel_thread_helper+0x10/0x18 [ 543.432000] [ 543.432000] [ 543.432000] Code: 8e22000c 5046 3c02801e 8c4200fc 30420001 1446 24020002 3c02801e 244240cc [ 543.664000] ---[ end trace f941e3bc313ba83f ]--- [ 543.668000] Kernel panic - not syncing: Fatal exception in interrupt I suppose this is something that should not normally happen, and I'd like to have it fixed somehow. I haven't tried trunk yet, but I will, if it could make some difference. There are tons of updates in trunk, this bug can be fixed for a long time already ;) . ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
21.06.2014 0:23, Rafał Miłecki: On 20 June 2014 22:12, Nikolai Zhubrn-a-zh...@yandex.ru wrote: There are tons of updates in trunk, this bug can be fixed for a long time already ;) Unfortunately it seems no :/ Also, with trunk version, routing speed limit seems to be noticably lower (~27 Mbit trunk compared to ~34 Mbit 12.09) Serial log follows (Unaligned access and something) Please let me know what else I can do to find and fix it :) Thank you, Nikolai - BARRIER BREAKER (Bleeding Edge, r41293) - * 1/2 oz Galliano Pour all ingredients into * 4 oz cold Coffeean irish coffee mug filled * 1 1/2 oz Dark Rum with crushed ice. Stir. * 2 tsp. Creme de Cacao - root@OpenWrt:/# exit Please press Enter to activate this console. [ 637.43] [ cut here ] [ 637.44] WARNING: at net/core/dev.c:2194 skb_warn_bad_offload+0xc0/0xe8() [ 637.45] b44: caps=(0x4000, 0x) len=1500 data_len=0 gso_size=53118 gso_type=59551 ip_summed=0 [ 637.46] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 637.52] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.10.36 #1 [ 637.52] Stack : 8030d552 0036 818201d0 0008 8026cfd0 802bb23b 0003 8030cd00 818201d0 0008 802b76e4 802b76dc 8001c118 0003 80019ad8 80293ecc 0008 8026e870 8182bc5c 8182bbe8 ... [ 637.56] Call Trace: [ 637.56] [80010bb4] show_stack+0x48/0x70 [ 637.57] [80019bd4] warn_slowpath_common+0x78/0xa8 [ 637.57] [80019c30] warn_slowpath_fmt+0x2c/0x38 [ 637.58] [801b27dc] skb_warn_bad_offload+0xc0/0xe8 [ 637.58] [801b6390] __skb_gso_segment+0x50/0xec [ 637.59] [801de0dc] ip_forward_finish+0x108/0x1bc [ 637.59] [801b386c] __netif_receive_skb_core+0x46c/0x52c [ 637.60] [81acc16c] 0x81acc16c [ 637.60] [ 637.60] ---[ end trace 2c2a6a28d6589bcc ]--- [ 637.61] Unhandled kernel unaligned access[#1]: [ 637.61] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: GW 3.10.36 #1 [ 637.61] task: 81820028 ti: 8182a000 task.ti: 8182a000 [ 637.61] $ 0 : 0001 80d72b68 0028 [ 637.61] $ 4 : c36ae951 7088 [ 637.61] $ 8 : 002d 36643832 62393835 5d206363 [ 637.61] $12 : 03bf bc00 [ 637.61] $16 : 80ea6ce0 0001 0001 0014 [ 637.61] $20 : 0008 802b76e4 [ 637.61] $24 : 0003 801507e8 [ 637.61] $28 : 8182a000 8182bd28 802b76dc 801aadc0 [ 637.61] Hi: [ 637.61] Lo: 0083 [ 637.61] epc : 80064208 put_page+0x0/0x4c [ 637.61] Tainted: GW [ 637.61] ra: 801aadc0 skb_release_data+0xc4/0x118 [ 637.61] Status: 1000b803 KERNEL EXL IE [ 637.61] Cause : 00800010 [ 637.61] BadVA : c36ae951 [ 637.61] PrId : 00029006 (Broadcom BMIPS3300) [ 637.61] Modules linked in: pppoe ppp_async iptable_nat b43legacy b43 pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ip6t_REJECT ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio gpio_button_hotplug tg3 hwmon bgmac b44 ptp pps_core [ 637.61] Process ksoftirqd/0 (pid: 3, threadinfo=8182a000, task=81820028, tls=) [ 637.61] Stack : 80ea6ce0 801aae28 80ea6ce0 7088 80ea6ce0 ffea 801aae64 802b76e4 80ea6ce0 80d6f6c0 0001 80ea6ce0 801de0f4 0258 8088de40 81a9e000 801ddfd4 0001 80ea6ce0 802b8a98 802b8a98 802b76d0 81ab5000 80ea6ce0 801b386c 0183 81ac8964
[OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
Hello people, I have asus wl-500W router (http://wiki.openwrt.org/toh/asus/wl500w). It is also very similar to wl-500gp. Some few months ago I updated to 12.09. I can't recall now if it was backfire or kamikaze before, but I noticed 2 things immediately: 1. Maximum practically achievable download speed increased somewhat. (From ~30Mbit to ~34mbit approx) 2. After reaching (and keeping) this max download speed, the device will always reboot soon. (Absolutely reproducible) For some time I was thinking that's just hardware, like bad electrolytic capacitors and/or weak power supply. Finally I opened the router, replaced all 3 capacitors (2 of 3 appeared somewhat damaged indeed), attached a voltmeter to check for undervoltage. Still nothing: power supply is OK, reboots still happen. So I had to turn to the software side, and found 2 new things again: 1. While uplink load goes up approaching 34Mbit, softirqs eat up more and more CPU, approaching 100% CPU. 2. At some point I get (on a serial link): [ 368.948000] sched: RT throttling activated [ 382.688000] Unhandled kernel unaligned access[#1]: trim [ 382.932000] Kernel panic - not syncing: Fatal exception in interrupt [ 382.94] Rebooting in 3 seconds.. (Trimmed all in between because there is no debugging info for now) I suppose this is something that should not normally happen, and I'd like to have it fixed somehow. I haven't tried trunk yet, but I will, if it could make some difference. I can provide serial logs, compile trunk, apply patches, redo testing etc. (As time permits) Any hints appreciated. Thank you, Nikolai ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel
Re: [OpenWrt-Devel] AA on brcm47xx: Unhandled kernel unaligned access
On 20 June 2014 22:12, Nikolai Zhubr n-a-zh...@yandex.ru wrote: 2. At some point I get (on a serial link): [ 368.948000] sched: RT throttling activated [ 382.688000] Unhandled kernel unaligned access[#1]: trim [ 382.932000] Kernel panic - not syncing: Fatal exception in interrupt [ 382.94] Rebooting in 3 seconds.. (Trimmed all in between because there is no debugging info for now) Debugging info would be really wanted. I suppose this is something that should not normally happen, and I'd like to have it fixed somehow. I haven't tried trunk yet, but I will, if it could make some difference. There are tons of updates in trunk, this bug can be fixed for a long time already ;) ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel