Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-10-03 Thread Luka Perkov
Hi,

On Fri, Sep 30, 2011 at 01:52:53PM +0200, Florian Fainelli wrote:
 On Monday 05 September 2011 18:44:39 Michael Büsch wrote:
  On Mon, 05 Sep 2011 18:11:43 +0200
  
  Felix Fietkau n...@openwrt.org wrote:
I am still wondering how enabling preempt could possibly
workaround/hide an alignment bug. sounds strange to me. Does somebody
have an idea?

I didn't look too closely at the function yet, though.
   
   Look at BadVA : 6fbb600f - it's not an alignment bug, the address is
   completely bogus. It just happens to trip on the unhandled unaligned
   access first because of the lowest bits.
   This looks like a nasty memory corruption bug, and hiding it with
   CONFIG_PREEMPT probably eventually makes it show up somewhere else
   instead.
  
  Ok, that makes sense.
  
  So instead of enabling preempt, it would be a way better idea to enable
  various kernel memory debugging options (probably also lockdep) to track
  this down.
 
 It looks like this thread stalled here. Luka, have you been able to run a 
 kernel with lockdep enabled to see what is going one here?

I'll try to do it this week, I'll also send mail to netdev as you
suggested...

Luka
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-30 Thread Florian Fainelli
Hello,

On Sunday 04 September 2011 21:24:45 Philip Prindeville wrote:
 On 9/4/11 11:43 AM, Michael Büsch wrote:
  On Sun, 04 Sep 2011 10:11:08 -0700
  
  Philip Prindeville philipp_s...@redfish-solutions.com wrote:
  And finally, I'm not really convinced that any of the routers/APs
  that OpenWRT supports have latency requirements in the milliseconds
  range. I'd rather say throughput matters a _lot_ more than a
  millisecond of latency for these devices.
  
  If you're doing VoIP, then I'd certainly say latency matters.
  
  No it doesn't. At least not in the MILLISECONDS range.
  It does not matter at all, if your voip call has 300 or 302 ms latency.
  But it _does_ matter that there's enough throughput bandwidth to get most
  of the packages through the pipe.
 
 Who the heck has 300ms latency???
 
 pbx*CLI sip show peers
 Name/username  HostDyn
 Forcerport ACL Port Status ata_1/ata_1192.168.1.12
 D   N  5060 OK (15 ms) ata_2/ata_2
192.168.1.12 D   N  5061 OK (11
 ms) bedroom_1/bedroom_1192.168.1.5  D 
  N  5060 OK (14 ms) bedroom_2/bedroom_2192.168.1.5
  D   N  5061 OK (13 ms) bedroom_3/bedroom_3   
 192.168.1.5  D   N  5062 OK (16
 ms) cell_1/cell_1  184.72.221.84D 
  N  45983OK (211 ms) cell_2 (Unspecified) 
   D   N  0UNKNOWN guest_1 
   (Unspecified)D   N  0UNKNOWN
 guest_2(Unspecified)D   N 
 0UNKNOWN guest_3(Unspecified) 
   D   N  0UNKNOWN guest_4   
 (Unspecified)D   N  0UNKNOWN
 kitchen_1/kitchen_1192.168.1.6  D   N 
 5060 OK (12 ms) kitchen_2/kitchen_2192.168.1.6
  D   N  5061 OK (10 ms) office_1  
 (Unspecified)D   N  0UNKNOWN
 office_2   (Unspecified)D   N 
 0UNKNOWN office_3   (Unspecified) 
   D   N  0UNKNOWN sip_proxy 
 66.232.80.9 5060 Unmonitored
 sip_proxy-out  66.232.80.9
 5060 OK (46 ms) softphone  (Unspecified)  
  D   N  0UNKNOWN 19 sip peers [Monitored: 9
 online, 9 offline Unmonitored: 1 online, 0 offline] pbx*CLI
 
 
 My local softswitch is at the other end of a PON link 1.2km away...
 
 The VoIP agent on my iPhone 4 has terrible latency just because I'm on
 ATT...
 
 If I were using an Android on T-mobile that would be around 100ms...

The discussion in question is about a scheduler latency, not a network one, 
though both can be related in the end.
-- 
Florian
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-30 Thread Florian Fainelli
Hello,

On Monday 05 September 2011 18:44:39 Michael Büsch wrote:
 On Mon, 05 Sep 2011 18:11:43 +0200
 
 Felix Fietkau n...@openwrt.org wrote:
   I am still wondering how enabling preempt could possibly
   workaround/hide an alignment bug. sounds strange to me. Does somebody
   have an idea?
   
   I didn't look too closely at the function yet, though.
  
  Look at BadVA : 6fbb600f - it's not an alignment bug, the address is
  completely bogus. It just happens to trip on the unhandled unaligned
  access first because of the lowest bits.
  This looks like a nasty memory corruption bug, and hiding it with
  CONFIG_PREEMPT probably eventually makes it show up somewhere else
  instead.
 
 Ok, that makes sense.
 
 So instead of enabling preempt, it would be a way better idea to enable
 various kernel memory debugging options (probably also lockdep) to track
 this down.

It looks like this thread stalled here. Luka, have you been able to run a 
kernel with lockdep enabled to see what is going one here?
-- 
Florian
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-30 Thread Philip Prindeville
On 9/30/11 4:52 AM, Florian Fainelli wrote:
 Hello,

 On Sunday 04 September 2011 21:24:45 Philip Prindeville wrote:
 On 9/4/11 11:43 AM, Michael Büsch wrote:
 On Sun, 04 Sep 2011 10:11:08 -0700

 Philip Prindeville philipp_s...@redfish-solutions.com wrote:
 And finally, I'm not really convinced that any of the routers/APs
 that OpenWRT supports have latency requirements in the milliseconds
 range. I'd rather say throughput matters a _lot_ more than a
 millisecond of latency for these devices.
 If you're doing VoIP, then I'd certainly say latency matters.
 No it doesn't. At least not in the MILLISECONDS range.
 It does not matter at all, if your voip call has 300 or 302 ms latency.
 But it _does_ matter that there's enough throughput bandwidth to get most
 of the packages through the pipe.
 Who the heck has 300ms latency???

 pbx*CLI sip show peers
 Name/username  HostDyn
 Forcerport ACL Port Status ata_1/ata_1192.168.1.12
 D   N  5060 OK (15 ms) ata_2/ata_2
192.168.1.12 D   N  5061 OK (11
 ms) bedroom_1/bedroom_1192.168.1.5  D 
  N  5060 OK (14 ms) bedroom_2/bedroom_2192.168.1.5
  D   N  5061 OK (13 ms) bedroom_3/bedroom_3   
 192.168.1.5  D   N  5062 OK (16
 ms) cell_1/cell_1  184.72.221.84D 
  N  45983OK (211 ms) cell_2 (Unspecified) 
   D   N  0UNKNOWN guest_1 
   (Unspecified)D   N  0UNKNOWN
 guest_2(Unspecified)D   N 
 0UNKNOWN guest_3(Unspecified) 
   D   N  0UNKNOWN guest_4   
 (Unspecified)D   N  0UNKNOWN
 kitchen_1/kitchen_1192.168.1.6  D   N 
 5060 OK (12 ms) kitchen_2/kitchen_2192.168.1.6
  D   N  5061 OK (10 ms) office_1  
 (Unspecified)D   N  0UNKNOWN
 office_2   (Unspecified)D   N 
 0UNKNOWN office_3   (Unspecified) 
   D   N  0UNKNOWN sip_proxy 
 66.232.80.9 5060 Unmonitored
 sip_proxy-out  66.232.80.9
 5060 OK (46 ms) softphone  (Unspecified)  
  D   N  0UNKNOWN 19 sip peers [Monitored: 9
 online, 9 offline Unmonitored: 1 online, 0 offline] pbx*CLI


 My local softswitch is at the other end of a PON link 1.2km away...

 The VoIP agent on my iPhone 4 has terrible latency just because I'm on
 ATT...

 If I were using an Android on T-mobile that would be around 100ms...
 The discussion in question is about a scheduler latency, not a network one, 
 though both can be related in the end.

The assert was made that scheduling delays didn't matter because network delays 
are so much higher (by orders of magnitude).

My point was simply that network delays aren't always as substantial as some 
might believe.

-Philip

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-05 Thread Florian Fainelli
On Sunday 04 September 2011 22:44:08 Luka Perkov wrote:
 On Sun, Sep 04, 2011 at 08:47:46PM +0200, Michael Büsch wrote:
  On Sun, 4 Sep 2011 01:06:02 +0200
  
  Luka Perkov open...@lukaperkov.net wrote:
What are you actually trying to fix with enabling preemption? I
didn't really get it by reading your mail.
   
   Kernel oops that I described.
  
  Yeah. And that is completely unacceptable.
 
 See the patch attached.
 
   CONFIG_PREEMPT must be enabled; don't know what more I can do.
  
  No. You must provide a full OOPS message.
  An unaligned access is easy to fix (or at least work around properly)
  with proper debugging information.
 
 Unhandled kernel unaligned access[#1]:
 Cpu 0
 $ 0   :   0006 0011
 $ 4   : d5bf9da3 80dbb548 0006 c010
 $ 8   : c578  6e617332 6e617332
 $12   :    
 $16   : 6fbb5ff7 80d05618 8028fab0 
 $20   : 8028fa28 80cba248 8028fabc 8028fabe
 $24   :  80d85a50
 $28   : 8028e000 8028f9f0 81043d14 80cb8708
 Hi: 0235
 Lo: 02922c00
 epc   : 80cb8968 nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
 Tainted: P
 ra: 80cb8708 nf_nat_setup_info+0x80/0x6e8 [nf_nat]
 Status: 1100fc03KERNEL EXL IE
 Cause : 00800010
 BadVA : 6fbb600f
 PrId  : 00019641 (MIPS 24Kc)
 Modules linked in: gpio_keys_polled dwc_otg ath_pci ath_hal(P) lantiq_atm
 drv_dsl_cpe_api lantiq_mei ipt_MASQUERADE iptable_nat nf_nat xt_conntrack
 xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4
 nf_conntrack pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment
 xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables
 xt_tcpudp x_tables ppp_async ppp_generic slhc br2684 atm drv_vmmc usbcore
 drv_tapi crc_ccitt drv_ifxos arc4 aes_generic crypto_algapi Process
 swapper (pid: 0, threadinfo=8028e000, task=80291bc0, tls=) Stack :
 81722280 8019bfa0 801c686c 80f4f800 c0a801c7   
 c5780002 6ea9cbd9    a6a90600 d5bf9da3 
 6ea9cbd9    a6a90002 c0a801c7  
  c5780601 80cb9fd0 80cb8b0c 0001 d5bf9da3  
 c0a801c7 8028fae4 80fd8840 80d05618 8028fae8 d8263338 813ca98c 80fd8840
 ...
 Call Trace:
 [80cb8968] nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
 [80d1e158] masquerade_tg+0xc0/0xe8 [ipt_MASQUERADE]
 [80c646a8] ipt_do_table+0x3e0/0x484 [ip_tables]
 [80dee0c0] nf_nat_rule_find+0x28/0x9c [iptable_nat]
 [80dee290] nf_nat_fn+0x120/0x1a0 [iptable_nat]
 [801baa34] nf_iterate+0x8c/0xfc
 [801bab34] nf_hook_slow+0x90/0x17c
 [801c76c8] ip_output+0xd8/0x104
 [8019a224] __netif_receive_skb+0x4d4/0x578
 [80210128] br_handle_frame+0x280/0x2b8
 [80199f9c] __netif_receive_skb+0x24c/0x578
 [8019a370] process_backlog+0xa8/0x188
 [8019a778] net_rx_action+0x8c/0x1b8
 [800215f0] __do_softirq+0xa8/0x154
 [800217f0] do_softirq+0x48/0x68
 [800031c0] plat_irq_dispatch+0xf4/0x164
 [800059ec] ret_from_irq+0x0/0x4
 [80005be0] r4k_wait+0x20/0x40
 [80007690] cpu_idle+0x28/0x4c
 [802a58d0] start_kernel+0x35c/0x378
 
 If you want to debug say so and I'll send you vmlinux file. I'm not going
 to debug this further.

Then just post the oops along with the problem description the netfilter/netdev 
mailing-list and get the upstream networking people resolve this bug for you. 
They certainly will ask you to test for some patches, which sounds like an 
acceptable trade to me.

Please at least agree that enabling preemption is a workaround and not a fix.

 
   --- a/net/ipv4/netfilter/nf_nat_core.c
   +++ b/net/ipv4/netfilter/nf_nat_core.c
  
  This doesn't seem to fix any alignment issues, does it?
 
 No.
 
 Users can now choose which preemption mode they want. Is this ok?
 
 Luka
 
 Index: Config.in
 ===
 --- Config.in (revision 28166)
 +++ Config.in (working copy)
 @@ -231,6 +231,33 @@
   bool Compile the kernel with SysRq support
   default n
 
 + choice
 + prompt Compile the kernel with selected preemption model
 + default KERNEL_PREEMPT_NONE
 + help
 +   Select the preemption model you wish to use.
 +
 + config KERNEL_PREEMPT_NONE
 + bool No Forced Preemption (Server)
 + help
 +   Select this option if you are building a kernel for a 
server or
 +   scientific/computation system, or if you want to 
 maximize 
the
 +   raw processing power of the kernel, irrespective of 
scheduling
 +   latencies.
 +
 + config KERNEL_PREEMPT_VOLUNTARY
 + bool Voluntary Kernel Preemption (Desktop)
 + help
 +   Select this if you are building a kernel for a 
 desktop 
system.
 +
 + config KERNEL_PREEMPT
 + bool Preemptible Kernel (Low-Latency Desktop)
 +   

Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-05 Thread Michael Büsch
On Mon, 5 Sep 2011 09:58:58 +0200
Florian Fainelli flor...@openwrt.org wrote:
 Now this looks better, I am not opposed at all in us exposing such a kernel 
 configuration option to users through OpenWrt's menuconfig.

I'm completely fine with this as long as it defaults to no-preempt
and that it is not advertises as a fix to a completely unrelated bug.

short: patch is fine and may be applied, but doesn't fix the bug.

-- 
Greetings, Michael.
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-05 Thread Michael Büsch
On Sun, 4 Sep 2011 22:44:08 +0200
Luka Perkov open...@lukaperkov.net wrote:
 Unhandled kernel unaligned access[#1]:
 Cpu 0
 $ 0   :   0006 0011
 $ 4   : d5bf9da3 80dbb548 0006 c010
 $ 8   : c578  6e617332 6e617332
 $12   :    
 $16   : 6fbb5ff7 80d05618 8028fab0 
 $20   : 8028fa28 80cba248 8028fabc 8028fabe
 $24   :  80d85a50
 $28   : 8028e000 8028f9f0 81043d14 80cb8708
 Hi: 0235
 Lo: 02922c00
 epc   : 80cb8968 nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
 Tainted: P
 ra: 80cb8708 nf_nat_setup_info+0x80/0x6e8 [nf_nat]
 Status: 1100fc03KERNEL EXL IE
 Cause : 00800010
 BadVA : 6fbb600f
 PrId  : 00019641 (MIPS 24Kc)
 Modules linked in: gpio_keys_polled dwc_otg ath_pci ath_hal(P) lantiq_atm 
 drv_dsl_cpe_api lantiq_mei ipt_MASQUERADE iptable_nat nf_nat xt_conntrack 
 xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack 
 pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac 
 xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async 
 ppp_generic slhc br2684 atm drv_vmmc usbcore drv_tapi crc_ccitt drv_ifxos 
 arc4 aes_generic crypto_algapi
 Process swapper (pid: 0, threadinfo=8028e000, task=80291bc0, tls=)
 Stack : 81722280 8019bfa0 801c686c 80f4f800 c0a801c7   
 
 c5780002 6ea9cbd9    a6a90600 d5bf9da3 
 
 6ea9cbd9    a6a90002 c0a801c7  
 
  c5780601 80cb9fd0 80cb8b0c 0001 d5bf9da3  
 
 c0a801c7 8028fae4 80fd8840 80d05618 8028fae8 d8263338 813ca98c 
 80fd8840
 ...
 Call Trace:
 [80cb8968] nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
 [80d1e158] masquerade_tg+0xc0/0xe8 [ipt_MASQUERADE]
 [80c646a8] ipt_do_table+0x3e0/0x484 [ip_tables]
 [80dee0c0] nf_nat_rule_find+0x28/0x9c [iptable_nat]
 [80dee290] nf_nat_fn+0x120/0x1a0 [iptable_nat]
 [801baa34] nf_iterate+0x8c/0xfc
 [801bab34] nf_hook_slow+0x90/0x17c
 [801c76c8] ip_output+0xd8/0x104
 [8019a224] __netif_receive_skb+0x4d4/0x578
 [80210128] br_handle_frame+0x280/0x2b8
 [80199f9c] __netif_receive_skb+0x24c/0x578
 [8019a370] process_backlog+0xa8/0x188
 [8019a778] net_rx_action+0x8c/0x1b8
 [800215f0] __do_softirq+0xa8/0x154
 [800217f0] do_softirq+0x48/0x68
 [800031c0] plat_irq_dispatch+0xf4/0x164
 [800059ec] ret_from_irq+0x0/0x4
 [80005be0] r4k_wait+0x20/0x40
 [80007690] cpu_idle+0x28/0x4c
 [802a58d0] start_kernel+0x35c/0x378

thanks.
I am still wondering how enabling preempt could possibly workaround/hide
an alignment bug. sounds strange to me. Does somebody have an idea?

I didn't look too closely at the function yet, though.

-- 
Greetings, Michael.
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-05 Thread Conor O'Gorman
On Mon, 2011-09-05 at 13:25 +, Michael Büsch wrote:

 On Sun, 4 Sep 2011 22:44:08 +0200
 Luka Perkov open...@lukaperkov.net wrote:
  Unhandled kernel unaligned access[#1]:
  Cpu 0
  $ 0   :   0006 0011
  $ 4   : d5bf9da3 80dbb548 0006 c010
  $ 8   : c578  6e617332 6e617332
  $12   :    
  $16   : 6fbb5ff7 80d05618 8028fab0 
  $20   : 8028fa28 80cba248 8028fabc 8028fabe
  $24   :  80d85a50
  $28   : 8028e000 8028f9f0 81043d14 80cb8708
  Hi: 0235
  Lo: 02922c00
  epc   : 80cb8968 nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
  Tainted: P
  ra: 80cb8708 nf_nat_setup_info+0x80/0x6e8 [nf_nat]
  Status: 1100fc03KERNEL EXL IE
  Cause : 00800010
  BadVA : 6fbb600f
  PrId  : 00019641 (MIPS 24Kc)
  Modules linked in: gpio_keys_polled dwc_otg ath_pci ath_hal(P) lantiq_atm 
  drv_dsl_cpe_api lantiq_mei ipt_MASQUERADE iptable_nat nf_nat xt_conntrack 
  xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 
  nf_conntrack pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment 
  xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables 
  xt_tcpudp x_tables ppp_async ppp_generic slhc br2684 atm drv_vmmc usbcore 
  drv_tapi crc_ccitt drv_ifxos arc4 aes_generic crypto_algapi



 thanks.
 I am still wondering how enabling preempt could possibly workaround/hide
 an alignment bug. sounds strange to me. Does somebody have an idea?
 
 I didn't look too closely at the function yet, though.
 


This is on the Lantiq danube/xway platform, which has a second mips
processor and a packet accelerator. Both of these units run proprietary
code accessing main memory as they see fit. Also the Atheros binary hal
is in there.

The test running is a heavy stress test of the netfilter code. It's
spending most of it's time in there. I would not be surprised if one of
the other elements in the picture is doing something bad, ie. not the
netfilter code itself. The preemption option is merely hiding something
else.

Conor
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-05 Thread Conor O'Gorman
On Mon, 2011-09-05 at 13:25 +, Michael Büsch wrote:

 On Sun, 4 Sep 2011 22:44:08 +0200
 Luka Perkov open...@lukaperkov.net wrote:
  Unhandled kernel unaligned access[#1]:
  Cpu 0
  $ 0   :   0006 0011
  $ 4   : d5bf9da3 80dbb548 0006 c010
  $ 8   : c578  6e617332 6e617332
  $12   :    
  $16   : 6fbb5ff7 80d05618 8028fab0 
  $20   : 8028fa28 80cba248 8028fabc 8028fabe
  $24   :  80d85a50
  $28   : 8028e000 8028f9f0 81043d14 80cb8708
  Hi: 0235
  Lo: 02922c00
  epc   : 80cb8968 nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
  Tainted: P
  ra: 80cb8708 nf_nat_setup_info+0x80/0x6e8 [nf_nat]
  Status: 1100fc03KERNEL EXL IE
  Cause : 00800010
  BadVA : 6fbb600f
  PrId  : 00019641 (MIPS 24Kc)
  Modules linked in: gpio_keys_polled dwc_otg ath_pci ath_hal(P) lantiq_atm 
  drv_dsl_cpe_api lantiq_mei ipt_MASQUERADE iptable_nat nf_nat xt_conntrack 
  xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 
  nf_conntrack pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment 
  xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables 
  xt_tcpudp x_tables ppp_async ppp_generic slhc br2684 atm drv_vmmc usbcore 
  drv_tapi crc_ccitt drv_ifxos arc4 aes_generic crypto_algapi
  Process swapper (pid: 0, threadinfo=8028e000, task=80291bc0, tls=)
  Stack : 81722280 8019bfa0 801c686c 80f4f800 c0a801c7   
  
  c5780002 6ea9cbd9    a6a90600 d5bf9da3 
  
  6ea9cbd9    a6a90002 c0a801c7  
  
   c5780601 80cb9fd0 80cb8b0c 0001 d5bf9da3  
  
  c0a801c7 8028fae4 80fd8840 80d05618 8028fae8 d8263338 813ca98c 
  80fd8840
  ...
  Call Trace:
  [80cb8968] nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
  [80d1e158] masquerade_tg+0xc0/0xe8 [ipt_MASQUERADE]
  [80c646a8] ipt_do_table+0x3e0/0x484 [ip_tables]
  [80dee0c0] nf_nat_rule_find+0x28/0x9c [iptable_nat]
  [80dee290] nf_nat_fn+0x120/0x1a0 [iptable_nat]
  [801baa34] nf_iterate+0x8c/0xfc
  [801bab34] nf_hook_slow+0x90/0x17c
  [801c76c8] ip_output+0xd8/0x104
  [8019a224] __netif_receive_skb+0x4d4/0x578
  [80210128] br_handle_frame+0x280/0x2b8
  [80199f9c] __netif_receive_skb+0x24c/0x578
  [8019a370] process_backlog+0xa8/0x188
  [8019a778] net_rx_action+0x8c/0x1b8
  [800215f0] __do_softirq+0xa8/0x154
  [800217f0] do_softirq+0x48/0x68
  [800031c0] plat_irq_dispatch+0xf4/0x164
  [800059ec] ret_from_irq+0x0/0x4
  [80005be0] r4k_wait+0x20/0x40
  [80007690] cpu_idle+0x28/0x4c
  [802a58d0] start_kernel+0x35c/0x378
 
 thanks.
 I am still wondering how enabling preempt could possibly workaround/hide
 an alignment bug. sounds strange to me. Does somebody have an idea?
 
 I didn't look too closely at the function yet, though.
 


What is the exact opcode that is causing the problem and how bad is the
bad address?

(Yes, I could look up those things myself, and I might do it later if I
have time.)

Conor
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-05 Thread Felix Fietkau

On 2011-09-05 3:25 PM, Michael Büsch wrote:

On Sun, 4 Sep 2011 22:44:08 +0200
Luka Perkovopen...@lukaperkov.net  wrote:

 Unhandled kernel unaligned access[#1]:
 Cpu 0
 $ 0   :   0006 0011
 $ 4   : d5bf9da3 80dbb548 0006 c010
 $ 8   : c578  6e617332 6e617332
 $12   :    
 $16   : 6fbb5ff7 80d05618 8028fab0 
 $20   : 8028fa28 80cba248 8028fabc 8028fabe
 $24   :  80d85a50
 $28   : 8028e000 8028f9f0 81043d14 80cb8708
 Hi: 0235
 Lo: 02922c00
 epc   : 80cb8968 nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
 Tainted: P
 ra: 80cb8708 nf_nat_setup_info+0x80/0x6e8 [nf_nat]
 Status: 1100fc03KERNEL EXL IE
 Cause : 00800010
 BadVA : 6fbb600f
 PrId  : 00019641 (MIPS 24Kc)
 Modules linked in: gpio_keys_polled dwc_otg ath_pci ath_hal(P) lantiq_atm 
drv_dsl_cpe_api lantiq_mei ipt_MASQUERADE iptable_nat nf_nat xt_conntrack 
xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack 
pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac 
xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async 
ppp_generic slhc br2684 atm drv_vmmc usbcore drv_tapi crc_ccitt drv_ifxos arc4 
aes_generic crypto_algapi
 Process swapper (pid: 0, threadinfo=8028e000, task=80291bc0, tls=)
 Stack : 81722280 8019bfa0 801c686c 80f4f800 c0a801c7   
 c5780002 6ea9cbd9    a6a90600 d5bf9da3 
 6ea9cbd9    a6a90002 c0a801c7  
  c5780601 80cb9fd0 80cb8b0c 0001 d5bf9da3  
 c0a801c7 8028fae4 80fd8840 80d05618 8028fae8 d8263338 813ca98c 80fd8840
 ...
 Call Trace:
 [80cb8968] nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
 [80d1e158] masquerade_tg+0xc0/0xe8 [ipt_MASQUERADE]
 [80c646a8] ipt_do_table+0x3e0/0x484 [ip_tables]
 [80dee0c0] nf_nat_rule_find+0x28/0x9c [iptable_nat]
 [80dee290] nf_nat_fn+0x120/0x1a0 [iptable_nat]
 [801baa34] nf_iterate+0x8c/0xfc
 [801bab34] nf_hook_slow+0x90/0x17c
 [801c76c8] ip_output+0xd8/0x104
 [8019a224] __netif_receive_skb+0x4d4/0x578
 [80210128] br_handle_frame+0x280/0x2b8
 [80199f9c] __netif_receive_skb+0x24c/0x578
 [8019a370] process_backlog+0xa8/0x188
 [8019a778] net_rx_action+0x8c/0x1b8
 [800215f0] __do_softirq+0xa8/0x154
 [800217f0] do_softirq+0x48/0x68
 [800031c0] plat_irq_dispatch+0xf4/0x164
 [800059ec] ret_from_irq+0x0/0x4
 [80005be0] r4k_wait+0x20/0x40
 [80007690] cpu_idle+0x28/0x4c
 [802a58d0] start_kernel+0x35c/0x378


thanks.
I am still wondering how enabling preempt could possibly workaround/hide
an alignment bug. sounds strange to me. Does somebody have an idea?

I didn't look too closely at the function yet, though.
Look at BadVA : 6fbb600f - it's not an alignment bug, the address is 
completely bogus. It just happens to trip on the unhandled unaligned 
access first because of the lowest bits.
This looks like a nasty memory corruption bug, and hiding it with 
CONFIG_PREEMPT probably eventually makes it show up somewhere else instead.


- Felix
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-05 Thread Luka Perkov
On Mon, Sep 05, 2011 at 09:58:58AM +0200, Florian Fainelli wrote:
 On Sunday 04 September 2011 22:44:08 Luka Perkov wrote:
  On Sun, Sep 04, 2011 at 08:47:46PM +0200, Michael Büsch wrote:
  If you want to debug say so and I'll send you vmlinux file. I'm not going
  to debug this further.
 
 Then just post the oops along with the problem description the 
 netfilter/netdev 
 mailing-list and get the upstream networking people resolve this bug for you. 
 They certainly will ask you to test for some patches, which sounds like an 
 acceptable trade to me.

Sadly no one tried to reproduce the issue on their platform (or at least
I did not see it here). Please tyr to reproduce; that way, when posting
to upstream mailing-list, they would have more info.

 Please at least agree that enabling preemption is a workaround and not a fix.

It's a workaround.

  Index: Config.in
  ===
  --- Config.in   (revision 28166)
  +++ Config.in   (working copy)
 
 Now this looks better, I am not opposed at all in us exposing such a kernel 
 configuration option to users through OpenWrt's menuconfig.

Great.

Luka
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-04 Thread Philip Prindeville
On 9/2/11 2:09 PM, Michael Büsch wrote:
 On Fri, 2 Sep 2011 00:55:54 +0200
 Luka Perkov open...@lukaperkov.net wrote:
 
 Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
 CONFIG_PREEMPT:

  Select this if you are building a kernel for a desktop or
  embedded system with latency requirements in the milliseconds
  range

 Because of that I made changes to all kernel config files.

 Signed-off-by: Luka Perkov  openwrt --to-- lukaperkov.net 
 
 Uhm, wait a second.
 What are you actually trying to fix with enabling preemption? I didn't
 really get it by reading your mail.
 
 Some random text in a kernel config file is _not_ a reason to make
 a change with a scope like this one.
 Enabling preemption _does_ have negative effects. For one it increases
 the kernel size. And it also increases the runtime overhead (especially on 
 UP).
 
 And finally, I'm not really convinced that any of the routers/APs
 that OpenWRT supports have latency requirements in the milliseconds range.
 I'd rather say throughput matters a _lot_ more than a millisecond of latency
 for these devices.

If you're doing VoIP, then I'd certainly say latency matters.

-Philip

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-04 Thread Michael Büsch
On Sun, 04 Sep 2011 10:11:08 -0700
Philip Prindeville philipp_s...@redfish-solutions.com wrote:
  And finally, I'm not really convinced that any of the routers/APs
  that OpenWRT supports have latency requirements in the milliseconds range.
  I'd rather say throughput matters a _lot_ more than a millisecond of latency
  for these devices.
 
 If you're doing VoIP, then I'd certainly say latency matters.

No it doesn't. At least not in the MILLISECONDS range.
It does not matter at all, if your voip call has 300 or 302 ms latency.
But it _does_ matter that there's enough throughput bandwidth to get most
of the packages through the pipe.

-- 
Greetings, Michael.
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-04 Thread Michael Büsch
On Sun, 4 Sep 2011 01:06:02 +0200
Luka Perkov open...@lukaperkov.net wrote:
  What are you actually trying to fix with enabling preemption? I didn't
  really get it by reading your mail.
 
 Kernel oops that I described.

Yeah. And that is completely unacceptable.

 CONFIG_PREEMPT must be enabled; don't know what more I can do.

No. You must provide a full OOPS message.
An unaligned access is easy to fix (or at least work around properly)
with proper debugging information.

 --- a/net/ipv4/netfilter/nf_nat_core.c
 +++ b/net/ipv4/netfilter/nf_nat_core.c
 @@ -276,9 +276,9 @@ nf_nat_setup_info(struct nf_conn *ct,
  
   /* nat helper or nfctnetlink also setup binding */
   nat = nfct_nat(ct);
 - if (!nat) {
 + if (unlikely(!nat)) {
   nat = nf_ct_ext_add(ct, NF_CT_EXT_NAT, GFP_ATOMIC);
 - if (nat == NULL) {
 + if (unlikely(nat == NULL)) {
   pr_debug(failed to add NAT extension\n);
   return NF_ACCEPT;
   }
 @@ -313,16 +313,17 @@ nf_nat_setup_info(struct nf_conn *ct,
   }
  
   if (maniptype == IP_NAT_MANIP_SRC) {
 - unsigned int srchash;
 + unsigned int h;
  
 - srchash = hash_by_src(net, nf_ct_zone(ct),
 -   ct-tuplehash[IP_CT_DIR_ORIGINAL].tuple);
 - spin_lock_bh(nf_nat_lock);
 - /* nf_conntrack_alter_reply might re-allocate exntension aera */
 + h = hash_by_src(net, nf_ct_zone(ct),
 + ct-tuplehash[IP_CT_DIR_ORIGINAL].tuple);
 +
 + /* nf_conntrack_alter_reply might re-allocate extension area */
   nat = nfct_nat(ct);
   nat-ct = ct;
 - hlist_add_head_rcu(nat-bysource,
 -net-ipv4.nat_bysource[srchash]);
 +
 + spin_lock_bh(nf_nat_lock);
 + hlist_add_head_rcu(nat-bysource, net-ipv4.nat_bysource[h]);
   spin_unlock_bh(nf_nat_lock);
   }

This doesn't seem to fix any alignment issues, does it?

-- 
Greetings, Michael.
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-04 Thread Philip Prindeville
On 9/4/11 11:43 AM, Michael Büsch wrote:
 On Sun, 04 Sep 2011 10:11:08 -0700
 Philip Prindeville philipp_s...@redfish-solutions.com wrote:
 And finally, I'm not really convinced that any of the routers/APs
 that OpenWRT supports have latency requirements in the milliseconds range.
 I'd rather say throughput matters a _lot_ more than a millisecond of latency
 for these devices.
 If you're doing VoIP, then I'd certainly say latency matters.
 No it doesn't. At least not in the MILLISECONDS range.
 It does not matter at all, if your voip call has 300 or 302 ms latency.
 But it _does_ matter that there's enough throughput bandwidth to get most
 of the packages through the pipe.


Who the heck has 300ms latency???

pbx*CLI sip show peers
Name/username  HostDyn 
Forcerport ACL Port Status 
ata_1/ata_1192.168.1.12 D   N  
5060 OK (15 ms) 
ata_2/ata_2192.168.1.12 D   N  
5061 OK (11 ms) 
bedroom_1/bedroom_1192.168.1.5  D   N  
5060 OK (14 ms) 
bedroom_2/bedroom_2192.168.1.5  D   N  
5061 OK (13 ms) 
bedroom_3/bedroom_3192.168.1.5  D   N  
5062 OK (16 ms) 
cell_1/cell_1  184.72.221.84D   N  
45983OK (211 ms) 
cell_2 (Unspecified)D   N  
0UNKNOWN
guest_1(Unspecified)D   N  
0UNKNOWN
guest_2(Unspecified)D   N  
0UNKNOWN
guest_3(Unspecified)D   N  
0UNKNOWN
guest_4(Unspecified)D   N  
0UNKNOWN
kitchen_1/kitchen_1192.168.1.6  D   N  
5060 OK (12 ms) 
kitchen_2/kitchen_2192.168.1.6  D   N  
5061 OK (10 ms) 
office_1   (Unspecified)D   N  
0UNKNOWN
office_2   (Unspecified)D   N  
0UNKNOWN
office_3   (Unspecified)D   N  
0UNKNOWN
sip_proxy  66.232.80.9 
5060 Unmonitored 
sip_proxy-out  66.232.80.9 
5060 OK (46 ms) 
softphone  (Unspecified)D   N  
0UNKNOWN
19 sip peers [Monitored: 9 online, 9 offline Unmonitored: 1 online, 0 offline]
pbx*CLI 


My local softswitch is at the other end of a PON link 1.2km away...

The VoIP agent on my iPhone 4 has terrible latency just because I'm on ATT...

If I were using an Android on T-mobile that would be around 100ms...

-Philip


___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-04 Thread Luka Perkov
On Sun, Sep 04, 2011 at 08:47:46PM +0200, Michael Büsch wrote:
 On Sun, 4 Sep 2011 01:06:02 +0200
 Luka Perkov open...@lukaperkov.net wrote:
   What are you actually trying to fix with enabling preemption? I didn't
   really get it by reading your mail.
  
  Kernel oops that I described.
 
 Yeah. And that is completely unacceptable.

See the patch attached.

  CONFIG_PREEMPT must be enabled; don't know what more I can do.
 
 No. You must provide a full OOPS message.
 An unaligned access is easy to fix (or at least work around properly)
 with proper debugging information.

Unhandled kernel unaligned access[#1]:
Cpu 0
$ 0   :   0006 0011
$ 4   : d5bf9da3 80dbb548 0006 c010
$ 8   : c578  6e617332 6e617332
$12   :    
$16   : 6fbb5ff7 80d05618 8028fab0 
$20   : 8028fa28 80cba248 8028fabc 8028fabe
$24   :  80d85a50
$28   : 8028e000 8028f9f0 81043d14 80cb8708
Hi: 0235
Lo: 02922c00
epc   : 80cb8968 nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
Tainted: P
ra: 80cb8708 nf_nat_setup_info+0x80/0x6e8 [nf_nat]
Status: 1100fc03KERNEL EXL IE
Cause : 00800010
BadVA : 6fbb600f
PrId  : 00019641 (MIPS 24Kc)
Modules linked in: gpio_keys_polled dwc_otg ath_pci ath_hal(P) lantiq_atm 
drv_dsl_cpe_api lantiq_mei ipt_MASQUERADE iptable_nat nf_nat xt_conntrack 
xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack 
pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac 
xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async 
ppp_generic slhc br2684 atm drv_vmmc usbcore drv_tapi crc_ccitt drv_ifxos arc4 
aes_generic crypto_algapi
Process swapper (pid: 0, threadinfo=8028e000, task=80291bc0, tls=)
Stack : 81722280 8019bfa0 801c686c 80f4f800 c0a801c7   
c5780002 6ea9cbd9    a6a90600 d5bf9da3 
6ea9cbd9    a6a90002 c0a801c7  
 c5780601 80cb9fd0 80cb8b0c 0001 d5bf9da3  
c0a801c7 8028fae4 80fd8840 80d05618 8028fae8 d8263338 813ca98c 80fd8840
...
Call Trace:
[80cb8968] nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
[80d1e158] masquerade_tg+0xc0/0xe8 [ipt_MASQUERADE]
[80c646a8] ipt_do_table+0x3e0/0x484 [ip_tables]
[80dee0c0] nf_nat_rule_find+0x28/0x9c [iptable_nat]
[80dee290] nf_nat_fn+0x120/0x1a0 [iptable_nat]
[801baa34] nf_iterate+0x8c/0xfc
[801bab34] nf_hook_slow+0x90/0x17c
[801c76c8] ip_output+0xd8/0x104
[8019a224] __netif_receive_skb+0x4d4/0x578
[80210128] br_handle_frame+0x280/0x2b8
[80199f9c] __netif_receive_skb+0x24c/0x578
[8019a370] process_backlog+0xa8/0x188
[8019a778] net_rx_action+0x8c/0x1b8
[800215f0] __do_softirq+0xa8/0x154
[800217f0] do_softirq+0x48/0x68
[800031c0] plat_irq_dispatch+0xf4/0x164
[800059ec] ret_from_irq+0x0/0x4
[80005be0] r4k_wait+0x20/0x40
[80007690] cpu_idle+0x28/0x4c
[802a58d0] start_kernel+0x35c/0x378

If you want to debug say so and I'll send you vmlinux file. I'm not going to
debug this further.

  --- a/net/ipv4/netfilter/nf_nat_core.c
  +++ b/net/ipv4/netfilter/nf_nat_core.c
 
 This doesn't seem to fix any alignment issues, does it?

No.

Users can now choose which preemption mode they want. Is this ok?

Luka 

Index: Config.in
===
--- Config.in   (revision 28166)
+++ Config.in   (working copy)
@@ -231,6 +231,33 @@
bool Compile the kernel with SysRq support
default n
 
+   choice
+   prompt Compile the kernel with selected preemption model
+   default KERNEL_PREEMPT_NONE
+   help
+ Select the preemption model you wish to use.
+
+   config KERNEL_PREEMPT_NONE
+   bool No Forced Preemption (Server)
+   help
+ Select this option if you are building a kernel for a 
server or
+ scientific/computation system, or if you want to 
maximize the
+ raw processing power of the kernel, irrespective of 
scheduling
+ latencies.
+
+   config KERNEL_PREEMPT_VOLUNTARY
+   bool Voluntary Kernel Preemption (Desktop)
+   help
+ Select this if you are building a kernel for a 
desktop system.
+
+   config KERNEL_PREEMPT
+   bool Preemptible Kernel (Low-Latency Desktop)
+   help
+ Select this if you are building a kernel for a 
desktop or
+ embedded system with latency requirements in the 
milliseconds
+ range.
+   endchoice
+
comment Package build options
 
config DEBUG

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org

Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-03 Thread Luka Perkov
On Fri, Sep 02, 2011 at 11:09:48PM +0200, Michael Büsch wrote:
 On Fri, 2 Sep 2011 00:55:54 +0200 Luka Perkov wrote:
  Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
  CONFIG_PREEMPT:
  
  Select this if you are building a kernel for a desktop or
  embedded system with latency requirements in the milliseconds
  range
  
  Because of that I made changes to all kernel config files.
  
  Signed-off-by: Luka Perkov  openwrt --to-- lukaperkov.net 
 
 Uhm, wait a second.
 What are you actually trying to fix with enabling preemption? I didn't
 really get it by reading your mail.

Kernel oops that I described.

 Some random text in a kernel config file is _not_ a reason to make
 a change with a scope like this one.
 Enabling preemption _does_ have negative effects. For one it increases
 the kernel size. And it also increases the runtime overhead (especially on 
 UP).

That is what Florian pointed out also. I rather have those side
effects instead of occasional kernel oops, you?

 And finally, I'm not really convinced that any of the routers/APs
 that OpenWRT supports have latency requirements in the milliseconds range.
 I'd rather say throughput matters a _lot_ more than a millisecond of latency
 for these devices.

I guess everybody pointed out this part latency requirements in the
milliseconds range. But nobody reported their results from nmap scan
and said if they see any kind of kernel oops without the changes I
proposed... Come on, all of you have routers with OpenWrt ;)

Florian poited out that code should be able to work fine with PREEMPT
enabled or not so I looked what can be fixed/improved. I'll do some
more testing and I'll probaby send upstream patch bellow (inside the
lock we don't need two lines I removed).

CONFIG_PREEMPT must be enabled; don't know what more I can do.

Luka

--- a/net/ipv4/netfilter/nf_nat_core.c
+++ b/net/ipv4/netfilter/nf_nat_core.c
@@ -276,9 +276,9 @@ nf_nat_setup_info(struct nf_conn *ct,
 
/* nat helper or nfctnetlink also setup binding */
nat = nfct_nat(ct);
-   if (!nat) {
+   if (unlikely(!nat)) {
nat = nf_ct_ext_add(ct, NF_CT_EXT_NAT, GFP_ATOMIC);
-   if (nat == NULL) {
+   if (unlikely(nat == NULL)) {
pr_debug(failed to add NAT extension\n);
return NF_ACCEPT;
}
@@ -313,16 +313,17 @@ nf_nat_setup_info(struct nf_conn *ct,
}
 
if (maniptype == IP_NAT_MANIP_SRC) {
-   unsigned int srchash;
+   unsigned int h;
 
-   srchash = hash_by_src(net, nf_ct_zone(ct),
- ct-tuplehash[IP_CT_DIR_ORIGINAL].tuple);
-   spin_lock_bh(nf_nat_lock);
-   /* nf_conntrack_alter_reply might re-allocate exntension aera */
+   h = hash_by_src(net, nf_ct_zone(ct),
+   ct-tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+
+   /* nf_conntrack_alter_reply might re-allocate extension area */
nat = nfct_nat(ct);
nat-ct = ct;
-   hlist_add_head_rcu(nat-bysource,
-  net-ipv4.nat_bysource[srchash]);
+
+   spin_lock_bh(nf_nat_lock);
+   hlist_add_head_rcu(nat-bysource, net-ipv4.nat_bysource[h]);
spin_unlock_bh(nf_nat_lock);
}
 
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-02 Thread Florian Fainelli
Hello,

On Friday 02 September 2011 00:55:54 Luka Perkov wrote:
 I also had this issue on my sx763 lantiq based board:
 
 https://dev.openwrt.org/ticket/9440
 
 With symbol table I got this oops:
 
 Unhandled kernel unaligned access[#1]:
 ... bla bla bla (to keep it short) ...
 Call Trace:
 [80cb8968] nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
 [80d1e158] masquerade_tg+0xc0/0xe8 [ipt_MASQUERADE]
 [80c646a8] ipt_do_table+0x3e0/0x484 [ip_tables]
 [80dee0c0] nf_nat_rule_find+0x28/0x9c [iptable_nat]
 [80dee290] nf_nat_fn+0x120/0x1a0 [iptable_nat]
 [801baa34] nf_iterate+0x8c/0xfc
 [801bab34] nf_hook_slow+0x90/0x17c
 [801c76c8] ip_output+0xd8/0x104
 [8019a224] __netif_receive_skb+0x4d4/0x578
 [80210128] br_handle_frame+0x280/0x2b8
 [80199f9c] __netif_receive_skb+0x24c/0x578
 [8019a370] process_backlog+0xa8/0x188
 [8019a778] net_rx_action+0x8c/0x1b8
 [800215f0] __do_softirq+0xa8/0x154
 [800217f0] do_softirq+0x48/0x68
 [800031c0] plat_irq_dispatch+0xf4/0x164
 [800059ec] ret_from_irq+0x0/0x4
 [80005be0] r4k_wait+0x20/0x40
 [80007690] cpu_idle+0x28/0x4c
 [802a58d0] start_kernel+0x35c/0x378
 
 It's easy to reproduce with nmap:
 
 # nmap -sT -p 1-1 -T insane -Pn -n some.public.ip.address/24
 
 After some time I discovered that the issue is in this lines:
 
 % sed -n '320,326p' linux-2.6.39.4/net/ipv4/netfilter/nf_nat_core.c
   spin_lock_bh(nf_nat_lock);
   /* nf_conntrack_alter_reply might re-allocate exntension aera */
   nat = nfct_nat(ct);
   nat-ct = ct;
   hlist_add_head_rcu(nat-bysource,
  net-ipv4.nat_bysource[srchash]);
   spin_unlock_bh(nf_nat_lock);
 
 Long story short - enable CONFIG_PREEMPT to have functional spin locks:
 
 http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/x109.htm
 l
 
 Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
 CONFIG_PREEMPT:
 
   Select this if you are building a kernel for a desktop or
   embedded system with latency requirements in the milliseconds
   range
 
 Because of that I made changes to all kernel config files.
 
 Signed-off-by: Luka Perkov  openwrt --to-- lukaperkov.net 

I am not opposed to enabling CONFIG_PREEMPT on a global basis, but I am afraid 
this might reveal new locking problems that we have not had so far. At least 
trunk should be the place where to experiment this.
-- 
Florian
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-02 Thread Luka Perkov
Hi,

On Fri, Sep 02, 2011 at 10:46:37AM +0200, Florian Fainelli wrote:
 On Friday 02 September 2011 00:55:54 Luka Perkov wrote:
  Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
  CONFIG_PREEMPT:
  
  Select this if you are building a kernel for a desktop or
  embedded system with latency requirements in the milliseconds
  range

Please look at the kernel config file above. You will see that
CONFIG_PREEMPT should be used on embedded systems...
 
 I am not opposed to enabling CONFIG_PREEMPT on a global basis, but I am 
 afraid 
 this might reveal new locking problems that we have not had so far. At least 
 trunk should be the place where to experiment this.

I agree. If it causes new problems we can revert.

Luka
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-02 Thread Geert Uytterhoeven
On Fri, Sep 2, 2011 at 12:39, Luka Perkov open...@lukaperkov.net wrote:

 On Fri, Sep 02, 2011 at 10:46:37AM +0200, Florian Fainelli wrote:
 On Friday 02 September 2011 00:55:54 Luka Perkov wrote:
  Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
  CONFIG_PREEMPT:
 
      Select this if you are building a kernel for a desktop or
      embedded system with latency requirements in the milliseconds
      range

 Please look at the kernel config file above. You will see that
 CONFIG_PREEMPT should be used on embedded systems...

... with latency requirements in the milliseconds range.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
                                -- Linus Torvalds
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-02 Thread edgar . soldin
On 02.09.2011 12:39, Luka Perkov wrote:
 Hi,
 
 On Fri, Sep 02, 2011 at 10:46:37AM +0200, Florian Fainelli wrote:
 On Friday 02 September 2011 00:55:54 Luka Perkov wrote:
 Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
 CONFIG_PREEMPT:

 Select this if you are building a kernel for a desktop or
 embedded system with latency requirements in the milliseconds
 range
 
 Please look at the kernel config file above. You will see that
 CONFIG_PREEMPT should be used on embedded systems...
  

actually it says 

system with latency requirements in the milliseconds range

;9 .. but that's no reason not to use it if it might fix issues.

ede
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-02 Thread Florian Fainelli
On Friday 02 September 2011 12:55:08 Geert Uytterhoeven wrote:
 On Fri, Sep 2, 2011 at 12:39, Luka Perkov open...@lukaperkov.net wrote:
  On Fri, Sep 02, 2011 at 10:46:37AM +0200, Florian Fainelli wrote:
  On Friday 02 September 2011 00:55:54 Luka Perkov wrote:
   Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
   CONFIG_PREEMPT:
   
   Select this if you are building a kernel for a desktop or
   embedded system with latency requirements in the milliseconds
   range
  
  Please look at the kernel config file above. You will see that
  CONFIG_PREEMPT should be used on embedded systems...
 
 ... with latency requirements in the milliseconds range.

Indeed, that's the part I am concerned with, along with the memory footprint. 
Any code should be able to work with and without Preemption enabled. Your 
patch remains a workaround for now.
-- 
Florian
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-02 Thread Luka Perkov
On Fri, Sep 02, 2011 at 01:32:18PM +0200, Florian Fainelli wrote:
 On Friday 02 September 2011 12:55:08 Geert Uytterhoeven wrote:
  On Fri, Sep 2, 2011 at 12:39, Luka Perkov open...@lukaperkov.net wrote:
   On Fri, Sep 02, 2011 at 10:46:37AM +0200, Florian Fainelli wrote:
   On Friday 02 September 2011 00:55:54 Luka Perkov wrote:
Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
CONFIG_PREEMPT:

Select this if you are building a kernel for a desktop or
embedded system with latency requirements in the milliseconds
range
   
   Please look at the kernel config file above. You will see that
   CONFIG_PREEMPT should be used on embedded systems...
  
  ... with latency requirements in the milliseconds range.
 
 Indeed, that's the part I am concerned with, along with the memory footprint. 
 Any code should be able to work with and without Preemption enabled. Your 
 patch remains a workaround for now.

Please try to reproduce the issue with nmap on your devices. Run nmap
like I wrote on your PC and see what will your router do (you are
testing it's ability to handle many nat connections).

Try it with and without my patch and post what happened.

Luka
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-02 Thread Florian Fainelli
On Friday 02 September 2011 15:10:47 Luka Perkov wrote:
 On Fri, Sep 02, 2011 at 01:32:18PM +0200, Florian Fainelli wrote:
  On Friday 02 September 2011 12:55:08 Geert Uytterhoeven wrote:
   On Fri, Sep 2, 2011 at 12:39, Luka Perkov open...@lukaperkov.net 
wrote:
On Fri, Sep 02, 2011 at 10:46:37AM +0200, Florian Fainelli wrote:
On Friday 02 September 2011 00:55:54 Luka Perkov wrote:
 Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
 
 CONFIG_PREEMPT:
 Select this if you are building a kernel for a desktop or
 embedded system with latency requirements in the milliseconds
 range

Please look at the kernel config file above. You will see that
CONFIG_PREEMPT should be used on embedded systems...
   
   ... with latency requirements in the milliseconds range.
  
  Indeed, that's the part I am concerned with, along with the memory
  footprint. Any code should be able to work with and without Preemption
  enabled. Your patch remains a workaround for now.
 
 Please try to reproduce the issue with nmap on your devices. Run nmap
 like I wrote on your PC and see what will your router do (you are
 testing it's ability to handle many nat connections).
 

I will try to reproduce the error, but you cannot argue that code should be 
able to work fine with PREEMPT enabled or not, I have seen crappy drivers only 
working with preemption enabled too, but this is not an excuse.

 Try it with and without my patch and post what happened.

This has a net impact on the resulting kernel size, here is what I get for 
ar7:

- without preempt: 887 KB vmlinux.lzma
- with preempt: 902 KB vmlinux.lzma

this is quite a big increase.
-- 
Florian
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-02 Thread Otto Solares Cabrera
On Fri, Sep 02, 2011 at 12:39:38PM +0200, Luka Perkov wrote:
 Please look at the kernel config file above. You will see that
 CONFIG_PREEMPT should be used on embedded systems...

Doesn't CONFIG_PREEMPT will add userspace scheduling overhead which
in turn harm kernelspace workloads such as packet routing,
firewalling, NAT, etc.?

I had done some tests in the past with the scheduling priority of the
hostapd daemon (with nice and schedtools) but never spot a gain in
bandwith or latency with iperf.
-
 Otto
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


[OpenWrt-Devel] [PATCH] replace CONFIG_PREEMPT_NONE with CONFIG_PREEMPT

2011-09-01 Thread Luka Perkov
I also had this issue on my sx763 lantiq based board:

https://dev.openwrt.org/ticket/9440

With symbol table I got this oops:

Unhandled kernel unaligned access[#1]:
... bla bla bla (to keep it short) ...
Call Trace:
[80cb8968] nf_nat_setup_info+0x2e0/0x6e8 [nf_nat]
[80d1e158] masquerade_tg+0xc0/0xe8 [ipt_MASQUERADE]
[80c646a8] ipt_do_table+0x3e0/0x484 [ip_tables]
[80dee0c0] nf_nat_rule_find+0x28/0x9c [iptable_nat]
[80dee290] nf_nat_fn+0x120/0x1a0 [iptable_nat]
[801baa34] nf_iterate+0x8c/0xfc
[801bab34] nf_hook_slow+0x90/0x17c
[801c76c8] ip_output+0xd8/0x104
[8019a224] __netif_receive_skb+0x4d4/0x578
[80210128] br_handle_frame+0x280/0x2b8
[80199f9c] __netif_receive_skb+0x24c/0x578
[8019a370] process_backlog+0xa8/0x188
[8019a778] net_rx_action+0x8c/0x1b8
[800215f0] __do_softirq+0xa8/0x154
[800217f0] do_softirq+0x48/0x68
[800031c0] plat_irq_dispatch+0xf4/0x164
[800059ec] ret_from_irq+0x0/0x4
[80005be0] r4k_wait+0x20/0x40
[80007690] cpu_idle+0x28/0x4c
[802a58d0] start_kernel+0x35c/0x378

It's easy to reproduce with nmap:

# nmap -sT -p 1-1 -T insane -Pn -n some.public.ip.address/24

After some time I discovered that the issue is in this lines:

% sed -n '320,326p' linux-2.6.39.4/net/ipv4/netfilter/nf_nat_core.c
spin_lock_bh(nf_nat_lock);
/* nf_conntrack_alter_reply might re-allocate exntension aera */
nat = nfct_nat(ct);
nat-ct = ct;
hlist_add_head_rcu(nat-bysource,
   net-ipv4.nat_bysource[srchash]);
spin_unlock_bh(nf_nat_lock);

Long story short - enable CONFIG_PREEMPT to have functional spin locks:

http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/x109.html

Also in linux-2.6.39.4/kernel/Kconfig.preempt you will see for
CONFIG_PREEMPT:

Select this if you are building a kernel for a desktop or
embedded system with latency requirements in the milliseconds
range

Because of that I made changes to all kernel config files.

Signed-off-by: Luka Perkov  openwrt --to-- lukaperkov.net 
---

Index: target/linux/generic/config-2.6.30
===
--- target/linux/generic/config-2.6.30  (revision 28148)
+++ target/linux/generic/config-2.6.30  (working copy)
@@ -1683,8 +1683,8 @@
 # CONFIG_PPP_MPPE is not set
 CONFIG_PPP_MULTILINK=y
 # CONFIG_PPP_SYNC_TTY is not set
-# CONFIG_PREEMPT is not set
-CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT=y
+# CONFIG_PREEMPT_NONE is not set
 # CONFIG_PREEMPT_RCU is not set
 # CONFIG_PREEMPT_RCU_TRACE is not set
 # CONFIG_PREEMPT_VOLUNTARY is not set
Index: target/linux/generic/config-2.6.31
===
--- target/linux/generic/config-2.6.31  (revision 28148)
+++ target/linux/generic/config-2.6.31  (working copy)
@@ -1684,8 +1684,8 @@
 CONFIG_PPP_MULTILINK=y
 # CONFIG_PPP_SYNC_TTY is not set
 # CONFIG_PPS is not set
-# CONFIG_PREEMPT is not set
-CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT=y
+# CONFIG_PREEMPT_NONE is not set
 # CONFIG_PREEMPT_RCU is not set
 # CONFIG_PREEMPT_RCU_TRACE is not set
 # CONFIG_PREEMPT_VOLUNTARY is not set
Index: target/linux/generic/config-2.6.32
===
--- target/linux/generic/config-2.6.32  (revision 28148)
+++ target/linux/generic/config-2.6.32  (working copy)
@@ -1776,8 +1776,8 @@
 CONFIG_PPP_MULTILINK=y
 # CONFIG_PPP_SYNC_TTY is not set
 # CONFIG_PPS is not set
-# CONFIG_PREEMPT is not set
-CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT=y
+# CONFIG_PREEMPT_NONE is not set
 # CONFIG_PREEMPT_RCU is not set
 # CONFIG_PREEMPT_RCU_TRACE is not set
 # CONFIG_PREEMPT_VOLUNTARY is not set
Index: target/linux/generic/config-2.6.36
===
--- target/linux/generic/config-2.6.36  (revision 28148)
+++ target/linux/generic/config-2.6.36  (working copy)
@@ -1824,8 +1824,8 @@
 CONFIG_PPP_MULTILINK=y
 # CONFIG_PPP_SYNC_TTY is not set
 # CONFIG_PPS is not set
-# CONFIG_PREEMPT is not set
-CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT=y
+# CONFIG_PREEMPT_NONE is not set
 # CONFIG_PREEMPT_VOLUNTARY is not set
 CONFIG_PREVENT_FIRMWARE_BUILD=y
 CONFIG_PRINTK=y
Index: target/linux/generic/config-2.6.37
===
--- target/linux/generic/config-2.6.37  (revision 28148)
+++ target/linux/generic/config-2.6.37  (working copy)
@@ -1859,8 +1859,8 @@
 # CONFIG_PPP_SYNC_TTY is not set
 # CONFIG_PPS is not set
 # CONFIG_PPTP is not set
-# CONFIG_PREEMPT is not set
-CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT=y
+# CONFIG_PREEMPT_NONE is not set
 # CONFIG_PREEMPT_VOLUNTARY is not set
 CONFIG_PREVENT_FIRMWARE_BUILD=y
 CONFIG_PRINTK=y
Index: target/linux/generic/config-2.6.38
===
--- target/linux/generic/config-2.6.38  (revision 28148)
+++ target/linux/generic/config-2.6.38  (working copy)
@@ -1891,8