Re: [klibc] [patch] import socket defines
Mike Frysinger wrote: all this stuff is ABI constants, and the only reason glibc doesn't use them is that glibc prefers to use enums over #defines. a proper libc defines things in their headers according to the POSIX specs rather than relying on others to do it for them. if you want to argue about linux-specific ABI pieces being exported, then you probably have a valid point, but socket.h is hardly that. Have you looked at it?!!? It's full of ABI constants, and that's what I care about. POSIX doesn't define, say, AF_UNIX; that's an ABI specific. so if the only consumer is klibc and you're against adding these things to it, special case it for __KLIBC__. No, let's split the header so that there are *no* libc knowledge in the kernel. For the kernel to have knowledge about the specifics of any particular libc (klibc, glibc, or any other) is stupid, and that's the whole reason we're in this spot to begin with. Again, I don't particularly care about what they're named, but the whole point is #include linux/foo.h if you want the subset and #include linux/bar.h if you want the whole set. No libc specifics, and no feature test macros, which I think we can both agree are uglier than hell. I thought the naming worked out nicer with linux/sockaddr.h, but I *don't really care*. -hpa -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Fri, Jan 11, 2008 at 11:00:20AM +1100, Herbert Xu wrote: On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote: It seems this optimization could've a side effect: if during such a loop updates are done, and r is seen !NULL during while() check, but NULL after rcu_dereference(), the listing/counting could stop too soon. So, IMHO, probably the first version of this patch is more reliable. (Or alternatively additional check is needed before return.) No, while the value of r-u.dst.rt_next can change between two readings, the value of r cannot. ...Then, of course, it's O.K.! It looks like I'm really too lazy and/or these selfdocumenting features of RCU are a bit overrated: one can never be sure which pointer is really RCU protected without checking a few places?! So, after looking at this rt_cache_get_next() and this patch only, it's looks like the third candidate after seq-private and rtable... Thanks for explanation and sorry for disturbing! Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
HTB classify perfomance
Hello all. I N days try to tune system for best performance and see strange thing. Have N htb classes root class is HTB. param: default 7 (if not classify - go to 1:7) filters classify only mached ip. others go to HTB DEFAULT rule. run oprofile: First pc (htb and iptables compile in kernel): CPU: P4 / Xeon, speed 3409.94 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 10 samples %app name symbol name 743501 47.6081 vmlinux htb_classify 208718 13.3647 vmlinux ipt_do_table 94473 6.0493 vmlinux u32_classify 43088 2.7590 vmlinux e1000_intr 35086 2.2466 vmlinux e1000_clean_tx_irq 34925 2.2363 vmlinux ip_route_input 33972 2.1753 vmlinux e1000_irq_enable 33788 2.1635 vmlinux htb_dequeue 29197 1.8696 vmlinux e1000_clean_rx_irq 20177 1.2920 vmlinux sfq_dequeue 17825 1.1414 vmlinux sfq_enqueue 15135 0.9691 vmlinux e1000_xmit_frame 15123 0.9684 vmlinux eth_type_trans 13081 0.8376 vmlinux kfree 12153 0.7782 vmlinux dev_queue_xmit Second PC (htb and iptables is modules) CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 10 samples %app name symbol name 102108 30.7351 sch_htb (no symbols) 21559 6.4894 vmlinux e1000_intr 17428 5.2459 cls_u32 (no symbols) 13887 4.1801 ip_tables(no symbols) 11984 3.6072 sch_sfq (no symbols) 11785 3.5473 vmlinux e1000_irq_enable 9684 2.9149 vmlinux mwait_idle_with_hints 9227 2.7774 vmlinux e1000_clean_rx_irq 8686 2.6145 vmlinux e1000_clean_tx_irq 6747 2.0309 vmlinux ip_route_input 6533 1.9665 vmlinux irq_entries_start 6419 1.9322 vmlinux e1000_xmit_frame 5605 1.6871 vmlinux dev_queue_xmit 4030 1.2131 vmlinux __kfree_skb 3997 1.2031 vmlinux __qdisc_run 3931 1.1833 vmlinux e1000_clean 3565 1.0731 vmlinux net_rx_action 3518 1.0589 vmlinux ip_rcv 3377 1.0165 vmlinux getnstimeofday 3215 0.9677 vmlinux rb_erase 2973 0.8949 vmlinux eth_type_trans 2707 0.8148 vmlinux ip_output 2586 0.7784 vmlinux handle_fasteoi_irq Hmm.. strange... look to code htb_classify i see only one place where it may get many CPU. ok... try to add to the end of tc batch file.. filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 800:: match ip protocol 1 0x00 flowid 1:7 filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 800:: match ip protocol 1 0x00 flowid 1:7 (offtopic... strange... i not found that i can add filter without any match) Wow! CPU: P4 / Xeon, speed 3409.94 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 10 samples %app name symbol name 153128 20.9497 vmlinux ipt_unregister_table 121569 16.6321 vmlinux e1000_request_irq 60727 8.3082 vmlinux e1000_update_itr 47241 6.4631 vmlinux u32_delete 25836 3.5347 vmlinux htb_dequeue 18304 2.5042 vmlinux ipt_do_table 15980 2.1862 vmlinux mwait_idle_with_hints 15977 2.1858 vmlinux irq_entries_start 13337 1.8247 vmlinux htb_classify 12512 1.7118 vmlinux __ip_route_output_key 8821 1.2068 vmlinux sfq_init 8495 1.1622 vmlinux e1000_clean_rx_irq 8408 1.1503 vmlinux htb_enqueue 8018 1.0970 vmlinux e1000_xmit_frame 7867 1.0763 vmlinux e1000_clean_tx_ring 6336 0.8668 vmlinux htb_delete 5828 0.7973 vmlinux ___pskb_trim 5781 0.7909 vmlinux s_start 5234 0.7161 vmlinux e1000_clean_rx_irq_ps 4504 0.6162 vmlinux cache_alloc_refill 4133 0.5654 vmlinux radix_tree_delete Second PC CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated) Counted
Re: HTB classify perfomance
New info. Wait some time and reset oprifile statistic (i think info abount ipt_unregister_table its get what run some script... ). That clear info after add FILTER: First PC CPU: P4 / Xeon, speed 3409.96 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 10 samples %app name symbol name 1158171 19.1292 vmlinux ipt_do_table 722416 11.9319 vmlinux e1000_intr 627406 10.3627 vmlinux u32_classify 5652869.3367 vmlinux e1000_irq_enable 2693094.4481 vmlinux htb_dequeue 1910163.1550 vmlinux ip_route_input 1871273.0907 vmlinux sfq_dequeue 1727752.8537 vmlinux e1000_clean_tx_irq 1546542.5544 vmlinux e1000_clean_rx_irq 1469262.4267 vmlinux sfq_enqueue 1167821.9289 vmlinux htb_add_to_wait_tree 79398 1.3114 vmlinux rb_erase 74411 1.2290 vmlinux e1000_xmit_frame 65451 1.0810 vmlinux kfree 59966 0.9904 vmlinux irq_entries_start 59893 0.9892 vmlinux eth_type_trans 55510 0.9168 vmlinux dev_queue_xmit 52688 0.8702 vmlinux e1000_alloc_rx_buffers Hello all. I N days try to tune system for best performance and see strange thing. Have N htb classes root class is HTB. param: default 7 (if not classify - go to 1:7) filters classify only mached ip. others go to HTB DEFAULT rule. run oprofile: First pc (htb and iptables compile in kernel): CPU: P4 / Xeon, speed 3409.94 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 10 samples %app name symbol name 743501 47.6081 vmlinux htb_classify 208718 13.3647 vmlinux ipt_do_table 94473 6.0493 vmlinux u32_classify 43088 2.7590 vmlinux e1000_intr 35086 2.2466 vmlinux e1000_clean_tx_irq 34925 2.2363 vmlinux ip_route_input 33972 2.1753 vmlinux e1000_irq_enable 33788 2.1635 vmlinux htb_dequeue 29197 1.8696 vmlinux e1000_clean_rx_irq 20177 1.2920 vmlinux sfq_dequeue 17825 1.1414 vmlinux sfq_enqueue 15135 0.9691 vmlinux e1000_xmit_frame 15123 0.9684 vmlinux eth_type_trans 13081 0.8376 vmlinux kfree 12153 0.7782 vmlinux dev_queue_xmit Second PC (htb and iptables is modules) CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 10 samples %app name symbol name 102108 30.7351 sch_htb (no symbols) 21559 6.4894 vmlinux e1000_intr 17428 5.2459 cls_u32 (no symbols) 13887 4.1801 ip_tables(no symbols) 11984 3.6072 sch_sfq (no symbols) 11785 3.5473 vmlinux e1000_irq_enable 9684 2.9149 vmlinux mwait_idle_with_hints 9227 2.7774 vmlinux e1000_clean_rx_irq 8686 2.6145 vmlinux e1000_clean_tx_irq 6747 2.0309 vmlinux ip_route_input 6533 1.9665 vmlinux irq_entries_start 6419 1.9322 vmlinux e1000_xmit_frame 5605 1.6871 vmlinux dev_queue_xmit 4030 1.2131 vmlinux __kfree_skb 3997 1.2031 vmlinux __qdisc_run 3931 1.1833 vmlinux e1000_clean 3565 1.0731 vmlinux net_rx_action 3518 1.0589 vmlinux ip_rcv 3377 1.0165 vmlinux getnstimeofday 3215 0.9677 vmlinux rb_erase 2973 0.8949 vmlinux eth_type_trans 2707 0.8148 vmlinux ip_output 2586 0.7784 vmlinux handle_fasteoi_irq Hmm.. strange... look to code htb_classify i see only one place where it may get many CPU. ok... try to add to the end of tc batch file.. filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 800:: match ip protocol 1 0x00 flowid 1:7 filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 800:: match ip protocol 1 0x00 flowid 1:7 (offtopic... strange... i not found that i can add filter without any match) Wow! CPU: P4 / Xeon, speed 3409.94 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not
Re: [klibc] [patch] import socket defines
On Friday 11 January 2008, H. Peter Anvin wrote: Mike Frysinger wrote: all this stuff is ABI constants, and the only reason glibc doesn't use them is that glibc prefers to use enums over #defines. a proper libc defines things in their headers according to the POSIX specs rather than relying on others to do it for them. if you want to argue about linux-specific ABI pieces being exported, then you probably have a valid point, but socket.h is hardly that. Have you looked at it?!!? It's full of ABI constants, and that's what I care about. POSIX doesn't define, say, AF_UNIX; that's an ABI specific. i guess it depends on how you define define :P. no, POSIX does not state the specific numerical value (ABI) for the define (API), but POSIX does require sys/socket.h provide the macro AF_UNIX. so if the only consumer is klibc and you're against adding these things to it, special case it for __KLIBC__. No, let's split the header so that there are *no* libc knowledge in the kernel. For the kernel to have knowledge about the specifics of any particular libc (klibc, glibc, or any other) is stupid, and that's the whole reason we're in this spot to begin with. we're in this spot at the moment to appease klibc only. is there any other libc out there that is not providing its own complete sys/socket.h but instead relying on linux/socket.h ? glibc/uClibc rely on linux/socket.h only for the kernel's definition of sockaddr. Again, I don't particularly care about what they're named, but the whole point is #include linux/foo.h if you want the subset and #include linux/bar.h if you want the whole set. i looked more at glibc/uClibc and my primary/original concern (and what i thought what David was raising and you confirming) was that building of glibc was broken and glibc headers would need updates. that does not seem to be the case. the breakage here is for packages that include both sys/socket.h (directly/indirectly) and linux/socket.h (directly/indirectly). due to the way the network headers depend on each other, this case is trivial to induce. but i dont think linux/socket.h is any more special than the current retarded conflicts we have between the network headers from the libc (which are required by POSIX and beyond) and the kernel headers. No libc specifics, and no feature test macros, which I think we can both agree are uglier than hell. i think in general, all of the network related headers under linux/ are fubared for userspace. I thought the naming worked out nicer with linux/sockaddr.h placing the sockaddr definitions into linux/sockaddr.h makes sense. -mike signature.asc Description: This is a digitally signed message part.
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote: On Fri, Jan 11, 2008 at 11:00:20AM +1100, Herbert Xu wrote: On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote: It seems this optimization could've a side effect: if during such a loop updates are done, and r is seen !NULL during while() check, but NULL after rcu_dereference(), the listing/counting could stop too soon. So, IMHO, probably the first version of this patch is more reliable. (Or alternatively additional check is needed before return.) No, while the value of r-u.dst.rt_next can change between two readings, the value of r cannot. ...Then, of course, it's O.K.! It looks like I'm really too lazy and/or these selfdocumenting features of RCU are a bit overrated: one can never be sure which pointer is really RCU protected without checking a few places?! So, after looking at this rt_cache_get_next() and this patch only, it's looks like the third candidate after seq-private and rtable... OOPS! ...it seems we are talking about the same, properly documented (second) poiner yet... So, IOW: strictly speaking you are right, r can't change here, but I meant r vs. the returned value! Before the patch the returned value couldn't be NULL unless all elements of the list were looped. After this patch it seems possible... Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #3)
On Thu, Jan 10, 2008 at 03:32:28PM -0800, Kok, Auke wrote: - cleaned up largely against sparse, checkpatch largely means not completely, right? Please make sure there's no sparse warnings left at least. checkpatch is not that criticial, but it would be good to have an explanation for everything left. some comments on the patch - please remove that sill copyright heder on the Makefile, it's hard to claim any rights on a trivial 3 line makefile. - also please use igb-y instead of igb-objs in the Makefile - the driver would be a lot more readable (and more importantly hackable) if it was written in a natural flow instead of having dozends of lines of forward declarations in every file. - so you're adding your own phy abstraction. Is there a good reason you can't simply use the generic phylib? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote: ... So, IOW: strictly speaking you are right, r can't change here, but I meant r vs. the returned value! Before the patch the returned value couldn't be NULL unless all elements of the list were looped. After ...even more strictly: couldn't be NULL unless all buckets of the hash table were looped. After this patch it seems possible... Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [klibc] [patch] import socket defines
On Friday 11 January 2008, Mike Frysinger wrote: On Friday 11 January 2008, H. Peter Anvin wrote: Again, I don't particularly care about what they're named, but the whole point is #include linux/foo.h if you want the subset and #include linux/bar.h if you want the whole set. i looked more at glibc/uClibc and my primary/original concern (and what i thought what David was raising and you confirming) was that building of glibc was broken and glibc headers would need updates. that does not seem to be the case. the breakage here is for packages that include both sys/socket.h (directly/indirectly) and linux/socket.h (directly/indirectly). due to the way the network headers depend on each other, this case is trivial to induce. but i dont think linux/socket.h is any more special than the current retarded conflicts we have between the network headers from the libc (which are required by POSIX and beyond) and the kernel headers. No libc specifics, and no feature test macros, which I think we can both agree are uglier than hell. i think in general, all of the network related headers under linux/ are fubared for userspace. I thought the naming worked out nicer with linux/sockaddr.h placing the sockaddr definitions into linux/sockaddr.h makes sense. so there's no confusion, since the building of the libc itself and using pure libc headers are generally unaffected, and all of the network linux headers are already screwed for userspace usage, i'm not against the proposed change from Peter. it doesnt really make the situation any better/worse. -mike signature.asc Description: This is a digitally signed message part.
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote: It looks like I'm really too lazy and/or these selfdocumenting features of RCU are a bit overrated: one can never be sure which pointer is really RCU protected without checking a few places?! So, after looking at this rt_cache_get_next() and this patch only, it's looks like the third candidate after seq-private and rtable... Perhaps we could introduce a sparse attribute for it? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Fri, Jan 11, 2008 at 09:38:52PM +1100, Herbert Xu wrote: On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote: So, IOW: strictly speaking you are right, r can't change here, but I meant r vs. the returned value! Before the patch the returned value couldn't be NULL unless all elements of the list were looped. After this patch it seems possible... Since rcu_derference(r) is always the same as r this patch cannot change the value returned. Right!!! (But, you mean: always the same as r for local r, I hope...) So, my moronness's selfdocumenting features are not overrated at all! Thanks again, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
rp_filter and ip rule break ipsec policy
Hello everybody. AFAIK ipsec policy aren't related to routing tables: if there is an ipsec policy to deliver traffic, for example, from 192.168.0.0/16 to 10.0.0.0/8, xfrm will eat the packets ignoring the routing table. Here is the ipsec gateway schema: [-] cisco ISP router default gateway for | the linux box ip=cisco-genova | | | | _ eth0 ip=osw-genova | / |/ +--+--+ | | | + eth1 dmz-genova/28 ip=osw-genova | | +--+--+ | | |--- eth2 172.23.0.0/23 ip=172.23.1.8 Take a look: # ip ru sh 0: from all lookup local 601:from all to x.y.z.214 iif eth2 lookup test 32766: from all lookup main 32767: from all lookup default # ip r sh table test default via 172.23.1.254 dev eth2 metric 1 When I insert the rule number #601 packets to x.y.z.214 aren't ate by xfrm anymore. This happens when rp_filter is set to 1 on eth0. Disabling rp_filter on eth0 resolve the problem: xfrm eat the packets. Is this the expected behaviour? Why should rp_filter broken ipsec policy when rule #601 is inserted? I have enabled log_martinans on eth0 and when rp_filter is set to 1 I see this messages: martian source 172.23.1.4 from x.y.z.214, on dev eth0 ll header: 00:30:05:cb:27:c1:00:1b:54:fb:fd:78:08:00 martian source 172.23.1.4 from x.y.z.214, on dev eth0 ll header: 00:30:05:cb:27:c1:00:1b:54:fb:fd:78:08:00 # ip x p src x.y.z.214 dst 172.23.0.0/23 dir in priority 2376 ptype main tmpl src osw-napoli dst osw-genova proto comp reqid 16390 mode tunnel level use tmpl src 0.0.0.0 dst 0.0.0.0 proto esp reqid 16389 mode transport src 172.23.0.0/23 dst x.y.z.214 dir out priority 2376 ptype main tmpl src osw-genova dst osw-napoli proto comp reqid 16390 mode tunnel tmpl src 0.0.0.0 dst 0.0.0.0 proto esp reqid 16389 mode transport src x.y.z.214 dst 172.23.0.0/23 dir fwd priority 2376 ptype main tmpl src osw-napoli dst osw-genova proto comp reqid 16390 mode tunnel level use tmpl src 0.0.0.0 dst 0.0.0.0 proto esp reqid 16389 mode transport Here are the others routing tables: # ip r sh table main cisco-genova dev eth0 scope link dmz-genova/28 dev eth1 proto kernel scope link src osw-genova 172.23.0.0/23 dev eth2 proto kernel scope link src 172.23.1.8 127.0.0.0/8 dev lo scope link default via cisco-genova dev eth0 metric 1 # ip r sh table local broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1 broadcast dmz-genova dev eth0 proto kernel scope link src osw-genova broadcast dmz-genova dev eth1 proto kernel scope link src osw-genova broadcast broadcast-genova dev eth0 proto kernel scope link src osw-genova broadcast broadcast-genova dev eth1 proto kernel scope link src osw-genova local osw-genova dev eth0 proto kernel scope host src osw-genova local osw-genova dev eth1 proto kernel scope host src osw-genova broadcast 172.23.0.0 dev eth2 proto kernel scope link src 172.23.1.8 broadcast 172.23.1.255 dev eth2 proto kernel scope link src 172.23.1.8 local 172.23.1.8 dev eth2 proto kernel scope host src 172.23.1.8 broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1 local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1 local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
David Miller [EMAIL PROTECTED] writes: No IRQ balancing should be done at all for networking device interrupts, with zero exceptions. It destroys performance. Does irqbalanced need to be taught about this? And how about the initial balancing, so that each network card gets assigned to one CPU? /Benny -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Improving performance of bonding driver (eql) using round robin alogrithm
Hi All, The existing algorithm works in eql bonding driver works based on priority of each slaves. The priority has been assigned as speed of the particular line. The current problem is, all the slaves didn't get the chance as best slave for the transmission. Will the round robin algorithm for selecting best slave to transmit the data, improves the performance?. Thanks Jeba -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio_net and SMP guests
On Friday 11 January 2008 02:51:58 Christian Borntraeger wrote: What about the following patch: Looks correct and in fact pretty orthodox. I've folded this in, thanks! Rusty. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][ROSE][AX25] af_ax25: possible circular locking
On Thu, Jan 10, 2008 at 09:22:42PM -0800, David Miller wrote: From: Jarek Poplawski [EMAIL PROTECTED] Date: Sun, 30 Dec 2007 15:13:23 +0100 On Sat, Dec 29, 2007 at 07:14:43PM -0800, David Miller wrote: ... I've removed the warning and made the branch back to 'again' unconditional as I think this is the safest version of the change. I'll push this upstream, thanks for fixing this Jarek. Thanks for checking this and making safer! Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
On Wed, 2008-01-09 at 17:35 +0800, Zhang, Yanmin wrote: The regression is: 1)stoakley with 2 qual-core processors: 11%; 2)Tulsa with 4 dual-core(+hyperThread) processors:13%; I have new update on this issue and also cc to netdev maillist. Thank David Miller for pointing me the netdev maillist. The test command is: #sudo taskset -c 7 ./netserver #sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- -r 1,1 As a matter of fact, 2.6.23 has about 6% regression and 2.6.24-rc's regression is between 16%~11%. I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1, but the bisected kernel wasn't stable and went crazy. I tried both CONFIG_SLUB=y and CONFIG_SLAB=y to make sure SLUB isn't the culprit. The oprofile data of CONFIG_SLAB=y. Top cpu utilizations are: 1) 2.6.22 2067379 9.4888 vmlinux schedule 1873604 8.5994 vmlinux mwait_idle 1568131 7.1974 vmlinux resched_task 1066976 4.8972 vmlinux tcp_v4_rcv 9866414.5285 vmlinux tcp_rcv_established 9795184.4958 vmlinux find_busiest_group 7670693.5207 vmlinux sock_def_readable 7368083.3818 vmlinux tcp_sendmsg 5958892.7350 vmlinux task_rq_lock 5571932.5574 vmlinux tcp_ack 4705702.1598 vmlinux __mod_timer 3922201.8002 vmlinux __alloc_skb 3581061.6436 vmlinux skb_release_data 3133721.4383 vmlinux skb_clone 2) 2.6.24-rc7 2668426 12.4497 vmlinux vmlinux schedule 9556984.4589 vmlinux vmlinux skb_release_data 8363113.9018 vmlinux vmlinux tcp_v4_rcv 7623983.5570 vmlinux vmlinux skb_release_all 7289073.4007 vmlinux vmlinux task_rq_lock 7050373.2894 vmlinux vmlinux __wake_up 6942063.2388 vmlinux vmlinux __mod_timer 6176162.8815 vmlinux vmlinux mwait_idle It looks like tcp in 2.6.22 sends more packets, but frees far less skb than 2.6.24-rc6. tcp_rcv_established in 2.6.22 is highlighted on cpu utilization. I instrumented kernel to capure the function call numbers. 1) 2.6.22 skb_release_data:50148649 tcp_ack: 25062858 tcp_transmit_skb:25063150 tcp_v4_rcv: 25063279 2) 2.6.24-rc6 skb_release_data:21429692 tcp_ack: 10707710 tcp_transmit_skb:10707866 tcp_v4_rcv: 10707959 The data doesn't show that 2.6.22 sends more packets while freeing far less skb than 2.6.24-rc6. The data showed skb_release_data of kernel 2.6.22 is more than double of the one of 2.6.24-rc6. But netperf result just showed about 10% regression. As the packet only has 1 byte, so I suspect 2.6.24-rc6 tries to merge packets after waiting for a latency. 2.6.22 might haven't the wait latency or the latency is very small, so 2.6.22 almost sends the packets immediately. I will check the source codes later. -yanmin -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote: So, IOW: strictly speaking you are right, r can't change here, but I meant r vs. the returned value! Before the patch the returned value couldn't be NULL unless all elements of the list were looped. After this patch it seems possible... Since rcu_derference(r) is always the same as r this patch cannot change the value returned. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
Vince Fuller [EMAIL PROTECTED] writes: from Vince Fuller [EMAIL PROTECTED] This set of diffs modify the 2.6.20 kernel to enable use of the 240/4 (aka class-E) address space as consistent with the Internet Draft draft-fuller-240space-00.txt. Wouldn't it be wise to at least wait for it becoming an RFC first? -Andi -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
handing cloned frames to netif_rx()?
In 802.11n, there is a case where multiple data frames are received aggregated into a single frame (A-MSDU). Currently, we copy each of these frames out into their own skb, but because of the alignment with that etc. I started to think that we could simply pass up a clone of the original skb with start/length adjusted properly so that it windows only the contained packet. The buffer would be shared but the data within the original window (starting with the 802.3 header) could even be written to, it won't be needed again by mac80211 once it's handed off to netif_rx(). The skb will obviously have lots of head- and tailroom but that space would be part of other packets. Is it ok to do this? Will something freak out if we pass a cloned skb to netif_rx()? johannes signature.asc Description: This is a digitally signed message part
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Fri, Jan 11, 2008 at 09:37:42PM +1100, Herbert Xu wrote: On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote: It looks like I'm really too lazy and/or these selfdocumenting features of RCU are a bit overrated: one can never be sure which pointer is really RCU protected without checking a few places?! So, after looking at this rt_cache_get_next() and this patch only, it's looks like the third candidate after seq-private and rtable... Perhaps we could introduce a sparse attribute for it? I hope I won't be cursed by all those forced to additional writing, so I'd only admit that after this patch there should be no problem with identifying RCU protected data properly (maybe only this kind of rcu_dereference() needs some popularization). Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
In article [EMAIL PROTECTED] (at Fri, 11 Jan 2008 12:17:02 +0100), Andi Kleen [EMAIL PROTECTED] says: Vince Fuller [EMAIL PROTECTED] writes: from Vince Fuller [EMAIL PROTECTED] This set of diffs modify the 2.6.20 kernel to enable use of the 240/4 (aka class-E) address space as consistent with the Internet Draft draft-fuller-240space-00.txt. Wouldn't it be wise to at least wait for it becoming an RFC first? I do think so, too. There is no positive consesus on this draft at the intarea meeting in Vancouver, right? We cannot / should not enable that space until we have reached a consensus on it. --yoshfuji -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
doubt in e1000_io_write()
Hi all, i have doubt in e1000_io_write(). void e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value) { outl(value, port); } kernel version: 2.6.12.3 Even hw structure has not been used, why it has been passed into e1000_io_write function? Thanks Jeba -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23+] ingress classify to [nf]mark
On Fri, 2008-11-01 at 15:24 -0200, Dzianis Kahanovich wrote: jamal wrote: tc qdisc add dev XXX ingress tc filter add dev XXX parent : protocol ip prio 5 \ u32 blah bleh \ flowid 1:12 action ipt -j mark --set-mark 13 Yes, I do so. But there are simple: --- if [[ $[TC_INDEX2MARK] == 0 ]] ; then c=${c//action ipt -j MARK --set-mark /flowid :} fi $c --- I didnt quiet understand what you have above. Does your script above read the flowid and sets the MARK to some dynamic value based on flowid? if thats what you are doing - it sounds sensible and much more clever than what is posted. And it doesnt require any kernel patch. Simpliest: --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c @@ -222,6 +222,16 @@ - skb-tc_index = TC_H_MIN(res.classid); + skb-tc_index = TC_H_MIN(mark=res.classid); Just write a metaset action and you can have all sorts of policies on what tc_index, mark etc you want. It is something thats needed in any case. When we did tc_index it made sense then because it was for tc to use some default policy. Enforcing policies in the kernel is not the best thing to do; as an example you want to specify the polciy for mark to be: classid major16|minor. I am sure you have good reasons; however, for the next person who wants to set it it major8|minor for their own good reason, theres conflict. My offer to help you is still open. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
David Miller wrote: You have to be kidding, coming here for help with a nearly 4 year old kernel. I figured it couldn't hurt to ask...if I can't ask the original authors, who else is there? I'd love to work on newer kernels, but we have a commitment to our customers to support multiple releases for a significant amount of time. Chris -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
On Thu, Jan 10, 2008 at 03:51:11PM -0800, Paul E. McKenney wrote: On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote: Eric Dumazet wrote, On 01/09/2008 11:37 AM: ... [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache ... diff --git a/net/ipv4/route.c b/net/ipv4/route.c index d337706..28484f3 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq) break; rcu_read_unlock_bh(); } - return r; + return rcu_dereference(r); } static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r) { - struct rt_cache_iter_state *st = rcu_dereference(seq-private); + struct rt_cache_iter_state *st = seq-private; r = r-u.dst.rt_next; while (!r) { @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r) rcu_read_lock_bh(); r = rt_hash_table[st-bucket].chain; } - return r; + return rcu_dereference(r); } It seems this optimization could've a side effect: if during such a loop updates are done, and r is seen !NULL during while() check, but NULL after rcu_dereference(), the listing/counting could stop too soon. So, IMHO, probably the first version of this patch is more reliable. (Or alternatively additional check is needed before return.) Looks to me like r is a local variable (argument list), so there should not be any possibility of it being changed by some other task, right? It seems words could be stronger than then logic (in some cases)... After forgetting what's dereference usually for, it's all right! Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] spidernet: add missing initialization
On Friday 11 January 2008, Ishizaki Kou wrote: This patch fixes initialization of aneg_count and medium fields in spider_net_card to make spidernet driver correctly sets link status. Signed-off-by: Kou Ishizaki [EMAIL PROTECTED] Hi Ishizaki, Linas has left the company and is no longer doing kernel related stuff, so I suggest, given Jeff is ok with that, that the two of us take over spidernet maintainership. Jens --- Change maintainership for spidernet. Signed-off-by: Jens Osterkamp [EMAIL PROTECTED] Index: linux-2.6/MAINTAINERS === --- linux-2.6.orig/MAINTAINERS 2008-01-11 13:32:04.0 +0100 +++ linux-2.6/MAINTAINERS 2008-01-11 13:41:32.0 +0100 @@ -3613,8 +3613,10 @@ S: Supported SPIDERNET NETWORK DRIVER for CELL -P: Linas Vepstas -M: [EMAIL PROTECTED] +P: Ishizaki Kou +M: [EMAIL PROTECTED] +P: Jens Osterkamp +M: [EMAIL PROTECTED] L: netdev@vger.kernel.org S: Supported IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Simple question about LARTC theory
Hello all. Sorry for offtopic. I subscribe only on [EMAIL PROTECTED] try send to [EMAIL PROTECTED] and get Undelivered Mail Returned to Sender. May i do small offtop? This maillist have many people that known lartc in code and i hope its help for my idea. Thanks. Simple Question Legend [] - qdisc () - class ** - filter [htb 1:0 root] *match X FLOWID 3:5* (1:2 htb)(2:3 htb)(3:5 htb)[sfq 5] (1:6 htb)(6:7 htb)(7:8 htb)[sfq 8] packet go IN - [htb 1:0] - (class 1:2 - GREEN) - (class 2:3 GREEN) - (class 3:5 - GREEN) - [sfq 5] - OUT then i create [prio 3 bound 10:0] *match X flowid 10:2* +(10:1 htb) -- [sfq 101] +(10:2 htb) -- [sfq 102] +(10:3 htb) -- [sfq 103] HOW to add filter to [sfq 5] and [sfq 8] that then packet go out from it its go to [prio 3 bound 10:0] and do filter from it? flowid work if it see begin and end of links... i need like GOTO... if i add to [prio 3 bound 10:0] PARRENT ID - flowid found path, but i need that [prio 3 bound 10:0] must have more 1 parrent... i look to link but if i understand - its work for only for hashtables i look to classid but its go to class 10:X, not to [prio 3 bound 10:0] and not process filter... Or i not understand theory? That i need? I need 3 groups in tc 1-st group get all traffic and do HTB shape (defence from ICMP and UDP shtorm) a) icmp rate 100mbs cell 500mbs b) udp rate 100mbs cell 500mbs c) other rate 300mbs cell 500mbs all prio = 0 to do normal cellrate 2-nd group do prio ( icmp and udp must be first becouse its not have check for transmit) icmp = 1 udp = 2 other = 3 3-th group do speed limit by IP (shape it) ( this part is ready ) i wont that all exits on group 1 go to group 2 filters and all exits on group 2 go to group 3 exists... Thanks. Slavon This message was sent using IMP, the Internet Messaging Program. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
iproute2: removing primary address removes secondaries
Dear list, When I add an address to an interface whose network prefix is the same as that of an address already bound to the interface, the new address becomes a secondary address. As per http://www.policyrouting.org/iproute2.doc.html: secondary --- this address is not used when selecting the default source address for outgoing packets. An IP address becomes secondary if another address within the same prefix (network) already exists. The first address within the prefix is primary and is the tag address for the group of all the secondary addresses. When the primary address is deleted all of the secondaries are purged too. In the following, I want to argue that this is not necessary. I think that removal of a primary address should cause the next address to be promoted to be the default source address and the link-scoped route to be retained. This is basically out of http://bugs.debian.org/429689, the maintainer asked me to turn directly to this list. If I add an address to a device with 'ip add', ip also implicitly adds a link-scoped route according to the netmask. It only does this for primary addresses, so if I add a second address within the same network, the route is not duplicated. Thus, the net effect on the routing table is the same for the following two commands: ip a a 172.16.0.100/12 dev eth0 ip a a 172.16.0.200/12 dev eth0 ip a a 172.16.0.100/12 dev eth0 ip a a 172.16.0.200/32 dev eth0 In the first case, the .200 address becomes a secondary of the .100 address. In the second case, they are both primaries. In both cases, only one /12 link-scoped route will be created. However, in both cases, if I remove the .100 address, the .200 is affected: if it's secondary, it ceases to exist, and if it's primary (i.e. in the /32 case), then the host can no longer use it to communicate to hosts in the same link segment, only to hosts on the other side of the default gateway. I thus question the point of purging secondary addresses. Obviously, only one address can be primary (it is used as source address for packets leaving the machine by the respective route). But if the primary address is removed, the next secondary should be promoted and the route should *not* be deleted. Comments? Cheers, -- martin | http://madduck.net/ | http://two.sentenc.es/ microsoft: for when quality, reliability, and security just aren't that important! spamtraps: [EMAIL PROTECTED] digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/)
Re: e1000 performance issue in 4 simultaneous links
Breno Leitao a écrit : On Thu, 2008-01-10 at 12:52 -0800, Brandeburg, Jesse wrote: Breno Leitao wrote: When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec of transfer rate. If I run 4 netperf against 4 different interfaces, I get around 720 * 10^6 bits/sec. I hope this explanation makes sense, but what it comes down to is that combining hardware round robin balancing with NAPI is a BAD IDEA. In general the behavior of hardware round robin balancing is bad and I'm sure it is causing all sorts of other performance issues that you may not even be aware of. I've made another test removing the ppc IRQ Round Robin scheme, bonded each interface (eth6, eth7, eth16 and eth17) to different CPUs (CPU1, CPU2, CPU3 and CPU4) and I also get around around 720 * 10^6 bits/s in average. Take a look at the interrupt table this time: io-dolphins:~/leitao # cat /proc/interrupts | grep eth[1]*[67] 277: 151362450 13 14 13 14 15 18 XICS Level eth6 278: 12 131348681 19 13 15 10 11 XICS Level eth7 323: 11 18 171348426 18 11 11 13 XICS Level eth16 324: 12 16 11 191402709 13 14 11 XICS Level eth17 I also tried to bound all the 4 interface IRQ to a single CPU (CPU0) using the noirqdistrib boot paramenter, and the performance was a little worse. Rick, The 2 interface test that I showed in my first email, was run in two different NIC. Also, I am running netperf with the following command netperf -H hostname -T 0,8 while netserver is running without any argument at all. Also, running vmstat in parallel shows that there is no bottleneck in the CPU. Take a look: procs ---memory-- ---swap-- -io -system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 2 0 0 6714732 16168 22744000 8 2 203 21 0 1 98 0 0 0 0 0 6715120 16176 22744000 028 16234 505 0 16 83 0 1 0 0 0 6715516 16176 22744000 0 0 16251 518 0 16 83 0 1 1 0 0 6715252 16176 22744000 0 1 16316 497 0 15 84 0 1 0 0 0 6716092 16176 22744000 0 0 16300 520 0 16 83 0 1 0 0 0 6716320 16180 22744000 0 1 16354 486 0 15 84 0 1 If your machine has 8 cpus, then your vmstat output shows a bottleneck :) (100/8 = 12.5), so I guess one of your CPU is full -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PROCFS] [NETNS] issue with /proc/net entries
Eric W. Biederman wrote: Benjamin Thery [EMAIL PROTECTED] writes: Hi Eric, While testing the current network namespace stuff merged in net-2.6.25, I bumped into the following problem with the /proc/net/ entries. It doesn't always display the actual data of the current namespace, but sometime displays data from other namespaces. I bisected the problem to the commit: proc: remove/Fix proc generic d_revalidate 3790ee4bd86396558eedd86faac1052cb782e4e1 The problem: If a process in a particular network namespace changes current directory to /proc/net, then processes in other network namespaces trying to look at /proc/net entries will see data from the first namespace (the one with CWD /proc/net). (See test case below). As you comments in the commit suggest, you seem to be aware of some issues when CONFIG_NET_NS=y. Is it one of these corner cases you identified? Any idea on how we can fix it? Yes. It isn't especially hard. I have most of it in my queue I just need to get the silly patches out of there. Essentially we need to fix the caching of proc_generic entries, So that we can have a proper d_revalidate implementation. To get d_revalidate and the caching correct for /proc/net will take just a bit more work. We need to make /proc/net a symlink to something like /proc/self/net so that we don't get excess revalidates when switching between different processes. Or else we can't properly implement the case you have described. Where being in the directory causes the wrong version of /proc/net to show up. Changing the contents of the dentry for /proc/net should only happen during unshare. Not when we switch between processes or else we get into the d_revalidate leaks mount points problem again. We also need the check to see if something is mounted on top of us before we call drop the dentry. But if we don't even try until we know the dentry is invalid it should not be too bad. Thanks for all the details. I'll put this issue on my netns current limitations list until it's solved. Benjamin Eric -- B e n j a m i n T h e r y - BULL/DT/Open Software RD http://www.bull.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23+] ingress classify to [nf]mark
Patrick McHardy wrote: --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c @@ -161,2 +161,5 @@ skb-tc_index = TC_H_MIN(res.classid); +#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK +skb-mark = (skb-mark(res.classid16))|TC_H_MIN(res.classid); +#endif default: Behaviour like this shouldn't depend on compile-time options. Also I want to move it outside of NET_CLS_ACT dependence, but unsure in behaviour understanding without NET_CLS_ACT. But there are reduse code. -- WBR, Denis Kaganovich, [EMAIL PROTECTED] http://mahatma.bspu.unibel.by -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] spidernet: add missing initialization
Hi, On 11/01/2008, Jens Osterkamp [EMAIL PROTECTED] wrote: Hi Ishizaki, Linas has left the company and is no longer doing kernel related stuff, so I suggest, given Jeff is ok with that, that the two of us take over spidernet maintainership. Jens --- Change maintainership for spidernet. Signed-off-by: Jens Osterkamp [EMAIL PROTECTED] Fine with me ... Acked-by: Linas Vepstas [EMAIL PROTECTED] Index: linux-2.6/MAINTAINERS === --- linux-2.6.orig/MAINTAINERS 2008-01-11 13:32:04.0 +0100 +++ linux-2.6/MAINTAINERS 2008-01-11 13:41:32.0 +0100 @@ -3613,8 +3613,10 @@ S: Supported SPIDERNET NETWORK DRIVER for CELL -P: Linas Vepstas -M: [EMAIL PROTECTED] +P: Ishizaki Kou +M: [EMAIL PROTECTED] +P: Jens Osterkamp +M: [EMAIL PROTECTED] L: netdev@vger.kernel.org S: Supported -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23+] ingress classify to [nf]mark
jamal wrote: To classid x:y = mark=markx|y (classid :y = -j MARK --set-mark y, etc). --- linux-2.6.23-gentoo-r2/net/sched/Kconfig +++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig @@ -222,6 +222,16 @@ [..] skb-tc_index = TC_H_MIN(res.classid); +#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK + skb-mark = (skb-mark(res.classid16))|TC_H_MIN(res.classid); +#endif default: Please either use ipt action and netfilter fwmarker for this activity or Sorry. There are only unsuccessful attempt to popularize my working solution. Really I just use #define tc_index mark (in skbuff.h or sch_ingress.c) or something like this: --- linux-2.6.23-gentoo-r2/net/sched/Kconfig +++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig @@ -222,6 +222,16 @@ To compile this code as a module, choose M here: the module will be called sch_ingress. +config NET_SCH_INGRESS_TC2MARK + bool ingress tc_index - mark + depends on NET_SCH_INGRESS NET_CLS_ACT + ---help--- + This enables access to mark value via tc_index alias + in ingress and unify this values (usage example: set flowid :2 + in ingress and use it value as mark in any way - netfilter, etc). + + But tc_index may be undefined - use flowid :0. + comment Classification config NET_CLS --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c @@ -18,6 +18,9 @@ #include net/netlink.h #include net/pkt_sched.h +#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK +#define tc_index mark +#endif #undef DEBUG_INGRESS create a new action. If you choose the later (example because you want to dynamically compute the mark), look at net/sched/act_simple.c to start from and i can help you if you have any questions. If you want to use ipt action, the syntax would be something like: --- tc qdisc add dev XXX ingress tc filter add dev XXX parent : protocol ip prio 5 \ u32 blah bleh \ flowid 1:12 action ipt -j mark --set-mark 13 Yes, I do so. But there are simple: --- if [[ $[TC_INDEX2MARK] == 0 ]] ; then c=${c//action ipt -j MARK --set-mark /flowid :} fi $c --- Simpliest: --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c @@ -222,6 +222,16 @@ - skb-tc_index = TC_H_MIN(res.classid); + skb-tc_index = TC_H_MIN(mark=res.classid); -- WBR, Denis Kaganovich, [EMAIL PROTECTED] http://mahatma.bspu.unibel.by -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Pull request for 'ipg-fixes' branch
[EMAIL PROTECTED] [EMAIL PROTECTED] : [...] I notice that the vendor-supplied driver doesn't have these bugs. The M in POMS stands for my. [...] Would you be interested in some cleanup patches ? Yes. In particular, I think I can get rid of tx-lock entirely, or at least take it off the fast path. All it's protecting is the write to sp-tx_current, and a few judicious memory barriers can deal with that. I have done a kind of memory barrier trick for the r8169 in the past but it is not clear that I would do it again. Today I would argue more strongly in direction of similar locking amongst different drivers. The tg3 driver is a good model imho. Anyway you have been here for some time so I see no reason to kill any different/new locking scheme you could come with. Off until sunday. -- Ueimor -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
Maybe good idea to use sysstat ? http://perso.wanadoo.fr/sebastien.godard/ For example: visp-1 ~ # mpstat -P ALL 1 Linux 2.6.24-rc7-devel (visp-1) 01/11/08 19:27:57 CPU %user %nice%sys %iowait%irq %soft %steal %idleintr/s 19:27:58 all0.000.000.000.000.002.510.00 97.49 7707.00 19:27:58 00.000.000.000.000.004.000.00 96.00 1926.00 19:27:58 10.000.000.000.000.001.010.00 98.99 1926.00 19:27:58 20.000.000.000.000.005.000.00 95.00 1927.00 19:27:58 30.000.000.000.000.000.990.00 99.01 1927.00 19:27:58 40.000.000.000.000.000.000.00 0.00 0.00 When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec of transfer rate. If I run 4 netperf against 4 different interfaces, I get around 720 * 10^6 bits/sec. I hope this explanation makes sense, but what it comes down to is that combining hardware round robin balancing with NAPI is a BAD IDEA. In general the behavior of hardware round robin balancing is bad and I'm sure it is causing all sorts of other performance issues that you may not even be aware of. I've made another test removing the ppc IRQ Round Robin scheme, bonded each interface (eth6, eth7, eth16 and eth17) to different CPUs (CPU1, CPU2, CPU3 and CPU4) and I also get around around 720 * 10^6 bits/s in average. Take a look at the interrupt table this time: io-dolphins:~/leitao # cat /proc/interrupts | grep eth[1]*[67] 277: 151362450 13 14 13 14 15 18 XICS Level eth6 278: 12 131348681 19 13 15 10 11 XICS Level eth7 323: 11 18 171348426 18 11 11 13 XICS Level eth16 324: 12 16 11 191402709 13 14 11 XICS Level eth17 I also tried to bound all the 4 interface IRQ to a single CPU (CPU0) using the noirqdistrib boot paramenter, and the performance was a little worse. Rick, The 2 interface test that I showed in my first email, was run in two different NIC. Also, I am running netperf with the following command netperf -H hostname -T 0,8 while netserver is running without any argument at all. Also, running vmstat in parallel shows that there is no bottleneck in the CPU. Take a look: procs ---memory-- ---swap-- -io -system-- - cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 2 0 0 6714732 16168 22744000 8 2 203 21 0 1 98 0 0 0 0 0 6715120 16176 22744000 028 16234 505 0 16 83 0 1 0 0 0 6715516 16176 22744000 0 0 16251 518 0 16 83 0 1 1 0 0 6715252 16176 22744000 0 1 16316 497 0 15 84 0 1 0 0 0 6716092 16176 22744000 0 0 16300 520 0 16 83 0 1 0 0 0 6716320 16180 22744000 0 1 16354 486 0 15 84 0 1 If your machine has 8 cpus, then your vmstat output shows a bottleneck :) (100/8 = 12.5), so I guess one of your CPU is full -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
On Fri, 2008-01-11 at 17:48 +0100, Eric Dumazet wrote: Breno Leitao a écrit : Take a look at the interrupt table this time: io-dolphins:~/leitao # cat /proc/interrupts | grep eth[1]*[67] 277: 151362450 13 14 13 14 15 18 XICS Level eth6 278: 12 131348681 19 13 15 10 11 XICS Level eth7 323: 11 18 171348426 18 11 11 13 XICS Level eth16 324: 12 16 11 191402709 13 14 11 XICS Level eth17 If your machine has 8 cpus, then your vmstat output shows a bottleneck :) (100/8 = 12.5), so I guess one of your CPU is full Well, if I run top while running the test, I see this load distributed among the CPUs, mainly those that had a NIC IRC bonded. Take a look: Tasks: 133 total, 2 running, 130 sleeping, 0 stopped, 1 zombie Cpu0 : 0.3%us, 19.5%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 6.6%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 75.1%id, 0.0%wa, 0.7%hi, 24.3%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.7%hi, 26.2%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.7%hi, 23.3%si, 0.0%st Cpu4 : 0.0%us, 0.3%sy, 0.0%ni, 70.4%id, 0.7%wa, 0.3%hi, 28.2%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Note that this average scenario doesn't change during the entire benchmarking test. Thanks! -- Breno Leitao [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
why does promote_secondaries default to off? (was: iproute2: removing primary address removes secondaries)
also sprach Daniel Lezcano [EMAIL PROTECTED] [2008.01.11.1813 +0100]: There is a tweak in /proc/sys which activate secondaries promotion when a primary is deleted. /proc/sys/net/ipv4/conf/all/promote_secondaries I think it changes the behavior to the one you wish. Totally. That would have been the last place I had looked. Thank you! Do you have any idea why this isn't on by default? -- martin | http://madduck.net/ | http://two.sentenc.es/ i never go without my dinner. no one ever does, except vegetarians and people like that. -- oscar wilde spamtraps: [EMAIL PROTECTED] digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/)
Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
On Fri, Jan 11, 2008 at 12:17:02PM +0100, Andi Kleen wrote: Vince Fuller [EMAIL PROTECTED] writes: from Vince Fuller [EMAIL PROTECTED] This set of diffs modify the 2.6.20 kernel to enable use of the 240/4 (aka class-E) address space as consistent with the Internet Draft draft-fuller-240space-00.txt. Wouldn't it be wise to at least wait for it becoming an RFC first? There is reasonable consensus on making use of 240/4; some applications, such as ISAKMP and automatic ipv6-to-IPv4 tunneling, still need to determine if they should treat the space as public or private but that shouldn't affect whether kernel support is added. Solaris recently added support for 240/4 and OSX already has it. I thought the Linux kernel developers might appreciate having patches to do likewise. I leave it up to you, the developers, to decide if you want to use these patches. --Vince -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
Hello Denys, I've installed sysstat (good tools!) and the result is very similar to the one which appears at top, take a look: 13:34:23 CPU %user %nice%sys %iowait%irq %soft %steal %idleintr/s 13:34:24 all0.000.002.720.000.25 12.130.99 83.91 16267.33 13:34:24 00.000.00 21.780.000.000.007.92 70.30 40.59 13:34:24 10.000.000.000.000.99 24.750.00 74.26 4025.74 13:34:24 20.000.000.000.000.99 24.750.00 74.26 4036.63 13:34:24 30.000.000.000.000.99 21.780.00 77.23 4032.67 13:34:24 40.000.000.000.000.98 24.510.00 74.51 4034.65 13:34:24 50.000.000.000.000.000.000.00 100.00 30.69 13:34:24 60.000.000.000.000.000.000.00 100.00 33.66 13:34:24 70.000.000.000.000.000.000.00 100.00 32.67 So, we can assure that the IRQs are not being balanced, and that there isn't any processor overload. Thanks! On Fri, 2008-01-11 at 19:36 +0200, Denys Fedoryshchenko wrote: Maybe good idea to use sysstat ? http://perso.wanadoo.fr/sebastien.godard/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: doubt in e1000_io_write()
Jeba Anandhan wrote: Hi all, i have doubt in e1000_io_write(). void e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value) { outl(value, port); } kernel version: 2.6.12.3 Even hw structure has not been used, why it has been passed into e1000_io_write function? 2.6.12.3? why do you care? that code is probably long gone... was that function even used? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23+] ingress classify to [nf]mark
jamal wrote: Yes, I do so. But there are simple: --- if [[ $[TC_INDEX2MARK] == 0 ]] ; then ==1 c=${c//action ipt -j MARK --set-mark /flowid :} c=${c//action ipt -j MARK --set-mark 0x/flowid :} fi $c --- I didnt quiet understand what you have above. Does your script above read the flowid and sets the MARK to some dynamic value based on flowid? if thats what you are doing - it sounds sensible and much more clever than what is posted. And it doesnt require any kernel patch. I suggest just to use classid to toggle mark/nfmark in ingress. I see, classid are near unused in ingress (no classes, etc) and for many solutions classid in ingress filters may be used only for nfmarking. Also I suggest to use both parts (major minor) of classid - major may be and value, minor - or. In current place it may be useful only for (if, unsure) overriting netfilter raw table marks, but if it will be moved outside current CLS_ACT block - tc filter rules may operate mark bits more useful. About script example: While I compose filter, I check flag ($TC_INDEX2MARK), tells me are patch applied or no. If no - I use usual -j MARK --set-mark, else I use classid to change mark. All in ingress only. For example: tc filter add dev eth0 parent : protocol ip u32 ... action ipt -j MARK 0x10 are cname to: tc filter add dev eth0 parent : protocol ip u32 ... flowid :10 - it use less code/modules and, in many cases, may be single/main goal to ingress usage - pre-marking packets. Simpliest: --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c @@ -222,6 +222,16 @@ - skb-tc_index = TC_H_MIN(res.classid); + skb-tc_index = TC_H_MIN(mark=res.classid); Just write a metaset action and you can have all sorts of policies on what tc_index, mark etc you want. It is something thats needed in any case. When we did tc_index it made sense then because it was for tc to use some default policy. Enforcing policies in the kernel is not the best thing to do; as an example you want to specify the polciy for mark to be: classid major16|minor. I am sure you have good reasons; however, for the next person who wants to set it it major8|minor for their own good reason, theres conflict. My offer to help you is still open. OK, I understand there are not too transparent for future usage, but I see too few applications for ingress/classid will conflicting with. Thanx, I will try to understand metaset actions, but I think it will be not so elegant for my usage then my #define tc_index mark in the beginning of sch_ingress.c. Or may be I will use and/or behaviour, but now #define tc_index mark works on my router many month (I may use also -j MARK - with one flag in my script, but there are lot of unuseful code). This code (ingress/classifying[/CLS_ACT]) are executing everywhen and I suggest changes from none (changing target variable from tc_index to mark) to few and/or atomic operations for useful functionality. With mark=res.classid only (I may use self, but not suggest to kernel) it even less code then default (no TC_H_MIN) and fully satisfy to many goals (traffic marking without netfilter, but compatible with it). -- WBR, Denis Kaganovich, [EMAIL PROTECTED] http://mahatma.bspu.unibel.by -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2: removing primary address removes secondaries
martin f krafft wrote: Dear list, When I add an address to an interface whose network prefix is the same as that of an address already bound to the interface, the new address becomes a secondary address. As per http://www.policyrouting.org/iproute2.doc.html: secondary --- this address is not used when selecting the default source address for outgoing packets. An IP address becomes secondary if another address within the same prefix (network) already exists. The first address within the prefix is primary and is the tag address for the group of all the secondary addresses. When the primary address is deleted all of the secondaries are purged too. In the following, I want to argue that this is not necessary. I think that removal of a primary address should cause the next address to be promoted to be the default source address and the link-scoped route to be retained. This is basically out of http://bugs.debian.org/429689, the maintainer asked me to turn directly to this list. If I add an address to a device with 'ip add', ip also implicitly adds a link-scoped route according to the netmask. It only does this for primary addresses, so if I add a second address within the same network, the route is not duplicated. Thus, the net effect on the routing table is the same for the following two commands: ip a a 172.16.0.100/12 dev eth0 ip a a 172.16.0.200/12 dev eth0 ip a a 172.16.0.100/12 dev eth0 ip a a 172.16.0.200/32 dev eth0 In the first case, the .200 address becomes a secondary of the .100 address. In the second case, they are both primaries. In both cases, only one /12 link-scoped route will be created. However, in both cases, if I remove the .100 address, the .200 is affected: if it's secondary, it ceases to exist, and if it's primary (i.e. in the /32 case), then the host can no longer use it to communicate to hosts in the same link segment, only to hosts on the other side of the default gateway. I thus question the point of purging secondary addresses. Obviously, only one address can be primary (it is used as source address for packets leaving the machine by the respective route). But if the primary address is removed, the next secondary should be promoted and the route should *not* be deleted. Comments? Cheers, There is a tweak in /proc/sys which activate secondaries promotion when a primary is deleted. /proc/sys/net/ipv4/conf/all/promote_secondaries I think it changes the behavior to the one you wish. Regards -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bonding : Monitoring of 4965 wireless card
[EMAIL PROTECTED] wrote: Hi, I want to make a bond with my wireless card. The ipw driver create two interfaces (wlan0 and wmaster0). When i switch the rf_kill button, ifplug detect wlan0 unplugged but not wmaster0. If i down wlan0 (while rf_kil ), bonding detect the inactivity when i up the interface. Have you some idea where is the problem? the driver or the miimon of the module? my module parameters mode=1 miimon=100 primary eth0 miimon isn't meaningful for wmaster0. I suggest you use arp monitoring instead. See bonding.txt for details. -- Chris -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module
http://bugzilla.kernel.org/show_bug.cgi?id=9721 On Friday, 11 of January 2008, supersud501 wrote: Stephen Hemminger wrote: On Wed, 9 Jan 2008 16:03:00 -0800 Andrew Morton [EMAIL PROTECTED] wrote: (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 9 Jan 2008 13:05:34 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9721 Summary: wake on lan fails with sky2 module Product: ACPI Version: 2.5 KernelVersion: 2.6.24-rc7 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Power-Sleep-Wake AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] This post-2.6.23 regression was assigned to ACPI but is quite possibly a net driver problem? Latest working kernel version: 2.6.23.12 Earliest failing kernel version: 2.6.24-rc6 (not tested earlier kernel, 2.6.24-rc7 still failing) Distribution: Ubuntu 8.04 (but Kernel build from Kernel.org and system modifiet to make wake on lan work, i.e. network cards are not shutted down on poweroff) Hardware Environment: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) onboard Asus P5W DH motherboard, uses module SKY2 Software Environment: Problem Description: When enabling wake on lan with: 'ethtool -s eth0 wol' i get the following status: 21:56:29 ~ # sudo ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pg Wake-on: g wol enabled Current message level: 0x00ff (255) Link detected: yes but after shutting down the pc doesn't wake up when magic packet is sent. the status lights of the network card are still on (so the card seems to be online). same system with only changed kernel to 2.6.23.12 and same procedure like above: wake on lan works. Steps to reproduce: enable wol on your network card using SKY2 module and it doesn't work too? if you need more information, just tell me, it's my first bug report. regards Wake from power off works on 2.6.24-rc7 for me. Wake from suspend doesn't because Network Manager, HAL, or some other user space tool gets confused. I just rechecked it with Fujitsu Lifebook, which has sky2 (88E8055). There many variations of this chip, and it maybe chip specific problem or ACPI/BIOS issues. If you don't enable Wake on Lan in BIOS, the driver can't do it for you. Also, check how you are shutting down. Also since the device has to restart the PHY, it could be a switch issue if you have some fancy pants switch doing intrusion detection or something, but I doubt that. Is it a clean or fast shutdown, most distributions mark network devices as down on shutdown, but if the distribution does something stupid like remove the driver module, then the driver is unable to setup Wake On Lan. The wake on lan setup is done in one place in the driver, add a printk to see if it is ever called. I tried ACPI wakeup with /proc/acpi/alarm (like i described in my last mail) and it worked... so ACPI wakeup seems to work. i'll try to do the printk-thing when i find some time to mess around with the sources (maybe tomorrow). if someone has some brief instructions (maybe a link to a helpfull site for kernel debugging) for me i would be thankfull and could provide some more info faster. some steps for me to identify the source of the problem (is it really sky2?) would be really helpfull... Please do the tests requested at: http://bugzilla.kernel.org/show_bug.cgi?id=9721#c2, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: doubt in e1000_io_write()
Hello Auke, On Fri, 2008-01-11 at 10:41 -0800, Kok, Auke wrote: Even hw structure has not been used, why it has been passed into e1000_io_write function? 2.6.12.3? why do you care? that code is probably long gone... was that function even used? I noticed that this also happens on upstream netdev-2.6 branch. Moreover the function e1000_write_reg_io() from e1000_hw.c is the only function that calls e1000_io_write(). I write a small patch that fixes it. diff -uNp e1000.old/e1000_hw.c e1000/e1000_hw.c --- e1000.old/e1000_hw.c2008-01-11 14:14:36.0 -0500 +++ e1000/e1000_hw.c2008-01-11 14:13:36.0 -0500 @@ -6654,8 +6654,8 @@ e1000_write_reg_io(struct e1000_hw *hw, unsigned long io_addr = hw-io_base; unsigned long io_data = hw-io_base + 4; -e1000_io_write(hw, io_addr, offset); -e1000_io_write(hw, io_data, value); +e1000_io_write(io_addr, offset); +e1000_io_write(io_data, value); } /** diff -uNp e1000.old/e1000_hw.h e1000/e1000_hw.h --- e1000.old/e1000_hw.h2008-01-11 14:13:00.0 -0500 +++ e1000/e1000_hw.h2008-01-11 14:15:47.0 -0500 @@ -427,7 +427,7 @@ int32_t e1000_read_pcie_cap_reg(struct e void e1000_pcix_set_mmrbc(struct e1000_hw *hw, int mmrbc); int e1000_pcix_get_mmrbc(struct e1000_hw *hw); /* Port I/O is only supported on 82544 and newer */ -void e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value); +void e1000_io_write(unsigned long port, uint32_t value); int32_t e1000_disable_pciex_master(struct e1000_hw *hw); int32_t e1000_check_phy_reset_block(struct e1000_hw *hw); diff -uNp e1000.old/e1000_main.c e1000/e1000_main.c --- e1000.old/e1000_main.c 2008-01-11 14:14:36.0 -0500 +++ e1000/e1000_main.c 2008-01-11 14:13:23.0 -0500 @@ -4919,7 +4919,7 @@ e1000_read_pcie_cap_reg(struct e1000_hw } void -e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value) +e1000_io_write(unsigned long port, uint32_t value) { outl(value, port); } Signed-off-by: Breno Leitao [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ibm_newemac: Increase number of default rx-/tx-buffers
On Sat, Jan 05, 2008 at 01:38:17PM +0100, Stefan Roese wrote: On Saturday 05 January 2008, Benjamin Herrenschmidt wrote: On Sat, 2008-01-05 at 10:50 +0100, Stefan Roese wrote: Performance tests done by AMCC have shown that 256 buffer increase the performance of the Linux EMAC driver. So let's update the default values to match this setup. Signed-off-by: Stefan Roese [EMAIL PROTECTED] --- Do we have the numbers ? Did they also measure latency ? I hoped this question would not come. ;) No, unfortunately I don't have any numbers. Just the recommendation from AMCC to always use 256 buffers. This cannot be true for all chips. Default numbers I selected weren't random. In particular, 256 for Tx doesn't make a lot of sense for 405. You just gonna waste memory. I'd be quite reluctant to follow such advices from AMCC without actual details. -- Eugene -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
The test command is: #sudo taskset -c 7 ./netserver #sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- -r 1,1 A couple of comments/questions on the command lines: *) netperf/netserver support CPU affinity within themselves with the global -T option to netperf. Is the result with taskset much different? The equivalent to the above would be to run netperf with: ./netperf -T 0,7 ... The one possibly salient difference between the two is that when done within netperf, the initial process creation will take place wherever the scheduler wants it. *) The -i option to set the confidence iteration count will silently cap the max at 30. happy benchmarking, rick jones -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
TCP hints
Hi Stephen, Do you still remember what this is for (got added along with other TCP hint stuff)? What kind of problem you saw back then (or who saw problems)? @@ -1605,6 +1711,10 @@ static void tcp_undo_cwr(struct sock *sk, const int undo) } tcp_moderate_cwnd(tp); tp-snd_cwnd_stamp = tcp_time_stamp; + + /* There is something screwy going on with the retrans hints after + an undo */ + clear_all_retrans_hints(tp); } static inline int tcp_may_undo(struct tcp_sock *tp) -- i. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why does promote_secondaries default to off?
also sprach Daniel Lezcano [EMAIL PROTECTED] [2008.01.11.1833 +0100]: This tweak is recent (2.6.16 as far as I remember), so I suppose the reason is to not puzzled people with a changed default behavior. Your instant and helpful responses are most appreciated! -- martin | http://madduck.net/ | http://two.sentenc.es/ a common mistake that people make when trying to design something completely foolproof was to underestimate the ingenuity of complete fools. -- douglas adams, mostly harmless spamtraps: [EMAIL PROTECTED] digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/)
[PATCH] dm9161: add configuration for MII/RMII
diff --git a/drivers/net/phy/davicom.c b/drivers/net/phy/davicom.c index 7ed632d..6bdc32f 100644 --- a/drivers/net/phy/davicom.c +++ b/drivers/net/phy/davicom.c @@ -37,6 +37,7 @@ #define MII_DM9161_SCR 0x10 #define MII_DM9161_SCR_INIT0x0610 +#define MII_DM9161_SCR_RMII0x0100 /* DM9161 Interrupt Register */ #define MII_DM9161_INTR0x15 @@ -103,7 +104,7 @@ static int dm9161_config_aneg(struct phy_device *phydev) static int dm9161_config_init(struct phy_device *phydev) { - int err; + int err, temp; /* Isolate the PHY */ err = phy_write(phydev, MII_BMCR, BMCR_ISOLATE); @@ -111,8 +112,19 @@ static int dm9161_config_init(struct phy_device *phydev) if (err 0) return err; - /* Do not bypass the scrambler/descrambler */ - err = phy_write(phydev, MII_DM9161_SCR, MII_DM9161_SCR_INIT); + /* Do not bypass the scrambler/descrambler , configure MII Mode */ + switch (phydev-interface) { + case PHY_INTERFACE_MODE_MII: + temp = MII_DM9161_SCR_INIT; + break; + case PHY_INTERFACE_MODE_RMII: + temp = MII_DM9161_SCR_INIT | MII_DM9161_SCR_RMII; + break; + default: + return -EINVAL; + } + + err = phy_write(phydev, MII_DM9161_SCR, temp); if (err 0) return err; Signed-off-by: Frederic RODO [EMAIL PROTECTED] - Les informations précédentes peuvent être confidentielles ou privilégiées. Si vous n'êtes pas le destinataire prévu de ce mail, veuillez en notifier l'expéditeur en répondant à ce message puis supprimez-en toute trace de vos systèmes. TIL Technologies Parc du Golf, Bat 43 350 rue J.R Guilibert Gautier de la Lauzière 13856 AIX EN PROVENCE Tel. : +33 4 42 37 11 77 - -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: why does promote_secondaries default to off?
martin f krafft wrote: also sprach Daniel Lezcano [EMAIL PROTECTED] [2008.01.11.1813 +0100]: There is a tweak in /proc/sys which activate secondaries promotion when a primary is deleted. /proc/sys/net/ipv4/conf/all/promote_secondaries I think it changes the behavior to the one you wish. Totally. That would have been the last place I had looked. Thank you! Do you have any idea why this isn't on by default? This tweak is recent (2.6.16 as far as I remember), so I suppose the reason is to not puzzled people with a changed default behavior. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: e1000 performance issue in 4 simultaneous links
On Thu, 2008-01-10 at 12:52 -0800, Brandeburg, Jesse wrote: Breno Leitao wrote: When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec of transfer rate. If I run 4 netperf against 4 different interfaces, I get around 720 * 10^6 bits/sec. I hope this explanation makes sense, but what it comes down to is that combining hardware round robin balancing with NAPI is a BAD IDEA. In general the behavior of hardware round robin balancing is bad and I'm sure it is causing all sorts of other performance issues that you may not even be aware of. I've made another test removing the ppc IRQ Round Robin scheme, bonded each interface (eth6, eth7, eth16 and eth17) to different CPUs (CPU1, CPU2, CPU3 and CPU4) and I also get around around 720 * 10^6 bits/s in average. Take a look at the interrupt table this time: io-dolphins:~/leitao # cat /proc/interrupts | grep eth[1]*[67] 277: 151362450 13 14 13 14 15 18 XICS Level eth6 278: 12 131348681 19 13 15 10 11 XICS Level eth7 323: 11 18 171348426 18 11 11 13 XICS Level eth16 324: 12 16 11 191402709 13 14 11 XICS Level eth17 I also tried to bound all the 4 interface IRQ to a single CPU (CPU0) using the noirqdistrib boot paramenter, and the performance was a little worse. Rick, The 2 interface test that I showed in my first email, was run in two different NIC. Also, I am running netperf with the following command netperf -H hostname -T 0,8 while netserver is running without any argument at all. Also, running vmstat in parallel shows that there is no bottleneck in the CPU. Take a look: procs ---memory-- ---swap-- -io -system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 2 0 0 6714732 16168 22744000 8 2 203 21 0 1 98 0 0 0 0 0 6715120 16176 22744000 028 16234 505 0 16 83 0 1 0 0 0 6715516 16176 22744000 0 0 16251 518 0 16 83 0 1 1 0 0 6715252 16176 22744000 0 1 16316 497 0 15 84 0 1 0 0 0 6716092 16176 22744000 0 0 16300 520 0 16 83 0 1 0 0 0 6716320 16180 22744000 0 1 16354 486 0 15 84 0 1 Thanks! -- Breno Leitao [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC: TIPC API, bind to two networks
I'd like some feedback on a change to TIPC that I plan to submit to netdev/kernel.org. At this stage, I'm interested in what people think about using the protocol parameter of the socket interface to select a TIPC stack for the socket. My co-worker, Chris Friesen, has suggested that it would be more conventional to extend the TIPC sockaddr to select the appropriate network in calls to sendto() or bind(). I prefer socket(,,protocol) approach. Some background: We use TIPC in an ATCA chassis (Advanced Telecommunications Computing Architecture). An ATCA chassis may have two networks commonly called base and data (outside of the ATCA world base is called control). Users want to be able to create a TIPC socket that is bound to one network OR the other. The intention is that these networks should be isolated as much as possible. The attached patch accomplishes this by using the protocol parameter of the socket() syscall. A user can specify the TIPC stack to which the socket should be attached as follows: int socket(int domain, int type, int protocol); fd = socket(AF_TIPC, SOCK_SEQPACKET, 1); In the unpatched TIPC code the protocol was required to be zero. This requires that the user know the network topology or that the system designer provide an API (get_base_netid(), get_data_netid()). The patch is for tipc-1.5.12 which you can get from tipc.sf.net. We're way back on linux-2.6.14 - gotta love the embedded world! In terms of implementation, the basic idea of the patch is to introduce a layer in the TIPC code around socket creation and tipc netlink messages. This layer lets TIPC stack code register callback functions and then dispatches socket() and netlink calls to the appropriate TIPC stack. For example usage, please see: http://sourceforge.net/mailarchive/message.php?msg_name=476839F3.8070203%40nortel.com One note for those who might not read the link above I create two modules: tipc.ko and tipcstack.ko these are ~98% identical with certain bits of functionality, like registering AF_TIPC, disabled. This means that we have the same bits of code loaded twice but that's a feature not a bug! It means that the control and data networks are even more independent so you could update one but not the other or you could use system tap on one but not the other. diffstat: Makefile| 15 +++- include/net/tipc/tipc.h | 11 ++- net/tipc/core.c | 157 +++- net/tipc/core.h | 25 +++ net/tipc/handler.c | 17 +++-- net/tipc/netlink.c | 21 +- net/tipc/socket.c | 55 net/tipc/vtipc.c| 154 +++ net/tipc/vtipc.h| 48 ++ tools/tipc-config.c | 29 ++-- Even though I *don't* want the attached patch to be integrated into the kernel (yet), I'm still going to include: Signed-off-by: Randy MacLeod ([EMAIL PROTECTED]) because it's taken such a long time to get Nortel to bless official kernel participation! // Randy diff -Naur tipc-1.5.12_orig/include/net/tipc/tipc.h tipc-1.5.12_gmt/include/net/tipc/tipc.h --- tipc-1.5.12_orig/include/net/tipc/tipc.h2005-12-15 00:48:48.0 +0530 +++ tipc-1.5.12_gmt/include/net/tipc/tipc.h 2007-12-17 21:41:19.0 +0530 @@ -71,7 +71,6 @@ __u32 lower; __u32 upper; }; - static inline __u32 tipc_addr(unsigned int zone, unsigned int cluster, unsigned int node) @@ -213,13 +212,21 @@ /* * TIPC-specific socket option values */ - #define SOL_TIPC 50 /* TIPC socket option level */ + #define TIPC_IMPORTANCE127 /* Default: TIPC_LOW_IMPORTANCE */ #define TIPC_SRC_DROPPABLE 128 /* Default: 0 (resend congested msg) */ #define TIPC_DEST_DROPPABLE129 /* Default: based on socket type */ #define TIPC_CONN_TIMEOUT 130 /* Default: 8000 (ms) */ +#define TIPC_STACK_0 0 /* Default TIPC stack */ +#define TIPC_STACK_1 1 /* 1st TIPC stack */ +#define TIPC_STACK_2 2 /* 2nd TIPC stack */ +#define TIPC_STACK_3 3 /* 3rd TIPC stack */ +#define TIPC_STACK_4 4 /* 4th TIPC stack */ +#define TIPC_STACK_5 5 /* 5th TIPC stack */ +#define TIPC_STACK_6 6 /* 6th TIPC stack */ +#define TIPC_STACK_7 7 /* 7th TIPC stack */ #ifdef __KERNEL__ diff -Naur tipc-1.5.12_orig/Makefile tipc-1.5.12_gmt/Makefile --- tipc-1.5.12_orig/Makefile 2005-06-23 00:10:12.0 +0530 +++ tipc-1.5.12_gmt/Makefile2007-12-18 01:41:20.0 +0530 @@ -3,8 +3,6 @@ # SHELL = /bin/bash - - ifdef KERNELDIR KINCLUDE = ${KERNELDIR}/include @@ -22,8 +20,19 @@ -DCONFIG_TIPC_DEBUG obj-m += tipc.o +obj-m += tipcstack.o + +tipc-objs += net/tipc/addr.o net/tipc/bcast.o
Re: [PATCH] ibm_newemac: Increase number of default rx-/tx-buffers
On Fri, 2008-01-11 at 09:48 -0800, Eugene Surovegin wrote: On Sat, Jan 05, 2008 at 01:38:17PM +0100, Stefan Roese wrote: On Saturday 05 January 2008, Benjamin Herrenschmidt wrote: On Sat, 2008-01-05 at 10:50 +0100, Stefan Roese wrote: Performance tests done by AMCC have shown that 256 buffer increase the performance of the Linux EMAC driver. So let's update the default values to match this setup. Signed-off-by: Stefan Roese [EMAIL PROTECTED] --- Do we have the numbers ? Did they also measure latency ? I hoped this question would not come. ;) No, unfortunately I don't have any numbers. Just the recommendation from AMCC to always use 256 buffers. This cannot be true for all chips. Default numbers I selected weren't random. In particular, 256 for Tx doesn't make a lot of sense for 405. You just gonna waste memory. I'd be quite reluctant to follow such advices from AMCC without actual details. I think we can make defaults based on other config options nowadays. Not very nice but we could do things like default 128 if PPC_40x default 256 Or even more detailed. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
Breno Leitao wrote: On Fri, 2008-01-11 at 17:48 +0100, Eric Dumazet wrote: Breno Leitao a écrit : Take a look at the interrupt table this time: io-dolphins:~/leitao # cat /proc/interrupts | grep eth[1]*[67] 277: 151362450 13 14 13 14 15 18 XICS Level eth6 278: 12 131348681 19 13 15 10 11 XICS Level eth7 323: 11 18 171348426 18 11 11 13 XICS Level eth16 324: 12 16 11 191402709 13 14 11 XICS Level eth17 If your machine has 8 cpus, then your vmstat output shows a bottleneck :) (100/8 = 12.5), so I guess one of your CPU is full Well, if I run top while running the test, I see this load distributed among the CPUs, mainly those that had a NIC IRC bonded. Take a look: Tasks: 133 total, 2 running, 130 sleeping, 0 stopped, 1 zombie Cpu0 : 0.3%us, 19.5%sy, 0.0%ni, 73.5%id, 0.0%wa, 0.0%hi, 0.0%si, 6.6%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni, 75.1%id, 0.0%wa, 0.7%hi, 24.3%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.7%hi, 26.2%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.7%hi, 23.3%si, 0.0%st Cpu4 : 0.0%us, 0.3%sy, 0.0%ni, 70.4%id, 0.7%wa, 0.3%hi, 28.2%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st If you have IRQ's bound to CPUs 1-4, and have four netperfs running, given that the stack ostensibly tries to have applications run on the same CPUs, what is running on CPU0? Is it related to: The 2 interface test that I showed in my first email, was run in two different NIC. Also, I am running netperf with the following command netperf -H hostname -T 0,8 while netserver is running without any argument at all. Also, running vmstat in parallel shows that there is no bottleneck in the CPU. Take a look: Unless you have a morbid curiousity :) there isn't much point in binding all the netperf's to CPU 0 when the interrupts for the NICs servicing their connections are on CPUs 1-4. I also assume then that the system(s) on which netserver is running have 8 CPUs in them? (There are multiple destination systems yes?) Does anything change if you explicitly bind each netperf to the CPU on which the interrups for its connection are processed? Or for that matter if you remove the -T command entirely Does UDP_STREAM show different performance than TCP_STREAM (I'm ass-u-me-ing based on the above we are looking at the netperf side of a TCP_STREAM test above, please correct if otherwise). Are the CPUs above single-core CPUs or multi-core CPUs, and if multi-core are caches shared? How are CPUs numbered if multi-core on that system? Is there any hardware threading involved? I'm wondering if there may be some wrinkles in the system that might lead to reported CPU utilization being low even if a chip is otherwise saturated. Might need some HW counters to check that... Can you describe the I/O subsystem more completely? I understand that you are using at most two ports of a pair of quad-port cards at any one time, but am still curious to know if those two cards are on separate busses, or if they share any bus/link on the way to memory. rick jones -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module
Stephen Hemminger wrote: On Wed, 9 Jan 2008 16:03:00 -0800 Andrew Morton [EMAIL PROTECTED] wrote: (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 9 Jan 2008 13:05:34 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9721 Summary: wake on lan fails with sky2 module Product: ACPI Version: 2.5 KernelVersion: 2.6.24-rc7 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Power-Sleep-Wake AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] This post-2.6.23 regression was assigned to ACPI but is quite possibly a net driver problem? Latest working kernel version: 2.6.23.12 Earliest failing kernel version: 2.6.24-rc6 (not tested earlier kernel, 2.6.24-rc7 still failing) Distribution: Ubuntu 8.04 (but Kernel build from Kernel.org and system modifiet to make wake on lan work, i.e. network cards are not shutted down on poweroff) Hardware Environment: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) onboard Asus P5W DH motherboard, uses module SKY2 Software Environment: Problem Description: When enabling wake on lan with: 'ethtool -s eth0 wol' i get the following status: 21:56:29 ~ # sudo ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pg Wake-on: g wol enabled Current message level: 0x00ff (255) Link detected: yes but after shutting down the pc doesn't wake up when magic packet is sent. the status lights of the network card are still on (so the card seems to be online). same system with only changed kernel to 2.6.23.12 and same procedure like above: wake on lan works. Steps to reproduce: enable wol on your network card using SKY2 module and it doesn't work too? if you need more information, just tell me, it's my first bug report. regards Wake from power off works on 2.6.24-rc7 for me. Wake from suspend doesn't because Network Manager, HAL, or some other user space tool gets confused. I just rechecked it with Fujitsu Lifebook, which has sky2 (88E8055). There many variations of this chip, and it maybe chip specific problem or ACPI/BIOS issues. If you don't enable Wake on Lan in BIOS, the driver can't do it for you. Also, check how you are shutting down. Also since the device has to restart the PHY, it could be a switch issue if you have some fancy pants switch doing intrusion detection or something, but I doubt that. Is it a clean or fast shutdown, most distributions mark network devices as down on shutdown, but if the distribution does something stupid like remove the driver module, then the driver is unable to setup Wake On Lan. The wake on lan setup is done in one place in the driver, add a printk to see if it is ever called. I tried ACPI wakeup with /proc/acpi/alarm (like i described in my last mail) and it worked... so ACPI wakeup seems to work. i'll try to do the printk-thing when i find some time to mess around with the sources (maybe tomorrow). if someone has some brief instructions (maybe a link to a helpfull site for kernel debugging) for me i would be thankfull and could provide some more info faster. some steps for me to identify the source of the problem (is it really sky2?) would be really helpfull... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [ROSE] two extra tab characters removed
Signed-off-by: Bernard Pidoux [EMAIL PROTECTED] --- include/net/rose.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/rose.h b/include/net/rose.h index d3ab453..0cfdc0e 100644 --- a/include/net/rose.h +++ b/include/net/rose.h @@ -86,7 +86,7 @@ struct rose_neigh { ax25_addresscallsign; ax25_digi *digipeat; ax25_cb *ax25; - struct net_device *dev; + struct net_device *dev; unsigned short count; unsigned short use; unsigned intnumber; @@ -124,7 +124,7 @@ struct rose_sock { ax25_addresssource_digis[ROSE_MAX_DIGIS]; ax25_addressdest_digis[ROSE_MAX_DIGIS]; struct rose_neigh *neighbour; - struct net_device *device; + struct net_device *device; unsigned intlci, rand; unsigned char state, condition, qbitincl, defer; unsigned char cause, diagnostic; -- 1.5.3.7 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module
Rafael J. Wysocki wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9721 On Friday, 11 of January 2008, supersud501 wrote: Stephen Hemminger wrote: On Wed, 9 Jan 2008 16:03:00 -0800 Andrew Morton [EMAIL PROTECTED] wrote: (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 9 Jan 2008 13:05:34 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9721 Summary: wake on lan fails with sky2 module Product: ACPI Version: 2.5 KernelVersion: 2.6.24-rc7 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Power-Sleep-Wake AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] This post-2.6.23 regression was assigned to ACPI but is quite possibly a net driver problem? Latest working kernel version: 2.6.23.12 Earliest failing kernel version: 2.6.24-rc6 (not tested earlier kernel, 2.6.24-rc7 still failing) Distribution: Ubuntu 8.04 (but Kernel build from Kernel.org and system modifiet to make wake on lan work, i.e. network cards are not shutted down on poweroff) Hardware Environment: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) onboard Asus P5W DH motherboard, uses module SKY2 Software Environment: Problem Description: When enabling wake on lan with: 'ethtool -s eth0 wol' i get the following status: 21:56:29 ~ # sudo ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pg Wake-on: g wol enabled Current message level: 0x00ff (255) Link detected: yes but after shutting down the pc doesn't wake up when magic packet is sent. the status lights of the network card are still on (so the card seems to be online). same system with only changed kernel to 2.6.23.12 and same procedure like above: wake on lan works. Steps to reproduce: enable wol on your network card using SKY2 module and it doesn't work too? if you need more information, just tell me, it's my first bug report. regards Wake from power off works on 2.6.24-rc7 for me. Wake from suspend doesn't because Network Manager, HAL, or some other user space tool gets confused. I just rechecked it with Fujitsu Lifebook, which has sky2 (88E8055). There many variations of this chip, and it maybe chip specific problem or ACPI/BIOS issues. If you don't enable Wake on Lan in BIOS, the driver can't do it for you. Also, check how you are shutting down. Also since the device has to restart the PHY, it could be a switch issue if you have some fancy pants switch doing intrusion detection or something, but I doubt that. Is it a clean or fast shutdown, most distributions mark network devices as down on shutdown, but if the distribution does something stupid like remove the driver module, then the driver is unable to setup Wake On Lan. The wake on lan setup is done in one place in the driver, add a printk to see if it is ever called. I tried ACPI wakeup with /proc/acpi/alarm (like i described in my last mail) and it worked... so ACPI wakeup seems to work. i'll try to do the printk-thing when i find some time to mess around with the sources (maybe tomorrow). if someone has some brief instructions (maybe a link to a helpfull site for kernel debugging) for me i would be thankfull and could provide some more info faster. some steps for me to identify the source of the problem (is it really sky2?) would be really helpfull... Please do the tests requested at: http://bugzilla.kernel.org/show_bug.cgi?id=9721#c2, thanks. allright, didn't see that before, sorry, here are the results: kernel 2.6.23.12 acpi=off: when shutting down the system doesn't poweroff (of course), but pressing the powerbutton does the trick. and wake on lan: WORKS kernel 2.6.24-rc7 acpi=off: computer doesn't power off, either (so acpi=off works), but wol still DOESN'T work :( so no acpi-problem? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: handing cloned frames to netif_rx()?
On Jan 11, 2008 3:17 AM, Johannes Berg [EMAIL PROTECTED] wrote: In 802.11n, there is a case where multiple data frames are received aggregated into a single frame (A-MSDU). Currently, we copy each of these frames out into their own skb, but because of the alignment with that etc. I started to think that we could simply pass up a clone of the original skb with start/length adjusted properly so that it windows only the contained packet. The buffer would be shared but the data within the original window (starting with the 802.3 header) could even be written to, it won't be needed again by mac80211 once it's handed off to netif_rx(). The skb will obviously have lots of head- and tailroom but that space would be part of other packets. Is it ok to do this? Will something freak out if we pass a cloned skb to netif_rx()? This would be great even in regular case. 4965 has ability to deliver more frames per receiving buffer Because of A-MSDU we keeps 8K receiving buffers which are underutilized when A-MSDU is not used. johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[oops] 2.6.23.9
Linux version 2.6.23.9 vanilla Jan 7 02:24:11 eunite BUG: scheduling while atomic: mii-tool/0x0002/31658 Jan 7 02:24:11 eunite [c02fd920] schedule+0x27a/0x35e Jan 7 02:24:11 eunite [c011dbd7] __mod_timer+0xac/0xbc Jan 7 02:24:11 eunite [c02fe2bb] schedule_timeout+0x43/0x9f Jan 7 02:24:11 eunite [c011dd92] process_timeout+0x0/0x5 Jan 7 02:24:11 eunite [c011dd5e] msleep+0xf/0x14 Jan 7 02:24:11 eunite [c02242a9] e1000_reset_hw+0x57/0x353 Jan 7 02:24:11 eunite [c0219cca] e1000_reset+0xa6/0x2ed Jan 7 02:24:11 eunite [c0219f8d] e1000_down+0x7c/0xb1 Jan 7 02:24:11 eunite [c021a253] e1000_reinit_locked+0x37/0x76 Jan 7 02:24:11 eunite [c021a3be] e1000_ioctl+0x12c/0x280 Jan 7 02:24:11 eunite [c021a292] e1000_ioctl+0x0/0x280 Jan 7 02:24:11 eunite [c029e861] dev_ifsioc+0x2dc/0x30d Jan 7 02:24:11 eunite [c029f408] dev_ioctl+0x1f8/0x32f Jan 7 02:24:11 eunite [c02948a5] sock_ioctl+0x41/0x15f Jan 7 02:24:11 eunite [c0294864] sock_ioctl+0x0/0x15f Jan 7 02:24:11 eunite [c015d6cf] do_ioctl+0x1f/0x6d Jan 7 02:24:11 eunite [c015d76d] vfs_ioctl+0x50/0x26e Jan 7 02:24:11 eunite [c01f3e9a] tty_write+0x0/0x1b2 Jan 7 02:24:11 eunite [c015d9bf] sys_ioctl+0x34/0x51 Jan 7 02:24:11 eunite [c010268e] sysenter_past_esp+0x5f/0x85 Jan 7 02:24:11 eunite [c02f] fn_trie_insert+0x525/0x7f6 Jan 7 02:24:11 eunite === Jan 7 02:24:55 eunite BUG: scheduling while atomic: mii-tool/0x0002/31668 Jan 7 02:24:55 eunite [c02fd920] schedule+0x27a/0x35e Jan 7 02:24:55 eunite [c02fe2bb] schedule_timeout+0x43/0x9f Jan 7 02:24:55 eunite [c01bc290] __delay+0x6/0x7 Jan 7 02:24:55 eunite [c021fa67] e1000_write_phy_reg_ex+0x45/0x8f Jan 7 02:24:55 eunite [c011dd92] process_timeout+0x0/0x5 Jan 7 02:24:55 eunite [c011dd5e] msleep+0xf/0x14 Jan 7 02:24:55 eunite [c0220ade] e1000_phy_init_script+0x96/0x206 Jan 7 02:24:55 eunite [c0222097] e1000_phy_reset+0x57/0xa2 Jan 7 02:24:55 eunite [c0222348] e1000_setup_copper_link+0x266/0x12bc Jan 7 02:24:55 eunite [c0220f89] e1000_read_eeprom+0x8c/0x2f2 Jan 7 02:24:55 eunite [c0223f97] e1000_setup_link+0x37c/0x4e5 Jan 7 02:24:55 eunite [c022489f] e1000_init_hw+0x2fa/0xb68 Jan 7 02:24:55 eunite [c0118911] do_wait+0x7d4/0xc46 Jan 7 02:24:55 eunite [c022452a] e1000_reset_hw+0x2d8/0x353 Jan 7 02:24:55 eunite [c0219cea] e1000_reset+0xc6/0x2ed Jan 7 02:24:55 eunite [c0219f8d] e1000_down+0x7c/0xb1 Jan 7 02:24:55 eunite [c021a253] e1000_reinit_locked+0x37/0x76 Jan 7 02:24:55 eunite [c021a3be] e1000_ioctl+0x12c/0x280 Jan 7 02:24:55 eunite [c021a292] e1000_ioctl+0x0/0x280 Jan 7 02:24:55 eunite [c029e861] dev_ifsioc+0x2dc/0x30d Jan 7 02:24:55 eunite [c029f408] dev_ioctl+0x1f8/0x32f Jan 7 02:24:55 eunite [c02948a5] sock_ioctl+0x41/0x15f Jan 7 02:24:55 eunite [c0294864] sock_ioctl+0x0/0x15f Jan 7 02:24:55 eunite [c015d6cf] do_ioctl+0x1f/0x6d Jan 7 02:24:55 eunite [c015d76d] vfs_ioctl+0x50/0x26e Jan 7 02:24:55 eunite [c01f3e9a] tty_write+0x0/0x1b2 Jan 7 02:24:55 eunite [c015d9bf] sys_ioctl+0x34/0x51 Jan 7 02:24:55 eunite [c010268e] sysenter_past_esp+0x5f/0x85 an 7 02:24:55 eunite BUG: scheduling while atomic: mii-tool/0x0002/31668 Jan 7 02:24:55 eunite [c02fd920] schedule+0x27a/0x35e Jan 7 02:24:55 eunite [c02fe2bb] schedule_timeout+0x43/0x9f Jan 7 02:24:55 eunite [c011dd92] process_timeout+0x0/0x5 Jan 7 02:24:55 eunite [c011dd5e] msleep+0xf/0x14 Jan 7 02:24:55 eunite [c022235c] e1000_setup_copper_link+0x27a/0x12bc Jan 7 02:24:55 eunite [c0220f89] e1000_read_eeprom+0x8c/0x2f2 Jan 7 02:24:55 eunite [c0223f97] e1000_setup_link+0x37c/0x4e5 Jan 7 02:24:55 eunite [c022489f] e1000_init_hw+0x2fa/0xb68 Jan 7 02:24:55 eunite [c0118911] do_wait+0x7d4/0xc46 Jan 7 02:24:55 eunite [c022452a] e1000_reset_hw+0x2d8/0x353 Jan 7 02:24:55 eunite [c0219cea] e1000_reset+0xc6/0x2ed Jan 7 02:24:55 eunite [c0219f8d] e1000_down+0x7c/0xb1 Jan 7 02:24:55 eunite [c021a253] e1000_reinit_locked+0x37/0x76 Jan 7 02:24:55 eunite [c021a3be] e1000_ioctl+0x12c/0x280 Jan 7 02:24:55 eunite [c021a292] e1000_ioctl+0x0/0x280 Jan 7 02:24:55 eunite [c029e861] dev_ifsioc+0x2dc/0x30d Jan 7 02:24:55 eunite [c029f408] dev_ioctl+0x1f8/0x32f Jan 7 02:24:55 eunite [c02948a5] sock_ioctl+0x41/0x15f Jan 7 02:24:55 eunite [c0294864] sock_ioctl+0x0/0x15f Jan 7 02:24:55 eunite [c015d6cf] do_ioctl+0x1f/0x6d Jan 7 02:24:55 eunite [c015d76d] vfs_ioctl+0x50/0x26e Jan 7 02:24:55 eunite [c01f3e9a] tty_write+0x0/0x1b2 Jan 7 02:24:55 eunite [c015d9bf] sys_ioctl+0x34/0x51 Jan 7 02:24:55 eunite [c010268e] sysenter_past_esp+0x5f/0x85 Jan 7 02:24:55 eunite === -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: handing cloned frames to netif_rx()?
Johannes Berg [EMAIL PROTECTED] wrote: Is it ok to do this? Will something freak out if we pass a cloned skb to netif_rx()? Sounds OK as long as you stick to the rules of cloned skb's, e.g., not writing to them unless you've copied it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: handing cloned frames to netif_rx()?
On Sat, 2008-01-12 at 09:31 +1100, Herbert Xu wrote: Johannes Berg [EMAIL PROTECTED] wrote: Is it ok to do this? Will something freak out if we pass a cloned skb to netif_rx()? Sounds OK as long as you stick to the rules of cloned skb's, e.g., not writing to them unless you've copied it. Ok. Yes, we will of course adhere to that, but I was wondering whether maybe the net stack assumes somewhere that a packet it got from the driver can be written to w/o copying. johannes signature.asc Description: This is a digitally signed message part
Re: handing cloned frames to netif_rx()?
On Fri, Jan 11, 2008 at 11:58:05PM +0100, Johannes Berg wrote: Ok. Yes, we will of course adhere to that, but I was wondering whether maybe the net stack assumes somewhere that a packet it got from the driver can be written to w/o copying. All parts of the rx stack support clone handling because they can always run after another handler (e.g., AF_PACKET) which may have cloned the packet. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rp_filter and ip rule break ipsec policy
Marco Berizzi [EMAIL PROTECTED] wrote: When I insert the rule number #601 packets to x.y.z.214 aren't ate by xfrm anymore. This happens when rp_filter is set to 1 on eth0. Disabling rp_filter on eth0 resolve the problem: xfrm eat the packets. Is this the expected behaviour? Why should Absolutely. While on local output, IPsec lookup does override routing lookup (however there we do the route lookup first and use that as the key for the IPsec lookup). On forwarding this is not the case. We decapsulate and check policy first (if encrypted), and then do a route lookup, at which point rp_filter can eat your packet, and only after that do we perform the output IPsec lookup. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
Chris Friesen [EMAIL PROTECTED] wrote: I'd love to work on newer kernels, but we have a commitment to our customers to support multiple releases for a significant amount of time. Since you've made the commitment, you should stick to it and resolve the issues without asking us to contribute. After all we haven't made that commitment to you or your customers. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: handing cloned frames to netif_rx()?
On Sat, 2008-01-12 at 10:01 +1100, Herbert Xu wrote: On Fri, Jan 11, 2008 at 11:58:05PM +0100, Johannes Berg wrote: Ok. Yes, we will of course adhere to that, but I was wondering whether maybe the net stack assumes somewhere that a packet it got from the driver can be written to w/o copying. All parts of the rx stack support clone handling because they can always run after another handler (e.g., AF_PACKET) which may have cloned the packet. Great, thanks for confirming, we'll do that then. johannes signature.asc Description: This is a digitally signed message part
Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module
On Friday, 11 of January 2008, supersud501 wrote: Rafael J. Wysocki wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9721 allright, didn't see that before, sorry, here are the results: kernel 2.6.23.12 acpi=off: when shutting down the system doesn't poweroff (of course), but pressing the powerbutton does the trick. and wake on lan: WORKS kernel 2.6.24-rc7 acpi=off: computer doesn't power off, either (so acpi=off works), but wol still DOESN'T work :( so no acpi-problem? No, I don't think it's an ACPI problem. Since it seems to be 100% reproducible, it would be very helpful if you could use git-bisect to identify the offending commit. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
From: Benny Amorsen [EMAIL PROTECTED] Date: Fri, 11 Jan 2008 12:09:32 +0100 David Miller [EMAIL PROTECTED] writes: No IRQ balancing should be done at all for networking device interrupts, with zero exceptions. It destroys performance. Does irqbalanced need to be taught about this? The userland one already does. It's only the in-kernel IRQ load balancing for these (presumably powerpc) platforms that is broken. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2: removing primary address removes secondaries
echo 1 /proc/sys/net/ipv4/conf/all/promote_secondaries -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] Date: Fri, 11 Jan 2008 21:41:20 +0900 (JST) There is no positive consesus on this draft at the intarea meeting in Vancouver, right? We cannot / should not enable that space until we have reached a consensus on it. This is so incredibly incorrect. There is consensus on making network stacks able to use this address space. And that is all that the patch does. The consensus is only missing on whether to make the address space public or private. This is also clearly spelled out in the draft. It is important to get as large of a head start on this as possible because of how long it takes to deploy something like this. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
From: Vince Fuller [EMAIL PROTECTED] Date: Fri, 11 Jan 2008 09:29:15 -0800 I leave it up to you, the developers, to decide if you want to use these patches. Vince, please just ignore these turkeys who are dismissing your patch and respin it against current sources as I asked of you. I'll apply it, immediately, because it is the only correct course of action. Thanks a lot. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen [EMAIL PROTECTED] Date: Fri, 11 Jan 2008 08:59:26 -0600 I'd love to work on newer kernels, but we have a commitment to our customers to support multiple releases for a significant amount of time. And by asking here for people to dig into it for you, you are asking people for free help providing that support. That's why there is such negative backlash to asking questions about such ancient kernel here, you're asking us to do your work, for free. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23+] ingress classify to [nf]mark
On Fri, 2008-11-01 at 18:42 -0200, Dzianis Kahanovich wrote: About script example: While I compose filter, I check flag ($TC_INDEX2MARK), tells me are patch applied or no. If no - I use usual -j MARK --set-mark, else I use classid to change mark. All in ingress only. For example: tc filter add dev eth0 parent : protocol ip u32 ... action ipt -j MARK 0x10 are cname to: tc filter add dev eth0 parent : protocol ip u32 ... flowid :10 I thought you were doing something like this (to achieve your policy): -- major=1 minor=12 mark=`expr $major + $minor` # tc qdisc add dev XXX ingress tc filter add dev XXX parent : protocol ip prio 5 \ u32 blah bleh \ flowid $major:$minor action \ ipt -j mark --set-mark $mark --- - it use less code/modules and, in many cases, may be single/main goal to ingress usage - pre-marking packets. That is true and you would also have one less line in your policy; as an example in above the line ipt -j mark --set-mark $mark would be unnecessary; however, all the other lines in the policy setting _will be necessary_. And this + the fact there are many other values/shapes the default policy could take is essentially whats bothering me. In any case, scanning the current code it seems mark is no longer considered a netfilter-only metadatum - so it may not be semantically as obscene as i felt earlier; Can you pick something simpler for policy? example set the mark to whatever tc_index gets set? If you still could write the metadata action, we could use it to override mark, tc_index etc in addition. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
In article [EMAIL PROTECTED] (at Fri, 11 Jan 2008 17:48:57 -0800 (PST)), David Miller [EMAIL PROTECTED] says: From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] Date: Fri, 11 Jan 2008 21:41:20 +0900 (JST) There is no positive consesus on this draft at the intarea meeting in Vancouver, right? We cannot / should not enable that space until we have reached a consensus on it. This is so incredibly incorrect. There is consensus on making network stacks able to use this address space. And that is all that the patch does. No, we did never make consensus on it. The consensus is only missing on whether to make the address space public or private. This is also clearly spelled out in the draft. It is important to get as large of a head start on this as possible because of how long it takes to deploy something like this. Okay, though I am afraid this space will not be used widely, we should be ready for it. I'll make some more comments on the patch itself from another point view. --yoshfuji -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 performance issue in 4 simultaneous links
Sorry. that i interfere in this subject. Do you recommend CONFIG_IRQBALANCE to be enabled? If it is enabled - irq's not jumping nonstop over processors. softirqd changing this behavior. If it is disabled, irq's distributed over each processor, and in loaded systems it seems harmful. I work a little yesterday with server with CONFIG_IRQBALANCE=no, 160kpps load. It was packetloss-ing, till i set smp_affinity. Maybe it is useful to put more info in Kconfig, since it is very important for performance option. On Fri, 11 Jan 2008 17:41:09 -0800 (PST), David Miller wrote From: Benny Amorsen benny [EMAIL PROTECTED] Date: Fri, 11 Jan 2008 12:09:32 0100 David Miller [EMAIL PROTECTED] writes: No IRQ balancing should be done at all for networking device interrupts, with zero exceptions. It destroys performance. Does irqbalanced need to be taught about this? The userland one already does. It's only the in-kernel IRQ load balancing for these (presumably powerpc) platforms that is broken. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Denys Fedoryshchenko Technical Manager Virtual ISP S.A.L. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: questions on NAPI processing latency and dropped network packets
On Jan 10, 2008 9:24 AM, Chris Friesen [EMAIL PROTECTED] wrote: After a recent userspace app change, we've started seeing packets being dropped by the ethernet hardware (e1000, NAPI is enabled). The error/dropped/fifo counts are going up in ethtool: (These are perhaps too obvious, but I didn't see the questions or answers in the thread.) Can you reproduce it with a simple userspace cpu hog? (Two, really, one per cpu.) Can you reproduce it with the newer e1000? Can you reproduce it with git head? If the answer to the first one is yes, the last no, then bisect until you get a kernel that doesn't show the problem. Backport the fix, unless the fix happens to be CFS. However, I suspect that your userpace app is just starving the system from time to time. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
Hello. In article [EMAIL PROTECTED] (at Mon, 7 Jan 2008 17:10:57 -0800), Vince Fuller [EMAIL PROTECTED] says: #define IN_MULTICAST_NET 0xF000 +#define IN_CLASSE(a) long int) (a)) 0xf000) == 0xf000) +#define IN_CLASSE_NET 0xff00 +#define IN_CLASSE_NSHIFT8 +#define IN_CLASSE_HOST (0x ~IN_CLASSE_NET) + +/* + * these are no longer used #define IN_EXPERIMENTAL(a) long int) (a)) 0xf000) == 0xf000) #define IN_BADCLASS(a) IN_EXPERIMENTAL((a)) +*/ Please do not remove this, but have these instead: #define IN_EXPERIMENTAL(a) IN_CLASSE((a)) #define IN_BADCASS(a) IN_CLASSE((a)) And, I think it is good to remove BADCLASS() (inside #ifdef __KERNEL__ .. #endif) because we do not have its users any longer, right? Regards, --yoshfuji -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Netconf at conf.au 2008?
Hello, I saw somewhere (maybe in this mailing list a while ago) that there might be a Linux Kernel Developers' Netconf conference at conf.au 2008. Does anyone here know if such a thing is planned ? Regards, Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] move size information to pr_debug()
The size of structures is a debug thing, not something that needs to be part of a /proc api. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/fib_trie.c 2008-01-11 22:29:20.0 -0800 +++ b/net/ipv4/fib_trie.c 2008-01-11 22:30:28.0 -0800 @@ -1962,8 +1962,11 @@ struct fib_table *fib_hash_init(u32 id) t = (struct trie *) tb-tb_data; memset(t, 0, sizeof(*t)); - if (id == RT_TABLE_LOCAL) + if (id == RT_TABLE_LOCAL) { printk(KERN_INFO IPv4 FIB: Using LC-trie version %s\n, VERSION); + pr_debug(Basic info: size of leaf: %Zd bytes, size of tnode: %Zd bytes.\n, +sizeof(struct leaf), sizeof(struct tnode)); + } return tb; } @@ -2159,9 +2162,6 @@ static int fib_triestat_seq_show(struct if (!stat) return -ENOMEM; - seq_printf(seq, Basic info: size of leaf: %Zd bytes, size of tnode: %Zd bytes.\n, - sizeof(struct leaf), sizeof(struct tnode)); - if (trie_local) { seq_printf(seq, Local:\n); trie_collect_stats(trie_local, stat); -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] : fib_insert_node cleanup
The only error from fib_insert_node is if memory allocation fails, so instead of passing by reference, just use the convention of returning NULL. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/fib_trie.c 2008-01-11 22:04:08.0 -0800 +++ b/net/ipv4/fib_trie.c 2008-01-11 22:04:33.0 -0800 @@ -980,8 +980,7 @@ static struct node *trie_rebalance(struc /* only used from updater-side */ -static struct list_head * -fib_insert_node(struct trie *t, int *err, u32 key, int plen) +static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen) { int pos, newpos; struct tnode *tp = NULL, *tn = NULL; @@ -1043,10 +1042,8 @@ fib_insert_node(struct trie *t, int *err li = leaf_info_new(plen); - if (!li) { - *err = -ENOMEM; - goto done; - } + if (!li) + return NULL; fa_head = li-falh; insert_leaf_info(l-list, li); @@ -1054,18 +1051,15 @@ fib_insert_node(struct trie *t, int *err } l = leaf_new(); - if (!l) { - *err = -ENOMEM; - goto done; - } + if (!l) + return NULL; l-key = key; li = leaf_info_new(plen); if (!li) { tnode_free((struct tnode *) l); - *err = -ENOMEM; - goto done; + return NULL; } fa_head = li-falh; @@ -1101,8 +1095,7 @@ fib_insert_node(struct trie *t, int *err if (!tn) { free_leaf_info(li); tnode_free((struct tnode *) l); - *err = -ENOMEM; - goto done; + return NULL; } node_set_parent((struct node *)tn, tp); @@ -1258,10 +1251,11 @@ static int fn_trie_insert(struct fib_tab */ if (!fa_head) { - err = 0; - fa_head = fib_insert_node(t, err, key, plen); - if (err) + fa_head = fib_insert_node(t, key, plen); + if (unlikely(!fa_head)) { + err = -ENOMEM; goto out_free_new_fa; + } } list_add_tail_rcu(new_fa-fa_list, -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/9] FIB patches for 2.6.25
Did some work cleaning up FIB Trie today. The only real change is the output format for /proc/net/fib_triestat. -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] add statistics
The FIB TRIE code has a bunch of statistics, but the code is hidden behind an ifdef that was never implemented. Since it was dead code, it was broken as well. This patch fixes that by making it a config option. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/Kconfig 2008-01-11 22:17:11.0 -0800 +++ b/net/ipv4/Kconfig 2008-01-11 22:31:17.0 -0800 @@ -85,6 +85,13 @@ endchoice config IP_FIB_HASH def_bool ASK_IP_FIB_HASH || !IP_ADVANCED_ROUTER +config IP_FIB_TRIE_STATS + bool FIB TRIE statistics + depends on IP_FIB_TRIE + ---help--- + Keep track of statistics on structure of FIB TRIE table. + Useful for testing and measuring TRIE performance. + config IP_MULTIPLE_TABLES bool IP: policy routing depends on IP_ADVANCED_ROUTER --- a/net/ipv4/fib_trie.c 2008-01-11 22:31:00.0 -0800 +++ b/net/ipv4/fib_trie.c 2008-01-11 22:31:56.0 -0800 @@ -82,7 +82,6 @@ #include net/ip_fib.h #include fib_lookup.h -#undef CONFIG_IP_FIB_TRIE_STATS #define MAX_STAT_DEPTH 32 #define KEYLENGTH (8*sizeof(t_key)) @@ -2119,20 +2118,22 @@ static void trie_show_stats(struct seq_f bytes += sizeof(struct node *) * pointers; seq_printf(seq, Null ptrs: %u\n, stat-nullpointers); seq_printf(seq, Total size: %u kB\n, (bytes + 1023) / 1024); +} #ifdef CONFIG_IP_FIB_TRIE_STATS - seq_printf(seq, Counters:\n-\n); - seq_printf(seq,gets = %d\n, t-stats.gets); - seq_printf(seq,backtracks = %d\n, t-stats.backtrack); - seq_printf(seq,semantic match passed = %d\n, t-stats.semantic_match_passed); - seq_printf(seq,semantic match miss = %d\n, t-stats.semantic_match_miss); - seq_printf(seq,null node hit= %d\n, t-stats.null_node_hit); - seq_printf(seq,skipped node resize = %d\n, t-stats.resize_node_skipped); -#ifdef CLEAR_STATS - memset((t-stats), 0, sizeof(t-stats)); -#endif -#endif /* CONFIG_IP_FIB_TRIE_STATS */ +static void trie_show_usage(struct seq_file *seq, + const struct trie_use_stats *stats) +{ + seq_printf(seq, \nCounters:\n-\n); + seq_printf(seq,gets = %u\n, stats-gets); + seq_printf(seq,backtracks = %u\n, stats-backtrack); + seq_printf(seq,semantic match passed = %u\n, stats-semantic_match_passed); + seq_printf(seq,semantic match miss = %u\n, stats-semantic_match_miss); + seq_printf(seq,null node hit= %u\n, stats-null_node_hit); + seq_printf(seq,skipped node resize = %u\n\n, stats-resize_node_skipped); } +#endif /* CONFIG_IP_FIB_TRIE_STATS */ + static int fib_triestat_seq_show(struct seq_file *seq, void *v) { @@ -2160,12 +2161,18 @@ static int fib_triestat_seq_show(struct seq_printf(seq, Local:\n); trie_collect_stats(trie_local, stat); trie_show_stats(seq, stat); +#ifdef CONFIG_IP_FIB_TRIE_STATS + trie_show_usage(seq, trie_local-stats); +#endif } if (trie_main) { seq_printf(seq, Main:\n); trie_collect_stats(trie_main, stat); trie_show_stats(seq, stat); +#ifdef CONFIG_IP_FIB_TRIE_STATS + trie_show_usage(seq, trie_main-stats); +#endif } kfree(stat); -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] use %u for unsigned printfs
Use %u instead of %d when printing unsigned values. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/fib_trie.c 2008-01-11 22:30:36.0 -0800 +++ b/net/ipv4/fib_trie.c 2008-01-11 22:30:46.0 -0800 @@ -2100,13 +2100,13 @@ static void trie_show_stats(struct seq_f else avdepth = 0; - seq_printf(seq, \tAver depth: %d.%02d\n, avdepth / 100, avdepth % 100 ); + seq_printf(seq, \tAver depth: %u.%02d\n, avdepth / 100, avdepth % 100 ); seq_printf(seq, \tMax depth: %u\n, stat-maxdepth); seq_printf(seq, \tLeaves: %u\n, stat-leaves); bytes = sizeof(struct leaf) * stat-leaves; - seq_printf(seq, \tInternal nodes: %d\n\t, stat-tnodes); + seq_printf(seq, \tInternal nodes: %u\n\t, stat-tnodes); bytes += sizeof(struct tnode) * stat-tnodes; max = MAX_STAT_DEPTH; @@ -2116,15 +2116,15 @@ static void trie_show_stats(struct seq_f pointers = 0; for (i = 1; i = max; i++) if (stat-nodesizes[i] != 0) { - seq_printf(seq, %d: %d, i, stat-nodesizes[i]); + seq_printf(seq, %u: %u, i, stat-nodesizes[i]); pointers += (1i) * stat-nodesizes[i]; } seq_putc(seq, '\n'); - seq_printf(seq, \tPointers: %d\n, pointers); + seq_printf(seq, \tPointers: %u\n, pointers); bytes += sizeof(struct node *) * pointers; - seq_printf(seq, Null ptrs: %d\n, stat-nullpointers); - seq_printf(seq, Total size: %d kB\n, (bytes + 1023) / 1024); + seq_printf(seq, Null ptrs: %u\n, stat-nullpointers); + seq_printf(seq, Total size: %u kB\n, (bytes + 1023) / 1024); #ifdef CONFIG_IP_FIB_TRIE_STATS seq_printf(seq, Counters:\n-\n); @@ -2318,7 +2318,7 @@ static inline const char *rtn_type(unsig if (t __RTN_MAX rtn_type_names[t]) return rtn_type_names[t]; - snprintf(buf, sizeof(buf), type %d, t); + snprintf(buf, sizeof(buf), type %u, t); return buf; } -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] get rid of unused revision element
The revision element must of been part of an earlier design, because currently it is set but never used. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/fib_trie.c 2008-01-11 22:18:34.0 -0800 +++ b/net/ipv4/fib_trie.c 2008-01-11 22:26:34.0 -0800 @@ -153,7 +153,6 @@ struct trie { struct trie_use_stats stats; #endif int size; - unsigned int revision; }; static void put_child(struct trie *t, struct tnode *tn, int i, struct node *n); @@ -1046,7 +1045,7 @@ fib_insert_node(struct trie *t, int *err if (!li) { *err = -ENOMEM; - goto err; + goto done; } fa_head = li-falh; @@ -1058,7 +1057,7 @@ fib_insert_node(struct trie *t, int *err if (!l) { *err = -ENOMEM; - goto err; + goto done; } l-key = key; @@ -1067,7 +1066,7 @@ fib_insert_node(struct trie *t, int *err if (!li) { tnode_free((struct tnode *) l); *err = -ENOMEM; - goto err; + goto done; } fa_head = li-falh; @@ -1104,7 +1103,7 @@ fib_insert_node(struct trie *t, int *err free_leaf_info(li); tnode_free((struct tnode *) l); *err = -ENOMEM; - goto err; + goto done; } node_set_parent((struct node *)tn, tp); @@ -1130,8 +1129,6 @@ fib_insert_node(struct trie *t, int *err rcu_assign_pointer(t-trie, trie_rebalance(t, tp)); done: - t-revision++; -err: return fa_head; } @@ -1543,7 +1540,6 @@ static int trie_leaf_remove(struct trie * Remove the leaf and rebalance the tree */ - t-revision++; t-size--; tp = node_parent(n); @@ -1749,8 +1745,6 @@ static int fn_trie_flush(struct fib_tabl struct leaf *ll = NULL, *l = NULL; int found = 0, h; - t-revision++; - for (h = 0; (l = nextleaf(t, l)) != NULL; h++) { found += trie_flush_leaf(t, l); -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] statistics improvements
Turn the unused size field into a useful counter for the number of routes. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/fib_trie.c 2008-01-11 22:30:28.0 -0800 +++ b/net/ipv4/fib_trie.c 2008-01-11 22:30:36.0 -0800 @@ -149,10 +149,10 @@ struct trie_stat { struct trie { struct node *trie; + unsigned int size; #ifdef CONFIG_IP_FIB_TRIE_STATS struct trie_use_stats stats; #endif - int size; }; static void put_child(struct trie *t, struct tnode *tn, int i, struct node *n); @@ -1052,7 +1052,6 @@ fib_insert_node(struct trie *t, int *err insert_leaf_info(l-list, li); goto done; } - t-size++; l = leaf_new(); if (!l) { @@ -1267,6 +1266,7 @@ static int fn_trie_insert(struct fib_tab list_add_tail_rcu(new_fa-fa_list, (fa ? fa-fa_list : fa_head)); + t-size++; rt_cache_flush(-1); rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb-tb_id, -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] get rid of trie_init
trie_init is worthless it is just zeroing stuff that is already zero! Move the memset() down to make it obvious. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/fib_trie.c 2008-01-11 21:56:47.0 -0800 +++ b/net/ipv4/fib_trie.c 2008-01-11 22:03:47.0 -0800 @@ -876,19 +876,6 @@ nomem: } } -static void trie_init(struct trie *t) -{ - if (!t) - return; - - t-size = 0; - rcu_assign_pointer(t-trie, NULL); - t-revision = 0; -#ifdef CONFIG_IP_FIB_TRIE_STATS - memset(t-stats, 0, sizeof(struct trie_use_stats)); -#endif -} - /* readside must use rcu_read_lock currently dump routines via get_fa_head and dump */ @@ -1977,11 +1964,9 @@ struct fib_table *fib_hash_init(u32 id) tb-tb_flush = fn_trie_flush; tb-tb_select_default = fn_trie_select_default; tb-tb_dump = fn_trie_dump; - memset(tb-tb_data, 0, sizeof(struct trie)); t = (struct trie *) tb-tb_data; - - trie_init(t); + memset(t, 0, sizeof(*t)); if (id == RT_TABLE_LOCAL) printk(KERN_INFO IPv4 FIB: Using LC-trie version %s\n, VERSION); -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] fix sparse warnings
Make FIB TRIE go through sparse checker without warnings. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/net/ipv4/fib_trie.c 2008-01-11 22:35:37.0 -0800 +++ b/net/ipv4/fib_trie.c 2008-01-11 22:41:57.0 -0800 @@ -653,7 +653,6 @@ static struct node *resize(struct trie * static struct tnode *inflate(struct trie *t, struct tnode *tn) { - struct tnode *inode; struct tnode *oldtnode = tn; int olen = tnode_child_length(tn); int i; @@ -701,6 +700,7 @@ static struct tnode *inflate(struct trie } for (i = 0; i olen; i++) { + struct tnode *inode; struct node *node = tnode_get_child(oldtnode, i); struct tnode *left, *right; int size, j; @@ -1037,8 +1037,7 @@ static struct list_head *fib_insert_node /* Case 1: n is a leaf. Compare prefixes */ if (n != NULL IS_LEAF(n) tkey_equals(key, n-key)) { - struct leaf *l = (struct leaf *) n; - + l = (struct leaf *) n; li = leaf_info_new(plen); if (!li) @@ -2231,6 +2230,7 @@ static struct node *fib_trie_get_idx(str } static void *fib_trie_seq_start(struct seq_file *seq, loff_t *pos) + __acquires(RCU) { struct fib_trie_iter *iter = seq-private; struct fib_table *tb; @@ -2273,6 +2273,7 @@ static void *fib_trie_seq_next(struct se } static void fib_trie_seq_stop(struct seq_file *seq, void *v) + __releases(RCU) { rcu_read_unlock(); } -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] printk related cleanups
printk related cleanups: * Get rid of unused printk wrappers. * Make bug checks into KERN_WARNING because KERN_DEBUG gets ignored * Turn one cryptic old message into something real * Make sure all messages have KERN_XXX --- net/ipv4/fib_frontend.c |6 ++ net/ipv4/fib_hash.c |3 ++- net/ipv4/fib_semantics.c |7 +++ 3 files changed, 7 insertions(+), 9 deletions(-) --- a/net/ipv4/fib_frontend.c 2008-01-11 21:56:47.0 -0800 +++ b/net/ipv4/fib_frontend.c 2008-01-11 22:05:09.0 -0800 @@ -47,8 +47,6 @@ #include net/ip_fib.h #include net/rtnetlink.h -#define FFprint(a...) printk(KERN_DEBUG a) - #ifndef CONFIG_IP_MULTIPLE_TABLES static int __net_init fib4_rules_init(struct net *net) @@ -706,7 +704,7 @@ void fib_add_ifaddr(struct in_ifaddr *if if (ifa-ifa_flagsIFA_F_SECONDARY) { prim = inet_ifa_byprefix(in_dev, prefix, mask); if (prim == NULL) { - printk(KERN_DEBUG fib_add_ifaddr: bug: prim == NULL\n); + printk(KERN_WARNING fib_add_ifaddr: bug: prim == NULL\n); return; } } @@ -753,7 +751,7 @@ static void fib_del_ifaddr(struct in_ifa else { prim = inet_ifa_byprefix(in_dev, any, ifa-ifa_mask); if (prim == NULL) { - printk(KERN_DEBUG fib_del_ifaddr: bug: prim == NULL\n); + printk(KERN_WARNING fib_del_ifaddr: bug: prim == NULL\n); return; } } --- a/net/ipv4/fib_hash.c 2008-01-11 21:56:47.0 -0800 +++ b/net/ipv4/fib_hash.c 2008-01-11 22:04:45.0 -0800 @@ -168,7 +168,8 @@ static void fn_rehash_zone(struct fn_zon new_hashmask = (new_divisor - 1); #if RT_CACHE_DEBUG = 2 - printk(fn_rehash_zone: hash for zone %d grows from %d\n, fz-fz_order, old_divisor); + printk(KERN_DEBUG fn_rehash_zone: hash for zone %d grows from %d\n, + fz-fz_order, old_divisor); #endif ht = fz_hash_alloc(new_divisor); --- a/net/ipv4/fib_semantics.c 2008-01-11 21:56:47.0 -0800 +++ b/net/ipv4/fib_semantics.c 2008-01-11 22:04:45.0 -0800 @@ -47,8 +47,6 @@ #include fib_lookup.h -#define FSprintk(a...) - static DEFINE_SPINLOCK(fib_info_lock); static struct hlist_head *fib_info_hash; static struct hlist_head *fib_info_laddrhash; @@ -145,7 +143,7 @@ static const struct void free_fib_info(struct fib_info *fi) { if (fi-fib_dead == 0) { - printk(Freeing alive fib_info %p\n, fi); + printk(KERN_WARNING Freeing alive fib_info %p\n, fi); return; } change_nexthops(fi) { @@ -914,7 +912,8 @@ int fib_semantic_match(struct list_head continue; default: - printk(KERN_DEBUG impossible 102\n); + printk(KERN_WARNING fib_semantic_match bad type %#x\n, + fa-fa_type); return -EINVAL; } } -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ibm_newemac: Increase number of default rx-/tx-buffers
On Friday 11 January 2008, Benjamin Herrenschmidt wrote: On Fri, 2008-01-11 at 09:48 -0800, Eugene Surovegin wrote: On Sat, Jan 05, 2008 at 01:38:17PM +0100, Stefan Roese wrote: On Saturday 05 January 2008, Benjamin Herrenschmidt wrote: On Sat, 2008-01-05 at 10:50 +0100, Stefan Roese wrote: Performance tests done by AMCC have shown that 256 buffer increase the performance of the Linux EMAC driver. So let's update the default values to match this setup. Signed-off-by: Stefan Roese [EMAIL PROTECTED] --- Do we have the numbers ? Did they also measure latency ? I hoped this question would not come. ;) No, unfortunately I don't have any numbers. Just the recommendation from AMCC to always use 256 buffers. This cannot be true for all chips. Default numbers I selected weren't random. In particular, 256 for Tx doesn't make a lot of sense for 405. You just gonna waste memory. This may be the case with the old 405 PPC's. But with the new ones coming out right now, like the up to 666MHz 405EX with GBit support, 256 could be an improvement. I still owe you figures though. Will try to do some testing in a short while. I'd be quite reluctant to follow such advices from AMCC without actual details. I think we can make defaults based on other config options nowadays. Not very nice but we could do things like default 128 if PPC_40x default 256 Or even more detailed. We shouldn't make it too complicated. We can always select different settings in the defconfig file. My thinking here is to better wast a little memory with a potential performance improvement. Just me 0.02$ Best regards, Stefan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ibm_newemac: Increase number of default rx-/tx-buffers
On Sat, 2008-01-12 at 08:26 +0100, Stefan Roese wrote: We shouldn't make it too complicated. We can always select different settings in the defconfig file. My thinking here is to better wast a little memory with a potential performance improvement. Just me 0.02$ If it gets really critical, then we can move those settings to the device-tree. Ben. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html