Re: [klibc] [patch] import socket defines

2008-01-11 Thread H. Peter Anvin

Mike Frysinger wrote:
all this stuff is ABI constants, and the only reason glibc 
doesn't use them is that glibc prefers to use enums over #defines.


a proper libc defines things in their headers according to the POSIX specs 
rather than relying on others to do it for them.  if you want to argue about 
linux-specific ABI pieces being exported, then you probably have a valid 
point, but socket.h is hardly that.


Have you looked at it?!!?  It's full of ABI constants, and that's what I 
care about.  POSIX doesn't define, say, AF_UNIX; that's an ABI specific.


so if the only consumer is klibc and you're against adding these things to it, 
special case it for __KLIBC__.


No, let's split the header so that there are *no* libc knowledge in the 
kernel.  For the kernel to have knowledge about the specifics of any 
particular libc (klibc, glibc, or any other) is stupid, and that's the 
whole reason we're in this spot to begin with.


Again, I don't particularly care about what they're named, but the whole 
point is


#include linux/foo.h

if you want the subset and

#include linux/bar.h

if you want the whole set.

No libc specifics, and no feature test macros, which I think we can both 
agree are uglier than hell.


I thought the naming worked out nicer with linux/sockaddr.h, but I 
*don't really care*.


-hpa
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-11 Thread Jarek Poplawski
On Fri, Jan 11, 2008 at 11:00:20AM +1100, Herbert Xu wrote:
 On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
 
  It seems this optimization could've a side effect: if during such a
  loop updates are done, and r is seen !NULL during while() check, but
  NULL after rcu_dereference(), the listing/counting could stop too
  soon. So, IMHO, probably the first version of this patch is more
  reliable. (Or alternatively additional check is needed before return.)
 
 No, while the value of r-u.dst.rt_next can change between two readings,
 the value of r cannot.

...Then, of course, it's O.K.!

It looks like I'm really too lazy and/or these selfdocumenting features
of RCU are a bit overrated: one can never be sure which pointer is
really RCU protected without checking a few places?! So, after looking
at this rt_cache_get_next() and this patch only, it's looks like the
third candidate after seq-private and rtable...

Thanks for explanation and sorry for disturbing!
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


HTB classify perfomance

2008-01-11 Thread Badalian Vyacheslav

Hello all.
I N days try to tune system for best performance and see strange thing.

Have N htb classes
root class is HTB. param: default 7 (if not classify - go to 1:7)

filters classify only mached ip. others go to HTB DEFAULT rule.

run oprofile:
First pc (htb and iptables compile in kernel):
CPU: P4 / Xeon, speed 3409.94 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not 
stopped) with a unit mask of 0x01 (mandatory) count 10

samples  %app name symbol name
743501   47.6081  vmlinux  htb_classify
208718   13.3647  vmlinux  ipt_do_table
94473 6.0493  vmlinux  u32_classify
43088 2.7590  vmlinux  e1000_intr
35086 2.2466  vmlinux  e1000_clean_tx_irq
34925 2.2363  vmlinux  ip_route_input
33972 2.1753  vmlinux  e1000_irq_enable
33788 2.1635  vmlinux  htb_dequeue
29197 1.8696  vmlinux  e1000_clean_rx_irq
20177 1.2920  vmlinux  sfq_dequeue
17825 1.1414  vmlinux  sfq_enqueue
15135 0.9691  vmlinux  e1000_xmit_frame
15123 0.9684  vmlinux  eth_type_trans
13081 0.8376  vmlinux  kfree
12153 0.7782  vmlinux  dev_queue_xmit

Second PC (htb and iptables is modules)
CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not 
stopped) with a unit mask of 0x01 (mandatory) count 10

samples  %app name symbol name
102108   30.7351  sch_htb  (no symbols)
21559 6.4894  vmlinux  e1000_intr
17428 5.2459  cls_u32  (no symbols)
13887 4.1801  ip_tables(no symbols)
11984 3.6072  sch_sfq  (no symbols)
11785 3.5473  vmlinux  e1000_irq_enable
9684  2.9149  vmlinux  mwait_idle_with_hints
9227  2.7774  vmlinux  e1000_clean_rx_irq
8686  2.6145  vmlinux  e1000_clean_tx_irq
6747  2.0309  vmlinux  ip_route_input
6533  1.9665  vmlinux  irq_entries_start
6419  1.9322  vmlinux  e1000_xmit_frame
5605  1.6871  vmlinux  dev_queue_xmit
4030  1.2131  vmlinux  __kfree_skb
3997  1.2031  vmlinux  __qdisc_run
3931  1.1833  vmlinux  e1000_clean
3565  1.0731  vmlinux  net_rx_action
3518  1.0589  vmlinux  ip_rcv
3377  1.0165  vmlinux  getnstimeofday
3215  0.9677  vmlinux  rb_erase
2973  0.8949  vmlinux  eth_type_trans
2707  0.8148  vmlinux  ip_output
2586  0.7784  vmlinux  handle_fasteoi_irq

Hmm.. strange... look to code htb_classify i see only one place where it 
may get many CPU.


ok... try to add to the end of tc batch file..
filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 800:: match ip 
protocol 1 0x00 flowid 1:7
filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 800:: match ip 
protocol 1 0x00 flowid 1:7

(offtopic... strange... i not found that i can add filter without any match)

Wow!
CPU: P4 / Xeon, speed 3409.94 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not 
stopped) with a unit mask of 0x01 (mandatory) count 10

samples  %app name symbol name
153128   20.9497  vmlinux  ipt_unregister_table
121569   16.6321  vmlinux  e1000_request_irq
60727 8.3082  vmlinux  e1000_update_itr
47241 6.4631  vmlinux  u32_delete
25836 3.5347  vmlinux  htb_dequeue
18304 2.5042  vmlinux  ipt_do_table
15980 2.1862  vmlinux  mwait_idle_with_hints
15977 2.1858  vmlinux  irq_entries_start
13337 1.8247  vmlinux  htb_classify
12512 1.7118  vmlinux  __ip_route_output_key
8821  1.2068  vmlinux  sfq_init
8495  1.1622  vmlinux  e1000_clean_rx_irq
8408  1.1503  vmlinux  htb_enqueue
8018  1.0970  vmlinux  e1000_xmit_frame
7867  1.0763  vmlinux  e1000_clean_tx_ring
6336  0.8668  vmlinux  htb_delete
5828  0.7973  vmlinux  ___pskb_trim
5781  0.7909  vmlinux  s_start
5234  0.7161  vmlinux  e1000_clean_rx_irq_ps
4504  0.6162  vmlinux  cache_alloc_refill
4133  0.5654  vmlinux  radix_tree_delete

Second PC
CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated)
Counted 

Re: HTB classify perfomance

2008-01-11 Thread Badalian Vyacheslav
New info. Wait some time and reset oprifile statistic (i think info 
abount ipt_unregister_table its get what run some script... ).

That clear info after add FILTER:

First PC
CPU: P4 / Xeon, speed 3409.96 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not 
stopped) with a unit mask of 0x01 (mandatory) count 10

samples  %app name symbol name
1158171  19.1292  vmlinux  ipt_do_table
722416   11.9319  vmlinux  e1000_intr
627406   10.3627  vmlinux  u32_classify
5652869.3367  vmlinux  e1000_irq_enable
2693094.4481  vmlinux  htb_dequeue
1910163.1550  vmlinux  ip_route_input
1871273.0907  vmlinux  sfq_dequeue
1727752.8537  vmlinux  e1000_clean_tx_irq
1546542.5544  vmlinux  e1000_clean_rx_irq
1469262.4267  vmlinux  sfq_enqueue
1167821.9289  vmlinux  htb_add_to_wait_tree
79398 1.3114  vmlinux  rb_erase
74411 1.2290  vmlinux  e1000_xmit_frame
65451 1.0810  vmlinux  kfree
59966 0.9904  vmlinux  irq_entries_start
59893 0.9892  vmlinux  eth_type_trans
55510 0.9168  vmlinux  dev_queue_xmit
52688 0.8702  vmlinux  e1000_alloc_rx_buffers






Hello all.
I N days try to tune system for best performance and see strange thing.

Have N htb classes
root class is HTB. param: default 7 (if not classify - go to 1:7)

filters classify only mached ip. others go to HTB DEFAULT rule.

run oprofile:
First pc (htb and iptables compile in kernel):
CPU: P4 / Xeon, speed 3409.94 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not 
stopped) with a unit mask of 0x01 (mandatory) count 10

samples  %app name symbol name
743501   47.6081  vmlinux  htb_classify
208718   13.3647  vmlinux  ipt_do_table
94473 6.0493  vmlinux  u32_classify
43088 2.7590  vmlinux  e1000_intr
35086 2.2466  vmlinux  e1000_clean_tx_irq
34925 2.2363  vmlinux  ip_route_input
33972 2.1753  vmlinux  e1000_irq_enable
33788 2.1635  vmlinux  htb_dequeue
29197 1.8696  vmlinux  e1000_clean_rx_irq
20177 1.2920  vmlinux  sfq_dequeue
17825 1.1414  vmlinux  sfq_enqueue
15135 0.9691  vmlinux  e1000_xmit_frame
15123 0.9684  vmlinux  eth_type_trans
13081 0.8376  vmlinux  kfree
12153 0.7782  vmlinux  dev_queue_xmit

Second PC (htb and iptables is modules)
CPU: P4 / Xeon with 2 hyper-threads, speed 3192.35 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not 
stopped) with a unit mask of 0x01 (mandatory) count 10

samples  %app name symbol name
102108   30.7351  sch_htb  (no symbols)
21559 6.4894  vmlinux  e1000_intr
17428 5.2459  cls_u32  (no symbols)
13887 4.1801  ip_tables(no symbols)
11984 3.6072  sch_sfq  (no symbols)
11785 3.5473  vmlinux  e1000_irq_enable
9684  2.9149  vmlinux  mwait_idle_with_hints
9227  2.7774  vmlinux  e1000_clean_rx_irq
8686  2.6145  vmlinux  e1000_clean_tx_irq
6747  2.0309  vmlinux  ip_route_input
6533  1.9665  vmlinux  irq_entries_start
6419  1.9322  vmlinux  e1000_xmit_frame
5605  1.6871  vmlinux  dev_queue_xmit
4030  1.2131  vmlinux  __kfree_skb
3997  1.2031  vmlinux  __qdisc_run
3931  1.1833  vmlinux  e1000_clean
3565  1.0731  vmlinux  net_rx_action
3518  1.0589  vmlinux  ip_rcv
3377  1.0165  vmlinux  getnstimeofday
3215  0.9677  vmlinux  rb_erase
2973  0.8949  vmlinux  eth_type_trans
2707  0.8148  vmlinux  ip_output
2586  0.7784  vmlinux  handle_fasteoi_irq

Hmm.. strange... look to code htb_classify i see only one place where 
it may get many CPU.


ok... try to add to the end of tc batch file..
filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 800:: match 
ip protocol 1 0x00 flowid 1:7
filter add dev eth0 protocol ip parent 1:0 prio 5 u32 ht 800:: match 
ip protocol 1 0x00 flowid 1:7
(offtopic... strange... i not found that i can add filter without any 
match)


Wow!
CPU: P4 / Xeon, speed 3409.94 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not 

Re: [klibc] [patch] import socket defines

2008-01-11 Thread Mike Frysinger
On Friday 11 January 2008, H. Peter Anvin wrote:
 Mike Frysinger wrote:
  all this stuff is ABI constants, and the only reason glibc
  doesn't use them is that glibc prefers to use enums over #defines.
 
  a proper libc defines things in their headers according to the POSIX
  specs rather than relying on others to do it for them.  if you want to
  argue about linux-specific ABI pieces being exported, then you probably
  have a valid point, but socket.h is hardly that.

 Have you looked at it?!!?  It's full of ABI constants, and that's what I
 care about.  POSIX doesn't define, say, AF_UNIX; that's an ABI specific.

i guess it depends on how you define define :P.  no, POSIX does not state 
the specific numerical value (ABI) for the define (API), but POSIX does 
require sys/socket.h provide the macro AF_UNIX.

  so if the only consumer is klibc and you're against adding these things
  to it, special case it for __KLIBC__.

 No, let's split the header so that there are *no* libc knowledge in the
 kernel.  For the kernel to have knowledge about the specifics of any
 particular libc (klibc, glibc, or any other) is stupid, and that's the
 whole reason we're in this spot to begin with.

we're in this spot at the moment to appease klibc only.  is there any other 
libc out there that is not providing its own complete sys/socket.h but 
instead relying on linux/socket.h ?  glibc/uClibc rely on linux/socket.h only 
for the kernel's definition of sockaddr.

 Again, I don't particularly care about what they're named, but the whole
 point is

   #include linux/foo.h

 if you want the subset and

   #include linux/bar.h

 if you want the whole set.

i looked more at glibc/uClibc and my primary/original concern (and what i 
thought what David was raising and you confirming) was that building of glibc 
was broken and glibc headers would need updates.  that does not seem to be 
the case.  the breakage here is for packages that include both sys/socket.h 
(directly/indirectly) and linux/socket.h (directly/indirectly).

due to the way the network headers depend on each other, this case is trivial 
to induce.  but i dont think linux/socket.h is any more special than the 
current retarded conflicts we have between the network headers from the libc 
(which are required by POSIX and beyond) and the kernel headers.

 No libc specifics, and no feature test macros, which I think we can both
 agree are uglier than hell.

i think in general, all of the network related headers under linux/ are 
fubared for userspace.

 I thought the naming worked out nicer with linux/sockaddr.h

placing the sockaddr definitions into linux/sockaddr.h makes sense.
-mike


signature.asc
Description: This is a digitally signed message part.


Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-11 Thread Jarek Poplawski
On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote:
 On Fri, Jan 11, 2008 at 11:00:20AM +1100, Herbert Xu wrote:
  On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
  
   It seems this optimization could've a side effect: if during such a
   loop updates are done, and r is seen !NULL during while() check, but
   NULL after rcu_dereference(), the listing/counting could stop too
   soon. So, IMHO, probably the first version of this patch is more
   reliable. (Or alternatively additional check is needed before return.)
  
  No, while the value of r-u.dst.rt_next can change between two readings,
  the value of r cannot.
 
 ...Then, of course, it's O.K.!
 
 It looks like I'm really too lazy and/or these selfdocumenting features
 of RCU are a bit overrated: one can never be sure which pointer is
 really RCU protected without checking a few places?! So, after looking
 at this rt_cache_get_next() and this patch only, it's looks like the
 third candidate after seq-private and rtable...

OOPS! ...it seems we are talking about the same, properly documented
(second) poiner yet...

So, IOW: strictly speaking you are right, r can't change here, but I
meant r vs. the returned value! Before the patch the returned value
couldn't be NULL unless all elements of the list were looped. After
this patch it seems possible...

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #3)

2008-01-11 Thread Christoph Hellwig
On Thu, Jan 10, 2008 at 03:32:28PM -0800, Kok, Auke wrote:
 - cleaned up largely against sparse, checkpatch

largely means not completely, right?  Please make sure there's no sparse
warnings left at least.  checkpatch is not that criticial, but it would
be good to have an explanation for everything left.


some comments on the patch

 - please remove that sill copyright heder on the Makefile, it's hard
   to claim any rights on a trivial 3 line makefile.
 - also please use igb-y instead of igb-objs in the Makefile
 - the driver would be a lot more readable (and more importantly
   hackable) if it was written in a natural flow instead of having dozends 
   of lines of forward declarations in every file.
 - so you're adding your own phy abstraction.  Is there a good reason
   you can't simply use the generic phylib?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-11 Thread Jarek Poplawski
On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote:
...
 So, IOW: strictly speaking you are right, r can't change here, but I
 meant r vs. the returned value! Before the patch the returned value
 couldn't be NULL unless all elements of the list were looped. After

...even more strictly:

couldn't be NULL unless all buckets of the hash table were looped. After

 this patch it seems possible...

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [klibc] [patch] import socket defines

2008-01-11 Thread Mike Frysinger
On Friday 11 January 2008, Mike Frysinger wrote:
 On Friday 11 January 2008, H. Peter Anvin wrote:
  Again, I don't particularly care about what they're named, but the whole
  point is
 
  #include linux/foo.h
 
  if you want the subset and
 
  #include linux/bar.h
 
  if you want the whole set.

 i looked more at glibc/uClibc and my primary/original concern (and what i
 thought what David was raising and you confirming) was that building of
 glibc was broken and glibc headers would need updates.  that does not seem
 to be the case.  the breakage here is for packages that include both
 sys/socket.h (directly/indirectly) and linux/socket.h
 (directly/indirectly).

 due to the way the network headers depend on each other, this case is
 trivial to induce.  but i dont think linux/socket.h is any more special
 than the current retarded conflicts we have between the network headers
 from the libc (which are required by POSIX and beyond) and the kernel
 headers.

  No libc specifics, and no feature test macros, which I think we can both
  agree are uglier than hell.

 i think in general, all of the network related headers under linux/ are
 fubared for userspace.

  I thought the naming worked out nicer with linux/sockaddr.h

 placing the sockaddr definitions into linux/sockaddr.h makes sense.

so there's no confusion, since the building of the libc itself and using pure 
libc headers are generally unaffected, and all of the network linux headers 
are already screwed for userspace usage, i'm not against the proposed change 
from Peter.  it doesnt really make the situation any better/worse.
-mike


signature.asc
Description: This is a digitally signed message part.


Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-11 Thread Herbert Xu
On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote:
 
 It looks like I'm really too lazy and/or these selfdocumenting features
 of RCU are a bit overrated: one can never be sure which pointer is
 really RCU protected without checking a few places?! So, after looking
 at this rt_cache_get_next() and this patch only, it's looks like the
 third candidate after seq-private and rtable...

Perhaps we could introduce a sparse attribute for it?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-11 Thread Jarek Poplawski
On Fri, Jan 11, 2008 at 09:38:52PM +1100, Herbert Xu wrote:
 On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote:
  
  So, IOW: strictly speaking you are right, r can't change here, but I
  meant r vs. the returned value! Before the patch the returned value
  couldn't be NULL unless all elements of the list were looped. After
  this patch it seems possible...
 
 Since rcu_derference(r) is always the same as r this patch cannot
 change the value returned.

Right!!! (But, you mean: always the same as r for local r, I hope...)

So, my moronness's selfdocumenting features are not overrated at all!

Thanks again,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rp_filter and ip rule break ipsec policy

2008-01-11 Thread Marco Berizzi
Hello everybody.
AFAIK ipsec policy aren't related to routing
tables: if there is an ipsec policy to deliver
traffic, for example, from 192.168.0.0/16 to
10.0.0.0/8, xfrm will eat the packets ignoring
the routing table.

Here is the ipsec gateway schema:


 [-] cisco ISP router default gateway for
  | the linux box ip=cisco-genova
  |
  |
  |
  |  _ eth0 ip=osw-genova
  | /
  |/
   +--+--+
   | |
   | + eth1 dmz-genova/28 ip=osw-genova
   | |
   +--+--+
  |
  |
  |--- eth2 172.23.0.0/23 ip=172.23.1.8


Take a look:

# ip ru sh
0:  from all lookup local
601:from all to x.y.z.214 iif eth2 lookup test
32766:  from all lookup main
32767:  from all lookup default

# ip r sh table test
default via 172.23.1.254 dev eth2  metric 1

When I insert the rule number #601 packets to
x.y.z.214 aren't ate by xfrm anymore. This
happens when rp_filter is set to 1 on eth0.
Disabling rp_filter on eth0 resolve the problem:
xfrm eat the packets.
Is this the expected behaviour? Why should
rp_filter broken ipsec policy when rule #601
is inserted?

I have enabled log_martinans on eth0 and when
rp_filter is set to 1 I see this messages:

martian source 172.23.1.4 from x.y.z.214, on dev eth0
ll header: 00:30:05:cb:27:c1:00:1b:54:fb:fd:78:08:00
martian source 172.23.1.4 from x.y.z.214, on dev eth0
ll header: 00:30:05:cb:27:c1:00:1b:54:fb:fd:78:08:00

# ip x p
src x.y.z.214 dst 172.23.0.0/23
dir in priority 2376 ptype main
tmpl src osw-napoli dst osw-genova
proto comp reqid 16390 mode tunnel
level use
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16389 mode transport
src 172.23.0.0/23 dst x.y.z.214
dir out priority 2376 ptype main
tmpl src osw-genova dst osw-napoli
proto comp reqid 16390 mode tunnel
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16389 mode transport
src x.y.z.214 dst 172.23.0.0/23
dir fwd priority 2376 ptype main
tmpl src osw-napoli dst osw-genova
proto comp reqid 16390 mode tunnel
level use
tmpl src 0.0.0.0 dst 0.0.0.0
proto esp reqid 16389 mode transport

Here are the others routing tables:

# ip r sh table main
cisco-genova dev eth0  scope link
dmz-genova/28 dev eth1  proto kernel  scope link  src osw-genova
172.23.0.0/23 dev eth2  proto kernel  scope link  src 172.23.1.8
127.0.0.0/8 dev lo  scope link
default via cisco-genova dev eth0  metric 1

# ip r sh table local
broadcast 127.255.255.255 dev lo  proto kernel  scope link  src
127.0.0.1
broadcast dmz-genova dev eth0  proto kernel  scope link  src osw-genova
broadcast dmz-genova dev eth1  proto kernel  scope link  src osw-genova
broadcast broadcast-genova dev eth0  proto kernel  scope link  src
osw-genova
broadcast broadcast-genova dev eth1  proto kernel  scope link  src
osw-genova
local osw-genova dev eth0  proto kernel  scope host  src osw-genova
local osw-genova dev eth1  proto kernel  scope host  src osw-genova
broadcast 172.23.0.0 dev eth2  proto kernel  scope link  src 172.23.1.8
broadcast 172.23.1.255 dev eth2  proto kernel  scope link  src
172.23.1.8
local 172.23.1.8 dev eth2  proto kernel  scope host  src 172.23.1.8
broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1
local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 performance issue in 4 simultaneous links

2008-01-11 Thread Benny Amorsen
David Miller [EMAIL PROTECTED] writes:

 No IRQ balancing should be done at all for networking device
 interrupts, with zero exceptions.  It destroys performance.

Does irqbalanced need to be taught about this? And how about the
initial balancing, so that each network card gets assigned to one CPU?


/Benny


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Improving performance of bonding driver (eql) using round robin alogrithm

2008-01-11 Thread Jeba Anandhan
Hi All,
The existing algorithm works in eql bonding driver works based on
priority of each slaves. The priority has been assigned as speed of the
particular line. The current problem is, all the slaves didn't get the
chance as best slave for the transmission.


Will the round robin algorithm for selecting best slave to transmit the
data, improves the performance?.

Thanks
Jeba
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio_net and SMP guests

2008-01-11 Thread Rusty Russell
On Friday 11 January 2008 02:51:58 Christian Borntraeger wrote:
 What about the following patch:

Looks correct and in fact pretty orthodox.

I've folded this in, thanks!

Rusty.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][ROSE][AX25] af_ax25: possible circular locking

2008-01-11 Thread Jarek Poplawski
On Thu, Jan 10, 2008 at 09:22:42PM -0800, David Miller wrote:
 From: Jarek Poplawski [EMAIL PROTECTED]
 Date: Sun, 30 Dec 2007 15:13:23 +0100
 
  On Sat, Dec 29, 2007 at 07:14:43PM -0800, David Miller wrote:
...
 I've removed the warning and made the branch back to 'again'
 unconditional as I think this is the safest version of the
 change.
 
 I'll push this upstream, thanks for fixing this Jarek.
 

Thanks for checking this and making safer!

Regards,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22

2008-01-11 Thread Zhang, Yanmin
On Wed, 2008-01-09 at 17:35 +0800, Zhang, Yanmin wrote: 
 The regression is:
 1)stoakley with 2 qual-core processors: 11%;
 2)Tulsa with 4 dual-core(+hyperThread) processors:13%;
I have new update on this issue and also cc to netdev maillist.
Thank David Miller for pointing me the netdev maillist.

 
 The test command is:
 #sudo taskset -c 7 ./netserver
 #sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- 
 -r 1,1
 
 As a matter of fact, 2.6.23 has about 6% regression and 2.6.24-rc's
 regression is between 16%~11%.
 
 I tried to use bisect to locate the bad patch between 2.6.22 and 2.6.23-rc1,
 but the bisected kernel wasn't stable and went crazy.
 
 I tried both CONFIG_SLUB=y and CONFIG_SLAB=y to make sure SLUB isn't the
 culprit.
 
 The oprofile data of CONFIG_SLAB=y. Top cpu utilizations are:
 1) 2.6.22 
 2067379   9.4888  vmlinux  schedule
 1873604   8.5994  vmlinux  mwait_idle
 1568131   7.1974  vmlinux  resched_task
 1066976   4.8972  vmlinux  tcp_v4_rcv
 9866414.5285  vmlinux  tcp_rcv_established
 9795184.4958  vmlinux  find_busiest_group
 7670693.5207  vmlinux  sock_def_readable
 7368083.3818  vmlinux  tcp_sendmsg
 5958892.7350  vmlinux  task_rq_lock
 5571932.5574  vmlinux  tcp_ack
 4705702.1598  vmlinux  __mod_timer
 3922201.8002  vmlinux  __alloc_skb
 3581061.6436  vmlinux  skb_release_data
 3133721.4383  vmlinux  skb_clone
 
 2) 2.6.24-rc7
 2668426  12.4497  vmlinux  vmlinux  schedule
 9556984.4589  vmlinux  vmlinux  
 skb_release_data
 8363113.9018  vmlinux  vmlinux  tcp_v4_rcv
 7623983.5570  vmlinux  vmlinux  
 skb_release_all
 7289073.4007  vmlinux  vmlinux  
 task_rq_lock
 7050373.2894  vmlinux  vmlinux  __wake_up
 6942063.2388  vmlinux  vmlinux  
 __mod_timer
 6176162.8815  vmlinux  vmlinux  mwait_idle
 
 It looks like tcp in 2.6.22 sends more packets, but frees far less skb than 
 2.6.24-rc6.
 tcp_rcv_established in 2.6.22 is highlighted on cpu utilization.
I instrumented kernel to capure the function call numbers.
1) 2.6.22
skb_release_data:50148649
tcp_ack: 25062858   
tcp_transmit_skb:25063150   
tcp_v4_rcv:  25063279   

2) 2.6.24-rc6
skb_release_data:21429692   
tcp_ack: 10707710   
tcp_transmit_skb:10707866
tcp_v4_rcv:  10707959   

The data doesn't show that 2.6.22 sends more packets while freeing far less skb 
than
2.6.24-rc6.

The data showed skb_release_data of kernel 2.6.22 is more than double of the 
one of
2.6.24-rc6. But netperf result just showed about 10% regression.

As the packet only has 1 byte, so I suspect 2.6.24-rc6 tries to merge packets 
after waiting for
a latency. 2.6.22 might haven't the wait latency or the latency is very small, 
so 2.6.22 almost
sends the packets immediately. I will check the source codes later.

-yanmin


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-11 Thread Herbert Xu
On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote:
 
 So, IOW: strictly speaking you are right, r can't change here, but I
 meant r vs. the returned value! Before the patch the returned value
 couldn't be NULL unless all elements of the list were looped. After
 this patch it seems possible...

Since rcu_derference(r) is always the same as r this patch cannot
change the value returned.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 001/001] ipv4: enable use of 240/4 address space

2008-01-11 Thread Andi Kleen
Vince Fuller [EMAIL PROTECTED] writes:

 from Vince Fuller [EMAIL PROTECTED]

 This set of diffs modify the 2.6.20 kernel to enable use of the 240/4
 (aka class-E) address space as consistent with the Internet Draft
 draft-fuller-240space-00.txt.

Wouldn't it be wise to at least wait for it becoming an RFC first? 

-Andi
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


handing cloned frames to netif_rx()?

2008-01-11 Thread Johannes Berg
In 802.11n, there is a case where multiple data frames are received
aggregated into a single frame (A-MSDU).

Currently, we copy each of these frames out into their own skb, but
because of the alignment with that etc. I started to think that we could
simply pass up a clone of the original skb with start/length adjusted
properly so that it windows only the contained packet.

The buffer would be shared but the data within the original window
(starting with the 802.3 header) could even be written to, it won't be
needed again by mac80211 once it's handed off to netif_rx(). The skb
will obviously have lots of head- and tailroom but that space would be
part of other packets.

Is it ok to do this? Will something freak out if we pass a cloned skb to
netif_rx()?

johannes


signature.asc
Description: This is a digitally signed message part


Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-11 Thread Jarek Poplawski
On Fri, Jan 11, 2008 at 09:37:42PM +1100, Herbert Xu wrote:
 On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote:
  
  It looks like I'm really too lazy and/or these selfdocumenting features
  of RCU are a bit overrated: one can never be sure which pointer is
  really RCU protected without checking a few places?! So, after looking
  at this rt_cache_get_next() and this patch only, it's looks like the
  third candidate after seq-private and rtable...
 
 Perhaps we could introduce a sparse attribute for it?

I hope I won't be cursed by all those forced to additional writing,
so I'd only admit that after this patch there should be no problem
with identifying RCU protected data properly (maybe only this kind
of rcu_dereference() needs some popularization).

Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 001/001] ipv4: enable use of 240/4 address space

2008-01-11 Thread YOSHIFUJI Hideaki / 吉藤英明
In article [EMAIL PROTECTED] (at Fri, 11 Jan 2008 12:17:02 +0100), Andi Kleen 
[EMAIL PROTECTED] says:

 Vince Fuller [EMAIL PROTECTED] writes:
 
  from Vince Fuller [EMAIL PROTECTED]
 
  This set of diffs modify the 2.6.20 kernel to enable use of the 240/4
  (aka class-E) address space as consistent with the Internet Draft
  draft-fuller-240space-00.txt.
 
 Wouldn't it be wise to at least wait for it becoming an RFC first? 

I do think so, too.

There is no positive consesus on this draft
at the intarea meeting in Vancouver, right?

We cannot / should not enable that space until we have reached
a consensus on it.

--yoshfuji
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


doubt in e1000_io_write()

2008-01-11 Thread Jeba Anandhan
Hi all,
i have doubt in e1000_io_write().

void
e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value)
{
outl(value, port);
}


kernel version: 2.6.12.3


Even hw structure has not been used, why it has been passed into
e1000_io_write function?

Thanks
Jeba

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.23+] ingress classify to [nf]mark

2008-01-11 Thread jamal
On Fri, 2008-11-01 at 15:24 -0200, Dzianis Kahanovich wrote:
 jamal wrote:


  tc qdisc add dev XXX ingress
  tc filter add dev XXX parent : protocol ip prio 5 \
  u32 blah bleh \
  flowid 1:12 action ipt -j mark --set-mark 13 
 
 Yes, I do so. But there are simple:
 ---
 if [[ $[TC_INDEX2MARK] == 0 ]] ; then
   c=${c//action ipt -j MARK --set-mark /flowid :}
 fi
 $c
 ---

I didnt quiet understand what you have above. Does your script above
read the flowid and sets the MARK to some dynamic value based on flowid?
if thats what you are doing - it sounds sensible and much more clever
than what is posted. And it doesnt require any kernel patch.

 Simpliest:
 --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
 +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
 @@ -222,6 +222,16 @@
 - skb-tc_index = TC_H_MIN(res.classid);
 + skb-tc_index = TC_H_MIN(mark=res.classid);

Just write a metaset action and you can have all sorts of policies on
what tc_index, mark etc you want. It is something thats needed in any
case.
When we did tc_index it made sense then because it was for tc to use
some default policy. Enforcing policies in the kernel is not the best
thing to do; as an example you want to specify the polciy for mark to
be: classid major16|minor. I am sure you have good reasons; however,
for the next person who wants to set it it major8|minor for their own
good reason, theres conflict.  
My offer to help you is still open.

cheers,
jamal

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: questions on NAPI processing latency and dropped network packets

2008-01-11 Thread Chris Friesen

David Miller wrote:


You have to be kidding, coming here for help with a nearly
4 year old kernel.


I figured it couldn't hurt to ask...if I can't ask the original authors, 
who else is there?


I'd love to work on newer kernels, but we have a commitment to our 
customers to support multiple releases for a significant amount of time.


Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

2008-01-11 Thread Jarek Poplawski
On Thu, Jan 10, 2008 at 03:51:11PM -0800, Paul E. McKenney wrote:
 On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
  Eric Dumazet wrote, On 01/09/2008 11:37 AM:
  ...
   [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  ...
   diff --git a/net/ipv4/route.c b/net/ipv4/route.c
   index d337706..28484f3 100644
   --- a/net/ipv4/route.c
   +++ b/net/ipv4/route.c
   @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct 
   seq_file *seq)
 break;
 rcu_read_unlock_bh();
 }
   - return r;
   + return rcu_dereference(r);
}

static struct rtable *rt_cache_get_next(struct seq_file *seq, struct 
   rtable *r)
{
   - struct rt_cache_iter_state *st = rcu_dereference(seq-private);
   + struct rt_cache_iter_state *st = seq-private;

 r = r-u.dst.rt_next;
 while (!r) {
   @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct 
   seq_file *seq, struct rtable *r)
 rcu_read_lock_bh();
 r = rt_hash_table[st-bucket].chain;
 }
   - return r;
   + return rcu_dereference(r);
}
  
  It seems this optimization could've a side effect: if during such a
  loop updates are done, and r is seen !NULL during while() check, but
  NULL after rcu_dereference(), the listing/counting could stop too
  soon. So, IMHO, probably the first version of this patch is more
  reliable. (Or alternatively additional check is needed before return.)
 
 Looks to me like r is a local variable (argument list), so there
 should not be any possibility of it being changed by some other
 task, right?

It seems words could be stronger than then logic (in some cases)...
After forgetting what's dereference usually for, it's all right!

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] spidernet: add missing initialization

2008-01-11 Thread Jens Osterkamp
On Friday 11 January 2008, Ishizaki Kou wrote:
 This patch fixes initialization of aneg_count and medium fields in
 spider_net_card to make spidernet driver correctly sets link status.
 
 Signed-off-by: Kou Ishizaki [EMAIL PROTECTED]

Hi Ishizaki,

Linas has left the company and is no longer doing kernel related stuff,
so I suggest, given Jeff is ok with that, that the two of us take over
spidernet maintainership.

Jens

---

Change maintainership for spidernet.

Signed-off-by: Jens Osterkamp [EMAIL PROTECTED]

Index: linux-2.6/MAINTAINERS
===
--- linux-2.6.orig/MAINTAINERS  2008-01-11 13:32:04.0 +0100
+++ linux-2.6/MAINTAINERS   2008-01-11 13:41:32.0 +0100
@@ -3613,8 +3613,10 @@
 S: Supported
 
 SPIDERNET NETWORK DRIVER for CELL
-P: Linas Vepstas
-M: [EMAIL PROTECTED]
+P: Ishizaki Kou
+M: [EMAIL PROTECTED]
+P: Jens Osterkamp
+M: [EMAIL PROTECTED]
 L: netdev@vger.kernel.org
 S: Supported
 

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Simple question about LARTC theory

2008-01-11 Thread slavon

Hello all.
Sorry for offtopic. I subscribe only on [EMAIL PROTECTED] try  
send to [EMAIL PROTECTED] and get Undelivered Mail Returned to  
Sender. May i do small offtop? This maillist have many people that  
known lartc in code and i hope its help for my idea. Thanks.


Simple Question

Legend
[] - qdisc
() - class
** - filter

[htb 1:0 root] *match X FLOWID 3:5*
(1:2 htb)(2:3 htb)(3:5 htb)[sfq 5]
(1:6 htb)(6:7 htb)(7:8 htb)[sfq 8]

packet go
IN - [htb 1:0] - (class 1:2 - GREEN) - (class 2:3 GREEN) - (class
3:5 - GREEN) - [sfq 5] - OUT

then i create

[prio 3 bound 10:0] *match X flowid 10:2*
+(10:1 htb) -- [sfq 101]
+(10:2 htb) -- [sfq 102]
+(10:3 htb) -- [sfq 103]

HOW to add filter to [sfq 5] and [sfq 8]  that then packet go out from
it its go to [prio 3 bound 10:0] and do filter from it?

flowid work if it see begin and end of links... i need like GOTO... if
i add to [prio 3 bound 10:0] PARRENT ID - flowid found path, but i
need that [prio 3 bound 10:0] must have more 1 parrent...

i look to link but if i understand - its work for only for hashtables
i look to classid but its go to class 10:X, not to [prio 3 bound 10:0]
and not process filter...

Or i not understand theory?

That i need? I need 3 groups in tc
1-st group get all traffic and do HTB shape (defence from ICMP and UDP shtorm)
a) icmp rate 100mbs cell 500mbs
b) udp rate 100mbs cell 500mbs
c) other rate 300mbs cell 500mbs
all prio = 0 to do normal cellrate

2-nd group do prio ( icmp and udp must be first becouse its not have  
check for transmit)

icmp = 1
udp = 2
other = 3

3-th group do speed limit by IP (shape it) ( this part is ready )

i wont that all exits on group 1 go to group 2 filters and all exits  
on group 2 go to group 3 exists...


Thanks. Slavon



This message was sent using IMP, the Internet Messaging Program.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


iproute2: removing primary address removes secondaries

2008-01-11 Thread martin f krafft
Dear list,

When I add an address to an interface whose network prefix is the
same as that of an address already bound to the interface, the new
address becomes a secondary address. As per
http://www.policyrouting.org/iproute2.doc.html:

  secondary --- this address is not used when selecting the default
  source address for outgoing packets. An IP address becomes
  secondary if another address within the same prefix (network)
  already exists. The first address within the prefix is primary and
  is the tag address for the group of all the secondary addresses.
  When the primary address is deleted all of the secondaries are
  purged too.

In the following, I want to argue that this is not necessary.
I think that removal of a primary address should cause the next
address to be promoted to be the default source address and the
link-scoped route to be retained. This is basically out of
http://bugs.debian.org/429689, the maintainer asked me to turn
directly to this list.

If I add an address to a device with 'ip add', ip also implicitly
adds a link-scoped route according to the netmask. It only does this
for primary addresses, so if I add a second address within the same
network, the route is not duplicated.

Thus, the net effect on the routing table is the same for the
following two commands:

  ip a a 172.16.0.100/12 dev eth0  ip a a 172.16.0.200/12 dev eth0
  ip a a 172.16.0.100/12 dev eth0  ip a a 172.16.0.200/32 dev eth0

In the first case, the .200 address becomes a secondary of the .100
address. In the second case, they are both primaries. In both cases,
only one /12 link-scoped route will be created.

However, in both cases, if I remove the .100 address, the .200 is
affected: if it's secondary, it ceases to exist, and if it's
primary (i.e. in the /32 case), then the host can no longer use it
to communicate to hosts in the same link segment, only to hosts on
the other side of the default gateway.

I thus question the point of purging secondary addresses. Obviously,
only one address can be primary (it is used as source address for
packets leaving the machine by the respective route). But if the
primary address is removed, the next secondary should be promoted
and the route should *not* be deleted.

Comments?

Cheers,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
microsoft: for when quality, reliability, and security
   just aren't that important!
 
spamtraps: [EMAIL PROTECTED]


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)


Re: e1000 performance issue in 4 simultaneous links

2008-01-11 Thread Eric Dumazet

Breno Leitao a écrit :

On Thu, 2008-01-10 at 12:52 -0800, Brandeburg, Jesse wrote:
  

Breno Leitao wrote:


When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
of transfer rate. If I run 4 netperf against 4 different interfaces, I
get around 720 * 10^6 bits/sec.
  

I hope this explanation makes sense, but what it comes down to is that
combining hardware round robin balancing with NAPI is a BAD IDEA.  In
general the behavior of hardware round robin balancing is bad and I'm
sure it is causing all sorts of other performance issues that you may
not even be aware of.


I've made another test removing the ppc IRQ Round Robin scheme, bonded
each interface (eth6, eth7, eth16 and eth17) to different CPUs (CPU1,
CPU2, CPU3 and CPU4) and I also get around around 720 * 10^6 bits/s in
average.

Take a look at the interrupt table this time: 


io-dolphins:~/leitao # cat /proc/interrupts  | grep eth[1]*[67]
277: 151362450 13 14 13 14 
15 18   XICS  Level eth6
278: 12 131348681 19 13 15 
10 11   XICS  Level eth7
323: 11 18 171348426 18 11 
11 13   XICS  Level eth16
324: 12 16 11 191402709 13 
14 11   XICS  Level eth17


I also tried to bound all the 4 interface IRQ to a single CPU (CPU0)
using the noirqdistrib boot paramenter, and the performance was a little
worse.

Rick, 
  The 2 interface test that I showed in my first email, was run in two

different NIC. Also, I am running netperf with the following command
netperf -H hostname -T 0,8 while netserver is running without any
argument at all. Also, running vmstat in parallel shows that there is no
bottleneck in the CPU. Take a look: 


procs ---memory-- ---swap-- -io -system-- -cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 2  0  0 6714732  16168 22744000 8 2  203   21  0  1 98  0  0
 0  0  0 6715120  16176 22744000 028 16234  505  0 16 83  0 
 1
 0  0  0 6715516  16176 22744000 0 0 16251  518  0 16 83  0 
 1
 1  0  0 6715252  16176 22744000 0 1 16316  497  0 15 84  0 
 1
 0  0  0 6716092  16176 22744000 0 0 16300  520  0 16 83  0 
 1
 0  0  0 6716320  16180 22744000 0 1 16354  486  0 15 84  0 
 1
 

  

If your machine has 8 cpus, then your vmstat output shows a bottleneck :)

(100/8 = 12.5), so I guess one of your CPU is full





--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PROCFS] [NETNS] issue with /proc/net entries

2008-01-11 Thread Benjamin Thery

Eric W. Biederman wrote:

Benjamin Thery [EMAIL PROTECTED] writes:


Hi Eric,

While testing the current network namespace stuff merged in net-2.6.25,
I bumped into the following problem with the /proc/net/ entries.
It doesn't always display the actual data of the current namespace,
but sometime displays data from other namespaces.

I bisected the problem to the commit:
proc: remove/Fix proc generic d_revalidate
3790ee4bd86396558eedd86faac1052cb782e4e1

The problem: If a process in a particular network namespace changes
current directory to /proc/net, then processes in other network
namespaces trying to look at /proc/net entries will see data from the
first namespace (the one with CWD /proc/net). (See test case below).

As you comments in the commit suggest, you seem to be aware of some
issues when CONFIG_NET_NS=y. Is it one of these corner cases you
identified? Any idea on how we can fix it?


Yes.  It isn't especially hard.   I have most of it in my queue
I just need to get the silly patches out of there.

Essentially we need to fix the caching of proc_generic entries,
So that we can have a proper d_revalidate implementation.

To get d_revalidate and the caching correct for /proc/net will take
just a bit more work.  We need to make /proc/net a symlink
to something like /proc/self/net so that we don't get excess
revalidates when switching between different processes.

Or else we can't properly implement the case you have described.
Where being in the directory causes the wrong version of /proc/net
to show up. Changing the contents of the dentry for /proc/net
should only happen during unshare.  Not when we switch between
processes or else we get into the d_revalidate leaks mount points
problem again.

We also need the check to see if something is mounted on top of
us before we call drop the dentry.  But if we don't even try until
we know the dentry is invalid it should not be too bad.


Thanks for all the details.
I'll put this issue on my netns current limitations list until
it's solved.

Benjamin




Eric




--
B e n j a m i n   T h e r y  - BULL/DT/Open Software RD

   http://www.bull.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.23+] ingress classify to [nf]mark

2008-01-11 Thread Dzianis Kahanovich

Patrick McHardy wrote:


--- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
@@ -161,2 +161,5 @@
 skb-tc_index = TC_H_MIN(res.classid);
+#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
+skb-mark = 
(skb-mark(res.classid16))|TC_H_MIN(res.classid);

+#endif
 default:



Behaviour like this shouldn't depend on compile-time options.


Also I want to move it outside of NET_CLS_ACT dependence, but unsure in 
behaviour understanding without NET_CLS_ACT.


But there are reduse code.

--
WBR,
Denis Kaganovich,  [EMAIL PROTECTED]  http://mahatma.bspu.unibel.by
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] spidernet: add missing initialization

2008-01-11 Thread Linas Vepstas
Hi,

On 11/01/2008, Jens Osterkamp [EMAIL PROTECTED] wrote:
 Hi Ishizaki,

 Linas has left the company and is no longer doing kernel related stuff,
 so I suggest, given Jeff is ok with that, that the two of us take over
 spidernet maintainership.

 Jens

 ---

 Change maintainership for spidernet.

 Signed-off-by: Jens Osterkamp [EMAIL PROTECTED]

Fine with me ...

Acked-by: Linas Vepstas [EMAIL PROTECTED]

 Index: linux-2.6/MAINTAINERS
 ===
 --- linux-2.6.orig/MAINTAINERS  2008-01-11 13:32:04.0 +0100
 +++ linux-2.6/MAINTAINERS   2008-01-11 13:41:32.0 +0100
 @@ -3613,8 +3613,10 @@
  S: Supported

  SPIDERNET NETWORK DRIVER for CELL
 -P: Linas Vepstas
 -M: [EMAIL PROTECTED]
 +P: Ishizaki Kou
 +M: [EMAIL PROTECTED]
 +P: Jens Osterkamp
 +M: [EMAIL PROTECTED]
  L: netdev@vger.kernel.org
  S: Supported


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.23+] ingress classify to [nf]mark

2008-01-11 Thread Dzianis Kahanovich

jamal wrote:


To classid x:y = mark=markx|y (classid :y = -j MARK --set-mark y, etc).

--- linux-2.6.23-gentoo-r2/net/sched/Kconfig
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig
@@ -222,6 +222,16 @@

[..]

skb-tc_index = TC_H_MIN(res.classid);
+#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
+   skb-mark = 
(skb-mark(res.classid16))|TC_H_MIN(res.classid);
+#endif
default:



Please either use ipt action and netfilter fwmarker for this activity or


Sorry. There are only unsuccessful attempt to popularize my working solution.
Really I just use #define tc_index mark (in skbuff.h or sch_ingress.c) or 
something like this:


--- linux-2.6.23-gentoo-r2/net/sched/Kconfig
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig
@@ -222,6 +222,16 @@
  To compile this code as a module, choose M here: the
  module will be called sch_ingress.

+config NET_SCH_INGRESS_TC2MARK
+   bool ingress tc_index - mark
+   depends on NET_SCH_INGRESS  NET_CLS_ACT
+   ---help---
+ This enables access to mark value via tc_index alias
+ in ingress and unify this values (usage example: set flowid :2
+ in ingress and use it value as mark in any way - netfilter, etc).
+   
+ But tc_index may be undefined - use flowid :0.
+
 comment Classification

 config NET_CLS
--- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
@@ -18,6 +18,9 @@
 #include net/netlink.h
 #include net/pkt_sched.h

+#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
+#define tc_index mark
+#endif

 #undef DEBUG_INGRESS



create a new action. 
If you choose the later (example because you want to dynamically compute

the mark), look at net/sched/act_simple.c to start from and i can help
you if you have any questions.
 
If you want to use ipt action, the syntax would be something like:


---
tc qdisc add dev XXX ingress
tc filter add dev XXX parent : protocol ip prio 5 \
u32 blah bleh \
flowid 1:12 action ipt -j mark --set-mark 13 


Yes, I do so. But there are simple:
---
if [[ $[TC_INDEX2MARK] == 0 ]] ; then
 c=${c//action ipt -j MARK --set-mark /flowid :}
fi
$c
---

Simpliest:
--- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
@@ -222,6 +222,16 @@
-   skb-tc_index = TC_H_MIN(res.classid);
+   skb-tc_index = TC_H_MIN(mark=res.classid);


--
WBR,
Denis Kaganovich,  [EMAIL PROTECTED]  http://mahatma.bspu.unibel.by
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Pull request for 'ipg-fixes' branch

2008-01-11 Thread Francois Romieu
[EMAIL PROTECTED] [EMAIL PROTECTED] :
[...]
 I notice that the vendor-supplied driver doesn't have these bugs.

The M in POMS stands for my.

[...]
 Would you be interested in some cleanup patches ?

Yes.

 In particular, I think I can get rid of tx-lock entirely, or at least
 take it off the fast path. All it's protecting is the write to
 sp-tx_current, and a few judicious memory barriers can deal with that.

I have done a kind of memory barrier trick for the r8169 in the past but
it is not clear that I would do it again. Today I would argue more strongly
in direction of similar locking amongst different drivers. The tg3 driver
is a good model imho.

Anyway you have been here for some time so I see no reason to kill any
different/new locking scheme you could come with.

Off until sunday.

-- 
Ueimor
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 performance issue in 4 simultaneous links

2008-01-11 Thread Denys Fedoryshchenko
Maybe good idea to use sysstat ?

http://perso.wanadoo.fr/sebastien.godard/

For example:

visp-1 ~ # mpstat -P ALL 1
Linux 2.6.24-rc7-devel (visp-1) 01/11/08

19:27:57 CPU   %user   %nice%sys %iowait%irq   %soft  %steal
   %idleintr/s
19:27:58 all0.000.000.000.000.002.510.00   
97.49   7707.00
19:27:58   00.000.000.000.000.004.000.00   
96.00   1926.00
19:27:58   10.000.000.000.000.001.010.00   
98.99   1926.00
19:27:58   20.000.000.000.000.005.000.00   
95.00   1927.00
19:27:58   30.000.000.000.000.000.990.00   
99.01   1927.00
19:27:58   40.000.000.000.000.000.000.00
0.00  0.00



  
  When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
  of transfer rate. If I run 4 netperf against 4 different interfaces, I
  get around 720 * 10^6 bits/sec.

  I hope this explanation makes sense, but what it comes down to is that
  combining hardware round robin balancing with NAPI is a BAD IDEA.  In
  general the behavior of hardware round robin balancing is bad and I'm
  sure it is causing all sorts of other performance issues that you may
  not even be aware of.
  
  I've made another test removing the ppc IRQ Round Robin scheme, bonded
  each interface (eth6, eth7, eth16 and eth17) to different CPUs (CPU1,
  CPU2, CPU3 and CPU4) and I also get around around 720 * 10^6 bits/s in
  average.
 
  Take a look at the interrupt table this time: 
 
  io-dolphins:~/leitao # cat /proc/interrupts  | grep eth[1]*[67]
  277: 151362450 13 14 13 
14 15 18   XICS  Level eth6
  278: 12 131348681 19 13 
15 10 11   XICS  Level eth7
  323: 11 18 171348426 18 
11 11 13   XICS  Level eth16
  324: 12 16 11 191402709 
13 14 11   XICS  Level eth17
 
 
  I also tried to bound all the 4 interface IRQ to a single CPU (CPU0)
  using the noirqdistrib boot paramenter, and the performance was a little
  worse.
 
  Rick, 
The 2 interface test that I showed in my first email, was run in two
  different NIC. Also, I am running netperf with the following command
  netperf -H hostname -T 0,8 while netserver is running without any
  argument at all. Also, running vmstat in parallel shows that there is no
  bottleneck in the CPU. Take a look: 
 
  procs ---memory-- ---swap-- -io -system-- -
cpu--
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy 
id wa st
   2  0  0 6714732  16168 22744000 8 2  203   21  0  1 
98  0  0
   0  0  0 6715120  16176 22744000 028 16234  505  0 16 
83  0  1
   0  0  0 6715516  16176 22744000 0 0 16251  518  0 16 
83  0  1
   1  0  0 6715252  16176 22744000 0 1 16316  497  0 15 
84  0  1
   0  0  0 6716092  16176 22744000 0 0 16300  520  0 16 
83  0  1
   0  0  0 6716320  16180 22744000 0 1 16354  486  0 15 
84  0  1
   
 

 If your machine has 8 cpus, then your vmstat output shows a 
 bottleneck :)
 
 (100/8 = 12.5), so I guess one of your CPU is full
 
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 performance issue in 4 simultaneous links

2008-01-11 Thread Breno Leitao
On Fri, 2008-01-11 at 17:48 +0100, Eric Dumazet wrote:
 Breno Leitao a écrit :
  Take a look at the interrupt table this time: 
 
  io-dolphins:~/leitao # cat /proc/interrupts  | grep eth[1]*[67]
  277: 151362450 13 14 13 14  
 15 18   XICS  Level eth6
  278: 12 131348681 19 13 15  
 10 11   XICS  Level eth7
  323: 11 18 171348426 18 11  
 11 13   XICS  Level eth16
  324: 12 16 11 191402709 13  
 14 11   XICS  Level eth17
 
 

 If your machine has 8 cpus, then your vmstat output shows a bottleneck :)
 
 (100/8 = 12.5), so I guess one of your CPU is full

Well, if I run top while running the test, I see this load distributed
among the CPUs, mainly those that had a NIC IRC bonded. Take a look:

Tasks: 133 total,   2 running, 130 sleeping,   0 stopped,   1 zombie
Cpu0  :  0.3%us, 19.5%sy,  0.0%ni, 73.5%id,  0.0%wa,  0.0%hi,  0.0%si,  6.6%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 75.1%id,  0.0%wa,  0.7%hi, 24.3%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni, 73.1%id,  0.0%wa,  0.7%hi, 26.2%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni, 76.1%id,  0.0%wa,  0.7%hi, 23.3%si,  0.0%st
Cpu4  :  0.0%us,  0.3%sy,  0.0%ni, 70.4%id,  0.7%wa,  0.3%hi, 28.2%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Note that this average scenario doesn't change during the entire
benchmarking test.

Thanks!

-- 
Breno Leitao [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


why does promote_secondaries default to off? (was: iproute2: removing primary address removes secondaries)

2008-01-11 Thread martin f krafft
also sprach Daniel Lezcano [EMAIL PROTECTED] [2008.01.11.1813 +0100]:
 There is a tweak in /proc/sys which activate secondaries promotion when a 
 primary is deleted.

 /proc/sys/net/ipv4/conf/all/promote_secondaries

 I think it changes the behavior to the one you wish.

Totally. That would have been the last place I had looked.
Thank you!

Do you have any idea why this isn't on by default?

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
i never go without my dinner. no one ever does, except vegetarians
 and people like that.
-- oscar wilde
 
spamtraps: [EMAIL PROTECTED]


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)


Re: [PATCH 001/001] ipv4: enable use of 240/4 address space

2008-01-11 Thread Vince Fuller
On Fri, Jan 11, 2008 at 12:17:02PM +0100, Andi Kleen wrote:
 Vince Fuller [EMAIL PROTECTED] writes:
 
  from Vince Fuller [EMAIL PROTECTED]
 
  This set of diffs modify the 2.6.20 kernel to enable use of the 240/4
  (aka class-E) address space as consistent with the Internet Draft
  draft-fuller-240space-00.txt.
 
 Wouldn't it be wise to at least wait for it becoming an RFC first? 

There is reasonable consensus on making use of 240/4; some applications,
such as ISAKMP and automatic ipv6-to-IPv4 tunneling, still need to determine
if they should treat the space as public or private but that shouldn't
affect whether kernel support is added.

Solaris recently added support for 240/4 and OSX already has it. I thought
the Linux kernel developers might appreciate having patches to do likewise.

I leave it up to you, the developers, to decide if you want to use these
patches.

--Vince
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 performance issue in 4 simultaneous links

2008-01-11 Thread Breno Leitao
Hello Denys, 
   I've installed sysstat (good tools!) and the result is very similar
to the one which appears at top, take a look:
   13:34:23 CPU   %user   %nice%sys %iowait%irq   %soft  %steal   
%idleintr/s
13:34:24 all0.000.002.720.000.25   12.130.99   
83.91  16267.33
13:34:24   00.000.00   21.780.000.000.007.92   
70.30 40.59
13:34:24   10.000.000.000.000.99   24.750.00   
74.26   4025.74
13:34:24   20.000.000.000.000.99   24.750.00   
74.26   4036.63
13:34:24   30.000.000.000.000.99   21.780.00   
77.23   4032.67
13:34:24   40.000.000.000.000.98   24.510.00   
74.51   4034.65
13:34:24   50.000.000.000.000.000.000.00  
100.00 30.69
13:34:24   60.000.000.000.000.000.000.00  
100.00 33.66
13:34:24   70.000.000.000.000.000.000.00  
100.00 32.67

So, we can assure that the IRQs are not being balanced, and that there
isn't any processor overload.

Thanks!


On Fri, 2008-01-11 at 19:36 +0200, Denys Fedoryshchenko wrote:
 Maybe good idea to use sysstat ?
 
 http://perso.wanadoo.fr/sebastien.godard/

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: doubt in e1000_io_write()

2008-01-11 Thread Kok, Auke
Jeba Anandhan wrote:
 Hi all,
 i have doubt in e1000_io_write().
 
 void
 e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value)
 {
 outl(value, port);
 }
 
 
 kernel version: 2.6.12.3
 
 
 Even hw structure has not been used, why it has been passed into
 e1000_io_write function?

2.6.12.3? why do you care? that code is probably long gone... was that function
even used?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.23+] ingress classify to [nf]mark

2008-01-11 Thread Dzianis Kahanovich

jamal wrote:


Yes, I do so. But there are simple:
---
if [[ $[TC_INDEX2MARK] == 0 ]] ; then

==1

  c=${c//action ipt -j MARK --set-mark /flowid :}

   c=${c//action ipt -j MARK --set-mark 0x/flowid :}

fi
$c
---


I didnt quiet understand what you have above. Does your script above
read the flowid and sets the MARK to some dynamic value based on flowid?
if thats what you are doing - it sounds sensible and much more clever
than what is posted. And it doesnt require any kernel patch.


I suggest just to use classid to toggle mark/nfmark in ingress. I see, classid
are near unused in ingress (no classes, etc) and for many solutions classid in
ingress filters may be used only for nfmarking. Also I suggest to use both
parts (major  minor) of classid - major may be and value, minor - or. In
current place it may be useful only for (if, unsure) overriting netfilter
raw table marks, but if it will be moved outside current CLS_ACT block -
tc filter rules may operate mark bits more useful.

About script example:
While I compose filter, I check flag ($TC_INDEX2MARK), tells me are patch
applied or no. If no - I use usual -j MARK --set-mark, else I use classid to
change mark. All in ingress only. For example:
tc filter add dev eth0 parent : protocol ip u32 ... action ipt -j MARK 0x10
are cname to:
tc filter add dev eth0 parent : protocol ip u32 ... flowid :10

- it use less code/modules and, in many cases, may be single/main goal to
ingress usage - pre-marking packets.


Simpliest:
--- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
@@ -222,6 +222,16 @@
-   skb-tc_index = TC_H_MIN(res.classid);
+   skb-tc_index = TC_H_MIN(mark=res.classid);


Just write a metaset action and you can have all sorts of policies on
what tc_index, mark etc you want. It is something thats needed in any
case.
When we did tc_index it made sense then because it was for tc to use
some default policy. Enforcing policies in the kernel is not the best
thing to do; as an example you want to specify the polciy for mark to
be: classid major16|minor. I am sure you have good reasons; however,
for the next person who wants to set it it major8|minor for their own
good reason, theres conflict.  
My offer to help you is still open.


OK, I understand there are not too transparent for future usage, but I see too
few applications for ingress/classid will conflicting with.

Thanx, I will try to understand metaset actions, but I think it will be not
so elegant for my usage then my #define tc_index mark in the beginning of
sch_ingress.c. Or may be I will use and/or behaviour, but now #define
tc_index mark works on my router many month (I may use also -j MARK - with
one flag in my script, but there are lot of unuseful code).

This code (ingress/classifying[/CLS_ACT]) are executing everywhen and I
suggest changes from none (changing target variable from tc_index to mark)
to few and/or atomic operations for useful functionality. With
mark=res.classid only (I may use self, but not suggest to kernel) it even
less code then default (no TC_H_MIN) and fully satisfy to many goals (traffic
marking without netfilter, but compatible with it).

--
WBR,
Denis Kaganovich,  [EMAIL PROTECTED]  http://mahatma.bspu.unibel.by

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: removing primary address removes secondaries

2008-01-11 Thread Daniel Lezcano

martin f krafft wrote:

Dear list,

When I add an address to an interface whose network prefix is the
same as that of an address already bound to the interface, the new
address becomes a secondary address. As per
http://www.policyrouting.org/iproute2.doc.html:

  secondary --- this address is not used when selecting the default
  source address for outgoing packets. An IP address becomes
  secondary if another address within the same prefix (network)
  already exists. The first address within the prefix is primary and
  is the tag address for the group of all the secondary addresses.
  When the primary address is deleted all of the secondaries are
  purged too.

In the following, I want to argue that this is not necessary.
I think that removal of a primary address should cause the next
address to be promoted to be the default source address and the
link-scoped route to be retained. This is basically out of
http://bugs.debian.org/429689, the maintainer asked me to turn
directly to this list.

If I add an address to a device with 'ip add', ip also implicitly
adds a link-scoped route according to the netmask. It only does this
for primary addresses, so if I add a second address within the same
network, the route is not duplicated.

Thus, the net effect on the routing table is the same for the
following two commands:

  ip a a 172.16.0.100/12 dev eth0  ip a a 172.16.0.200/12 dev eth0
  ip a a 172.16.0.100/12 dev eth0  ip a a 172.16.0.200/32 dev eth0

In the first case, the .200 address becomes a secondary of the .100
address. In the second case, they are both primaries. In both cases,
only one /12 link-scoped route will be created.

However, in both cases, if I remove the .100 address, the .200 is
affected: if it's secondary, it ceases to exist, and if it's
primary (i.e. in the /32 case), then the host can no longer use it
to communicate to hosts in the same link segment, only to hosts on
the other side of the default gateway.

I thus question the point of purging secondary addresses. Obviously,
only one address can be primary (it is used as source address for
packets leaving the machine by the respective route). But if the
primary address is removed, the next secondary should be promoted
and the route should *not* be deleted.

Comments?

Cheers,


There is a tweak in /proc/sys which activate secondaries promotion when 
a primary is deleted.


/proc/sys/net/ipv4/conf/all/promote_secondaries

I think it changes the behavior to the one you wish.

Regards
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bonding : Monitoring of 4965 wireless card

2008-01-11 Thread Chris Snook

[EMAIL PROTECTED] wrote:

Hi,

I want to make a bond with my wireless card. The ipw driver create two
 interfaces (wlan0 and wmaster0). When i switch the rf_kill button,
 ifplug detect wlan0 unplugged but not wmaster0. If i down wlan0 (while
 rf_kil ), bonding detect the inactivity when i up the interface.

Have you some idea where is the problem? the driver or the miimon of
 the module?

my module parameters mode=1 miimon=100 primary eth0


miimon isn't meaningful for wmaster0.  I suggest you use arp monitoring instead. 
 See bonding.txt for details.


-- Chris
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module

2008-01-11 Thread Rafael J. Wysocki
 http://bugzilla.kernel.org/show_bug.cgi?id=9721

On Friday, 11 of January 2008, supersud501 wrote:
 
 Stephen Hemminger wrote:
  On Wed, 9 Jan 2008 16:03:00 -0800
  Andrew Morton [EMAIL PROTECTED] wrote:
  
  (switched to email.  Please respond via emailed reply-to-all, not via the
  bugzilla web interface).
 
  On Wed,  9 Jan 2008 13:05:34 -0800 (PST)
  [EMAIL PROTECTED] wrote:
 
  http://bugzilla.kernel.org/show_bug.cgi?id=9721
 
 Summary: wake on lan fails with sky2 module
 Product: ACPI
 Version: 2.5
   KernelVersion: 2.6.24-rc7
Platform: All
  OS/Version: Linux
Tree: Mainline
  Status: NEW
Severity: normal
Priority: P1
   Component: Power-Sleep-Wake
  AssignedTo: [EMAIL PROTECTED]
  ReportedBy: [EMAIL PROTECTED]
  This post-2.6.23 regression was assigned to ACPI but is quite possibly a
  net driver problem?
 
  Latest working kernel version: 2.6.23.12
  Earliest failing kernel version: 2.6.24-rc6 (not tested earlier kernel,
  2.6.24-rc7 still failing)
  Distribution: Ubuntu 8.04 (but Kernel build from Kernel.org and system 
  modifiet
  to make wake on lan work, i.e. network cards are not shutted down on 
  poweroff)
  Hardware Environment: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
  Ethernet Controller (rev 20) onboard Asus P5W DH motherboard, uses module 
  SKY2
  Software Environment:
  Problem Description:
 
  When enabling wake on lan with: 'ethtool -s eth0 wol' i get the following
  status:
 
  21:56:29 ~ # sudo ethtool eth0
  Settings for eth0:
  Supported ports: [ TP ]
  Supported link modes:   10baseT/Half 10baseT/Full 
  100baseT/Half 100baseT/Full 
  1000baseT/Half 1000baseT/Full 
  Supports auto-negotiation: Yes
  Advertised link modes:  10baseT/Half 10baseT/Full 
  100baseT/Half 100baseT/Full 
  1000baseT/Half 1000baseT/Full 
  Advertised auto-negotiation: Yes
  Speed: 100Mb/s
  Duplex: Full
  Port: Twisted Pair
  PHYAD: 0
  Transceiver: internal
  Auto-negotiation: on
  Supports Wake-on: pg
  Wake-on: g wol enabled
  Current message level: 0x00ff (255)
  Link detected: yes
 
  but after shutting down the pc doesn't wake up when magic packet is sent.
 
  the status lights of the network card are still on (so the card seems to 
  be
  online).
 
  same system with only changed kernel to 2.6.23.12 and same procedure like
  above: wake on lan works.
 
  Steps to reproduce: enable wol on your network card using SKY2 module and 
  it
  doesn't work too?
 
  if you need more information, just tell me, it's my first bug report.
  regards
 
  
  
  Wake from power off works on 2.6.24-rc7 for me.
  Wake from suspend doesn't because Network Manager, HAL, or some other
  user space tool gets confused.
  
  I just rechecked it with Fujitsu Lifebook, which has sky2 (88E8055).
  There many variations of this chip, and it maybe chip specific problem
  or ACPI/BIOS issues.  If you don't enable Wake on Lan in BIOS, the
  driver can't do it for you. Also, check how you are shutting down.
  
  Also since the device has to restart the PHY, it could be a switch
  issue if you have some fancy pants switch doing intrusion detection
  or something, but I doubt that.
  
  Is it a clean or fast shutdown, most distributions mark network
  devices as down on shutdown, but if the distribution does something 
  stupid like remove the driver module, then the driver is unable to setup 
  Wake On Lan.
  The wake on lan setup is done in one place in the driver, add
  a printk to see if it is ever called.
  
  
 
 
 I tried ACPI wakeup with /proc/acpi/alarm (like i described in my last 
 mail) and it worked... so ACPI wakeup seems to work.
 
 i'll try to do the printk-thing when i find some time to mess around 
 with the sources (maybe tomorrow). if someone has some brief 
 instructions (maybe a link to a helpfull site for kernel debugging) for 
 me i would be thankfull and could provide some more info faster.
 
 some steps for me to identify the source of the problem (is it really 
 sky2?) would be really helpfull...

Please do the tests requested at:
http://bugzilla.kernel.org/show_bug.cgi?id=9721#c2,
thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: doubt in e1000_io_write()

2008-01-11 Thread Breno Leitao
Hello Auke, 
On Fri, 2008-01-11 at 10:41 -0800, Kok, Auke wrote:
  Even hw structure has not been used, why it has been passed into
  e1000_io_write function?
 
 2.6.12.3? why do you care? that code is probably long gone... was that 
 function
 even used?

I noticed that this also happens on upstream netdev-2.6 branch. 
Moreover the function e1000_write_reg_io() from e1000_hw.c is the only
function that calls e1000_io_write().

I write a small patch that fixes it.

diff -uNp e1000.old/e1000_hw.c e1000/e1000_hw.c
--- e1000.old/e1000_hw.c2008-01-11 14:14:36.0 -0500
+++ e1000/e1000_hw.c2008-01-11 14:13:36.0 -0500
@@ -6654,8 +6654,8 @@ e1000_write_reg_io(struct e1000_hw *hw,
 unsigned long io_addr = hw-io_base;
 unsigned long io_data = hw-io_base + 4;
 
-e1000_io_write(hw, io_addr, offset);
-e1000_io_write(hw, io_data, value);
+e1000_io_write(io_addr, offset);
+e1000_io_write(io_data, value);
 }
 
 /**
diff -uNp e1000.old/e1000_hw.h e1000/e1000_hw.h
--- e1000.old/e1000_hw.h2008-01-11 14:13:00.0 -0500
+++ e1000/e1000_hw.h2008-01-11 14:15:47.0 -0500
@@ -427,7 +427,7 @@ int32_t e1000_read_pcie_cap_reg(struct e
 void e1000_pcix_set_mmrbc(struct e1000_hw *hw, int mmrbc);
 int e1000_pcix_get_mmrbc(struct e1000_hw *hw);
 /* Port I/O is only supported on 82544 and newer */
-void e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value);
+void e1000_io_write(unsigned long port, uint32_t value);
 int32_t e1000_disable_pciex_master(struct e1000_hw *hw);
 int32_t e1000_check_phy_reset_block(struct e1000_hw *hw);
 
diff -uNp e1000.old/e1000_main.c e1000/e1000_main.c
--- e1000.old/e1000_main.c  2008-01-11 14:14:36.0 -0500
+++ e1000/e1000_main.c  2008-01-11 14:13:23.0 -0500
@@ -4919,7 +4919,7 @@ e1000_read_pcie_cap_reg(struct e1000_hw 
 }
 
 void
-e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value)
+e1000_io_write(unsigned long port, uint32_t value)
 {
outl(value, port);
 }

Signed-off-by: Breno Leitao [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ibm_newemac: Increase number of default rx-/tx-buffers

2008-01-11 Thread Eugene Surovegin
On Sat, Jan 05, 2008 at 01:38:17PM +0100, Stefan Roese wrote:
 On Saturday 05 January 2008, Benjamin Herrenschmidt wrote:
  On Sat, 2008-01-05 at 10:50 +0100, Stefan Roese wrote:
   Performance tests done by AMCC have shown that 256 buffer increase the
   performance of the Linux EMAC driver. So let's update the default
   values to match this setup.
  
   Signed-off-by: Stefan Roese [EMAIL PROTECTED]
   ---
 
  Do we have the numbers ? Did they also measure latency ?
 
 I hoped this question would not come. ;) No, unfortunately I don't have any 
 numbers. Just the recommendation from AMCC to always use 256 buffers.

This cannot be true for all chips. Default numbers I selected weren't 
random. In particular, 256 for Tx doesn't make a lot of sense for 405. 
You just gonna waste memory.

I'd be quite reluctant to follow such advices from AMCC without actual 
details. 

-- 
Eugene

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22

2008-01-11 Thread Rick Jones

The test command is:
#sudo taskset -c 7 ./netserver
#sudo taskset -c 0 ./netperf -t TCP_RR -l 60 -H 127.0.0.1 -i 50,3 -I 99,5 -- -r 
1,1


A couple of comments/questions on the command lines:

*) netperf/netserver support CPU affinity within themselves with the 
global -T option to netperf.  Is the result with taskset much different? 
  The equivalent to the above would be to run netperf with:


./netperf -T 0,7 ...

The one possibly salient difference between the two is that when done 
within netperf, the initial process creation will take place wherever 
the scheduler wants it.


*) The -i option to set the confidence iteration count will silently cap 
the max at 30.


happy benchmarking,

rick jones
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TCP hints

2008-01-11 Thread Ilpo Järvinen
Hi Stephen,

Do you still remember what this is for (got added along with other TCP 
hint stuff)? What kind of problem you saw back then (or who saw
problems)?

@@ -1605,6 +1711,10 @@ static void tcp_undo_cwr(struct sock *sk, const int 
undo)
}
tcp_moderate_cwnd(tp);
tp-snd_cwnd_stamp = tcp_time_stamp;
+
+   /* There is something screwy going on with the retrans hints after
+  an undo */
+   clear_all_retrans_hints(tp);
 }
 
 static inline int tcp_may_undo(struct tcp_sock *tp)


-- 
 i.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: why does promote_secondaries default to off?

2008-01-11 Thread martin f krafft
also sprach Daniel Lezcano [EMAIL PROTECTED] [2008.01.11.1833 +0100]:
 This tweak is recent (2.6.16 as far as I remember), so I suppose
 the  reason is to not puzzled people with a changed default
 behavior.

Your instant and helpful responses are most appreciated!

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
a common mistake that people make
when trying to design something completely foolproof
was to underestimate the ingenuity of complete fools.
 -- douglas adams, mostly harmless
 
spamtraps: [EMAIL PROTECTED]


digital_signature_gpg.asc
Description: Digital signature (see http://martin-krafft.net/gpg/)


[PATCH] dm9161: add configuration for MII/RMII

2008-01-11 Thread frederic RODO

diff --git a/drivers/net/phy/davicom.c b/drivers/net/phy/davicom.c
index 7ed632d..6bdc32f 100644
--- a/drivers/net/phy/davicom.c
+++ b/drivers/net/phy/davicom.c
@@ -37,6 +37,7 @@

 #define MII_DM9161_SCR 0x10
 #define MII_DM9161_SCR_INIT0x0610
+#define MII_DM9161_SCR_RMII0x0100

 /* DM9161 Interrupt Register */
 #define MII_DM9161_INTR0x15
@@ -103,7 +104,7 @@ static int dm9161_config_aneg(struct phy_device *phydev)

 static int dm9161_config_init(struct phy_device *phydev)
 {
-   int err;
+   int err, temp;

/* Isolate the PHY */
err = phy_write(phydev, MII_BMCR, BMCR_ISOLATE);
@@ -111,8 +112,19 @@ static int dm9161_config_init(struct phy_device 
*phydev)

if (err  0)
return err;

-   /* Do not bypass the scrambler/descrambler */
-   err = phy_write(phydev, MII_DM9161_SCR, MII_DM9161_SCR_INIT);
+   /* Do not bypass the scrambler/descrambler , configure MII Mode */
+   switch (phydev-interface) {
+   case PHY_INTERFACE_MODE_MII:
+   temp = MII_DM9161_SCR_INIT;
+   break;
+   case PHY_INTERFACE_MODE_RMII:
+   temp =  MII_DM9161_SCR_INIT | MII_DM9161_SCR_RMII;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   err = phy_write(phydev, MII_DM9161_SCR, temp);

if (err  0)
return err;
Signed-off-by: Frederic RODO [EMAIL PROTECTED]

-
Les informations précédentes peuvent être confidentielles ou privilégiées.
Si vous n'êtes pas le destinataire prévu de ce mail, veuillez en notifier 
l'expéditeur en répondant à ce message puis supprimez-en toute trace 
de vos systèmes.


TIL Technologies
Parc du Golf, Bat 43
350 rue J.R Guilibert Gautier de la Lauzière 
13856 AIX EN PROVENCE

Tel. : +33 4 42 37 11 77
-


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: why does promote_secondaries default to off?

2008-01-11 Thread Daniel Lezcano

martin f krafft wrote:

also sprach Daniel Lezcano [EMAIL PROTECTED] [2008.01.11.1813 +0100]:
There is a tweak in /proc/sys which activate secondaries promotion when a 
primary is deleted.


/proc/sys/net/ipv4/conf/all/promote_secondaries

I think it changes the behavior to the one you wish.


Totally. That would have been the last place I had looked.
Thank you!

Do you have any idea why this isn't on by default?


This tweak is recent (2.6.16 as far as I remember), so I suppose the 
reason is to not puzzled people with a changed default behavior.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: e1000 performance issue in 4 simultaneous links

2008-01-11 Thread Breno Leitao
On Thu, 2008-01-10 at 12:52 -0800, Brandeburg, Jesse wrote:
 Breno Leitao wrote:
  When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
  of transfer rate. If I run 4 netperf against 4 different interfaces, I
  get around 720 * 10^6 bits/sec.
 
 I hope this explanation makes sense, but what it comes down to is that
 combining hardware round robin balancing with NAPI is a BAD IDEA.  In
 general the behavior of hardware round robin balancing is bad and I'm
 sure it is causing all sorts of other performance issues that you may
 not even be aware of.
I've made another test removing the ppc IRQ Round Robin scheme, bonded
each interface (eth6, eth7, eth16 and eth17) to different CPUs (CPU1,
CPU2, CPU3 and CPU4) and I also get around around 720 * 10^6 bits/s in
average.

Take a look at the interrupt table this time: 

io-dolphins:~/leitao # cat /proc/interrupts  | grep eth[1]*[67]
277: 151362450 13 14 13 14 
15 18   XICS  Level eth6
278: 12 131348681 19 13 15 
10 11   XICS  Level eth7
323: 11 18 171348426 18 11 
11 13   XICS  Level eth16
324: 12 16 11 191402709 13 
14 11   XICS  Level eth17


I also tried to bound all the 4 interface IRQ to a single CPU (CPU0)
using the noirqdistrib boot paramenter, and the performance was a little
worse.

Rick, 
  The 2 interface test that I showed in my first email, was run in two
different NIC. Also, I am running netperf with the following command
netperf -H hostname -T 0,8 while netserver is running without any
argument at all. Also, running vmstat in parallel shows that there is no
bottleneck in the CPU. Take a look: 

procs ---memory-- ---swap-- -io -system-- -cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 2  0  0 6714732  16168 22744000 8 2  203   21  0  1 98  0  0
 0  0  0 6715120  16176 22744000 028 16234  505  0 16 83  0 
 1
 0  0  0 6715516  16176 22744000 0 0 16251  518  0 16 83  0 
 1
 1  0  0 6715252  16176 22744000 0 1 16316  497  0 15 84  0 
 1
 0  0  0 6716092  16176 22744000 0 0 16300  520  0 16 83  0 
 1
 0  0  0 6716320  16180 22744000 0 1 16354  486  0 15 84  0 
 1
 

Thanks!

-- 
Breno Leitao [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: TIPC API, bind to two networks

2008-01-11 Thread Randy Macleod


I'd like some feedback on a change to TIPC that I plan
to submit to netdev/kernel.org. At this stage, I'm interested
in what people think about using the protocol parameter
of the socket interface to select a TIPC stack for the socket.
My co-worker, Chris Friesen, has suggested that it would
be more conventional to extend the TIPC sockaddr to
select the appropriate network in calls to sendto() or bind().
I prefer socket(,,protocol) approach.

Some background:

We use TIPC in an ATCA chassis (Advanced Telecommunications Computing
Architecture). An ATCA chassis may have two networks commonly called
base and data (outside of the ATCA world base is called control).

Users want to be able to create a TIPC socket that is
bound to one network OR the other. The intention is that these
networks should be isolated as much as possible.

The attached patch accomplishes this by using the protocol
parameter of the socket() syscall. A user can specify the TIPC stack
to which the socket should be attached as follows:
int socket(int domain, int type, int protocol);
fd = socket(AF_TIPC, SOCK_SEQPACKET, 1);
In the unpatched TIPC code the protocol was required to be zero.
This requires that the user know the network topology or that the
system designer provide an API (get_base_netid(), get_data_netid()).

The patch is for tipc-1.5.12 which you can get from tipc.sf.net.
We're way back on linux-2.6.14 - gotta love the embedded world!

In terms of implementation,
the basic idea of the patch is to introduce a layer in the
TIPC code around socket creation and tipc netlink messages.
This layer lets TIPC stack code register callback functions and
then dispatches socket() and netlink calls to the appropriate
TIPC stack.

For example usage, please see:
http://sourceforge.net/mailarchive/message.php?msg_name=476839F3.8070203%40nortel.com

One note for those who might not read the link above
I create two modules: tipc.ko and tipcstack.ko
these are ~98% identical with certain bits of functionality,
like registering AF_TIPC, disabled. This means that we have
the same bits of code loaded twice but that's a feature not
a bug! It means that the control and data networks are even
more independent so you could update one but not the other
or you could use system tap on one but not the other.



diffstat:
   Makefile|   15 +++-
   include/net/tipc/tipc.h |   11 ++-
   net/tipc/core.c |  157
+++-
   net/tipc/core.h |   25 +++
   net/tipc/handler.c  |   17 +++--
   net/tipc/netlink.c  |   21 +-
   net/tipc/socket.c   |   55 
   net/tipc/vtipc.c|  154
+++
   net/tipc/vtipc.h|   48 ++
   tools/tipc-config.c |   29 ++--


Even though I *don't* want the attached patch to be integrated
into the kernel (yet), I'm still going to include:
Signed-off-by: Randy MacLeod ([EMAIL PROTECTED])
because it's taken such a long time to get Nortel to bless
official kernel participation!

// Randy


diff -Naur tipc-1.5.12_orig/include/net/tipc/tipc.h 
tipc-1.5.12_gmt/include/net/tipc/tipc.h
--- tipc-1.5.12_orig/include/net/tipc/tipc.h2005-12-15 00:48:48.0 
+0530
+++ tipc-1.5.12_gmt/include/net/tipc/tipc.h 2007-12-17 21:41:19.0 
+0530
@@ -71,7 +71,6 @@
__u32 lower;
__u32 upper;
 };
-
 static inline __u32 tipc_addr(unsigned int zone,
  unsigned int cluster,
  unsigned int node)
@@ -213,13 +212,21 @@
 /*
  * TIPC-specific socket option values
  */
-
 #define SOL_TIPC   50  /* TIPC socket option level */
+
 #define TIPC_IMPORTANCE127 /* Default: TIPC_LOW_IMPORTANCE 
*/
 #define TIPC_SRC_DROPPABLE 128 /* Default: 0 (resend congested msg) */
 #define TIPC_DEST_DROPPABLE129 /* Default: based on socket type */
 #define TIPC_CONN_TIMEOUT  130 /* Default: 8000 (ms)  */
 
+#define TIPC_STACK_0   0   /* Default TIPC stack */
+#define TIPC_STACK_1   1   /* 1st TIPC stack */
+#define TIPC_STACK_2   2   /* 2nd TIPC stack */
+#define TIPC_STACK_3   3   /* 3rd TIPC stack */
+#define TIPC_STACK_4   4   /* 4th TIPC stack */
+#define TIPC_STACK_5   5   /* 5th TIPC stack */
+#define TIPC_STACK_6   6   /* 6th TIPC stack */
+#define TIPC_STACK_7   7   /* 7th TIPC stack */
 
 #ifdef __KERNEL__
 
diff -Naur tipc-1.5.12_orig/Makefile tipc-1.5.12_gmt/Makefile
--- tipc-1.5.12_orig/Makefile   2005-06-23 00:10:12.0 +0530
+++ tipc-1.5.12_gmt/Makefile2007-12-18 01:41:20.0 +0530
@@ -3,8 +3,6 @@
 #
 
 SHELL = /bin/bash
-
-
  
 ifdef KERNELDIR
KINCLUDE = ${KERNELDIR}/include
@@ -22,8 +20,19 @@
   -DCONFIG_TIPC_DEBUG
 
 obj-m += tipc.o
+obj-m += tipcstack.o
+
+tipc-objs += net/tipc/addr.o  net/tipc/bcast.o  

Re: [PATCH] ibm_newemac: Increase number of default rx-/tx-buffers

2008-01-11 Thread Benjamin Herrenschmidt

On Fri, 2008-01-11 at 09:48 -0800, Eugene Surovegin wrote:
 On Sat, Jan 05, 2008 at 01:38:17PM +0100, Stefan Roese wrote:
  On Saturday 05 January 2008, Benjamin Herrenschmidt wrote:
   On Sat, 2008-01-05 at 10:50 +0100, Stefan Roese wrote:
Performance tests done by AMCC have shown that 256 buffer increase the
performance of the Linux EMAC driver. So let's update the default
values to match this setup.
   
Signed-off-by: Stefan Roese [EMAIL PROTECTED]
---
  
   Do we have the numbers ? Did they also measure latency ?
  
  I hoped this question would not come. ;) No, unfortunately I don't have any 
  numbers. Just the recommendation from AMCC to always use 256 buffers.
 
 This cannot be true for all chips. Default numbers I selected weren't 
 random. In particular, 256 for Tx doesn't make a lot of sense for 405. 
 You just gonna waste memory.
 
 I'd be quite reluctant to follow such advices from AMCC without actual 
 details. 

I think we can make defaults based on other config options nowadays. Not
very nice but we could do things like

default 128 if PPC_40x
default 256

Or even more detailed.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 performance issue in 4 simultaneous links

2008-01-11 Thread Rick Jones

Breno Leitao wrote:

On Fri, 2008-01-11 at 17:48 +0100, Eric Dumazet wrote:


Breno Leitao a écrit :

Take a look at the interrupt table this time: 


io-dolphins:~/leitao # cat /proc/interrupts  | grep eth[1]*[67]
277: 151362450 13 14 13 14 
15 18   XICS  Level eth6
278: 12 131348681 19 13 15 
10 11   XICS  Level eth7
323: 11 18 171348426 18 11 
11 13   XICS  Level eth16
324: 12 16 11 191402709 13 
14 11   XICS  Level eth17


 


If your machine has 8 cpus, then your vmstat output shows a bottleneck :)

(100/8 = 12.5), so I guess one of your CPU is full



Well, if I run top while running the test, I see this load distributed
among the CPUs, mainly those that had a NIC IRC bonded. Take a look:

Tasks: 133 total,   2 running, 130 sleeping,   0 stopped,   1 zombie
Cpu0  :  0.3%us, 19.5%sy,  0.0%ni, 73.5%id,  0.0%wa,  0.0%hi,  0.0%si,  6.6%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni, 75.1%id,  0.0%wa,  0.7%hi, 24.3%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni, 73.1%id,  0.0%wa,  0.7%hi, 26.2%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni, 76.1%id,  0.0%wa,  0.7%hi, 23.3%si,  0.0%st
Cpu4  :  0.0%us,  0.3%sy,  0.0%ni, 70.4%id,  0.7%wa,  0.3%hi, 28.2%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st


If you have IRQ's bound to CPUs 1-4, and have four netperfs running, 
given that the stack ostensibly tries to have applications run on the 
same CPUs, what is running on CPU0?


Is it related to:


  The 2 interface test that I showed in my first email, was run in two
different NIC. Also, I am running netperf with the following command
netperf -H hostname -T 0,8 while netserver is running without any
argument at all. Also, running vmstat in parallel shows that there is no
bottleneck in the CPU. Take a look: 


Unless you have a morbid curiousity :) there isn't much point in binding 
all the netperf's to CPU 0 when the interrupts for the NICs servicing 
their connections are on CPUs 1-4.  I also assume then that the 
system(s) on which netserver is running have  8 CPUs in them? (There 
are multiple destination systems yes?)


Does anything change if you explicitly bind each netperf to the CPU on 
which the interrups for its connection are processed?  Or for that 
matter if you remove the -T command entirely


Does UDP_STREAM show different performance than TCP_STREAM (I'm 
ass-u-me-ing based on the above we are looking at the netperf side of a 
TCP_STREAM test above, please correct if otherwise).


Are the CPUs above single-core CPUs or multi-core CPUs, and if 
multi-core are caches shared?  How are CPUs numbered if multi-core on 
that system?  Is there any hardware threading involved?  I'm wondering 
if there may be some wrinkles in the system that might lead to reported 
CPU utilization being low even if a chip is otherwise saturated.  Might 
need some HW counters to check that...


Can you describe the I/O subsystem more completely?  I understand that 
you are using at most two ports of a pair of quad-port cards at any one 
time, but am still curious to know if those two cards are on separate 
busses, or if they share any bus/link on the way to memory.


rick jones
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module

2008-01-11 Thread supersud501



Stephen Hemminger wrote:

On Wed, 9 Jan 2008 16:03:00 -0800
Andrew Morton [EMAIL PROTECTED] wrote:


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed,  9 Jan 2008 13:05:34 -0800 (PST)
[EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9721

   Summary: wake on lan fails with sky2 module
   Product: ACPI
   Version: 2.5
 KernelVersion: 2.6.24-rc7
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Power-Sleep-Wake
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]

This post-2.6.23 regression was assigned to ACPI but is quite possibly a
net driver problem?


Latest working kernel version: 2.6.23.12
Earliest failing kernel version: 2.6.24-rc6 (not tested earlier kernel,
2.6.24-rc7 still failing)
Distribution: Ubuntu 8.04 (but Kernel build from Kernel.org and system modifiet
to make wake on lan work, i.e. network cards are not shutted down on poweroff)
Hardware Environment: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20) onboard Asus P5W DH motherboard, uses module SKY2
Software Environment:
Problem Description:

When enabling wake on lan with: 'ethtool -s eth0 wol' i get the following
status:

21:56:29 ~ # sudo ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Advertised auto-negotiation: Yes

Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pg
Wake-on: g wol enabled
Current message level: 0x00ff (255)
Link detected: yes

but after shutting down the pc doesn't wake up when magic packet is sent.

the status lights of the network card are still on (so the card seems to be
online).

same system with only changed kernel to 2.6.23.12 and same procedure like
above: wake on lan works.

Steps to reproduce: enable wol on your network card using SKY2 module and it
doesn't work too?

if you need more information, just tell me, it's my first bug report.
regards




Wake from power off works on 2.6.24-rc7 for me.
Wake from suspend doesn't because Network Manager, HAL, or some other
user space tool gets confused.

I just rechecked it with Fujitsu Lifebook, which has sky2 (88E8055).
There many variations of this chip, and it maybe chip specific problem
or ACPI/BIOS issues.  If you don't enable Wake on Lan in BIOS, the
driver can't do it for you. Also, check how you are shutting down.

Also since the device has to restart the PHY, it could be a switch
issue if you have some fancy pants switch doing intrusion detection
or something, but I doubt that.

Is it a clean or fast shutdown, most distributions mark network
devices as down on shutdown, but if the distribution does something 
stupid like remove the driver module, then the driver is unable to setup Wake On Lan.

The wake on lan setup is done in one place in the driver, add
a printk to see if it is ever called.





I tried ACPI wakeup with /proc/acpi/alarm (like i described in my last 
mail) and it worked... so ACPI wakeup seems to work.


i'll try to do the printk-thing when i find some time to mess around 
with the sources (maybe tomorrow). if someone has some brief 
instructions (maybe a link to a helpfull site for kernel debugging) for 
me i would be thankfull and could provide some more info faster.


some steps for me to identify the source of the problem (is it really 
sky2?) would be really helpfull...

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [ROSE] two extra tab characters removed

2008-01-11 Thread Bernard Pidoux F6BVP


Signed-off-by: Bernard Pidoux [EMAIL PROTECTED]
---
 include/net/rose.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/rose.h b/include/net/rose.h
index d3ab453..0cfdc0e 100644
--- a/include/net/rose.h
+++ b/include/net/rose.h
@@ -86,7 +86,7 @@ struct rose_neigh {
ax25_addresscallsign;
ax25_digi   *digipeat;
ax25_cb *ax25;
-   struct net_device   *dev;
+   struct net_device   *dev;
unsigned short  count;
unsigned short  use;
unsigned intnumber;
@@ -124,7 +124,7 @@ struct rose_sock {
ax25_addresssource_digis[ROSE_MAX_DIGIS];
ax25_addressdest_digis[ROSE_MAX_DIGIS];
struct rose_neigh   *neighbour;
-   struct net_device   *device;
+   struct net_device   *device;
unsigned intlci, rand;
unsigned char   state, condition, qbitincl, defer;
unsigned char   cause, diagnostic;
--
1.5.3.7
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module

2008-01-11 Thread supersud501



Rafael J. Wysocki wrote:

http://bugzilla.kernel.org/show_bug.cgi?id=9721


On Friday, 11 of January 2008, supersud501 wrote:

Stephen Hemminger wrote:

On Wed, 9 Jan 2008 16:03:00 -0800
Andrew Morton [EMAIL PROTECTED] wrote:


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed,  9 Jan 2008 13:05:34 -0800 (PST)
[EMAIL PROTECTED] wrote:


http://bugzilla.kernel.org/show_bug.cgi?id=9721

   Summary: wake on lan fails with sky2 module
   Product: ACPI
   Version: 2.5
 KernelVersion: 2.6.24-rc7
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Power-Sleep-Wake
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]

This post-2.6.23 regression was assigned to ACPI but is quite possibly a
net driver problem?


Latest working kernel version: 2.6.23.12
Earliest failing kernel version: 2.6.24-rc6 (not tested earlier kernel,
2.6.24-rc7 still failing)
Distribution: Ubuntu 8.04 (but Kernel build from Kernel.org and system modifiet
to make wake on lan work, i.e. network cards are not shutted down on poweroff)
Hardware Environment: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit
Ethernet Controller (rev 20) onboard Asus P5W DH motherboard, uses module SKY2
Software Environment:
Problem Description:

When enabling wake on lan with: 'ethtool -s eth0 wol' i get the following
status:

21:56:29 ~ # sudo ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
100baseT/Half 100baseT/Full 
1000baseT/Half 1000baseT/Full 
Advertised auto-negotiation: Yes

Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pg
Wake-on: g wol enabled
Current message level: 0x00ff (255)
Link detected: yes

but after shutting down the pc doesn't wake up when magic packet is sent.

the status lights of the network card are still on (so the card seems to be
online).

same system with only changed kernel to 2.6.23.12 and same procedure like
above: wake on lan works.

Steps to reproduce: enable wol on your network card using SKY2 module and it
doesn't work too?

if you need more information, just tell me, it's my first bug report.
regards



Wake from power off works on 2.6.24-rc7 for me.
Wake from suspend doesn't because Network Manager, HAL, or some other
user space tool gets confused.

I just rechecked it with Fujitsu Lifebook, which has sky2 (88E8055).
There many variations of this chip, and it maybe chip specific problem
or ACPI/BIOS issues.  If you don't enable Wake on Lan in BIOS, the
driver can't do it for you. Also, check how you are shutting down.

Also since the device has to restart the PHY, it could be a switch
issue if you have some fancy pants switch doing intrusion detection
or something, but I doubt that.

Is it a clean or fast shutdown, most distributions mark network
devices as down on shutdown, but if the distribution does something 
stupid like remove the driver module, then the driver is unable to setup Wake On Lan.

The wake on lan setup is done in one place in the driver, add
a printk to see if it is ever called.




I tried ACPI wakeup with /proc/acpi/alarm (like i described in my last 
mail) and it worked... so ACPI wakeup seems to work.


i'll try to do the printk-thing when i find some time to mess around 
with the sources (maybe tomorrow). if someone has some brief 
instructions (maybe a link to a helpfull site for kernel debugging) for 
me i would be thankfull and could provide some more info faster.


some steps for me to identify the source of the problem (is it really 
sky2?) would be really helpfull...


Please do the tests requested at:
http://bugzilla.kernel.org/show_bug.cgi?id=9721#c2,
thanks.



allright, didn't see that before, sorry, here are the results:

kernel 2.6.23.12 acpi=off: when shutting down the system doesn't 
poweroff (of course), but pressing the powerbutton does the trick. and 
wake on lan: WORKS


kernel 2.6.24-rc7 acpi=off: computer doesn't power off, either (so 
acpi=off works), but wol still DOESN'T work :(


so no acpi-problem?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: handing cloned frames to netif_rx()?

2008-01-11 Thread Tomas Winkler
On Jan 11, 2008 3:17 AM, Johannes Berg [EMAIL PROTECTED] wrote:
 In 802.11n, there is a case where multiple data frames are received
 aggregated into a single frame (A-MSDU).

 Currently, we copy each of these frames out into their own skb, but
 because of the alignment with that etc. I started to think that we could
 simply pass up a clone of the original skb with start/length adjusted
 properly so that it windows only the contained packet.

 The buffer would be shared but the data within the original window
 (starting with the 802.3 header) could even be written to, it won't be
 needed again by mac80211 once it's handed off to netif_rx(). The skb
 will obviously have lots of head- and tailroom but that space would be
 part of other packets.

 Is it ok to do this? Will something freak out if we pass a cloned skb to
 netif_rx()?

This would be great even in regular case. 4965 has ability to deliver
more frames per receiving buffer
Because of A-MSDU we keeps 8K receiving buffers which are
underutilized when A-MSDU is not used.

 johannes

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[oops] 2.6.23.9

2008-01-11 Thread Andrei Popa
Linux version 2.6.23.9 vanilla

Jan  7 02:24:11 eunite BUG: scheduling while atomic:
mii-tool/0x0002/31658
Jan  7 02:24:11 eunite [c02fd920] schedule+0x27a/0x35e
Jan  7 02:24:11 eunite [c011dbd7] __mod_timer+0xac/0xbc
Jan  7 02:24:11 eunite [c02fe2bb] schedule_timeout+0x43/0x9f
Jan  7 02:24:11 eunite [c011dd92] process_timeout+0x0/0x5
Jan  7 02:24:11 eunite [c011dd5e] msleep+0xf/0x14
Jan  7 02:24:11 eunite [c02242a9] e1000_reset_hw+0x57/0x353
Jan  7 02:24:11 eunite [c0219cca] e1000_reset+0xa6/0x2ed
Jan  7 02:24:11 eunite [c0219f8d] e1000_down+0x7c/0xb1
Jan  7 02:24:11 eunite [c021a253] e1000_reinit_locked+0x37/0x76
Jan  7 02:24:11 eunite [c021a3be] e1000_ioctl+0x12c/0x280
Jan  7 02:24:11 eunite [c021a292] e1000_ioctl+0x0/0x280
Jan  7 02:24:11 eunite [c029e861] dev_ifsioc+0x2dc/0x30d
Jan  7 02:24:11 eunite [c029f408] dev_ioctl+0x1f8/0x32f
Jan  7 02:24:11 eunite [c02948a5] sock_ioctl+0x41/0x15f
Jan  7 02:24:11 eunite [c0294864] sock_ioctl+0x0/0x15f
Jan  7 02:24:11 eunite [c015d6cf] do_ioctl+0x1f/0x6d
Jan  7 02:24:11 eunite [c015d76d] vfs_ioctl+0x50/0x26e
Jan  7 02:24:11 eunite [c01f3e9a] tty_write+0x0/0x1b2
Jan  7 02:24:11 eunite [c015d9bf] sys_ioctl+0x34/0x51
Jan  7 02:24:11 eunite [c010268e] sysenter_past_esp+0x5f/0x85
Jan  7 02:24:11 eunite [c02f] fn_trie_insert+0x525/0x7f6
Jan  7 02:24:11 eunite ===

Jan  7 02:24:55 eunite BUG: scheduling while atomic:
mii-tool/0x0002/31668
Jan  7 02:24:55 eunite [c02fd920] schedule+0x27a/0x35e
Jan  7 02:24:55 eunite [c02fe2bb] schedule_timeout+0x43/0x9f
Jan  7 02:24:55 eunite [c01bc290] __delay+0x6/0x7
Jan  7 02:24:55 eunite [c021fa67] e1000_write_phy_reg_ex+0x45/0x8f
Jan  7 02:24:55 eunite [c011dd92] process_timeout+0x0/0x5
Jan  7 02:24:55 eunite [c011dd5e] msleep+0xf/0x14
Jan  7 02:24:55 eunite [c0220ade] e1000_phy_init_script+0x96/0x206
Jan  7 02:24:55 eunite [c0222097] e1000_phy_reset+0x57/0xa2
Jan  7 02:24:55 eunite [c0222348] e1000_setup_copper_link+0x266/0x12bc
Jan  7 02:24:55 eunite [c0220f89] e1000_read_eeprom+0x8c/0x2f2
Jan  7 02:24:55 eunite [c0223f97] e1000_setup_link+0x37c/0x4e5
Jan  7 02:24:55 eunite [c022489f] e1000_init_hw+0x2fa/0xb68
Jan  7 02:24:55 eunite [c0118911] do_wait+0x7d4/0xc46
Jan  7 02:24:55 eunite [c022452a] e1000_reset_hw+0x2d8/0x353
Jan  7 02:24:55 eunite [c0219cea] e1000_reset+0xc6/0x2ed
Jan  7 02:24:55 eunite [c0219f8d] e1000_down+0x7c/0xb1
Jan  7 02:24:55 eunite [c021a253] e1000_reinit_locked+0x37/0x76
Jan  7 02:24:55 eunite [c021a3be] e1000_ioctl+0x12c/0x280
Jan  7 02:24:55 eunite [c021a292] e1000_ioctl+0x0/0x280
Jan  7 02:24:55 eunite [c029e861] dev_ifsioc+0x2dc/0x30d
Jan  7 02:24:55 eunite [c029f408] dev_ioctl+0x1f8/0x32f
Jan  7 02:24:55 eunite [c02948a5] sock_ioctl+0x41/0x15f
Jan  7 02:24:55 eunite [c0294864] sock_ioctl+0x0/0x15f
Jan  7 02:24:55 eunite [c015d6cf] do_ioctl+0x1f/0x6d
Jan  7 02:24:55 eunite [c015d76d] vfs_ioctl+0x50/0x26e
Jan  7 02:24:55 eunite [c01f3e9a] tty_write+0x0/0x1b2
Jan  7 02:24:55 eunite [c015d9bf] sys_ioctl+0x34/0x51
Jan  7 02:24:55 eunite [c010268e] sysenter_past_esp+0x5f/0x85

an  7 02:24:55 eunite BUG: scheduling while atomic:
mii-tool/0x0002/31668
Jan  7 02:24:55 eunite [c02fd920] schedule+0x27a/0x35e
Jan  7 02:24:55 eunite [c02fe2bb] schedule_timeout+0x43/0x9f
Jan  7 02:24:55 eunite [c011dd92] process_timeout+0x0/0x5
Jan  7 02:24:55 eunite [c011dd5e] msleep+0xf/0x14
Jan  7 02:24:55 eunite [c022235c] e1000_setup_copper_link+0x27a/0x12bc
Jan  7 02:24:55 eunite [c0220f89] e1000_read_eeprom+0x8c/0x2f2
Jan  7 02:24:55 eunite [c0223f97] e1000_setup_link+0x37c/0x4e5
Jan  7 02:24:55 eunite [c022489f] e1000_init_hw+0x2fa/0xb68
Jan  7 02:24:55 eunite [c0118911] do_wait+0x7d4/0xc46
Jan  7 02:24:55 eunite [c022452a] e1000_reset_hw+0x2d8/0x353
Jan  7 02:24:55 eunite [c0219cea] e1000_reset+0xc6/0x2ed
Jan  7 02:24:55 eunite [c0219f8d] e1000_down+0x7c/0xb1
Jan  7 02:24:55 eunite [c021a253] e1000_reinit_locked+0x37/0x76
Jan  7 02:24:55 eunite [c021a3be] e1000_ioctl+0x12c/0x280
Jan  7 02:24:55 eunite [c021a292] e1000_ioctl+0x0/0x280
Jan  7 02:24:55 eunite [c029e861] dev_ifsioc+0x2dc/0x30d
Jan  7 02:24:55 eunite [c029f408] dev_ioctl+0x1f8/0x32f
Jan  7 02:24:55 eunite [c02948a5] sock_ioctl+0x41/0x15f
Jan  7 02:24:55 eunite [c0294864] sock_ioctl+0x0/0x15f
Jan  7 02:24:55 eunite [c015d6cf] do_ioctl+0x1f/0x6d
Jan  7 02:24:55 eunite [c015d76d] vfs_ioctl+0x50/0x26e
Jan  7 02:24:55 eunite [c01f3e9a] tty_write+0x0/0x1b2
Jan  7 02:24:55 eunite [c015d9bf] sys_ioctl+0x34/0x51
Jan  7 02:24:55 eunite [c010268e] sysenter_past_esp+0x5f/0x85
Jan  7 02:24:55 eunite ===


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: handing cloned frames to netif_rx()?

2008-01-11 Thread Herbert Xu
Johannes Berg [EMAIL PROTECTED] wrote:

 Is it ok to do this? Will something freak out if we pass a cloned skb to
 netif_rx()?

Sounds OK as long as you stick to the rules of cloned skb's, e.g., not
writing to them unless you've copied it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: handing cloned frames to netif_rx()?

2008-01-11 Thread Johannes Berg

On Sat, 2008-01-12 at 09:31 +1100, Herbert Xu wrote:
 Johannes Berg [EMAIL PROTECTED] wrote:
 
  Is it ok to do this? Will something freak out if we pass a cloned skb to
  netif_rx()?
 
 Sounds OK as long as you stick to the rules of cloned skb's, e.g., not
 writing to them unless you've copied it.

Ok. Yes, we will of course adhere to that, but I was wondering whether
maybe the net stack assumes somewhere that a packet it got from the
driver can be written to w/o copying.

johannes


signature.asc
Description: This is a digitally signed message part


Re: handing cloned frames to netif_rx()?

2008-01-11 Thread Herbert Xu
On Fri, Jan 11, 2008 at 11:58:05PM +0100, Johannes Berg wrote:

 Ok. Yes, we will of course adhere to that, but I was wondering whether
 maybe the net stack assumes somewhere that a packet it got from the
 driver can be written to w/o copying.

All parts of the rx stack support clone handling because they can always
run after another handler (e.g., AF_PACKET) which may have cloned the
packet.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rp_filter and ip rule break ipsec policy

2008-01-11 Thread Herbert Xu
Marco Berizzi [EMAIL PROTECTED] wrote:

 When I insert the rule number #601 packets to
 x.y.z.214 aren't ate by xfrm anymore. This
 happens when rp_filter is set to 1 on eth0.
 Disabling rp_filter on eth0 resolve the problem:
 xfrm eat the packets.
 Is this the expected behaviour? Why should

Absolutely.  While on local output, IPsec lookup does override
routing lookup (however there we do the route lookup first and
use that as the key for the IPsec lookup).  On forwarding this
is not the case.  We decapsulate and check policy first (if
encrypted), and then do a route lookup, at which point rp_filter
can eat your packet, and only after that do we perform the output
IPsec lookup.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: questions on NAPI processing latency and dropped network packets

2008-01-11 Thread Herbert Xu
Chris Friesen [EMAIL PROTECTED] wrote:

 I'd love to work on newer kernels, but we have a commitment to our 
 customers to support multiple releases for a significant amount of time.

Since you've made the commitment, you should stick to it and resolve
the issues without asking us to contribute.  After all we haven't made
that commitment to you or your customers.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: handing cloned frames to netif_rx()?

2008-01-11 Thread Johannes Berg

On Sat, 2008-01-12 at 10:01 +1100, Herbert Xu wrote:
 On Fri, Jan 11, 2008 at 11:58:05PM +0100, Johannes Berg wrote:
 
  Ok. Yes, we will of course adhere to that, but I was wondering whether
  maybe the net stack assumes somewhere that a packet it got from the
  driver can be written to w/o copying.
 
 All parts of the rx stack support clone handling because they can always
 run after another handler (e.g., AF_PACKET) which may have cloned the
 packet.

Great, thanks for confirming, we'll do that then.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module

2008-01-11 Thread Rafael J. Wysocki
On Friday, 11 of January 2008, supersud501 wrote:
 
 Rafael J. Wysocki wrote:
  http://bugzilla.kernel.org/show_bug.cgi?id=9721

 allright, didn't see that before, sorry, here are the results:
 
 kernel 2.6.23.12 acpi=off: when shutting down the system doesn't 
 poweroff (of course), but pressing the powerbutton does the trick. and 
 wake on lan: WORKS
 
 kernel 2.6.24-rc7 acpi=off: computer doesn't power off, either (so 
 acpi=off works), but wol still DOESN'T work :(
 
 so no acpi-problem?

No, I don't think it's an ACPI problem.

Since it seems to be 100% reproducible, it would be very helpful if you could
use git-bisect to identify the offending commit.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 performance issue in 4 simultaneous links

2008-01-11 Thread David Miller
From: Benny Amorsen [EMAIL PROTECTED]
Date: Fri, 11 Jan 2008 12:09:32 +0100

 David Miller [EMAIL PROTECTED] writes:
 
  No IRQ balancing should be done at all for networking device
  interrupts, with zero exceptions.  It destroys performance.
 
 Does irqbalanced need to be taught about this?

The userland one already does.

It's only the in-kernel IRQ load balancing for these (presumably
powerpc) platforms that is broken.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: removing primary address removes secondaries

2008-01-11 Thread David Miller

echo 1 /proc/sys/net/ipv4/conf/all/promote_secondaries
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 001/001] ipv4: enable use of 240/4 address space

2008-01-11 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
Date: Fri, 11 Jan 2008 21:41:20 +0900 (JST)

 There is no positive consesus on this draft
 at the intarea meeting in Vancouver, right?
 
 We cannot / should not enable that space until we have reached
 a consensus on it.

This is so incredibly incorrect.

There is consensus on making network stacks able to use this
address space.  And that is all that the patch does.

The consensus is only missing on whether to make the address
space public or private.

This is also clearly spelled out in the draft.

It is important to get as large of a head start on this as
possible because of how long it takes to deploy something
like this.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 001/001] ipv4: enable use of 240/4 address space

2008-01-11 Thread David Miller
From: Vince Fuller [EMAIL PROTECTED]
Date: Fri, 11 Jan 2008 09:29:15 -0800

 I leave it up to you, the developers, to decide if you want to use these
 patches.

Vince, please just ignore these turkeys who are dismissing
your patch and respin it against current sources as I asked
of you.

I'll apply it, immediately, because it is the only correct
course of action.

Thanks a lot.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: questions on NAPI processing latency and dropped network packets

2008-01-11 Thread David Miller
From: Chris Friesen [EMAIL PROTECTED]
Date: Fri, 11 Jan 2008 08:59:26 -0600

 I'd love to work on newer kernels, but we have a commitment to our 
 customers to support multiple releases for a significant amount of time.

And by asking here for people to dig into it for you, you are asking
people for free help providing that support.

That's why there is such negative backlash to asking questions about
such ancient kernel here, you're asking us to do your work, for free.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.23+] ingress classify to [nf]mark

2008-01-11 Thread jamal
On Fri, 2008-11-01 at 18:42 -0200, Dzianis Kahanovich wrote:

 About script example:
 While I compose filter, I check flag ($TC_INDEX2MARK), tells me are patch
 applied or no. If no - I use usual -j MARK --set-mark, else I use classid to
 change mark. All in ingress only. For example:
 tc filter add dev eth0 parent : protocol ip u32 ... action ipt -j MARK 
 0x10
 are cname to:
 tc filter add dev eth0 parent : protocol ip u32 ... flowid :10

I thought you were doing something like this (to achieve your policy):

--
major=1
minor=12
mark=`expr $major + $minor`
#
tc qdisc add dev XXX ingress
tc filter add dev XXX parent : protocol ip prio 5 \
u32 blah bleh \
flowid $major:$minor action \
ipt -j mark --set-mark $mark
---

 - it use less code/modules and, in many cases, may be single/main goal to
 ingress usage - pre-marking packets.

That is true and you would also have one less line in your policy; as an
example in above the line ipt -j mark --set-mark $mark would be
unnecessary; however, all the other lines in the policy setting _will be
necessary_. And this + the fact there are many other values/shapes the
default policy could take is essentially whats bothering me. 

In any case, scanning the current code it seems mark is no longer
considered a netfilter-only metadatum - so it may not be semantically as
obscene as i felt earlier; Can you pick something simpler for policy?
example set the mark to whatever tc_index gets set?
If you still could write the metadata action, we could use it to
override mark, tc_index etc in addition.

cheers,
jamal

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 001/001] ipv4: enable use of 240/4 address space

2008-01-11 Thread YOSHIFUJI Hideaki / 吉藤英明
In article [EMAIL PROTECTED] (at Fri, 11 Jan 2008 17:48:57 -0800 (PST)), 
David Miller [EMAIL PROTECTED] says:

 From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
 Date: Fri, 11 Jan 2008 21:41:20 +0900 (JST)
 
  There is no positive consesus on this draft
  at the intarea meeting in Vancouver, right?
  
  We cannot / should not enable that space until we have reached
  a consensus on it.
 
 This is so incredibly incorrect.
 
 There is consensus on making network stacks able to use this
 address space.  And that is all that the patch does.

No, we did never make consensus on it.

 The consensus is only missing on whether to make the address
 space public or private.
 
 This is also clearly spelled out in the draft.
 
 It is important to get as large of a head start on this as
 possible because of how long it takes to deploy something
 like this.

Okay, though I am afraid this space will not be used widely,
we should be ready for it.

I'll make some more comments on the patch itself from
another point view.

--yoshfuji

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 performance issue in 4 simultaneous links

2008-01-11 Thread Denys Fedoryshchenko
Sorry. that i interfere in this subject.

Do you recommend CONFIG_IRQBALANCE to be enabled?

If it is enabled - irq's not jumping nonstop over processors. softirqd 
changing this behavior.
If it is disabled, irq's distributed over each processor, and in loaded 
systems it seems harmful. 
I work a little yesterday with server with CONFIG_IRQBALANCE=no, 160kpps load.
It was packetloss-ing, till i set smp_affinity.

Maybe it is useful to put more info in Kconfig, since it is very important 
for performance option.

On Fri, 11 Jan 2008 17:41:09 -0800 (PST), David Miller wrote
 From: Benny Amorsen benny [EMAIL PROTECTED]
 Date: Fri, 11 Jan 2008 12:09:32  0100
 
  David Miller [EMAIL PROTECTED] writes:
  
   No IRQ balancing should be done at all for networking device
   interrupts, with zero exceptions.  It destroys performance.
  
  Does irqbalanced need to be taught about this?
 
 The userland one already does.
 
 It's only the in-kernel IRQ load balancing for these (presumably
 powerpc) platforms that is broken.
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: questions on NAPI processing latency and dropped network packets

2008-01-11 Thread Ray Lee
On Jan 10, 2008 9:24 AM, Chris Friesen [EMAIL PROTECTED] wrote:
 After a recent userspace app change, we've started seeing packets being
 dropped by the ethernet hardware (e1000, NAPI is enabled).  The
 error/dropped/fifo counts are going up in ethtool:

(These are perhaps too obvious, but I didn't see the questions or
answers in the thread.)

Can you reproduce it with a simple userspace cpu hog? (Two, really,
one per cpu.)

Can you reproduce it with the newer e1000?

Can you reproduce it with git head?

If the answer to the first one is yes, the last no, then bisect until
you get a kernel that doesn't show the problem. Backport the fix,
unless the fix happens to be CFS. However, I suspect that your
userpace app is just starving the system from time to time.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 001/001] ipv4: enable use of 240/4 address space

2008-01-11 Thread YOSHIFUJI Hideaki / 吉藤英明
Hello.

In article [EMAIL PROTECTED] (at Mon, 7 Jan 2008 17:10:57 -0800), Vince 
Fuller [EMAIL PROTECTED] says:

  #define IN_MULTICAST_NET 0xF000
  
 +#define IN_CLASSE(a) long int) (a))  0xf000) == 0xf000)
 +#define  IN_CLASSE_NET   0xff00
 +#define  IN_CLASSE_NSHIFT8
 +#define  IN_CLASSE_HOST  (0x  ~IN_CLASSE_NET)
 +
 +/* 
 + * these are no longer used
  #define  IN_EXPERIMENTAL(a)  long int) (a))  0xf000) == 
 0xf000)
  #define  IN_BADCLASS(a)  IN_EXPERIMENTAL((a))
 +*/

Please do not remove this, but have these instead:

#define IN_EXPERIMENTAL(a)  IN_CLASSE((a))
#define IN_BADCASS(a)   IN_CLASSE((a))

And, I think it is good to remove BADCLASS() (inside
#ifdef __KERNEL__ .. #endif) because we do not have its users
any longer, right?

Regards,

--yoshfuji
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Netconf at conf.au 2008?

2008-01-11 Thread Andy Johnson
Hello,
I saw somewhere (maybe in this mailing list a while ago) that there
might be a  Linux Kernel Developers' Netconf conference  at conf.au
2008.

Does anyone here know if  such a thing is planned ?

Regards,
Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] move size information to pr_debug()

2008-01-11 Thread Stephen Hemminger
The size of structures is a debug thing, not something that needs to
be part of a /proc api.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/net/ipv4/fib_trie.c   2008-01-11 22:29:20.0 -0800
+++ b/net/ipv4/fib_trie.c   2008-01-11 22:30:28.0 -0800
@@ -1962,8 +1962,11 @@ struct fib_table *fib_hash_init(u32 id)
t = (struct trie *) tb-tb_data;
memset(t, 0, sizeof(*t));
 
-   if (id == RT_TABLE_LOCAL)
+   if (id == RT_TABLE_LOCAL) {
printk(KERN_INFO IPv4 FIB: Using LC-trie version %s\n, 
VERSION);
+   pr_debug(Basic info: size of leaf: %Zd bytes, size of tnode: 
%Zd bytes.\n,
+sizeof(struct leaf), sizeof(struct tnode));
+   }
 
return tb;
 }
@@ -2159,9 +2162,6 @@ static int fib_triestat_seq_show(struct 
if (!stat)
return -ENOMEM;
 
-   seq_printf(seq, Basic info: size of leaf: %Zd bytes, size of tnode: 
%Zd bytes.\n,
-  sizeof(struct leaf), sizeof(struct tnode));
-
if (trie_local) {
seq_printf(seq, Local:\n);
trie_collect_stats(trie_local, stat);

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] : fib_insert_node cleanup

2008-01-11 Thread Stephen Hemminger
The only error from fib_insert_node is if memory allocation fails,
so instead of passing by reference, just use the convention of returning
NULL.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/net/ipv4/fib_trie.c   2008-01-11 22:04:08.0 -0800
+++ b/net/ipv4/fib_trie.c   2008-01-11 22:04:33.0 -0800
@@ -980,8 +980,7 @@ static struct node *trie_rebalance(struc
 
 /* only used from updater-side */
 
-static  struct list_head *
-fib_insert_node(struct trie *t, int *err, u32 key, int plen)
+static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen)
 {
int pos, newpos;
struct tnode *tp = NULL, *tn = NULL;
@@ -1043,10 +1042,8 @@ fib_insert_node(struct trie *t, int *err
 
li = leaf_info_new(plen);
 
-   if (!li) {
-   *err = -ENOMEM;
-   goto done;
-   }
+   if (!li)
+   return NULL;
 
fa_head = li-falh;
insert_leaf_info(l-list, li);
@@ -1054,18 +1051,15 @@ fib_insert_node(struct trie *t, int *err
}
l = leaf_new();
 
-   if (!l) {
-   *err = -ENOMEM;
-   goto done;
-   }
+   if (!l)
+   return NULL;
 
l-key = key;
li = leaf_info_new(plen);
 
if (!li) {
tnode_free((struct tnode *) l);
-   *err = -ENOMEM;
-   goto done;
+   return NULL;
}
 
fa_head = li-falh;
@@ -1101,8 +1095,7 @@ fib_insert_node(struct trie *t, int *err
if (!tn) {
free_leaf_info(li);
tnode_free((struct tnode *) l);
-   *err = -ENOMEM;
-   goto done;
+   return NULL;
}
 
node_set_parent((struct node *)tn, tp);
@@ -1258,10 +1251,11 @@ static int fn_trie_insert(struct fib_tab
 */
 
if (!fa_head) {
-   err = 0;
-   fa_head = fib_insert_node(t, err, key, plen);
-   if (err)
+   fa_head = fib_insert_node(t, key, plen);
+   if (unlikely(!fa_head)) {
+   err = -ENOMEM;
goto out_free_new_fa;
+   }
}
 
list_add_tail_rcu(new_fa-fa_list,

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] FIB patches for 2.6.25

2008-01-11 Thread Stephen Hemminger
Did some work cleaning up FIB Trie today.  The only real change
is the output format for /proc/net/fib_triestat.

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] add statistics

2008-01-11 Thread Stephen Hemminger
The FIB TRIE code has a bunch of statistics, but the code is hidden
behind an ifdef that was never implemented. Since it was dead code,
it was broken as well.

This patch fixes that by making it a config option.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/net/ipv4/Kconfig  2008-01-11 22:17:11.0 -0800
+++ b/net/ipv4/Kconfig  2008-01-11 22:31:17.0 -0800
@@ -85,6 +85,13 @@ endchoice
 config IP_FIB_HASH
def_bool ASK_IP_FIB_HASH || !IP_ADVANCED_ROUTER
 
+config IP_FIB_TRIE_STATS
+   bool FIB TRIE statistics
+   depends on IP_FIB_TRIE
+   ---help---
+ Keep track of statistics on structure of FIB TRIE table.
+ Useful for testing and measuring TRIE performance.
+
 config IP_MULTIPLE_TABLES
bool IP: policy routing
depends on IP_ADVANCED_ROUTER
--- a/net/ipv4/fib_trie.c   2008-01-11 22:31:00.0 -0800
+++ b/net/ipv4/fib_trie.c   2008-01-11 22:31:56.0 -0800
@@ -82,7 +82,6 @@
 #include net/ip_fib.h
 #include fib_lookup.h
 
-#undef CONFIG_IP_FIB_TRIE_STATS
 #define MAX_STAT_DEPTH 32
 
 #define KEYLENGTH (8*sizeof(t_key))
@@ -2119,20 +2118,22 @@ static void trie_show_stats(struct seq_f
bytes += sizeof(struct node *) * pointers;
seq_printf(seq, Null ptrs: %u\n, stat-nullpointers);
seq_printf(seq, Total size: %u  kB\n, (bytes + 1023) / 1024);
+}
 
 #ifdef CONFIG_IP_FIB_TRIE_STATS
-   seq_printf(seq, Counters:\n-\n);
-   seq_printf(seq,gets = %d\n, t-stats.gets);
-   seq_printf(seq,backtracks = %d\n, t-stats.backtrack);
-   seq_printf(seq,semantic match passed = %d\n, 
t-stats.semantic_match_passed);
-   seq_printf(seq,semantic match miss = %d\n, 
t-stats.semantic_match_miss);
-   seq_printf(seq,null node hit= %d\n, t-stats.null_node_hit);
-   seq_printf(seq,skipped node resize = %d\n, 
t-stats.resize_node_skipped);
-#ifdef CLEAR_STATS
-   memset((t-stats), 0, sizeof(t-stats));
-#endif
-#endif /*  CONFIG_IP_FIB_TRIE_STATS */
+static void trie_show_usage(struct seq_file *seq,
+   const struct trie_use_stats *stats)
+{
+   seq_printf(seq, \nCounters:\n-\n);
+   seq_printf(seq,gets = %u\n, stats-gets);
+   seq_printf(seq,backtracks = %u\n, stats-backtrack);
+   seq_printf(seq,semantic match passed = %u\n, 
stats-semantic_match_passed);
+   seq_printf(seq,semantic match miss = %u\n, 
stats-semantic_match_miss);
+   seq_printf(seq,null node hit= %u\n, stats-null_node_hit);
+   seq_printf(seq,skipped node resize = %u\n\n, 
stats-resize_node_skipped);
 }
+#endif /*  CONFIG_IP_FIB_TRIE_STATS */
+
 
 static int fib_triestat_seq_show(struct seq_file *seq, void *v)
 {
@@ -2160,12 +2161,18 @@ static int fib_triestat_seq_show(struct 
seq_printf(seq, Local:\n);
trie_collect_stats(trie_local, stat);
trie_show_stats(seq, stat);
+#ifdef CONFIG_IP_FIB_TRIE_STATS
+   trie_show_usage(seq, trie_local-stats);
+#endif
}
 
if (trie_main) {
seq_printf(seq, Main:\n);
trie_collect_stats(trie_main, stat);
trie_show_stats(seq, stat);
+#ifdef CONFIG_IP_FIB_TRIE_STATS
+   trie_show_usage(seq, trie_main-stats);
+#endif
}
kfree(stat);
 

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] use %u for unsigned printfs

2008-01-11 Thread Stephen Hemminger
Use %u instead of %d when printing unsigned values.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/net/ipv4/fib_trie.c   2008-01-11 22:30:36.0 -0800
+++ b/net/ipv4/fib_trie.c   2008-01-11 22:30:46.0 -0800
@@ -2100,13 +2100,13 @@ static void trie_show_stats(struct seq_f
else
avdepth = 0;
 
-   seq_printf(seq, \tAver depth: %d.%02d\n, avdepth / 100, avdepth % 
100 );
+   seq_printf(seq, \tAver depth: %u.%02d\n, avdepth / 100, avdepth % 
100 );
seq_printf(seq, \tMax depth:  %u\n, stat-maxdepth);
 
seq_printf(seq, \tLeaves: %u\n, stat-leaves);
 
bytes = sizeof(struct leaf) * stat-leaves;
-   seq_printf(seq, \tInternal nodes: %d\n\t, stat-tnodes);
+   seq_printf(seq, \tInternal nodes: %u\n\t, stat-tnodes);
bytes += sizeof(struct tnode) * stat-tnodes;
 
max = MAX_STAT_DEPTH;
@@ -2116,15 +2116,15 @@ static void trie_show_stats(struct seq_f
pointers = 0;
for (i = 1; i = max; i++)
if (stat-nodesizes[i] != 0) {
-   seq_printf(seq,   %d: %d,  i, stat-nodesizes[i]);
+   seq_printf(seq,   %u: %u,  i, stat-nodesizes[i]);
pointers += (1i) * stat-nodesizes[i];
}
seq_putc(seq, '\n');
-   seq_printf(seq, \tPointers: %d\n, pointers);
+   seq_printf(seq, \tPointers: %u\n, pointers);
 
bytes += sizeof(struct node *) * pointers;
-   seq_printf(seq, Null ptrs: %d\n, stat-nullpointers);
-   seq_printf(seq, Total size: %d  kB\n, (bytes + 1023) / 1024);
+   seq_printf(seq, Null ptrs: %u\n, stat-nullpointers);
+   seq_printf(seq, Total size: %u  kB\n, (bytes + 1023) / 1024);
 
 #ifdef CONFIG_IP_FIB_TRIE_STATS
seq_printf(seq, Counters:\n-\n);
@@ -2318,7 +2318,7 @@ static inline const char *rtn_type(unsig
 
if (t  __RTN_MAX  rtn_type_names[t])
return rtn_type_names[t];
-   snprintf(buf, sizeof(buf), type %d, t);
+   snprintf(buf, sizeof(buf), type %u, t);
return buf;
 }
 

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] get rid of unused revision element

2008-01-11 Thread Stephen Hemminger
The revision element must of been part of an earlier design,
because currently it is set but never used.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/net/ipv4/fib_trie.c   2008-01-11 22:18:34.0 -0800
+++ b/net/ipv4/fib_trie.c   2008-01-11 22:26:34.0 -0800
@@ -153,7 +153,6 @@ struct trie {
struct trie_use_stats stats;
 #endif
int size;
-   unsigned int revision;
 };
 
 static void put_child(struct trie *t, struct tnode *tn, int i, struct node *n);
@@ -1046,7 +1045,7 @@ fib_insert_node(struct trie *t, int *err
 
if (!li) {
*err = -ENOMEM;
-   goto err;
+   goto done;
}
 
fa_head = li-falh;
@@ -1058,7 +1057,7 @@ fib_insert_node(struct trie *t, int *err
 
if (!l) {
*err = -ENOMEM;
-   goto err;
+   goto done;
}
 
l-key = key;
@@ -1067,7 +1066,7 @@ fib_insert_node(struct trie *t, int *err
if (!li) {
tnode_free((struct tnode *) l);
*err = -ENOMEM;
-   goto err;
+   goto done;
}
 
fa_head = li-falh;
@@ -1104,7 +1103,7 @@ fib_insert_node(struct trie *t, int *err
free_leaf_info(li);
tnode_free((struct tnode *) l);
*err = -ENOMEM;
-   goto err;
+   goto done;
}
 
node_set_parent((struct node *)tn, tp);
@@ -1130,8 +1129,6 @@ fib_insert_node(struct trie *t, int *err
 
rcu_assign_pointer(t-trie, trie_rebalance(t, tp));
 done:
-   t-revision++;
-err:
return fa_head;
 }
 
@@ -1543,7 +1540,6 @@ static int trie_leaf_remove(struct trie 
 * Remove the leaf and rebalance the tree
 */
 
-   t-revision++;
t-size--;
 
tp = node_parent(n);
@@ -1749,8 +1745,6 @@ static int fn_trie_flush(struct fib_tabl
struct leaf *ll = NULL, *l = NULL;
int found = 0, h;
 
-   t-revision++;
-
for (h = 0; (l = nextleaf(t, l)) != NULL; h++) {
found += trie_flush_leaf(t, l);
 

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] statistics improvements

2008-01-11 Thread Stephen Hemminger
Turn the unused size field into a useful counter for the number
of routes.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/net/ipv4/fib_trie.c   2008-01-11 22:30:28.0 -0800
+++ b/net/ipv4/fib_trie.c   2008-01-11 22:30:36.0 -0800
@@ -149,10 +149,10 @@ struct trie_stat {
 
 struct trie {
struct node *trie;
+   unsigned int size;
 #ifdef CONFIG_IP_FIB_TRIE_STATS
struct trie_use_stats stats;
 #endif
-   int size;
 };
 
 static void put_child(struct trie *t, struct tnode *tn, int i, struct node *n);
@@ -1052,7 +1052,6 @@ fib_insert_node(struct trie *t, int *err
insert_leaf_info(l-list, li);
goto done;
}
-   t-size++;
l = leaf_new();
 
if (!l) {
@@ -1267,6 +1266,7 @@ static int fn_trie_insert(struct fib_tab
 
list_add_tail_rcu(new_fa-fa_list,
  (fa ? fa-fa_list : fa_head));
+   t-size++;
 
rt_cache_flush(-1);
rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, tb-tb_id,

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] get rid of trie_init

2008-01-11 Thread Stephen Hemminger
trie_init is worthless it is just zeroing stuff that is already
zero!  Move the memset() down to make it obvious.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/net/ipv4/fib_trie.c   2008-01-11 21:56:47.0 -0800
+++ b/net/ipv4/fib_trie.c   2008-01-11 22:03:47.0 -0800
@@ -876,19 +876,6 @@ nomem:
}
 }
 
-static void trie_init(struct trie *t)
-{
-   if (!t)
-   return;
-
-   t-size = 0;
-   rcu_assign_pointer(t-trie, NULL);
-   t-revision = 0;
-#ifdef CONFIG_IP_FIB_TRIE_STATS
-   memset(t-stats, 0, sizeof(struct trie_use_stats));
-#endif
-}
-
 /* readside must use rcu_read_lock currently dump routines
  via get_fa_head and dump */
 
@@ -1977,11 +1964,9 @@ struct fib_table *fib_hash_init(u32 id)
tb-tb_flush = fn_trie_flush;
tb-tb_select_default = fn_trie_select_default;
tb-tb_dump = fn_trie_dump;
-   memset(tb-tb_data, 0, sizeof(struct trie));
 
t = (struct trie *) tb-tb_data;
-
-   trie_init(t);
+   memset(t, 0, sizeof(*t));
 
if (id == RT_TABLE_LOCAL)
printk(KERN_INFO IPv4 FIB: Using LC-trie version %s\n, 
VERSION);

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] fix sparse warnings

2008-01-11 Thread Stephen Hemminger
Make FIB TRIE go through sparse checker without warnings.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/net/ipv4/fib_trie.c   2008-01-11 22:35:37.0 -0800
+++ b/net/ipv4/fib_trie.c   2008-01-11 22:41:57.0 -0800
@@ -653,7 +653,6 @@ static struct node *resize(struct trie *
 
 static struct tnode *inflate(struct trie *t, struct tnode *tn)
 {
-   struct tnode *inode;
struct tnode *oldtnode = tn;
int olen = tnode_child_length(tn);
int i;
@@ -701,6 +700,7 @@ static struct tnode *inflate(struct trie
}
 
for (i = 0; i  olen; i++) {
+   struct tnode *inode;
struct node *node = tnode_get_child(oldtnode, i);
struct tnode *left, *right;
int size, j;
@@ -1037,8 +1037,7 @@ static struct list_head *fib_insert_node
/* Case 1: n is a leaf. Compare prefixes */
 
if (n != NULL  IS_LEAF(n)  tkey_equals(key, n-key)) {
-   struct leaf *l = (struct leaf *) n;
-
+   l = (struct leaf *) n;
li = leaf_info_new(plen);
 
if (!li)
@@ -2231,6 +2230,7 @@ static struct node *fib_trie_get_idx(str
 }
 
 static void *fib_trie_seq_start(struct seq_file *seq, loff_t *pos)
+   __acquires(RCU)
 {
struct fib_trie_iter *iter = seq-private;
struct fib_table *tb;
@@ -2273,6 +2273,7 @@ static void *fib_trie_seq_next(struct se
 }
 
 static void fib_trie_seq_stop(struct seq_file *seq, void *v)
+   __releases(RCU)
 {
rcu_read_unlock();
 }

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] printk related cleanups

2008-01-11 Thread Stephen Hemminger
printk related cleanups:
 * Get rid of unused printk wrappers.
 * Make bug checks into KERN_WARNING because KERN_DEBUG gets ignored
 * Turn one cryptic old message into something real
 * Make sure all messages have KERN_XXX

---
 net/ipv4/fib_frontend.c  |6 ++
 net/ipv4/fib_hash.c  |3 ++-
 net/ipv4/fib_semantics.c |7 +++
 3 files changed, 7 insertions(+), 9 deletions(-)

--- a/net/ipv4/fib_frontend.c   2008-01-11 21:56:47.0 -0800
+++ b/net/ipv4/fib_frontend.c   2008-01-11 22:05:09.0 -0800
@@ -47,8 +47,6 @@
 #include net/ip_fib.h
 #include net/rtnetlink.h
 
-#define FFprint(a...) printk(KERN_DEBUG a)
-
 #ifndef CONFIG_IP_MULTIPLE_TABLES
 
 static int __net_init fib4_rules_init(struct net *net)
@@ -706,7 +704,7 @@ void fib_add_ifaddr(struct in_ifaddr *if
if (ifa-ifa_flagsIFA_F_SECONDARY) {
prim = inet_ifa_byprefix(in_dev, prefix, mask);
if (prim == NULL) {
-   printk(KERN_DEBUG fib_add_ifaddr: bug: prim == 
NULL\n);
+   printk(KERN_WARNING fib_add_ifaddr: bug: prim == 
NULL\n);
return;
}
}
@@ -753,7 +751,7 @@ static void fib_del_ifaddr(struct in_ifa
else {
prim = inet_ifa_byprefix(in_dev, any, ifa-ifa_mask);
if (prim == NULL) {
-   printk(KERN_DEBUG fib_del_ifaddr: bug: prim == 
NULL\n);
+   printk(KERN_WARNING fib_del_ifaddr: bug: prim == 
NULL\n);
return;
}
}
--- a/net/ipv4/fib_hash.c   2008-01-11 21:56:47.0 -0800
+++ b/net/ipv4/fib_hash.c   2008-01-11 22:04:45.0 -0800
@@ -168,7 +168,8 @@ static void fn_rehash_zone(struct fn_zon
new_hashmask = (new_divisor - 1);
 
 #if RT_CACHE_DEBUG = 2
-   printk(fn_rehash_zone: hash for zone %d grows from %d\n, 
fz-fz_order, old_divisor);
+   printk(KERN_DEBUG fn_rehash_zone: hash for zone %d grows from %d\n,
+  fz-fz_order, old_divisor);
 #endif
 
ht = fz_hash_alloc(new_divisor);
--- a/net/ipv4/fib_semantics.c  2008-01-11 21:56:47.0 -0800
+++ b/net/ipv4/fib_semantics.c  2008-01-11 22:04:45.0 -0800
@@ -47,8 +47,6 @@
 
 #include fib_lookup.h
 
-#define FSprintk(a...)
-
 static DEFINE_SPINLOCK(fib_info_lock);
 static struct hlist_head *fib_info_hash;
 static struct hlist_head *fib_info_laddrhash;
@@ -145,7 +143,7 @@ static const struct
 void free_fib_info(struct fib_info *fi)
 {
if (fi-fib_dead == 0) {
-   printk(Freeing alive fib_info %p\n, fi);
+   printk(KERN_WARNING Freeing alive fib_info %p\n, fi);
return;
}
change_nexthops(fi) {
@@ -914,7 +912,8 @@ int fib_semantic_match(struct list_head 
continue;
 
default:
-   printk(KERN_DEBUG impossible 102\n);
+   printk(KERN_WARNING fib_semantic_match bad 
type %#x\n,
+   fa-fa_type);
return -EINVAL;
}
}

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ibm_newemac: Increase number of default rx-/tx-buffers

2008-01-11 Thread Stefan Roese
On Friday 11 January 2008, Benjamin Herrenschmidt wrote:
 On Fri, 2008-01-11 at 09:48 -0800, Eugene Surovegin wrote:
  On Sat, Jan 05, 2008 at 01:38:17PM +0100, Stefan Roese wrote:
   On Saturday 05 January 2008, Benjamin Herrenschmidt wrote:
On Sat, 2008-01-05 at 10:50 +0100, Stefan Roese wrote:
 Performance tests done by AMCC have shown that 256 buffer increase
 the performance of the Linux EMAC driver. So let's update the
 default values to match this setup.

 Signed-off-by: Stefan Roese [EMAIL PROTECTED]
 ---
   
Do we have the numbers ? Did they also measure latency ?
  
   I hoped this question would not come. ;) No, unfortunately I don't have
   any numbers. Just the recommendation from AMCC to always use 256
   buffers.
 
  This cannot be true for all chips. Default numbers I selected weren't
  random. In particular, 256 for Tx doesn't make a lot of sense for 405.
  You just gonna waste memory.

This may be the case with the old 405 PPC's. But with the new ones coming 
out right now, like the up to 666MHz 405EX with GBit support, 256 could be an 
improvement. I still owe you figures though. Will try to do some testing in a 
short while.

  I'd be quite reluctant to follow such advices from AMCC without actual
  details.

 I think we can make defaults based on other config options nowadays. Not
 very nice but we could do things like

   default 128 if PPC_40x
   default 256

 Or even more detailed.

We shouldn't make it too complicated. We can always select different settings 
in the defconfig file. My thinking here is to better wast a little memory 
with a potential performance improvement. Just me 0.02$

Best regards,
Stefan
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ibm_newemac: Increase number of default rx-/tx-buffers

2008-01-11 Thread Benjamin Herrenschmidt

On Sat, 2008-01-12 at 08:26 +0100, Stefan Roese wrote:
 
 We shouldn't make it too complicated. We can always select different
 settings 
 in the defconfig file. My thinking here is to better wast a little
 memory 
 with a potential performance improvement. Just me 0.02$

If it gets really critical, then we can move those settings to the
device-tree.

Ben.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html