date:20070321

Re: [PATCH -rt] Fix initialization of spinlock in irttp_dup()

2007-03-21 Thread Ingo Molnar


* Deepak Saxena [EMAIL PROTECTED] wrote:

 This was found around the 2.6.10 timeframe when testing with the -rt 
 patch and I believe is still is an issue. irttp_dup() does a memcpy() 
 of the tsap_cb structure causing the spinlock protecting various 
 fields in the structure to be duped.  This works OK in the non-RT case 
 but in the RT case we end up with two mutexes pointing to the same 
 wait_list and leading to an OOPS. Fix is to simply initialize the 
 spinlock after the memcpy().

note that memcpy based lock initialization is a problem for lockdep too.

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oops in filter add

2007-03-21 Thread jamal

On Tue, 2007-20-03 at 11:58 +0100, Patrick McHardy wrote:
 jamal wrote:

  So the resolution (as Dave points out) was wrong. In any case, restoring
  queue_lock for now would slow things but will remove the race.
 
 
 Yes. I think thats what we should do for 2.6.21, since fixing
 this while keeping ingress_lock is quite intrusive.
 

reasonable.

 I'm on it. I'm using the opportunity to try to simply the qdisc locking.

Ok, thanks Patrick. 
BTW, I was just staring at the code and i think i have found probably a
long standing minor bug on the holding of the tree lock. I will post
 a patch shortly if i dont get disrupted.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1][PKT_CLS] Avoid multiple tree locks

2007-03-21 Thread jamal

Seems to have been around a while. IMO, mterial for 2.6.21 but not
stable. I have only compile-tested but it looks right(tm).
I could have moved the lock down, but this looked safer.

cheers,
jamal
[PKT_CLS] Avoid multiple tree locks
This fixes: When dumping filters the tree is locked first in the main
dump function then when looking qdisc

Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]

---
commit 4a52cdd599f259b05320219d7aba1bac58fdf6d0
tree e9e4b83f7a2925b4408e4f18211365c3f9bff3fa
parent 0a14fe6e5efd0af0f9c6c01e0433445d615d0110
author Jamal Hadi Salim [EMAIL PROTECTED] Wed, 21 Mar 2007 05:27:55 -0400
committer Jamal Hadi Salim [EMAIL PROTECTED] Wed, 21 Mar 2007 05:27:55 -0400

 include/net/pkt_sched.h |1 +
 net/sched/cls_api.c |2 +-
 net/sched/sch_api.c |2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index f6afee7..dd930bd 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -212,6 +212,7 @@ extern struct Qdisc_ops bfifo_qdisc_ops;
 
 extern int register_qdisc(struct Qdisc_ops *qops);
 extern int unregister_qdisc(struct Qdisc_ops *qops);
+extern struct Qdisc *__qdisc_lookup(struct net_device *dev, u32 handle);
 extern struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 extern struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 extern struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 5c6ffdb..17d4d37 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -403,7 +403,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct 
netlink_callback *cb)
if (!tcm-tcm_parent)
q = dev-qdisc_sleeping;
else
-   q = qdisc_lookup(dev, TC_H_MAJ(tcm-tcm_parent));
+   q = __qdisc_lookup(dev, TC_H_MAJ(tcm-tcm_parent));
if (!q)
goto out;
if ((cops = q-ops-cl_ops) == NULL)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ecc988a..1a3b65e 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -190,7 +190,7 @@ int unregister_qdisc(struct Qdisc_ops *qops)
(root qdisc, all its children, children of children etc.)
  */
 
-static struct Qdisc *__qdisc_lookup(struct net_device *dev, u32 handle)
+struct Qdisc *__qdisc_lookup(struct net_device *dev, u32 handle)
 {
struct Qdisc *q;

Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks

2007-03-21 Thread Patrick McHardy

jamal wrote:
 Seems to have been around a while. IMO, mterial for 2.6.21 but not
 stable. I have only compile-tested but it looks right(tm).


Its harmless since its a read lock, which can be nested. I actually
don't see any need for qdisc_tree_lock at all, all changes and all
walking is done under the RTNL, which is why I've removed it in
my (upcoming) patches. I suggest to leave it as is for now so I
don't need to change the __qdisc_lookup back to qdisc_lookup in
2.6.22.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks

2007-03-21 Thread Patrick McHardy

Patrick McHardy wrote:
 jamal wrote:
 
Seems to have been around a while. IMO, mterial for 2.6.21 but not
stable. I have only compile-tested but it looks right(tm).
 
 
 
 Its harmless since its a read lock, which can be nested. I actually
 don't see any need for qdisc_tree_lock at all, all changes and all
 walking is done under the RTNL, which is why I've removed it in
 my (upcoming) patches. I suggest to leave it as is for now so I
 don't need to change the __qdisc_lookup back to qdisc_lookup in
 2.6.22.


Alexey just explained to me why we do need qdisc_tree_lock in private
mail. While dumping only the first skb is filled under the RTNL,
while filling further skbs we don't hold the RTNL anymore. So I will
probably have to drop that patch.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks

2007-03-21 Thread jamal

On Wed, 2007-21-03 at 11:10 +0100, Patrick McHardy wrote:

 Its harmless since its a read lock, which can be nested. I actually
 don't see any need for qdisc_tree_lock at all, all changes and all
 walking is done under the RTNL, which is why I've removed it in
 my (upcoming) patches. I suggest to leave it as is for now so I
 don't need to change the __qdisc_lookup back to qdisc_lookup in
 2.6.22.

Sounds good to me.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[1/1] netlink: no need to crash if table does not exist.

2007-03-21 Thread Evgeniy Polyakov

We would already do that on init.
Some things become very confused, when nl_table is not used to store
netlink sockets.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]

23ebdcf1f439cde050a63f33897d5b099fe08c95
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 9b69d9b..071e4d7 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1330,8 +1330,6 @@ netlink_kernel_create(int unit, unsigned int groups,
struct netlink_sock *nlk;
unsigned long *listeners = NULL;
 
-   BUG_ON(!nl_table);
-
if (unit0 || unit=MAX_LINKS)
return NULL;
 

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [1/1] netlink: no need to crash if table does not exist.

2007-03-21 Thread Patrick McHardy

Evgeniy Polyakov wrote:
 We would already do that on init.
 Some things become very confused, when nl_table is not used to store
 netlink sockets.


Its unnecessary, but I don't understand what the problem is.
Why would it be NULL and what gets confused?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [1/1] netlink: no need to crash if table does not exist.

2007-03-21 Thread Evgeniy Polyakov

On Wed, Mar 21, 2007 at 11:54:45AM +0100, Patrick McHardy ([EMAIL PROTECTED]) 
wrote:
 Evgeniy Polyakov wrote:
  We would already do that on init.
  Some things become very confused, when nl_table is not used to store
  netlink sockets.
 
 
 Its unnecessary, but I don't understand what the problem is.
 Why would it be NULL and what gets confused?

There is no problem as-is, but I implement unified cache for different
sockets (currently tcp/udp/raw and netlink are supported), which does
not use that table, so I currently wrap all access code into special
ifdefs, this one can be wrapped too, but since it is not needed, it
saves couple of lines of code.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-21 Thread Guennadi Liakhovetski

(Short recap for newly added to cc: netdev: I'm seeing an skb leak in 
2.6.20 during an IrDA IrNET+ppp UDP test with periodic connection 
disruptions)


On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:


On Tue, 20 Mar 2007, Guennadi Liakhovetski wrote:

Ok, looks like all leaked skbuffs come from ip_append_data(), like this:

(sock_alloc_send_skb+0x2c8/0x2e4)
(ip_append_data+0x7fc/0xa80)
(udp_sendmsg+0x248/0x68c)
(inet_sendmsg+0x60/0x64)
(sock_sendmsg+0xb4/0xe4)
 r4 = C3CB4960
(sys_sendto+0xc8/0xf0)
 r4 = 
(sys_socketcall+0x168/0x1f0)
(ret_fast_syscall+0x0/0x2c)


This call to sock_alloc_send_skb() in ip_append_data() is not from the 
inlined ip_ufo_append_data(), it is here:


/* The last fragment gets additional space at tail.
 * Note, with MSG_MORE we overallocate on fragments,
 * because we have no idea what fragment will be
 * the last.
 */
if (datalen == length + fraggap)
alloclen += rt-u.dst.trailer_len;

if (transhdrlen) {
skb = sock_alloc_send_skb(sk,
alloclen + hh_len + 15,
(flags  MSG_DONTWAIT), err);
} else {

Then, I traced a couple of paths how such a skbuff, coming down from 
ip_append_data() and allocated above get freed (when they do):


[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
 r5 = C332BC00  r4 = C332BC00 
[c01824f0] (kfree_skb+0x0/0x50) from [bf0fac58] (irlap_update_nr_received+0x94/0xc8 [irda])

[bf0fabc4] (irlap_update_nr_received+0x0/0xc8 [irda]) from [bf0fda98] 
(irlap_state_nrm_p+0x530/0x7c0 [irda])
 r7 = 0001  r6 = C0367EC0  r5 = C332BC00  r4 = 
[bf0fd568] (irlap_state_nrm_p+0x0/0x7c0 [irda]) from [bf0fbd90] 
(irlap_do_event+0x68/0x18c [irda])
[bf0fbd28] (irlap_do_event+0x0/0x18c [irda]) from [bf1008cc] 
(irlap_driver_rcv+0x1f0/0xd38 [irda])
[bf1006dc] (irlap_driver_rcv+0x0/0xd38 [irda]) from [c01892c0] 
(netif_receive_skb+0x244/0x338)
[c018907c] (netif_receive_skb+0x0/0x338) from [c0189468] 
(process_backlog+0xb4/0x194)
[c01893b4] (process_backlog+0x0/0x194) from [c01895f8] 
(net_rx_action+0xb0/0x210)
[c0189548] (net_rx_action+0x0/0x210) from [c0042f7c] (ksoftirqd+0x108/0x1cc)
[c0042e74] (ksoftirqd+0x0/0x1cc) from [c0053614] (kthread+0x10c/0x138)
[c0053508] (kthread+0x0/0x138) from [c003f918] (do_exit+0x0/0x8b0)
 r8 =   r7 =   r6 =   r5 = 
 r4 = 

and

[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
 r5 = C03909E0  r4 = C1A97400 
[c01824f0] (kfree_skb+0x0/0x50) from [c0199bf8] (pfifo_fast_enqueue+0xb4/0xd0)

[c0199b44] (pfifo_fast_enqueue+0x0/0xd0) from [c0188c30] 
(dev_queue_xmit+0x17c/0x25c)
 r8 = C1A2DCE0  r7 = FFF4  r6 = C3393114  r5 = C03909E0
 r4 = C3393000 
[c0188ab4] (dev_queue_xmit+0x0/0x25c) from [c01a7c18] (ip_output+0x150/0x254)

 r7 = C3717120  r6 = C03909E0  r5 =   r4 = C1A2DCE0
[c01a7ac8] (ip_output+0x0/0x254) from [c01a93d0] 
(ip_push_pending_frames+0x368/0x4d4)
[c01a9068] (ip_push_pending_frames+0x0/0x4d4) from [c01c6954] 
(udp_push_pending_frames+0x14c/0x310)
[c01c6808] (udp_push_pending_frames+0x0/0x310) from [c01c70d8] 
(udp_sendmsg+0x5c0/0x690)
[c01c6b18] (udp_sendmsg+0x0/0x690) from [c01ceafc] (inet_sendmsg+0x60/0x64)
[c01cea9c] (inet_sendmsg+0x0/0x64) from [c017c970] (sock_sendmsg+0xb4/0xe4)
 r7 = C2CEFDF4  r6 = 0064  r5 = C2CEFEA8  r4 = C3C94080
[c017c8bc] (sock_sendmsg+0x0/0xe4) from [c017dd9c] (sys_sendto+0xc8/0xf0)
 r7 = 0064  r6 = C3571580  r5 = C2CEFEC4  r4 = 
[c017dcd4] (sys_sendto+0x0/0xf0) from [c017e654] 
(sys_socketcall+0x168/0x1f0)
[c017e4ec] (sys_socketcall+0x0/0x1f0) from [c001ff40] 
(ret_fast_syscall+0x0/0x2c)
 r5 = 00415344  r4 = 

I would be greatful for any hints how I can identify which skbuff's get 
lost and why, and where and who should free them.


I am not subscribed to netdev, please keep in cc.

Thanks
Guennadi
-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-21 Thread Samuel Ortiz


On 3/21/2007, Guennadi Liakhovetski [EMAIL PROTECTED] wrote:

(Short recap for newly added to cc: netdev: I'm seeing an skb leak in
2.6.20 during an IrDA IrNET+ppp UDP test with periodic connection
disruptions)

On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:

 On Tue, 20 Mar 2007, Guennadi Liakhovetski wrote:

 Ok, looks like all leaked skbuffs come from ip_append_data(), like this:

 (sock_alloc_send_skb+0x2c8/0x2e4)
 (ip_append_data+0x7fc/0xa80)
 (udp_sendmsg+0x248/0x68c)
 (inet_sendmsg+0x60/0x64)
 (sock_sendmsg+0xb4/0xe4)
  r4 = C3CB4960
 (sys_sendto+0xc8/0xf0)
  r4 = 
 (sys_socketcall+0x168/0x1f0)
 (ret_fast_syscall+0x0/0x2c)

This call to sock_alloc_send_skb() in ip_append_data() is not from the
inlined ip_ufo_append_data(), it is here:

   /* The last fragment gets additional space at tail.
* Note, with MSG_MORE we overallocate on fragments,
* because we have no idea what fragment will be
* the last.
*/
   if (datalen == length + fraggap)
   alloclen += rt-u.dst.trailer_len;

   if (transhdrlen) {
   skb = sock_alloc_send_skb(sk,
   alloclen + hh_len + 15,
   (flags  MSG_DONTWAIT), err);
   } else {

Then, I traced a couple of paths how such a skbuff, coming down from
ip_append_data() and allocated above get freed (when they do):

[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
  r5 = C332BC00  r4 = C332BC00
[c01824f0] (kfree_skb+0x0/0x50) from [bf0fac58] 
(irlap_update_nr_received+0x94/0xc8 [irda])
[bf0fabc4] (irlap_update_nr_received+0x0/0xc8 [irda]) from [bf0fda98] 
(irlap_state_nrm_p+0x530/0x7c0 [irda])
  r7 = 0001  r6 = C0367EC0  r5 = C332BC00  r4 = 
[bf0fd568] (irlap_state_nrm_p+0x0/0x7c0 [irda]) from [bf0fbd90] 
(irlap_do_event+0x68/0x18c [irda])
[bf0fbd28] (irlap_do_event+0x0/0x18c [irda]) from [bf1008cc] 
(irlap_driver_rcv+0x1f0/0xd38 [irda])
[bf1006dc] (irlap_driver_rcv+0x0/0xd38 [irda]) from [c01892c0] 
(netif_receive_skb+0x244/0x338)
[c018907c] (netif_receive_skb+0x0/0x338) from [c0189468] 
(process_backlog+0xb4/0x194)
[c01893b4] (process_backlog+0x0/0x194) from [c01895f8] 
(net_rx_action+0xb0/0x210)
[c0189548] (net_rx_action+0x0/0x210) from [c0042f7c] 
(ksoftirqd+0x108/0x1cc)
[c0042e74] (ksoftirqd+0x0/0x1cc) from [c0053614] (kthread+0x10c/0x138)
[c0053508] (kthread+0x0/0x138) from [c003f918] (do_exit+0x0/0x8b0)
  r8 =   r7 =   r6 =   r5 = 
  r4 = 
This is the IrDA RX path, so I doubt the corresponding skb ever got
through
ip_append_data(). The skb was allocated by your HW driver upon packet
reception, then queued to the net input queue, and finally passed to the
IrDA stack. Are you sure your tracing is correct ?


and

[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
  r5 = C03909E0  r4 = C1A97400
[c01824f0] (kfree_skb+0x0/0x50) from [c0199bf8] 
(pfifo_fast_enqueue+0xb4/0xd0)
[c0199b44] (pfifo_fast_enqueue+0x0/0xd0) from [c0188c30] 
(dev_queue_xmit+0x17c/0x25c)
  r8 = C1A2DCE0  r7 = FFF4  r6 = C3393114  r5 = C03909E0
  r4 = C3393000
[c0188ab4] (dev_queue_xmit+0x0/0x25c) from [c01a7c18] 
(ip_output+0x150/0x254)
  r7 = C3717120  r6 = C03909E0  r5 =   r4 = C1A2DCE0
[c01a7ac8] (ip_output+0x0/0x254) from [c01a93d0] 
(ip_push_pending_frames+0x368/0x4d4)
[c01a9068] (ip_push_pending_frames+0x0/0x4d4) from [c01c6954] 
(udp_push_pending_frames+0x14c/0x310)
[c01c6808] (udp_push_pending_frames+0x0/0x310) from [c01c70d8] 
(udp_sendmsg+0x5c0/0x690)
[c01c6b18] (udp_sendmsg+0x0/0x690) from [c01ceafc] (inet_sendmsg+0x60/0x64)
[c01cea9c] (inet_sendmsg+0x0/0x64) from [c017c970] (sock_sendmsg+0xb4/0xe4)
  r7 = C2CEFDF4  r6 = 0064  r5 = C2CEFEA8  r4 = C3C94080
[c017c8bc] (sock_sendmsg+0x0/0xe4) from [c017dd9c] (sys_sendto+0xc8/0xf0)
  r7 = 0064  r6 = C3571580  r5 = C2CEFEC4  r4 = 
[c017dcd4] (sys_sendto+0x0/0xf0) from [c017e654] 
(sys_socketcall+0x168/0x1f0)
[c017e4ec] (sys_socketcall+0x0/0x1f0) from [c001ff40] 
(ret_fast_syscall+0x0/0x2c)
  r5 = 00415344  r4 = 
This one is on the TX path, yes. However it got dropped and freed because
your TX queue was full. Any idea in which situation does that happen ?


I would be greatful for any hints how I can identify which skbuff's get
lost and why, and where and who should free them.
You're seeing skb leaks when cutting the ppp connection periodically,
right ?
Do you such leaks when not cutting the ppp connection ?
If not, could you send me a kernel trace (with irda debug set to 5) when
the ppp connection is shut down ? It would narrow down the problem a bit.
I'm quite sure the leak is in the IrDA code rather than in the ppp or
ipv4
one, hence the need for full irda debug...

Cheers,
Samuel.

I am not

Re: [PATCH 5/5] [NETLINK]: Ignore control messages directly in netlink_run_queue()

2007-03-21 Thread Thomas Graf

* Thomas Graf [EMAIL PROTECTED] 2007-03-21 12:45
 * Patrick McHardy [EMAIL PROTECTED] 2007-03-21 05:44
  This looks like it would break nfnetlink, which appears to be
  using 0 as smallest message type.
 
 It shouldn't do that, the first 16 message types are reserved
 for control messages.

Alright, even though nfnetlink is wrong and buggy we can't break
it at this point.

Dave, please ignore this last patch for now.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-21 Thread Guennadi Liakhovetski


On Wed, 21 Mar 2007, Samuel Ortiz wrote:


On 3/21/2007, Guennadi Liakhovetski [EMAIL PROTECTED] wrote:


[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
 r5 = C332BC00  r4 = C332BC00
[c01824f0] (kfree_skb+0x0/0x50) from [bf0fac58] 
(irlap_update_nr_received+0x94/0xc8 [irda])
[bf0fabc4] (irlap_update_nr_received+0x0/0xc8 [irda]) from [bf0fda98] 
(irlap_state_nrm_p+0x530/0x7c0 [irda])
 r7 = 0001  r6 = C0367EC0  r5 = C332BC00  r4 = 
[bf0fd568] (irlap_state_nrm_p+0x0/0x7c0 [irda]) from [bf0fbd90] 
(irlap_do_event+0x68/0x18c [irda])
[bf0fbd28] (irlap_do_event+0x0/0x18c [irda]) from [bf1008cc] 
(irlap_driver_rcv+0x1f0/0xd38 [irda])
[bf1006dc] (irlap_driver_rcv+0x0/0xd38 [irda]) from [c01892c0] 
(netif_receive_skb+0x244/0x338)
[c018907c] (netif_receive_skb+0x0/0x338) from [c0189468] 
(process_backlog+0xb4/0x194)
[c01893b4] (process_backlog+0x0/0x194) from [c01895f8] 
(net_rx_action+0xb0/0x210)
[c0189548] (net_rx_action+0x0/0x210) from [c0042f7c] (ksoftirqd+0x108/0x1cc)
[c0042e74] (ksoftirqd+0x0/0x1cc) from [c0053614] (kthread+0x10c/0x138)
[c0053508] (kthread+0x0/0x138) from [c003f918] (do_exit+0x0/0x8b0)
 r8 =   r7 =   r6 =   r5 = 
 r4 = 

This is the IrDA RX path, so I doubt the corresponding skb ever got
through
ip_append_data(). The skb was allocated by your HW driver upon packet
reception, then queued to the net input queue, and finally passed to the
IrDA stack. Are you sure your tracing is correct ?


I've added a bitfield to struct sk_buff:

__u8pkt_type:3,
fclone:2,
-   ipvs_property:1;
+   ipvs_property:1,
+   trace_dbg:1;

and I set itin ip_append_data() before sock_alloc_send_skb() is called. 
Then I check this bit in __kfree_skb(). The bit is set to 0 in __alloc_skb 
per


memset(skb, 0, offsetof(struct sk_buff, truesize));

So, if it was a freshly allocated skb, the tracing should be correct.


[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
 r5 = C03909E0  r4 = C1A97400
[c01824f0] (kfree_skb+0x0/0x50) from [c0199bf8] 
(pfifo_fast_enqueue+0xb4/0xd0)
[c0199b44] (pfifo_fast_enqueue+0x0/0xd0) from [c0188c30] 
(dev_queue_xmit+0x17c/0x25c)
 r8 = C1A2DCE0  r7 = FFF4  r6 = C3393114  r5 = C03909E0
 r4 = C3393000
[c0188ab4] (dev_queue_xmit+0x0/0x25c) from [c01a7c18] 
(ip_output+0x150/0x254)
 r7 = C3717120  r6 = C03909E0  r5 =   r4 = C1A2DCE0
[c01a7ac8] (ip_output+0x0/0x254) from [c01a93d0] 
(ip_push_pending_frames+0x368/0x4d4)
[c01a9068] (ip_push_pending_frames+0x0/0x4d4) from [c01c6954] 
(udp_push_pending_frames+0x14c/0x310)
[c01c6808] (udp_push_pending_frames+0x0/0x310) from [c01c70d8] 
(udp_sendmsg+0x5c0/0x690)
[c01c6b18] (udp_sendmsg+0x0/0x690) from [c01ceafc] (inet_sendmsg+0x60/0x64)
[c01cea9c] (inet_sendmsg+0x0/0x64) from [c017c970] (sock_sendmsg+0xb4/0xe4)
 r7 = C2CEFDF4  r6 = 0064  r5 = C2CEFEA8  r4 = C3C94080
[c017c8bc] (sock_sendmsg+0x0/0xe4) from [c017dd9c] (sys_sendto+0xc8/0xf0)
 r7 = 0064  r6 = C3571580  r5 = C2CEFEC4  r4 = 
[c017dcd4] (sys_sendto+0x0/0xf0) from [c017e654] 
(sys_socketcall+0x168/0x1f0)
[c017e4ec] (sys_socketcall+0x0/0x1f0) from [c001ff40] 
(ret_fast_syscall+0x0/0x2c)
 r5 = 00415344  r4 = 

This one is on the TX path, yes. However it got dropped and freed because
your TX queue was full. Any idea in which situation does that happen ?


No. I can only describe what communication is running while ppp is 
disrupted - it's just some sort of udp mirror test - udp packets are sent 
one after another and mirrored back.



I would be greatful for any hints how I can identify which skbuff's get
lost and why, and where and who should free them.

You're seeing skb leaks when cutting the ppp connection periodically,
right ?


Right


Do you such leaks when not cutting the ppp connection ?


Looks like I don't.


If not, could you send me a kernel trace (with irda debug set to 5) when
the ppp connection is shut down ? It would narrow down the problem a bit.


Attached bzipped... It's a complete log starting from irda up, running udp 
packets over the link, closing the link and bringing irda completely down.



I'm quite sure the leak is in the IrDA code rather than in the ppp or
ipv4 one, hence the need for full irda debug...


Likely, yes. Why I am asking netdev guys for help is just because I have 
very little idea about the data flow in the network stack(s). And the more 
experienced eyes we have on the problem the sooner we might solve it, I 
hope...


Thanks
Guennadi
-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

mpppdown.bz2
Description: Binary data

Re: [PATCH 5/5] [NETLINK]: Ignore control messages directly in netlink_run_queue()

2007-03-21 Thread Patrick McHardy

Thomas Graf wrote:
 * Patrick McHardy [EMAIL PROTECTED] 2007-03-21 05:44
 
This looks like it would break nfnetlink, which appears to be
using 0 as smallest message type.
 
 
 It shouldn't do that, the first 16 message types are reserved
 for control messages.


I'm afraid it does:

enum cntl_msg_types {
IPCTNL_MSG_CT_NEW,
IPCTNL_MSG_CT_GET,
IPCTNL_MSG_CT_DELETE,
IPCTNL_MSG_CT_GET_CTRZERO,
IPCTNL_MSG_MAX
};

This is totally broken of course since it also uses netlink_ack(),
netlink_dump() etc. :( Any smart ideas how to fix this without
breaking compatibility?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] [NETLINK]: Ignore control messages directly in netlink_run_queue()

2007-03-21 Thread Patrick McHardy

Patrick McHardy wrote:
 Thomas Graf wrote:
 
* Patrick McHardy [EMAIL PROTECTED] 2007-03-21 05:44


This looks like it would break nfnetlink, which appears to be
using 0 as smallest message type.


It shouldn't do that, the first 16 message types are reserved
for control messages.
 
 
 
 I'm afraid it does:
 
 enum cntl_msg_types {
 IPCTNL_MSG_CT_NEW,
 IPCTNL_MSG_CT_GET,
 IPCTNL_MSG_CT_DELETE,
 IPCTNL_MSG_CT_GET_CTRZERO,
 IPCTNL_MSG_MAX
 };
 
 This is totally broken of course since it also uses netlink_ack(),
 netlink_dump() etc. :( Any smart ideas how to fix this without
 breaking compatibility?


Seems like we're lucky, nfnetlink encodes the subsystem ID
in the upper 8 bits of the message type and uses 1 as the
smallest ID:

/* netfilter netlink message types are split in two pieces:
 * 8 bit subsystem, 8bit operation.
 */

#define NFNL_SUBSYS_ID(x)   ((x  0xff00)  8)
#define NFNL_MSG_TYPE(x)(x  0x00ff)

#define NFNL_SUBSYS_NONE0
#define NFNL_SUBSYS_CTNETLINK   1
#define NFNL_SUBSYS_CTNETLINK_EXP   2
#define NFNL_SUBSYS_QUEUE   3
#define NFNL_SUBSYS_ULOG4
#define NFNL_SUBSYS_COUNT   5



So this should work fine.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] [NETLINK]: Ignore control messages directly in netlink_run_queue()

2007-03-21 Thread Thomas Graf

* Patrick McHardy [EMAIL PROTECTED] 2007-03-21 13:06
 Thomas Graf wrote:
  * Patrick McHardy [EMAIL PROTECTED] 2007-03-21 05:44
  
 This looks like it would break nfnetlink, which appears to be
 using 0 as smallest message type.
  
  
  It shouldn't do that, the first 16 message types are reserved
  for control messages.
 
 
 I'm afraid it does:
 
 enum cntl_msg_types {
 IPCTNL_MSG_CT_NEW,
 IPCTNL_MSG_CT_GET,
 IPCTNL_MSG_CT_DELETE,
 IPCTNL_MSG_CT_GET_CTRZERO,
 IPCTNL_MSG_MAX
 };
 
 This is totally broken of course since it also uses netlink_ack(),
 netlink_dump() etc. :( Any smart ideas how to fix this without
 breaking compatibility?

Hmm... I think nfnetlink isn't even broken:

/* netfilter netlink message types are split in two pieces:
 * 8 bit subsystem, 8bit operation.
 */

#define NFNL_SUBSYS_ID(x)   ((x  0xff00)  8)
#define NFNL_MSG_TYPE(x)(x  0x00ff)

/* No enum here, otherwise __stringify() trick of
 * MODULE_ALIAS_NFNL_SUBSYS()
 * won't work anymore */
#define NFNL_SUBSYS_NONE0
#define NFNL_SUBSYS_CTNETLINK   1
#define NFNL_SUBSYS_CTNETLINK_EXP   2
#define NFNL_SUBSYS_QUEUE   3
#define NFNL_SUBSYS_ULOG4
#define NFNL_SUBSYS_COUNT   5

A msg_type  0x10 would just trigger a -EINVAL as no 0x0 subsystem
can ever be registered.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] [NETLINK]: Ignore control messages directly in netlink_run_queue()

2007-03-21 Thread Thomas Graf

* Patrick McHardy [EMAIL PROTECTED] 2007-03-21 13:21
 Seems like we're lucky, nfnetlink encodes the subsystem ID
 in the upper 8 bits of the message type and uses 1 as the
 smallest ID:

Alright, you've been quicker :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NETFILTER] nfnetlink: netlink_run_queue() already checks for NLM_F_REQUEST

2007-03-21 Thread Thomas Graf

Patrick has made use of netlink_run_queue() in nfnetlink while my patches
have been waiting for net-2.6.22 to open. So this check for NLM_F_REQUEST
can go as well.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.22/net/netfilter/nfnetlink.c
===
--- net-2.6.22.orig/net/netfilter/nfnetlink.c   2007-03-21 13:27:48.0 
+0100
+++ net-2.6.22/net/netfilter/nfnetlink.c2007-03-21 13:28:11.0 
+0100
@@ -207,10 +207,6 @@ static int nfnetlink_rcv_msg(struct sk_b
return -1;
}
 
-   /* Only requests are handled by kernel now. */
-   if (!(nlh-nlmsg_flags  NLM_F_REQUEST))
-   return 0;
-
/* All the messages must at least contain nfgenmsg */
if (nlh-nlmsg_len  NLMSG_SPACE(sizeof(struct nfgenmsg)))
return 0;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks

2007-03-21 Thread Patrick McHardy

Patrick McHardy wrote:
Its harmless since its a read lock, which can be nested. I actually
don't see any need for qdisc_tree_lock at all, all changes and all
walking is done under the RTNL, which is why I've removed it in
my (upcoming) patches. I suggest to leave it as is for now so I
don't need to change the __qdisc_lookup back to qdisc_lookup in
2.6.22.
 
 
 
 Alexey just explained to me why we do need qdisc_tree_lock in private
 mail. While dumping only the first skb is filled under the RTNL,
 while filling further skbs we don't hold the RTNL anymore. So I will
 probably have to drop that patch.


What we could do is replace the netlink cb_lock spinlock by a
user-supplied mutex (supplied to netlink_kernel_create, rtnl_mutex
in this case). That would put the entire dump under the rtnl and
allow us to get rid of qdisc_tree_lock and avoid the need to take
dev_base_lock during qdisc dumping. Same in other spots like
rtnl_dump_ifinfo, inet_dump_ifaddr, ...

What do you think?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NETFILTER] nfnetlink: netlink_run_queue() already checks for NLM_F_REQUEST

2007-03-21 Thread Patrick McHardy

Thomas Graf wrote:
 Patrick has made use of netlink_run_queue() in nfnetlink while my patches
 have been waiting for net-2.6.22 to open. So this check for NLM_F_REQUEST
 can go as well.


Looks good, thanks. I've added it to my queue.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/12] [IPv6]: Use rtnl registration interface

2007-03-21 Thread Thomas Graf

* YOSHIFUJI Hideaki / ?$B5HF#1QL@ [EMAIL PROTECTED] 2007-03-21 02:01
 In article [EMAIL PROTECTED] (at Wed, 21 Mar 2007 01:06:03 +0100), Thomas 
 Graf [EMAIL PROTECTED] says:
 
  -static int
  -inet6_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
  +static int nl_addr_del(struct sk_buff *skb, struct nlmsghdr *nlh, void 
  *arg)
   {
  struct ifaddrmsg *ifm;
 
 I'm rather not favor changing function names here...

I was trying to achieve consistent naming among all message handlers.
All these functions are static with a single reference.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks

2007-03-21 Thread Patrick McHardy

Patrick McHardy wrote:
Alexey just explained to me why we do need qdisc_tree_lock in private
mail. While dumping only the first skb is filled under the RTNL,
while filling further skbs we don't hold the RTNL anymore. So I will
probably have to drop that patch.
 
 
 
 What we could do is replace the netlink cb_lock spinlock by a
 user-supplied mutex (supplied to netlink_kernel_create, rtnl_mutex
 in this case). That would put the entire dump under the rtnl and
 allow us to get rid of qdisc_tree_lock and avoid the need to take
 dev_base_lock during qdisc dumping. Same in other spots like
 rtnl_dump_ifinfo, inet_dump_ifaddr, ...


These (compile tested) patches demonstrate the idea. The first one
lets netlink_kernel_create users specify a mutex that should be
held during dump callbacks, the second one uses this for rtnetlink
and changes inet_dump_ifaddr for demonstration.

A complete patch would allow us to simplify locking in lots of
spots, all rtnetlink users currently need to implement extra
locking just for the dump functions, and a number of them
already get it wrong and seem to rely on the rtnl.

If there are no objections to this change I'm going to update
the second patch to include all rtnetlink users.

[NET_SCHED]: cls_basic: fix NULL pointer dereference

cls_basic doesn't allocate tp-root before it is linked into the
active classifier list, resulting in a NULL pointer dereference
when packets hit the classifier before its -change function is
called.

Reported by Chris Madden [EMAIL PROTECTED]

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit f1b9a0694552e18e7a43c292d21abe3b51dfcae2
tree f5ae39c1746fdc1ffbee6c1d90d035ee48ca4904
parent 0a14fe6e5efd0af0f9c6c01e0433445d615d0110
author Patrick McHardy [EMAIL PROTECTED] Tue, 20 Mar 2007 16:08:54 +0100
committer Patrick McHardy [EMAIL PROTECTED] Tue, 20 Mar 2007 16:08:54 +0100

 net/sched/cls_basic.c |   16 +++-
 1 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index fad08e5..70fe36e 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -81,6 +81,13 @@ static void basic_put(struct tcf_proto *
 
 static int basic_init(struct tcf_proto *tp)
 {
+	struct basic_head *head;
+
+	head = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (head == NULL)
+		return -ENOBUFS;
+	INIT_LIST_HEAD(head-flist);
+	tp-root = head;
 	return 0;
 }
 
@@ -176,15 +183,6 @@ static int basic_change(struct tcf_proto
 	}
 
 	err = -ENOBUFS;
-	if (head == NULL) {
-		head = kzalloc(sizeof(*head), GFP_KERNEL);
-		if (head == NULL)
-			goto errout;
-
-		INIT_LIST_HEAD(head-flist);
-		tp-root = head;
-	}
-
 	f = kzalloc(sizeof(*f), GFP_KERNEL);
 	if (f == NULL)
 		goto errout;
[NET_SCHED]: Fix ingress locking

Ingress queueing uses a seperate lock for serializing enqueue operations,
but fails to properly protect itself against concurrent changes to the
qdisc tree. Use queue_lock for now since the real fix it quite intrusive.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 11985909b582dc688b5a7c0f73f16244224116f4
tree 0ee26bec34053f6c9b5f905ffbc1437881428eeb
parent f1b9a0694552e18e7a43c292d21abe3b51dfcae2
author Patrick McHardy [EMAIL PROTECTED] Tue, 20 Mar 2007 16:11:56 +0100
committer Patrick McHardy [EMAIL PROTECTED] Tue, 20 Mar 2007 16:11:56 +0100

 net/core/dev.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index cf71614..5984b55 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1750,10 +1750,10 @@ static int ing_filter(struct sk_buff *sk
 
 		skb-tc_verd = SET_TC_AT(skb-tc_verd,AT_INGRESS);
 
-		spin_lock(dev-ingress_lock);
+		spin_lock(dev-queue_lock);
 		if ((q = dev-qdisc_ingress) != NULL)
 			result = q-enqueue(skb, q);
-		spin_unlock(dev-ingress_lock);
+		spin_unlock(dev-queue_lock);
 
 	}

Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks

2007-03-21 Thread Patrick McHardy

Patrick McHardy wrote:
 Patrick McHardy wrote:
 
Alexey just explained to me why we do need qdisc_tree_lock in private
mail. While dumping only the first skb is filled under the RTNL,
while filling further skbs we don't hold the RTNL anymore. So I will
probably have to drop that patch.



What we could do is replace the netlink cb_lock spinlock by a
user-supplied mutex (supplied to netlink_kernel_create, rtnl_mutex
in this case). That would put the entire dump under the rtnl and
allow us to get rid of qdisc_tree_lock and avoid the need to take
dev_base_lock during qdisc dumping. Same in other spots like
rtnl_dump_ifinfo, inet_dump_ifaddr, ...
 
 
 
 These (compile tested) patches demonstrate the idea. The first one
 lets netlink_kernel_create users specify a mutex that should be
 held during dump callbacks, the second one uses this for rtnetlink
 and changes inet_dump_ifaddr for demonstration.
 
 A complete patch would allow us to simplify locking in lots of
 spots, all rtnetlink users currently need to implement extra
 locking just for the dump functions, and a number of them
 already get it wrong and seem to rely on the rtnl.
 
 If there are no objections to this change I'm going to update
 the second patch to include all rtnetlink users


D'oh .. wrong patches.



[NETLINK]: Put dump callback under mutex, optionally user supplied

Replace the callback spinlock by a mutex and allow users to supply
their own mutex to allow getting rid of seperate locking in dump
callbacks. For users that don't supply their own mutex nothing changes.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit c3400c45267a1fd291da75b0fe4b7970c846ff50
tree 96a4dc6050d74e72b4fffe9c047a0e695085e6db
parent 2c31e4429748f2629c59379b1113931a13a0cca9
author Patrick McHardy [EMAIL PROTECTED] Wed, 21 Mar 2007 14:43:02 +0100
committer Patrick McHardy [EMAIL PROTECTED] Wed, 21 Mar 2007 14:43:02 +0100

 drivers/connector/connector.c   |2 +-
 drivers/scsi/scsi_netlink.c |3 ++-
 drivers/scsi/scsi_transport_iscsi.c |2 +-
 fs/ecryptfs/netlink.c   |2 +-
 include/linux/netlink.h |5 -
 lib/kobject_uevent.c|2 +-
 net/bridge/netfilter/ebt_ulog.c |2 +-
 net/core/rtnetlink.c|2 +-
 net/decnet/netfilter/dn_rtmsg.c |2 +-
 net/ipv4/fib_frontend.c |2 +-
 net/ipv4/inet_diag.c|2 +-
 net/ipv4/netfilter/ip_queue.c   |2 +-
 net/ipv4/netfilter/ipt_ULOG.c   |2 +-
 net/ipv6/netfilter/ip6_queue.c  |2 +-
 net/netfilter/nfnetlink.c   |2 +-
 net/netlink/af_netlink.c|   30 +++---
 net/netlink/genetlink.c |2 +-
 net/xfrm/xfrm_user.c|2 +-
 security/selinux/netlink.c  |2 +-
 19 files changed, 41 insertions(+), 29 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 7f9c4fb..a7b9e9b 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -448,7 +448,7 @@ static int __devinit cn_init(void)
 
 	dev-nls = netlink_kernel_create(NETLINK_CONNECTOR,
 	 CN_NETLINK_USERS + 0xf,
-	 dev-input, THIS_MODULE);
+	 dev-input, NULL, THIS_MODULE);
 	if (!dev-nls)
 		return -EIO;
 
diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 45646a2..4bf9aa5 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -168,7 +168,8 @@ scsi_netlink_init(void)
 	}
 
 	scsi_nl_sock = netlink_kernel_create(NETLINK_SCSITRANSPORT,
-SCSI_NL_GRP_CNT, scsi_nl_rcv, THIS_MODULE);
+SCSI_NL_GRP_CNT, scsi_nl_rcv, NULL,
+THIS_MODULE);
 	if (!scsi_nl_sock) {
 		printk(KERN_ERR %s: register of recieve handler failed\n,
 __FUNCTION__);
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 10590cd..aabaa05 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1435,7 +1435,7 @@ static __init int iscsi_transport_init(void)
 	if (err)
 		goto unregister_conn_class;
 
-	nls = netlink_kernel_create(NETLINK_ISCSI, 1, iscsi_if_rx,
+	nls = netlink_kernel_create(NETLINK_ISCSI, 1, iscsi_if_rx, NULL,
 			THIS_MODULE);
 	if (!nls) {
 		err = -ENOBUFS;
diff --git a/fs/ecryptfs/netlink.c b/fs/ecryptfs/netlink.c
index 8405d21..fe91863 100644
--- a/fs/ecryptfs/netlink.c
+++ b/fs/ecryptfs/netlink.c
@@ -229,7 +229,7 @@ int ecryptfs_init_netlink(void)
 
 	ecryptfs_nl_sock = netlink_kernel_create(NETLINK_ECRYPTFS, 0,
 		 ecryptfs_receive_nl_message,
-		 THIS_MODULE);
+		 NULL, THIS_MODULE);
 	if (!ecryptfs_nl_sock) {
 		rc = -EIO;
 		ecryptfs_printk(KERN_ERR, Failed to create netlink socket\n);
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 0d11f6a..f41688f 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -157,7 +157,10 @@ struct netlink_skb_parms
 #define NETLINK_CREDS(skb)

[PATCH 2/5] netem: use better types for time values

2007-03-21 Thread Stephen Hemminger

The random number generator always generates 32 bit values.
The time values are limited by psched_tdiff_t

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 net/sched/sch_netem.c |   23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

--- net-2.6.22.orig/net/sched/sch_netem.c
+++ net-2.6.22/net/sched/sch_netem.c
@@ -56,19 +56,20 @@ struct netem_sched_data {
struct Qdisc*qdisc;
struct qdisc_watchdog watchdog;
 
-   u32 latency;
+   psched_tdiff_t latency;
+   psched_tdiff_t jitter;
+
u32 loss;
u32 limit;
u32 counter;
u32 gap;
-   u32 jitter;
u32 duplicate;
u32 reorder;
u32 corrupt;
 
struct crndstate {
-   unsigned long last;
-   unsigned long rho;
+   u32 last;
+   u32 rho;
} delay_cor, loss_cor, dup_cor, reorder_cor, corrupt_cor;
 
struct disttable {
@@ -95,7 +96,7 @@ static void init_crandom(struct crndstat
  * Next number depends on last value.
  * rho is scaled to avoid floating point.
  */
-static unsigned long get_crandom(struct crndstate *state)
+static u32 get_crandom(struct crndstate *state)
 {
u64 value, rho;
unsigned long answer;
@@ -114,11 +115,13 @@ static unsigned long get_crandom(struct 
  * std deviation sigma.  Uses table lookup to approximate the desired
  * distribution, and a uniformly-distributed pseudo-random source.
  */
-static long tabledist(unsigned long mu, long sigma,
- struct crndstate *state, const struct disttable *dist)
-{
-   long t, x;
-   unsigned long rnd;
+static psched_tdiff_t tabledist(psched_tdiff_t mu, psched_tdiff_t sigma,
+   struct crndstate *state,
+   const struct disttable *dist)
+{
+   psched_tdiff_t x;
+   long t;
+   u32 rnd;
 
if (sigma == 0)
return mu;

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5] netem performance improvements

2007-03-21 Thread Stephen Hemminger

The following patches for the 2.6.22 net tree, increase the
performance of netem by about 2x.  With 2.6.20 getting about
100K (out of possible 300K) packets per second, after these
patches now at over 200K pps.

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] qdisc: avoid transmit softirq on watchdog wakeup

2007-03-21 Thread Stephen Hemminger

If possible, avoid having to do a transmit softirq when a qdisc
watchdog decides to re-enable.  The watchdog routine runs off
a timer, so it is already in the same effective context as
the softirq.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 net/sched/sch_api.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- net-2.6.22.orig/net/sched/sch_api.c
+++ net-2.6.22/net/sched/sch_api.c
@@ -296,10 +296,16 @@ static enum hrtimer_restart qdisc_watchd
 {
struct qdisc_watchdog *wd = container_of(timer, struct qdisc_watchdog,
 timer);
+   struct net_device *dev = wd-qdisc-dev;
 
wd-qdisc-flags = ~TCQ_F_THROTTLED;
smp_wmb();
-   netif_schedule(wd-qdisc-dev);
+   if (spin_trylock(dev-queue_lock)) {
+   qdisc_run(dev);
+   spin_unlock(dev-queue_lock);
+   } else
+   netif_schedule(dev);
+
return HRTIMER_NORESTART;
 }
 

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] netem: optimize tfifo

2007-03-21 Thread Stephen Hemminger

In most cases, the next packet will be sent after the
last one. So optimize that case.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 net/sched/sch_netem.c |   15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

--- net-2.6.22.orig/net/sched/sch_netem.c
+++ net-2.6.22/net/sched/sch_netem.c
@@ -478,22 +478,28 @@ static int netem_change(struct Qdisc *sc
  */
 struct fifo_sched_data {
u32 limit;
+   psched_time_t oldest;
 };
 
 static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
 {
struct fifo_sched_data *q = qdisc_priv(sch);
struct sk_buff_head *list = sch-q;
-   const struct netem_skb_cb *ncb
-   = (const struct netem_skb_cb *)nskb-cb;
+   psched_time_t tnext = ((struct netem_skb_cb *)nskb-cb)-time_to_send;
struct sk_buff *skb;
 
if (likely(skb_queue_len(list)  q-limit)) {
+   /* Optimize for add at tail */
+   if (likely(skb_queue_empty(list) || !PSCHED_TLESS(tnext, 
q-oldest))) {
+   q-oldest = tnext;
+   return qdisc_enqueue_tail(nskb, sch);
+   }
+
skb_queue_reverse_walk(list, skb) {
const struct netem_skb_cb *cb
= (const struct netem_skb_cb *)skb-cb;
 
-   if (!PSCHED_TLESS(ncb-time_to_send, cb-time_to_send))
+   if (!PSCHED_TLESS(tnext, cb-time_to_send))
break;
}
 
@@ -506,7 +512,7 @@ static int tfifo_enqueue(struct sk_buff 
return NET_XMIT_SUCCESS;
}
 
-   return qdisc_drop(nskb, sch);
+   return qdisc_reshape_fail(nskb, sch);
 }
 
 static int tfifo_init(struct Qdisc *sch, struct rtattr *opt)
@@ -522,6 +528,7 @@ static int tfifo_init(struct Qdisc *sch,
} else
q-limit = max_t(u32, sch-dev-tx_queue_len, 1);
 
+   PSCHED_SET_PASTPERFECT(q-oldest);
return 0;
 }
 

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] netem: report reorder percent correctly.

2007-03-21 Thread Stephen Hemminger

If you setup netem to just delay packets; tc qdisc ls will report
the reordering as 100%. Well it's a lie, reorder isn't used unless
gap is set, so just set value to 0 so the output of utility
is correct.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


---
 net/sched/sch_netem.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- net-2.6.22.orig/net/sched/sch_netem.c
+++ net-2.6.22/net/sched/sch_netem.c
@@ -428,7 +428,8 @@ static int netem_change(struct Qdisc *sc
/* for compatiablity with earlier versions.
 * if gap is set, need to assume 100% probablity
 */
-   q-reorder = ~0;
+   if (q-gap)
+   q-reorder = ~0;
 
/* Handle nested options after initial queue options.
 * Should have put all options in nested format but too late now.

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] netem: avoid excessive requeues

2007-03-21 Thread Stephen Hemminger

The netem code would call getnstimeofday() and dequeue/requeue after
every packet, even if it was waiting. Avoid this overhead by using
the throttled flag.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 net/sched/sch_api.c   |3 +++
 net/sched/sch_netem.c |   21 -
 2 files changed, 15 insertions(+), 9 deletions(-)

--- net-2.6.22.orig/net/sched/sch_api.c
+++ net-2.6.22/net/sched/sch_api.c
@@ -298,6 +298,7 @@ static enum hrtimer_restart qdisc_watchd
 timer);
 
wd-qdisc-flags = ~TCQ_F_THROTTLED;
+   smp_wmb();
netif_schedule(wd-qdisc-dev);
return HRTIMER_NORESTART;
 }
@@ -315,6 +316,7 @@ void qdisc_watchdog_schedule(struct qdis
ktime_t time;
 
wd-qdisc-flags |= TCQ_F_THROTTLED;
+   smp_wmb();
time = ktime_set(0, 0);
time = ktime_add_ns(time, PSCHED_US2NS(expires));
hrtimer_start(wd-timer, time, HRTIMER_MODE_ABS);
@@ -325,6 +327,7 @@ void qdisc_watchdog_cancel(struct qdisc_
 {
hrtimer_cancel(wd-timer);
wd-qdisc-flags = ~TCQ_F_THROTTLED;
+   smp_wmb();
 }
 EXPORT_SYMBOL(qdisc_watchdog_cancel);
 
--- net-2.6.22.orig/net/sched/sch_netem.c
+++ net-2.6.22/net/sched/sch_netem.c
@@ -272,6 +272,10 @@ static struct sk_buff *netem_dequeue(str
struct netem_sched_data *q = qdisc_priv(sch);
struct sk_buff *skb;
 
+   smp_mb();
+   if (sch-flags  TCQ_F_THROTTLED)
+   return NULL;
+
skb = q-qdisc-dequeue(q-qdisc);
if (skb) {
const struct netem_skb_cb *cb
@@ -284,18 +288,17 @@ static struct sk_buff *netem_dequeue(str
if (PSCHED_TLESS(cb-time_to_send, now)) {
pr_debug(netem_dequeue: return skb=%p\n, skb);
sch-q.qlen--;
-   sch-flags = ~TCQ_F_THROTTLED;
return skb;
-   } else {
-   qdisc_watchdog_schedule(q-watchdog, cb-time_to_send);
+   }
 
-   if (q-qdisc-ops-requeue(skb, q-qdisc) != 
NET_XMIT_SUCCESS) {
-   qdisc_tree_decrease_qlen(q-qdisc, 1);
-   sch-qstats.drops++;
-   printk(KERN_ERR netem: queue discpline %s 
could not requeue\n,
-  q-qdisc-ops-id);
-   }
+   if (unlikely(q-qdisc-ops-requeue(skb, q-qdisc) != 
NET_XMIT_SUCCESS)) {
+   qdisc_tree_decrease_qlen(q-qdisc, 1);
+   sch-qstats.drops++;
+   printk(KERN_ERR netem: %s could not requeue\n,
+  q-qdisc-ops-id);
}
+
+   qdisc_watchdog_schedule(q-watchdog, cb-time_to_send);
}
 
return NULL;

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

iproute2-2.6.20-070313 bug ?

2007-03-21 Thread Denys

Possible i discovered bug, but maybe specific to my setup. 

In your sources (tc/tc_core.h) i notice 
#define TIME_UNITS_PER_SEC10 
When i change it to 
#define TIME_UNITS_PER_SEC  100.0 
(it was value before in sources) 
everythign works fine. Otherwise tbf not working at all, it is dropping all 
packets.

Did anyone test new iproute2 with tbf?
--
Virtual ISP S.A.L.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RESEND 0/4] was: [PATCH 0/3] myri10ge updates for 2.6.21

2007-03-21 Thread Brice Goglin

Brice Goglin wrote:
 Hi Jeff,

 Here's 3 minor updates for myri10ge in 2.6.21:
 1. use regular firmware on Serverworks HT2100
 2. update wcfifo and intr_coal_delay default values
 3. update driver version to 1.3.0-1.225

 Please apply.

 Thanks,
 Brice
   

I just got a last minute fix (management of allocated pages was wrong on
architectures with page size != 4kB. Please drop this serie, I am going
to resend all the patches.

Thanks,
Brice

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] myri10ge: Serverworks HT2100 provides aligned PCIe completion

2007-03-21 Thread Brice Goglin

[PATCH 1/4] myri10ge: Serverworks HT2100 provides aligned PCIe completion

Use the regular firmware on Serverworks HT2100 PCIe ports since this
chipset provides aligned PCIe completion.

Signed-off-by: Brice Goglin [EMAIL PROTECTED]
---
 drivers/net/myri10ge/myri10ge.c |8 
 1 file changed, 8 insertions(+)

Index: linux-rc/drivers/net/myri10ge/myri10ge.c
===
--- linux-rc.orig/drivers/net/myri10ge/myri10ge.c   2007-03-18 
21:01:42.0 +0100
+++ linux-rc/drivers/net/myri10ge/myri10ge.c2007-03-18 21:14:12.0 
+0100
@@ -2483,6 +2483,8 @@
 
 #define PCI_DEVICE_ID_INTEL_E5000_PCIE23 0x25f7
 #define PCI_DEVICE_ID_INTEL_E5000_PCIE47 0x25fa
+#define PCI_DEVICE_ID_SERVERWORKS_HT2100_PCIE_FIRST 0x140
+#define PCI_DEVICE_ID_SERVERWORKS_HT2100_PCIE_LAST 0x142
 
 static void myri10ge_select_firmware(struct myri10ge_priv *mgp)
 {
@@ -2514,6 +2516,12 @@
   ((bridge-vendor == PCI_VENDOR_ID_SERVERWORKS
  bridge-device ==
 PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE)
+   /* ServerWorks HT2100 */
+   || (bridge-vendor == PCI_VENDOR_ID_SERVERWORKS
+bridge-device =
+   PCI_DEVICE_ID_SERVERWORKS_HT2100_PCIE_FIRST
+bridge-device =
+   PCI_DEVICE_ID_SERVERWORKS_HT2100_PCIE_LAST)
/* All Intel E5000 PCIE ports */
|| (bridge-vendor == PCI_VENDOR_ID_INTEL
 bridge-device =


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] myri10ge: update wcfifo and intr_coal_delay default values

2007-03-21 Thread Brice Goglin

Update the default value of 2 module parameters:
* wcfifo disabled
* intr_coal_delay 75us

Signed-off-by: Brice Goglin [EMAIL PROTECTED]
---
 drivers/net/myri10ge/myri10ge.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-rc/drivers/net/myri10ge/myri10ge.c
===
--- linux-rc.orig/drivers/net/myri10ge/myri10ge.c   2007-03-18 
21:14:12.0 +0100
+++ linux-rc/drivers/net/myri10ge/myri10ge.c2007-03-18 21:14:21.0 
+0100
@@ -234,7 +234,7 @@
 module_param(myri10ge_msi, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(myri10ge_msi, Enable Message Signalled Interrupts\n);
 
-static int myri10ge_intr_coal_delay = 25;
+static int myri10ge_intr_coal_delay = 75;
 module_param(myri10ge_intr_coal_delay, int, S_IRUGO);
 MODULE_PARM_DESC(myri10ge_intr_coal_delay, Interrupt coalescing delay\n);
 
@@ -279,7 +279,7 @@
 module_param(myri10ge_fill_thresh, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(myri10ge_fill_thresh, Number of empty rx slots allowed\n);
 
-static int myri10ge_wcfifo = 1;
+static int myri10ge_wcfifo = 0;
 module_param(myri10ge_wcfifo, int, S_IRUGO);
 MODULE_PARM_DESC(myri10ge_wcfifo, Enable WC Fifo when WC is enabled\n);
 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] myri10ge: update driver version to 1.3.0-1.226

2007-03-21 Thread Brice Goglin

Driver version is now 1.3.0-1.226.

Signed-off-by: Brice Goglin [EMAIL PROTECTED]
---
 drivers/net/myri10ge/myri10ge.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-rc/drivers/net/myri10ge/myri10ge.c
===
--- linux-rc.orig/drivers/net/myri10ge/myri10ge.c   2007-03-18 
21:14:21.0 +0100
+++ linux-rc/drivers/net/myri10ge/myri10ge.c2007-03-18 21:14:23.0 
+0100
@@ -71,7 +71,7 @@
 #include myri10ge_mcp.h
 #include myri10ge_mcp_gen_header.h
 
-#define MYRI10GE_VERSION_STR 1.2.0
+#define MYRI10GE_VERSION_STR 1.3.0-1.226
 
 MODULE_DESCRIPTION(Myricom 10G driver (10GbE));
 MODULE_AUTHOR(Maintainer: [EMAIL PROTECTED]);


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] netxen: enum and #define cleanups

2007-03-21 Thread Andy Gospodarek


This patch cleans up some rather generically named items in the netxen
driver.  It seems bad to use names like USER_START and FLASH_TOTAL_SIZE,
so I added a NETXEN_ to the front of them.  

This has been compile tested.

Signed-off-by: Andy Gospodarek [EMAIL PROTECTED]
---

 netxen_nic.h |   51 ++-
 netxen_nic_ethtool.c |8 
 netxen_nic_hw.c  |   10 +-
 netxen_nic_init.c|   23 ---
 4 files changed, 47 insertions(+), 45 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index dd8ce35..8310584 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -65,12 +65,13 @@
 
 #define _NETXEN_NIC_LINUX_MAJOR 3
 #define _NETXEN_NIC_LINUX_MINOR 3
-#define _NETXEN_NIC_LINUX_SUBVERSION 3
-#define NETXEN_NIC_LINUX_VERSIONID  3.3.3
+#define _NETXEN_NIC_LINUX_SUBVERSION 4
+#define NETXEN_NIC_LINUX_VERSIONID  3.3.4
 
-#define NUM_FLASH_SECTORS (64)
-#define FLASH_SECTOR_SIZE (64 * 1024)
-#define FLASH_TOTAL_SIZE  (NUM_FLASH_SECTORS * FLASH_SECTOR_SIZE)
+#define NETXEN_NUM_FLASH_SECTORS (64)
+#define NETXEN_FLASH_SECTOR_SIZE (64 * 1024)
+#define NETXEN_FLASH_TOTAL_SIZE  (NETXEN_NUM_FLASH_SECTORS \
+   * NETXEN_FLASH_SECTOR_SIZE)
 
 #define PHAN_VENDOR_ID 0x4040
 
@@ -671,28 +672,28 @@ struct netxen_new_user_info {
 
 /* Flash memory map */
 typedef enum {
-   CRBINIT_START = 0,  /* Crbinit section */
-   BRDCFG_START = 0x4000,  /* board config */
-   INITCODE_START = 0x6000,/* pegtune code */
-   BOOTLD_START = 0x1, /* bootld */
-   IMAGE_START = 0x43000,  /* compressed image */
-   SECONDARY_START = 0x20, /* backup images */
-   PXE_START = 0x3E,   /* user defined region */
-   USER_START = 0x3E8000,  /* User defined region for new boards */
-   FIXED_START = 0x3F  /* backup of crbinit */
+   NETXEN_CRBINIT_START = 0,   /* Crbinit section */
+   NETXEN_BRDCFG_START = 0x4000,   /* board config */
+   NETXEN_INITCODE_START = 0x6000, /* pegtune code */
+   NETXEN_BOOTLD_START = 0x1,  /* bootld */
+   NETXEN_IMAGE_START = 0x43000,   /* compressed image */
+   NETXEN_SECONDARY_START = 0x20,  /* backup images */
+   NETXEN_PXE_START = 0x3E,/* user defined region */
+   NETXEN_USER_START = 0x3E8000,   /* User defined region for new boards */
+   NETXEN_FIXED_START = 0x3F   /* backup of crbinit */
 } netxen_flash_map_t;
 
-#define USER_START_OLD PXE_START   /* for backward compatibility */
-
-#define FLASH_START(CRBINIT_START)
-#define INIT_SECTOR(0)
-#define PRIMARY_START  (BOOTLD_START)
-#define FLASH_CRBINIT_SIZE (0x4000)
-#define FLASH_BRDCFG_SIZE  (sizeof(struct netxen_board_info))
-#define FLASH_USER_SIZE(sizeof(struct 
netxen_user_info)/sizeof(u32))
-#define FLASH_SECONDARY_SIZE   (USER_START-SECONDARY_START)
-#define NUM_PRIMARY_SECTORS(0x20)
-#define NUM_CONFIG_SECTORS (1)
+#define NETXEN_USER_START_OLD NETXEN_PXE_START /* for backward compatibility */
+
+#define NETXEN_FLASH_START (NETXEN_CRBINIT_START)
+#define NETXEN_INIT_SECTOR (0)
+#define NETXEN_PRIMARY_START   (NETXEN_BOOTLD_START)
+#define NETXEN_FLASH_CRBINIT_SIZE  (0x4000)
+#define NETXEN_FLASH_BRDCFG_SIZE   (sizeof(struct netxen_board_info))
+#define NETXEN_FLASH_USER_SIZE (sizeof(struct 
netxen_user_info)/sizeof(u32))
+#define NETXEN_FLASH_SECONDARY_SIZE
(NETXEN_USER_START-NETXEN_SECONDARY_START)
+#define NETXEN_NUM_PRIMARY_SECTORS (0x20)
+#define NETXEN_NUM_CONFIG_SECTORS  (1)
 #define PFX NetXen: 
 extern char netxen_nic_driver_name[];
 
diff --git a/drivers/net/netxen/netxen_nic_ethtool.c 
b/drivers/net/netxen/netxen_nic_ethtool.c
index ee1b5a2..4dfa76b 100644
--- a/drivers/net/netxen/netxen_nic_ethtool.c
+++ b/drivers/net/netxen/netxen_nic_ethtool.c
@@ -94,7 +94,7 @@ static const char netxen_nic_gstrings_test[][ETH_GSTRING_LEN] 
= {
 
 static int netxen_nic_get_eeprom_len(struct net_device *dev)
 {
-   return FLASH_TOTAL_SIZE;
+   return NETXEN_FLASH_TOTAL_SIZE;
 }
 
 static void
@@ -475,7 +475,7 @@ netxen_nic_set_eeprom(struct net_device *dev, struct 
ethtool_eeprom *eeprom,
return 0;
}
 
-   if (offset == BOOTLD_START) {
+   if (offset == NETXEN_BOOTLD_START) {
ret = netxen_flash_erase_primary(adapter);
if (ret != FLASH_SUCCESS) {
printk(KERN_ERR %s: Flash erase failed.\n, 
@@ -483,10 +483,10 @@ netxen_nic_set_eeprom(struct net_device *dev, struct 
ethtool_eeprom *eeprom,
return ret;
}
 
-   ret = netxen_rom_se(adapter, USER_START);
+   ret = netxen_rom_se(adapter, NETXEN_USER_START);
if (ret != FLASH_SUCCESS)

Re: many sockets, slow sendto

2007-03-21 Thread Eric Dumazet


Eric Dumazet a écrit :
Currently, udp_hash[UDP_HTABLE_SIZE] is using a hash function based on 
dport number only.


In your case, as you use a single port value, all sockets are in a 
single slot of this hash table :
To find the good socket, __udp4_lib_lookup() has to search in a list 
with thousands of elements. Not that good, isnt it ? :(


In case you want to try, here is a patch that could help you :)

[PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo

Some people want to have many UDP sockets, binded to a single port but many 
different addresses. We currently hash all those sockets into a single chain. 
Processing of incoming packets is very expensive, because the whole chain must 
be examined to find the best match.


I chose in this patch to hash UDP sockets with a hash function that take into 
account both their port number and address : This has a drawback because we 
need two lookups : one with a given address, one with a wildcard (null) address.


Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 71b0b60..27437e7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -114,14 +114,33 @@ DEFINE_RWLOCK(udp_hash_lock);
 
 static int udp_port_rover;
 
-static inline int __udp_lib_lport_inuse(__u16 num, struct hlist_head 
udptable[])
+/*
+ * Note about this hash function :
+ * Typical use is probably daddr = 0, only dport is going to vary hash
+ */
+static inline unsigned int hash_port_and_addr(__u16 port, __be32 addr)
+{
+   addr ^= addr  16;
+   addr ^= addr  8;
+   return port ^ addr;
+}
+
+static inline int __udp_lib_port_inuse(unsigned int hash, int port,
+   __be32 daddr, struct hlist_head udptable[])
 {
struct sock *sk;
struct hlist_node *node;
+   struct inet_sock *inet;
 
-   sk_for_each(sk, node, udptable[num  (UDP_HTABLE_SIZE - 1)])
-   if (sk-sk_hash == num)
+   sk_for_each(sk, node, udptable[hash  (UDP_HTABLE_SIZE - 1)]) {
+   if (sk-sk_hash != hash)
+   continue;
+   inet = inet_sk(sk);
+   if (inet-num != port)
+   continue;
+   if (inet-rcv_saddr == daddr)
return 1;
+   }
return 0;
 }
 
@@ -142,6 +161,7 @@ int __udp_lib_get_port(struct sock *sk, 
struct hlist_node *node;
struct hlist_head *head;
struct sock *sk2;
+   unsigned int hash;
interror = 1;
 
write_lock_bh(udp_hash_lock);
@@ -156,7 +176,9 @@ int __udp_lib_get_port(struct sock *sk, 
for (i = 0; i  UDP_HTABLE_SIZE; i++, result++) {
int size;
 
-   head = udptable[result  (UDP_HTABLE_SIZE - 1)];
+   hash = hash_port_and_addr(result,
+   inet_sk(sk)-rcv_saddr);
+   head = udptable[hash  (UDP_HTABLE_SIZE - 1)];
if (hlist_empty(head)) {
if (result  sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
@@ -181,7 +203,10 @@ int __udp_lib_get_port(struct sock *sk, 
result = sysctl_local_port_range[0]
+ ((result - 
sysctl_local_port_range[0]) 
   (UDP_HTABLE_SIZE - 1));
-   if (! __udp_lib_lport_inuse(result, udptable))
+   hash = hash_port_and_addr(result,
+   inet_sk(sk)-rcv_saddr);
+   if (! __udp_lib_port_inuse(hash, result,
+   inet_sk(sk)-rcv_saddr, udptable))
break;
}
if (i = (1  16) / UDP_HTABLE_SIZE)
@@ -189,11 +214,13 @@ int __udp_lib_get_port(struct sock *sk, 
 gotit:
*port_rover = snum = result;
} else {
-   head = udptable[snum  (UDP_HTABLE_SIZE - 1)];
+   hash = hash_port_and_addr(snum, inet_sk(sk)-rcv_saddr);
+   head = udptable[hash  (UDP_HTABLE_SIZE - 1)];
 
sk_for_each(sk2, node, head)
-   if (sk2-sk_hash == snum 
+   if (sk2-sk_hash == hash 
sk2 != sk
+   inet_sk(sk2)-num == snum
(!sk2-sk_reuse|| !sk-sk_reuse) 
(!sk2-sk_bound_dev_if || !sk-sk_bound_dev_if
 || sk2-sk_bound_dev_if == sk-sk_bound_dev_if) 
@@ -201,9 +228,9 @@ gotit:
goto fail;
}
inet_sk(sk)-num = snum;
-   sk-sk_hash = snum;
+   sk-sk_hash = hash;
if (sk_unhashed(sk)) {
-   head =

Re: many sockets, slow sendto

2007-03-21 Thread Eric Dumazet


Zacco a écrit :
Actually, the source address would be more important in my case, as my 
clients (each with different IP address) wants to connect to the same 
server, i.e. to the same address and port.


I dont understand why you need many sockets then.

A single socket should be enough.

I think, the current design is fair enough for server implementations 
and for regular clients. But even though my application is not tipical, 
as far as I know (but it can be important with the fast performance 
growth of regular PCs), the make-up should be general enough to cope 
with special circumstances, like mine. My initial idea was to somehow 
include the complete socket pair, i.e. source address:port and 
destination address:port, keeping in mind that it should work for both 
IPv4 and IPv6. Maybe it's an overkill, I don't know.


Could you send me a copy of your application source, or detailed specs, 
because I am confused right now...



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] [PATCHSET] netlink error management

2007-03-21 Thread Thomas Graf

This series of patches simplifies the error management and
signalization of dump starts of netlink_run_queue() message
handlers. It touches a fair bit of nfnetlink code as the
error pointer has been passed on to subsystems.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] [NETLINK]: Remove error pointer from netlink message handler

2007-03-21 Thread Thomas Graf

The error pointer argument in netlink message handlers is used
to signal the special case where processing has to be interrupted
because a dump was started but no error happened. Instead it is
simpler and more clear to return -EINTR and have netlink_run_queue()
deal with getting the queue right.

nfnetlink passed on this error pointer to its subsystem handlers
but only uses it to signal the start of a netlink dump. Therefore
it can be removed there as well.

This patch also cleans up the error handling in the affected
message handlers to be consistent since it had to be touched anyway.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.22/net/core/rtnetlink.c
===
--- net-2.6.22.orig/net/core/rtnetlink.c2007-03-21 15:36:28.0 
+0100
+++ net-2.6.22/net/core/rtnetlink.c 2007-03-21 18:38:32.0 +0100
@@ -851,8 +851,7 @@ static int rtattr_max;
 
 /* Process one rtnetlink message. */
 
-static __inline__ int
-rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, int *errp)
+static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
rtnl_doit_func doit;
int sz_idx, kind;
@@ -862,10 +861,8 @@ rtnetlink_rcv_msg(struct sk_buff *skb, s
int err;
 
type = nlh-nlmsg_type;
-
-   /* Unknown message: reply with EINVAL */
if (type  RTM_MAX)
-   goto err_inval;
+   return -EINVAL;
 
type -= RTM_BASE;
 
@@ -874,40 +871,33 @@ rtnetlink_rcv_msg(struct sk_buff *skb, s
return 0;
 
family = ((struct rtgenmsg*)NLMSG_DATA(nlh))-rtgen_family;
-   if (family = NPROTO) {
-   *errp = -EAFNOSUPPORT;
-   return -1;
-   }
+   if (family = NPROTO)
+   return -EAFNOSUPPORT;
 
sz_idx = type2;
kind = type3;
 
-   if (kind != 2  security_netlink_recv(skb, CAP_NET_ADMIN)) {
-   *errp = -EPERM;
-   return -1;
-   }
+   if (kind != 2  security_netlink_recv(skb, CAP_NET_ADMIN))
+   return -EPERM;
 
if (kind == 2  nlh-nlmsg_flagsNLM_F_DUMP) {
rtnl_dumpit_func dumpit;
 
dumpit = rtnl_get_dumpit(family, type);
if (dumpit == NULL)
-   goto err_inval;
+   return -EINVAL;
 
-   if ((*errp = netlink_dump_start(rtnl, skb, nlh,
-   dumpit, NULL)) != 0) {
-   return -1;
-   }
-
-   netlink_queue_skip(nlh, skb);
-   return -1;
+   err = netlink_dump_start(rtnl, skb, nlh, dumpit, NULL);
+   if (err == 0)
+   err = -EINTR;
+   return err;
}
 
memset(rta_buf, 0, (rtattr_max * sizeof(struct rtattr *)));
 
min_len = rtm_min[sz_idx];
if (nlh-nlmsg_len  min_len)
-   goto err_inval;
+   return -EINVAL;
 
if (nlh-nlmsg_len  min_len) {
int attrlen = nlh-nlmsg_len - NLMSG_ALIGN(min_len);
@@ -917,7 +907,7 @@ rtnetlink_rcv_msg(struct sk_buff *skb, s
unsigned flavor = attr-rta_type;
if (flavor) {
if (flavor  rta_max[sz_idx])
-   goto err_inval;
+   return -EINVAL;
rta_buf[flavor-1] = attr;
}
attr = RTA_NEXT(attr, attrlen);
@@ -926,15 +916,9 @@ rtnetlink_rcv_msg(struct sk_buff *skb, s
 
doit = rtnl_get_doit(family, type);
if (doit == NULL)
-   goto err_inval;
-   err = doit(skb, nlh, (void *)rta_buf[0]);
-
-   *errp = err;
-   return err;
+   return -EINVAL;
 
-err_inval:
-   *errp = -EINVAL;
-   return -1;
+   return doit(skb, nlh, (void *)rta_buf[0]);
 }
 
 static void rtnetlink_rcv(struct sock *sk, int len)
Index: net-2.6.22/net/netlink/genetlink.c
===
--- net-2.6.22.orig/net/netlink/genetlink.c 2007-03-21 15:42:18.0 
+0100
+++ net-2.6.22/net/netlink/genetlink.c  2007-03-21 18:38:32.0 +0100
@@ -295,60 +295,49 @@ int genl_unregister_family(struct genl_f
return -ENOENT;
 }
 
-static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
-  int *errp)
+static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
struct genl_ops *ops;
struct genl_family *family;
struct genl_info info;
struct genlmsghdr *hdr = nlmsg_data(nlh);
-   int hdrlen, err = -EINVAL;
+   int hdrlen, err;
 
family = genl_family_find_byid(nlh-nlmsg_type);
-   if (family == NULL) {
-   err = -ENOENT;
-   goto errout;
-   }
+   if

[PATCH 2/3] [IPv4] diag: Use netlink_run_queue() to process the receive queue

2007-03-21 Thread Thomas Graf

Makes use of netlink_run_queue() to process the receive queue and
converts inet_diag_rcv_msg() to use the type safe netlink interface.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.22/net/ipv4/inet_diag.c
===
--- net-2.6.22.orig/net/ipv4/inet_diag.c2007-03-21 18:40:29.0 
+0100
+++ net-2.6.22/net/ipv4/inet_diag.c 2007-03-22 00:08:05.0 +0100
@@ -806,68 +806,48 @@ done:
return skb-len;
 }
 
-static inline int inet_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
+static int inet_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
-   if (!(nlh-nlmsg_flagsNLM_F_REQUEST))
-   return 0;
+   int hdrlen = sizeof(struct inet_diag_req);
 
-   if (nlh-nlmsg_type = INET_DIAG_GETSOCK_MAX)
-   goto err_inval;
+   if (nlh-nlmsg_type = INET_DIAG_GETSOCK_MAX ||
+   nlmsg_len(nlh)  hdrlen)
+   return -EINVAL;
 
if (inet_diag_table[nlh-nlmsg_type] == NULL)
return -ENOENT;
 
-   if (NLMSG_LENGTH(sizeof(struct inet_diag_req))  skb-len)
-   goto err_inval;
-
-   if (nlh-nlmsg_flagsNLM_F_DUMP) {
-   if (nlh-nlmsg_len 
-   (4 + NLMSG_SPACE(sizeof(struct inet_diag_req {
-   struct rtattr *rta = (void *)(NLMSG_DATA(nlh) +
-sizeof(struct inet_diag_req));
-   if (rta-rta_type != INET_DIAG_REQ_BYTECODE ||
-   rta-rta_len  8 ||
-   rta-rta_len 
-   (nlh-nlmsg_len -
-NLMSG_SPACE(sizeof(struct inet_diag_req
-   goto err_inval;
-   if (inet_diag_bc_audit(RTA_DATA(rta), RTA_PAYLOAD(rta)))
-   goto err_inval;
-   }
-   return netlink_dump_start(idiagnl, skb, nlh,
- inet_diag_dump, NULL);
-   } else
-   return inet_diag_get_exact(skb, nlh);
-
-err_inval:
-   return -EINVAL;
-}
+   if (nlh-nlmsg_flags  NLM_F_DUMP) {
+   int err;
 
+   if (nlmsg_attrlen(nlh, hdrlen)) {
+   struct nlattr *attr;
 
-static inline void inet_diag_rcv_skb(struct sk_buff *skb)
-{
-   if (skb-len = NLMSG_SPACE(0)) {
-   int err;
-   struct nlmsghdr *nlh = nlmsg_hdr(skb);
+   attr = nlmsg_find_attr(nlh, hdrlen,
+  INET_DIAG_REQ_BYTECODE);
+   if (attr == NULL ||
+   nla_len(attr)  sizeof(struct inet_diag_bc_op) ||
+   inet_diag_bc_audit(nla_data(attr), nla_len(attr)))
+   return -EINVAL;
+   }
 
-   if (nlh-nlmsg_len  sizeof(*nlh) ||
-   skb-len  nlh-nlmsg_len)
-   return;
-   err = inet_diag_rcv_msg(skb, nlh);
-   if (err || nlh-nlmsg_flags  NLM_F_ACK)
-   netlink_ack(skb, nlh, err);
+   err = netlink_dump_start(idiagnl, skb, nlh,
+inet_diag_dump, NULL);
+   if (err == 0)
+   err = -EINTR;
+   return err;
}
+
+   return inet_diag_get_exact(skb, nlh);
 }
 
 static void inet_diag_rcv(struct sock *sk, int len)
 {
-   struct sk_buff *skb;
-   unsigned int qlen = skb_queue_len(sk-sk_receive_queue);
+   unsigned int qlen = 0;
 
-   while (qlen--  (skb = skb_dequeue(sk-sk_receive_queue))) {
-   inet_diag_rcv_skb(skb);
-   kfree_skb(skb);
-   }
+   do {
+   netlink_run_queue(sk, qlen, inet_diag_rcv_msg);
+   } while (qlen);
 }
 
 static DEFINE_SPINLOCK(inet_diag_register_lock);

--

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: many sockets, slow sendto

2007-03-21 Thread Eric Dumazet


Zacco a écrit :
So, my worry is confirmed then. But how could that delay disappear when 
splitting the sender and receiver on distinct hosts? Even in that case 
the good socket must be found somehow.



When the receiver and sender are on the same machine, the sendto() pass the 
packet to loopback and enters the receiving side. With that many sockets, the 
time to go through all sockets maybe 100 us. So your sendto() seems to be 
slow, but the slow part is the receiver.


If you put two machines, the sender might send XX.XXX frames per second (full 
speed), but the receiver might handle 5% of them and drop 95%


This is all speculation, since you didnt gave us the exact setup you use.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

fix up misplaced inlines.

2007-03-21 Thread Dave Jones

Turning up the warnings on gcc makes it emit warnings
about the placement of 'inline' in function declarations.
Here's everything that was under net/

Signed-off-by: Dave Jones [EMAIL PROTECTED]

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 4c914df..ecfe8da 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -319,7 +319,7 @@ static int __hidp_send_ctrl_message(struct hidp_session 
*session,
return 0;
 }
 
-static int inline hidp_send_ctrl_message(struct hidp_session *session,
+static inline int hidp_send_ctrl_message(struct hidp_session *session,
unsigned char hdr, unsigned char *data, int size)
 {
int err;
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 7712d76..5439a3c 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -61,7 +61,7 @@ static int brnf_filter_vlan_tagged __read_mostly = 1;
 #define brnf_filter_vlan_tagged 1
 #endif
 
-static __be16 inline vlan_proto(const struct sk_buff *skb)
+static inline __be16 vlan_proto(const struct sk_buff *skb)
 {
return vlan_eth_hdr(skb)-h_vlan_encapsulated_proto;
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 8d65d64..27c4f62 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -808,7 +808,7 @@ lenout:
  *
  * (We also register the sk_lock with the lock validator.)
  */
-static void inline sock_lock_init(struct sock *sk)
+static inline void sock_lock_init(struct sock *sk)
 {
sock_lock_init_class_and_name(sk,
af_family_slock_key_strings[sk-sk_family],
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a7fee6b..1b61699 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -804,7 +804,7 @@ struct ipv6_saddr_score {
 #define IPV6_SADDR_SCORE_LABEL 0x0020
 #define IPV6_SADDR_SCORE_PRIVACY   0x0040
 
-static int inline ipv6_saddr_preferred(int type)
+static inline int ipv6_saddr_preferred(int type)
 {
if (type  (IPV6_ADDR_MAPPED|IPV6_ADDR_COMPATv4|
IPV6_ADDR_LOOPBACK|IPV6_ADDR_RESERVED))
@@ -813,7 +813,7 @@ static int inline ipv6_saddr_preferred(int type)
 }
 
 /* static matching label */
-static int inline ipv6_saddr_label(const struct in6_addr *addr, int type)
+static inline int ipv6_saddr_label(const struct in6_addr *addr, int type)
 {
  /*
   *prefix (longest match)  label
@@ -3318,7 +3318,7 @@ errout:
rtnl_set_sk_err(RTNLGRP_IPV6_IFADDR, err);
 }
 
-static void inline ipv6_store_devconf(struct ipv6_devconf *cnf,
+static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
__s32 *array, int bytes)
 {
BUG_ON(bytes  (DEVCONF_MAX * 4));
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0e1f4b2..a6b3117 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -308,7 +308,7 @@ static inline void rt6_probe(struct rt6_info *rt)
 /*
  * Default Router Selection (RFC 2461 6.3.6)
  */
-static int inline rt6_check_dev(struct rt6_info *rt, int oif)
+static inline int rt6_check_dev(struct rt6_info *rt, int oif)
 {
struct net_device *dev = rt-rt6i_dev;
int ret = 0;
@@ -328,7 +328,7 @@ static int inline rt6_check_dev(struct rt6_info *rt, int 
oif)
return ret;
 }
 
-static int inline rt6_check_neigh(struct rt6_info *rt)
+static inline int rt6_check_neigh(struct rt6_info *rt)
 {
struct neighbour *neigh = rt-rt6i_nexthop;
int m = 0;
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index ee4b84a..93c4223 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -58,7 +58,7 @@ static struct kmem_cache *xfrm6_tunnel_spi_kmem __read_mostly;
 static struct hlist_head 
xfrm6_tunnel_spi_byaddr[XFRM6_TUNNEL_SPI_BYADDR_HSIZE];
 static struct hlist_head xfrm6_tunnel_spi_byspi[XFRM6_TUNNEL_SPI_BYSPI_HSIZE];
 
-static unsigned inline xfrm6_tunnel_spi_hash_byaddr(xfrm_address_t *addr)
+static inline unsigned xfrm6_tunnel_spi_hash_byaddr(xfrm_address_t *addr)
 {
unsigned h;
 
@@ -70,7 +70,7 @@ static unsigned inline 
xfrm6_tunnel_spi_hash_byaddr(xfrm_address_t *addr)
return h;
 }
 
-static unsigned inline xfrm6_tunnel_spi_hash_byspi(u32 spi)
+static inline unsigned xfrm6_tunnel_spi_hash_byspi(u32 spi)
 {
return spi % XFRM6_TUNNEL_SPI_BYSPI_HSIZE;
 }
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index e85df07..abc47cc 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -93,7 +93,7 @@ void route4_reset_fastmap(struct net_device *dev, struct 
route4_head *head, u32
spin_unlock_bh(dev-queue_lock);
 }
 
-static void __inline__
+static inline void
 route4_set_fastmap(struct route4_head *head, u32 id, int iif,
   struct route4_filter *f)
 {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 9678995..e81e2fb 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2025,7 +2025,7 @@ nlmsg_failure:
return -1;
 }
 
-static int inline

Re: many sockets, slow sendto

2007-03-21 Thread David Miller

From: Zacco [EMAIL PROTECTED]
Date: Wed, 21 Mar 2007 22:53:13 +0100

 Do you think there is interest in such a modification? If so, how could 
 we go on with it?

The best thing you can do is hash on both saddr/sport.  In order
to handle the saddr==0 case the socket lookup has to try two
lookups, one with the packets saddr, and one with saddr zero.
If the first lookup hits, we use that, since a precise match
should match over one to a wildcard saddr, otherwise we use the
result of the second lookup.

I'm not very inclined to hack on this, so anyone else is welcome
to.

FWIW, pretty much every other networking stack only hashes on sport
for UDP, just like Linux.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH -mm 3/4] Blackfin: on-chip ethernet MAC controller update driver

2007-03-21 Thread Wu, Bryan

Hi folks,

As we move 4 piece same board specific code get_bf537_ether_addr() into
arch/blackfin/mach-bf537/boards/eth_mac.c, the comment of driver should
be updated.

Signed-off-by: Bryan Wu [EMAIL PROTECTED]
---

 drivers/net/bfin_mac.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/drivers/net/bfin_mac.c
===
--- linux-2.6.orig/drivers/net/bfin_mac.c
+++ linux-2.6/drivers/net/bfin_mac.c
@@ -842,7 +842,7 @@
/*Is it valid? (Did bootloader initialize it?) */
if (!is_valid_ether_addr(dev-dev_addr)) {
/* Grab the MAC from the board somehow - this is done in the
-  arch/blackfin/boards/bf537/boardname.c */
+  arch/blackfin/mach-bf537/boards/eth_mac.c */
get_bf537_ether_addr(dev-dev_addr);
}
_

Thanks
-Bryan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tcp_cubic: use 32 bit math

2007-03-21 Thread Stephen Hemminger

On Tue, 13 Mar 2007 21:50:20 +0100
Willy Tarreau [EMAIL PROTECTED] wrote:

 Hi Stephen,
 
 On Mon, Mar 12, 2007 at 02:11:56PM -0700, Stephen Hemminger wrote:
   Oh BTW, I have a newer version with a first approximation of the
   cbrt() before the div64_64, which allows us to reduce from 3 div64
   to only 2 div64. This results in a version which is twice as fast
   as the initial one (ncubic), but with slightly less accuracy (0.286%
   compared to 0.247). But I see that other functions such as hcbrt()
   had a 1.5% avg error, so I think this is not dramatic.
  
  Ignore my hcbrt() it was a less accurate version of andi's stuff.
 
 OK.
 
   Also, I managed to remove all other divides, to be kind with CPUs
   having a slow divide instruction or no divide at all. Since we compute
   on limited range (22 bits), we can multiply then shift right. It shows
   me even slightly better time on pentium-m and athlon, with a slightly
   higher avg error (0.297% compared to 0.286%), and slightly smaller
   code.
  
  What does the code look like?
 
 Well, I have cleaned it a little bit, there were more comments and ifdefs
 than code ! I've appended it to the end of this mail.
 
 I have changed it a bit, because I noticed that integer divide precision
 was so coarse that there were other possibilities to play with the bits.
 
 I have experimented with combinations of several methods :
   - replace integer divides with multiplies/shifts where possible.
 
   - compensation for divide imprecisions by adding/removing small
 values bofore/after them. Often, the integer result of 1/(x*(x-1))
 is closer to (float)1/(float)x^2 than 1/(x*x). This is because
 the divide always truncates the result.
 
   - use direct result lookup for small values. Small inputs give small
 outputs which have very few moving bits. Many different values fit
 in a 32bit integer, so we use a shift offset to lookup the value.
 I used this in an fls function I wrote a while ago, that I should
 also post because it is up to twice as fast as the kernel's.
 Sometimes it seems faster to lookup in from memory, sometimes it
 is faster from an immediate value. Maybe more visible differences
 would show up on RISC CPUs where loading 32 bits immediate needs
 two instructions. I don't know yet, I've not tested on my sparc
 yet.
 
   - use small lookup tables (64 bytes) with 6 bits inputs and at least
 as many on output. We only lookup the 6 MSB and return the 2-3 MSB
 of the result.
 
   - iterative search and manual refinment of the lookup tables for best
 accuracy. The avg error rate can easily be halved this way.
 
 I have duplicated tried several functions with 0, 1, 2 and 3 divides.
 Several of them offer better accuracy over what we currently have, in
 less cycles. Others offer faster results (up to 5 times) with slightly
 less accuracy.
 
 There is one function which is not to be used, but is just here for
 comparison (ncubic_0div). It does no divide but has awful avg error.
 
 But one which is interesting is the ncubic_tab0. It does not use any
 divide at all, even not any div64. It shows a 0.6% avg error, which I'm
 not sure is enough or not. It is 6.7 times faster than initial ncubic()
 with less accuracy, and 4 times smaller. I suspect that it can differ
 more on architectures which have no divide instruction.
 
 Is 0.6% avg error rate is too much, ncubic_tab1() uses one single div64
 and is twice slower (still nearly 3 times faster than ncubic). It show
 0.195% avg error, which is better than initial ncubic. I think that it
 is a good tradeoff.
 
 If best accuracy is an absolute requirement, then I have a variation of
 ncubic (ncubic_3div) which does 0.17% in 2/3 of the time (compared to
 0.247%), and which is slightly smaller.
 
 I have also added a size column, indicating approximative function
 size, provided that the compiler does not reorder the code. On gcc 3.4,
 it's OK, but 4.1 returns garbage. That does not matter, it's just a
 rough estimate anyway.
 
 Here are the results classed by speed :
 
 /* Sample output on a Pentium-M 600 MHz :
 
 Function  clocks mean(us)  max(us)  std(us) Avg err size
 ncubic_tab0   79 0.66 7.20 1.04  0.613%  160
 ncubic_0div   84 0.70 7.64 1.57  4.521%  192
 ncubic_1div  178 1.4816.27 1.81  0.443%  336
 ncubic_tab1  179 1.4916.34 1.85  0.195%  320
 ncubic_ndiv3 263 2.1824.04 3.59  0.250%  512
 ncubic_2div  270 2.2424.70 2.77  0.187%  512
 ncubic32_1   359 2.9832.81 3.59  0.238%  544
 ncubic_3div  361 2.9933.08 3.79  0.170%  656
 ncubic32 364 3.0233.29 3.51  0.247%  544
 ncubic   529 4.3948.39 4.92  0.247%  720
 hcbrt539 4.4749.25 5.98  1.580%   96
 ocubic   732 4.9361.83 7.22  0.274%  320

Re: [PATCH] tcp_cubic: use 32 bit math

2007-03-21 Thread Willy Tarreau

Hi Stephen,

On Wed, Mar 21, 2007 at 11:54:19AM -0700, Stephen Hemminger wrote:
 On Tue, 13 Mar 2007 21:50:20 +0100
 Willy Tarreau [EMAIL PROTECTED] wrote:

[...] ( cut my boring part )

  Here are the results classed by speed :
  
  /* Sample output on a Pentium-M 600 MHz :
  
  Function  clocks mean(us)  max(us)  std(us) Avg err size
  ncubic_tab0   79 0.66 7.20 1.04  0.613%  160
  ncubic_0div   84 0.70 7.64 1.57  4.521%  192
  ncubic_1div  178 1.4816.27 1.81  0.443%  336
  ncubic_tab1  179 1.4916.34 1.85  0.195%  320
  ncubic_ndiv3 263 2.1824.04 3.59  0.250%  512
  ncubic_2div  270 2.2424.70 2.77  0.187%  512
  ncubic32_1   359 2.9832.81 3.59  0.238%  544
  ncubic_3div  361 2.9933.08 3.79  0.170%  656
  ncubic32 364 3.0233.29 3.51  0.247%  544
  ncubic   529 4.3948.39 4.92  0.247%  720
  hcbrt539 4.4749.25 5.98  1.580%   96
  ocubic   732 4.9361.83 7.22  0.274%  320
  acbrt842 6.9876.73 8.55  0.275%  192
  bictcp  1032 6.9586.30 9.04  0.172%  768
  

[...]

 The following version of div64_64 is faster because do_div already
 optimized for the 32 bit case..

Cool, this is interesting because I first wanted to optimize it but did
not find how to start with this. You seem to get very good results. BTW,
you did not append your changes.

However, one thing I do not understand is why your avg error is about 1/3
below the original one. Was there a precision bug in the original div_64_64
or did you extend the values used in the test ?

Or perhaps you used -fast-math to build and the original cbrt() is less
precise in this case ?

 I get the following results on ULV Core Solo (ie slow current processor)
 and the following on 64bit Core Duo. ncubic_tab1 seems like
 the best (no additional error and about as fast)

OK. It was the one I preferred too unless tab0's avg error was acceptable.

 ULV Core Solo
 
 Function  clocks mean(us)  max(us)  std(us) Avg err size
 ncubic_tab0  19211.2445.1015.28  0.450% -2262
 ncubic_0div  20111.7747.2327.40  3.357% -2404
 ncubic_1div  32419.0276.3225.82  0.189% -2567
 ncubic_tab1  32619.1376.7323.71  0.043% -2059
 ncubic_2div  45626.72   108.92   493.16  0.028% -2790
 ncubic_ndiv3 46327.15   133.37  1889.39  0.104% -3344
 ncubic32 54932.18   130.59   508.97  0.041% -3794
 ncubic32_1   57433.66   138.32   548.48  0.029% -3604
 ncubic_3div  58134.04   140.24   608.55  0.018% -3050
 ncubic   73342.92   173.35   523.19  0.041%  299
 ocubic  104661.25   283.68  3305.65  0.027% -2232
 acbrt   114967.32   284.91  1941.55  0.029%  168
 bictcp  166397.41   394.29   604.86  0.017%  628
 
 Core 2 Duo
 
 Function  clocks mean(us)  max(us)  std(us) Avg err size
 ncubic_0div   74 0.03 1.60 0.07  3.357% -2101
 ncubic_tab0   74 0.03 1.60 0.04  0.450% -2029
 ncubic_1div  142 0.07 3.11 1.05  0.189% -2195
 ncubic_tab1  144 0.07 3.18 1.02  0.043% -1638
 ncubic_2div  216 0.10 4.74 1.07  0.028% -2326
 ncubic_ndiv3 219 0.10 4.76 1.04  0.104% -2709
 ncubic32 269 0.13 5.87 1.13  0.041% -1500
 ncubic32_1   272 0.13 5.92 1.10  0.029% -2881
 ncubic   273 0.13 5.96 1.13  0.041% -1763
 ncubic_3div  290 0.14 6.32 1.01  0.018% -2499
 acbrt430 0.20 9.42 1.18  0.029%   77
 ocubic   444 0.21 9.82 1.82  0.027% -1924
 bictcp   549 0.2612.06 1.68  0.017%  236

Thanks,
Willy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] tcp: cubic optimization

2007-03-21 Thread Stephen Hemminger

Use willy's work in optimizing cube root by having table for small values.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- net-2.6.22.orig/net/ipv4/tcp_cubic.c2007-03-21 12:57:11.0 
-0700
+++ net-2.6.22/net/ipv4/tcp_cubic.c 2007-03-21 13:04:59.0 -0700
@@ -91,23 +91,51 @@
tcp_sk(sk)-snd_ssthresh = initial_ssthresh;
 }
 
-/*
- * calculate the cubic root of x using Newton-Raphson
+/* calculate the cubic root of x using a table lookup followed by one
+ * Newton-Raphson iteration.
+ * Avg err ~= 0.195%
  */
 static u32 cubic_root(u64 a)
 {
-   u32 x;
-
-   /* Initial estimate is based on:
-* cbrt(x) = exp(log(x) / 3)
+   u32 x, b, shift;
+   /*
+* cbrt(x) MSB values for x MSB values in [0..63].
+* Precomputed then refined by hand - Willy Tarreau
+*
+* For x in [0..63],
+*   v = cbrt(x  18) - 1
+*   cbrt(x) = (v[x] + 10)  6
 */
-   x = 1u  (fls64(a)/3);
+   static const u8 v[] = {
+   /* 0x00 */0,   54,   54,   54,  118,  118,  118,  118,
+   /* 0x08 */  123,  129,  134,  138,  143,  147,  151,  156,
+   /* 0x10 */  157,  161,  164,  168,  170,  173,  176,  179,
+   /* 0x18 */  181,  185,  187,  190,  192,  194,  197,  199,
+   /* 0x20 */  200,  202,  204,  206,  209,  211,  213,  215,
+   /* 0x28 */  217,  219,  221,  222,  224,  225,  227,  229,
+   /* 0x30 */  231,  232,  234,  236,  237,  239,  240,  242,
+   /* 0x38 */  244,  245,  246,  248,  250,  251,  252,  254,
+   };
+
+   b = fls64(a);
+   if (b  7) {
+   /* a in [0..63] */
+   return ((u32)v[(u32)a] + 35)  6;
+   }
+
+   b = ((b * 84)  8) - 1;
+   shift = (a  (b * 3));
 
-   /* converges to 32 bits in 3 iterations */
-   x = (2 * x + (u32)div64_64(a, (u64)x*(u64)x)) / 3;
-   x = (2 * x + (u32)div64_64(a, (u64)x*(u64)x)) / 3;
-   x = (2 * x + (u32)div64_64(a, (u64)x*(u64)x)) / 3;
+   x = ((u32)(((u32)v[shift] + 10)  b))  6;
 
+   /*
+* Newton-Raphson iteration
+* 2
+* x= ( 2 * x  +  a / x  ) / 3
+*  k+1  k k
+*/
+   x = (2 * x + (u32)div64_64(a, (u64)x * (u64)(x - 1)));
+   x = ((x * 341)  10);
return x;
 }
 


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] div64_64 optimization

2007-03-21 Thread Stephen Hemminger

Minor optimization of div64_64.  do_div() already does optimization
for the case of 32 by 32 divide, so no need to do it here.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- net-2.6.22.orig/lib/div64.c 2007-03-21 12:03:59.0 -0700
+++ net-2.6.22/lib/div64.c  2007-03-21 12:04:46.0 -0700
@@ -61,20 +61,18 @@
 /* 64bit divisor, dividend and result. dynamic precision */
 uint64_t div64_64(uint64_t dividend, uint64_t divisor)
 {
-   uint32_t d = divisor;
+   uint32_t high, d;
 
-   if (divisor  0xULL) {
-   unsigned int shift = fls(divisor  32);
+   high = divisor  32;
+   if (high) {
+   unsigned int shift = fls(high);
 
d = divisor  shift;
dividend = shift;
-   }
+   } else
+   d = divisor;
 
-   /* avoid 64 bit division if possible */
-   if (dividend  32)
-   do_div(dividend, d);
-   else
-   dividend = (uint32_t) dividend / d;
+   do_div(dividend, d);
 
return dividend;
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3]

2007-03-21 Thread David Howells


David Howells [EMAIL PROTECTED] wrote:

   - recvmsg not supporting MSG_TRUNC is rather weird and really ought to be
   fixed one day as its useful to find out the sizeof message pending when
   combined with MSG_PEEK
  
  Hmmm...  I hadn't considered that.  I assumed MSG_TRUNC not to be useful as
  arbitrarily chopping bits out of the request or reply would seem to be
  pointless.
 
 But why do I need to support MSG_TRUNC?  I currently have things arranged so
 that if you do a recvmsg() that doesn't pull everything out of a packet then
 the next time you do a recvmsg() you'll get the next part of the data in that
 packet.  MSG_EOR is flagged when recvmsg copies across the last byte of data
 of a particular phase.

Okay...  I've rewritten my recvmsg implementation for RxRPC.  The one I had
could pull messages belonging to a call off the socket in the wrong order if
two threads both tried to pull simultaneously.

Also:

 (1) If there's a sequence of data messages belonging to a particular call on
 the receive queue, then recvmsg() will keep eating them until it meets
 either a non-data message or a message belonging to a different call or
 until it fills the user buffer.  If it doesn't fill the user buffer, it
 will sleep unless it is non-blocking.

 (2) MSG_PEEK operates similarly, but will return immediately if it has put any
 data in the buffer rather than waiting for further packets to arrive.

 (3) If a packet is only partially consumed in filling a user buffer, then the
 shrunken packet will be left on the front of the queue for the next taker.

 (4) If there is more data to be had on a call (we haven't copied the last byte
 of the last data packet in that phase yet), then MSG_MORE will be flagged.

 (5) MSG_EOR will be flagged on the terminal message of a call.  No more
 messages from that call will be received, and the user ID may be reused.

Patch attached.

David

diff --git a/net/rxrpc/Makefile b/net/rxrpc/Makefile
index 3369534..f12cd28 100644
--- a/net/rxrpc/Makefile
+++ b/net/rxrpc/Makefile
@@ -17,6 +17,7 @@ af-rxrpc-objs := \
ar-local.o \
ar-output.o \
ar-peer.o \
+   ar-recvmsg.o \
ar-security.o \
ar-skbuff.o \
ar-transport.o
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index b25d931..06963e6 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -385,217 +385,6 @@ out:
 }
 
 /*
- * receive a message from an RxRPC socket
- */
-static int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
-struct msghdr *msg, size_t len, int flags)
-{
-   struct rxrpc_skb_priv *sp;
-   struct rxrpc_call *call;
-   struct rxrpc_sock *rx = rxrpc_sk(sock-sk);
-   struct sk_buff *skb;
-   int copy, ret, ullen;
-   u32 abort_code;
-
-   _enter(,,,%zu,%d, len, flags);
-
-   if (flags  (MSG_OOB | MSG_TRUNC))
-   return -EOPNOTSUPP;
-
-try_again:
-   if (RB_EMPTY_ROOT(rx-calls) 
-   rx-sk.sk_state != RXRPC_SERVER_LISTENING)
-   return -ENODATA;
-
-   /* receive the next message from the common Rx queue */
-   skb = skb_recv_datagram(rx-sk, flags, flags  MSG_DONTWAIT, ret);
-   if (!skb) {
-   _leave( = %d, ret);
-   return ret;
-   }
-
-   sp = rxrpc_skb(skb);
-   call = sp-call;
-   ASSERT(call != NULL);
-
-   /* make sure we wait for the state to be updated in this call */
-   spin_lock_bh(call-lock);
-   spin_unlock_bh(call-lock);
-
-   if (test_bit(RXRPC_CALL_RELEASED, call-flags)) {
-   _debug(packet from release call);
-   rxrpc_free_skb(skb);
-   goto try_again;
-   }
-
-   rxrpc_get_call(call);
-
-   /* copy the peer address. */
-   if (msg-msg_name  msg-msg_namelen  0)
-   memcpy(msg-msg_name, call-conn-trans-peer-srx,
-  sizeof(call-conn-trans-peer-srx));
-
-   /* set up the control messages */
-   ullen = msg-msg_flags  MSG_CMSG_COMPAT ? 4 : sizeof(unsigned long);
-
-   sock_recv_timestamp(msg, rx-sk, skb);
-
-   if (skb-mark == RXRPC_SKB_MARK_NEW_CALL) {
-   _debug(RECV NEW CALL);
-   ret = put_cmsg(msg, SOL_RXRPC, RXRPC_NEW_CALL, 0, abort_code);
-   if (ret  0)
-   goto error_requeue_packet;
-   goto done;
-   }
-
-   ret = put_cmsg(msg, SOL_RXRPC, RXRPC_USER_CALL_ID,
-  ullen, call-user_call_ID);
-   if (ret  0)
-   goto error_requeue_packet;
-   ASSERT(test_bit(RXRPC_CALL_HAS_USERID, call-flags));
-
-   switch (skb-mark) {
-   case RXRPC_SKB_MARK_DATA:
-   _debug(recvmsg DATA #%u { %d, %d },
-  ntohl(sp-hdr.seq), skb-len, sp-offset);
-
-   ASSERTCMP(ntohl(sp-hdr.seq), =, call-rx_data_recv);
-   ASSERTCMP(ntohl(sp-hdr.seq), =, call-rx_data_recv + 1);
-

Re: [PATCH 2.6.21 2/4] cxgb3 - Auto-load FW if mismatch detected

2007-03-21 Thread Andrew Morton

On Sun, 18 Mar 2007 13:10:06 -0700
[EMAIL PROTECTED] wrote:

  config CHELSIO_T3
  tristate Chelsio Communications T3 10Gb Ethernet support
  depends on PCI
 + select FW_LOADER

Something has gone wrong with the indenting there.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3]

2007-03-21 Thread David Howells

David Howells [EMAIL PROTECTED] wrote:

  - recvmsg not supporting MSG_TRUNC is rather weird and really ought to be
  fixed one day as its useful to find out the sizeof message pending when
  combined with MSG_PEEK
 
 Hmmm...  I hadn't considered that.  I assumed MSG_TRUNC not to be useful as
 arbitrarily chopping bits out of the request or reply would seem to be
 pointless.

But why do I need to support MSG_TRUNC?  I currently have things arranged so
that if you do a recvmsg() that doesn't pull everything out of a packet then
the next time you do a recvmsg() you'll get the next part of the data in that
packet.  MSG_EOR is flagged when recvmsg copies across the last byte of data
of a particular phase.

I might at some point in the future enable recvmsg() to keep pulling packets
off the Rx queue and copying them into userspace until the userspace buffer is
full or we find that the next packet is not the logical next in sequence.

Hmmm...  I'm actually overloading MSG_EOR.  MSG_EOR is flagged on the last
data read, and is also flagged for terminal messages (end or reply data,
abort, net error, final ACK, etc).  I wonder if I should use MSG_MORE (or its
lack) instead to indicate the end of data, and only set MSG_EOR on the
terminal message.

MSG_MORE is set by the app to flag to sendmsg() that there's more data to
come, so it would be consistent to use it for recvmsg() too.

David
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.21 2/4] cxgb3 - Auto-load FW if mismatch detected

2007-03-21 Thread Divy Le Ray


Andrew Morton wrote:

On Sun, 18 Mar 2007 13:10:06 -0700
[EMAIL PROTECTED] wrote:

  

 config CHELSIO_T3
 tristate Chelsio Communications T3 10Gb Ethernet support
 depends on PCI
+   select FW_LOADER



Something has gone wrong with the indenting there.
  


The added line is fine. The surrounding lines are not. They use white 
spaces.

I'll send a patch over the last series to use tabs in drivers/net/Kconfig.

Cheers,
Divy
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.21 3/4] cxgb3 - Fix potential MAC hang

2007-03-21 Thread Divy Le Ray


Andrew Morton wrote:

On Sun, 18 Mar 2007 13:10:12 -0700
[EMAIL PROTECTED] wrote:

  

From: Divy Le Ray [EMAIL PROTECTED]

Under rare conditions, the MAC might hang while generating a pause frame.
This patch fine tunes the MAC settings to avoid the issue, allows for 
periodic MAC state check, and triggers a recovery if hung. 


Also fix one MAC statistics counter for the rev board T3B2.



This conflicts with your previously-submitted, not-yet-merged-by-Jeff
cxgb3-add-sw-lro-support.patch.

What should we do about this?
  


I can send you a patch against the -mm tree, if it is acceptable.

Cheers,
Divy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.21 5/4] cxgb3 - fix white spaces in drivers/net/Kconfig

2007-03-21 Thread divy

From: Divy Le Ray [EMAIL PROTECTED]

Use tabs instead of white spaces for CHELSIO_T3 entry.

Signed-off-by: Divy Le Ray [EMAIL PROTECTED]
---

 drivers/net/Kconfig |   24 
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 1b6459b..c3f9f59 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2372,23 +2372,23 @@ config CHELSIO_T1_NAPI
  when the driver is receiving lots of packets from the card.
 
 config CHELSIO_T3
-tristate Chelsio Communications T3 10Gb Ethernet support
-depends on PCI
+   tristate Chelsio Communications T3 10Gb Ethernet support
+   depends on PCI
select FW_LOADER
-help
-  This driver supports Chelsio T3-based gigabit and 10Gb Ethernet
-  adapters.
+   help
+ This driver supports Chelsio T3-based gigabit and 10Gb Ethernet
+ adapters.
 
-  For general information about Chelsio and our products, visit
-  our website at http://www.chelsio.com.
+ For general information about Chelsio and our products, visit
+ our website at http://www.chelsio.com.
 
-  For customer support, please visit our customer support page at
-  http://www.chelsio.com/support.htm.
+ For customer support, please visit our customer support page at
+ http://www.chelsio.com/support.htm.
 
-  Please send feedback to [EMAIL PROTECTED].
+ Please send feedback to [EMAIL PROTECTED].
 
-  To compile this driver as a module, choose M here: the module
-  will be called cxgb3.
+ To compile this driver as a module, choose M here: the module
+ will be called cxgb3.
 
 config EHEA
tristate eHEA Ethernet support
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks

2007-03-21 Thread jamal

On Wed, 2007-21-03 at 15:04 +0100, Patrick McHardy wrote:
 Patrick McHardy wrote:
 
  What we could do is replace the netlink cb_lock spinlock by a
  user-supplied mutex (supplied to netlink_kernel_create, rtnl_mutex
  in this case). That would put the entire dump under the rtnl and
  allow us to get rid of qdisc_tree_lock and avoid the need to take
  dev_base_lock during qdisc dumping. Same in other spots like
  rtnl_dump_ifinfo, inet_dump_ifaddr, ...
 
 
 These (compile tested) patches demonstrate the idea. 

 The first one
 lets netlink_kernel_create users specify a mutex that should be
 held during dump callbacks, the second one uses this for rtnetlink
 and changes inet_dump_ifaddr for demonstration.
 
 A complete patch would allow us to simplify locking in lots of
 spots, all rtnetlink users currently need to implement extra
 locking just for the dump functions, and a number of them
 already get it wrong and seem to rely on the rtnl.
 

The mutex is certainly a cleaner approach;
and a lot of the RCU protection would go away. I like it.

Knowing you i sense theres something clever in there that i am 
missing. I dont see how you could get rid of the tree locking
since we need to protect against the data path still, no?
Or are you looking at that as a separate effort?

 If there are no objections to this change I'm going to update
 the second patch to include all rtnetlink users.

No objections here.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: many sockets, slow sendto

2007-03-21 Thread Eric Dumazet

On Wed, 21 Mar 2007 18:15:10 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Wed, 21 Mar 2007 23:12:40 +0100

  I chose in this patch to hash UDP sockets with a hash function that take 
  into 
  account both their port number and address : This has a drawback because we 
  need two lookups : one with a given address, one with a wildcard (null) 
  address.

 Thanks for doing this work Eric, I'll review this when I get home
 tomorrow night or Friday.

You're welcome :)

I knew you were busy with this new wii game^Wprogram

:)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

56 matches

Mail list logo