date:20070323

Re: [RFC] remove NLA_STRING NUL trimming

2007-03-23 Thread Thomas Graf

* Johannes Berg [EMAIL PROTECTED] 2007-03-23 00:12
 Looking through the netlink/attr.c code I noticed that NLA_STRING
 attributes that end with a binary NUL have it removed before passing it
 to the consumer.

It's not really removed, the trailing NUL is just ignored when checking
the length of the attribute. This is needed for older netlink families
where strings are not always NUL terminated, yet we still need to accept
the additional byte needed in case it is present. This validation is
strictly necessary, otherwise nla_strcmp() and others will fail.

 For wireless, we have a few places where we need to be able to accept
 any (even binary) values, for example for the SSID; the SSID can validly
 end with \0 and I'd still love to be able to take advantage of
 NLA_STRING and .len = 32 so I don't need to check the length myself.
 However, given the code above, an SSID with a terminating \0 would be
 reduced by one character.

I suggest that you introduce NLA_BINARY which enforces a maximum length.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET_SCHED 11/11]: qdisc: avoid dequeue while throttled

2007-03-23 Thread Patrick McHardy

Patrick McHardy wrote:
 [NET_SCHED]: qdisc: avoid dequeue while throttled


It just occured to me that this doesn't work properly with qdiscs
that have multiple classes since they all don't properly maintain
the TCQ_F_THROTTLED flag. They set it on dequeue when no active
class is willing to give out packets, but when enqueueing to a
non-active class (thereby activating it) it is still set even though
we don't know if that class could be dequeued.

So this updated patch unsets the TCQ_F_THROTTLED flag whenever we
activate a class. Additionally it removes the unsetting of
TCQ_F_THROTTLED on successful dequeue since we're now guaranteed
that it was not set before.

[NET_SCHED]: qdisc: avoid dequeue while throttled

Avoid dequeueing while the device is throttled.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 073456c84a46736a3aa1ae4cc9d953a9e97b327c
tree 805a29224001180c88a429e65812b97a489c427a
parent e2459acd7dee06fb4d5e980f26c23d31db0e5de1
author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 15:37:51 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 15:37:51 +0100

 net/sched/sch_cbq.c |5 +++--
 net/sched/sch_generic.c |4 
 net/sched/sch_hfsc.c|5 +++--
 net/sched/sch_htb.c |6 --
 net/sched/sch_netem.c   |4 
 net/sched/sch_tbf.c |1 -
 6 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index a294542..151f8e3 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -424,8 +424,10 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		sch-bstats.packets++;
 		sch-bstats.bytes+=len;
 		cbq_mark_toplevel(q, cl);
-		if (!cl-next_alive)
+		if (!cl-next_alive) {
 			cbq_activate_class(cl);
+			sch-flags = ~TCQ_F_THROTTLED;
+		}
 		return ret;
 	}
 
@@ -1030,7 +1032,6 @@ cbq_dequeue(struct Qdisc *sch)
 		skb = cbq_dequeue_1(sch);
 		if (skb) {
 			sch-q.qlen--;
-			sch-flags = ~TCQ_F_THROTTLED;
 			return skb;
 		}
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 52eb343..39c5312 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -93,6 +93,10 @@ static inline int qdisc_restart(struct net_device *dev)
 	struct Qdisc *q = dev-qdisc;
 	struct sk_buff *skb;
 
+	smp_rmb();
+	if (q-flags  TCQ_F_THROTTLED)
+		return q-q.qlen;
+
 	/* Dequeue packet */
 	if (((skb = dev-gso_skb)) || ((skb = q-dequeue(q {
 		unsigned nolock = (dev-features  NETIF_F_LLTX);
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 22cec11..c6da436 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1597,8 +1597,10 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		return err;
 	}
 
-	if (cl-qdisc-q.qlen == 1)
+	if (cl-qdisc-q.qlen == 1) {
 		set_active(cl, len);
+		sch-flags = ~TCQ_F_THROTTLED;
+	}
 
 	cl-bstats.packets++;
 	cl-bstats.bytes += len;
@@ -1672,7 +1674,6 @@ hfsc_dequeue(struct Qdisc *sch)
 	}
 
  out:
-	sch-flags = ~TCQ_F_THROTTLED;
 	sch-q.qlen--;
 
 	return skb;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 71db121..1387b7b 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -615,6 +615,8 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		/* enqueue to helper queue */
 		if (q-direct_queue.qlen  q-direct_qlen) {
 			__skb_queue_tail(q-direct_queue, skb);
+			if (q-direct_queue.qlen == 1)
+sch-flags = ~TCQ_F_THROTTLED;
 			q-direct_pkts++;
 		} else {
 			kfree_skb(skb);
@@ -637,6 +639,8 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		cl-bstats.packets++;
 		cl-bstats.bytes += skb-len;
 		htb_activate(q, cl);
+		if (cl-un.leaf.q-q.qlen == 1)
+			sch-flags = ~TCQ_F_THROTTLED;
 	}
 
 	sch-q.qlen++;
@@ -958,7 +962,6 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 	/* try to dequeue direct packets as high prio (!) to minimize cpu work */
 	skb = __skb_dequeue(q-direct_queue);
 	if (skb != NULL) {
-		sch-flags = ~TCQ_F_THROTTLED;
 		sch-q.qlen--;
 		return skb;
 	}
@@ -991,7 +994,6 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 			skb = htb_dequeue_tree(q, prio, level);
 			if (likely(skb != NULL)) {
 sch-q.qlen--;
-sch-flags = ~TCQ_F_THROTTLED;
 goto fin;
 			}
 		}
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 5d9d8bc..4c7a8d8 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -273,10 +273,6 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 	struct netem_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
 
-	smp_mb();
-	if (sch-flags  TCQ_F_THROTTLED)
-		return NULL;
-
 	skb = q-qdisc-dequeue(q-qdisc);
 	if (skb) {
 		const struct netem_skb_cb *cb
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 5386295..ed7e581 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -218,7 +218,6 @@ static struct sk_buff *tbf_dequeue(struct Qdisc* sch)
 			q-tokens = toks;
 			q-ptokens = ptoks;
 			sch-q.qlen--;
-			sch-flags = ~TCQ_F_THROTTLED;
 			return skb;

Re: [RFC] remove NLA_STRING NUL trimming

2007-03-23 Thread Johannes Berg

On Fri, 2007-03-23 at 15:20 +0100, Thomas Graf wrote:

 It's not really removed, the trailing NUL is just ignored when checking
 the length of the attribute. 

Good point.

 This is needed for older netlink families
 where strings are not always NUL terminated, yet we still need to accept
 the additional byte needed in case it is present. This validation is
 strictly necessary, otherwise nla_strcmp() and others will fail.

Ok.

  For wireless, we have a few places where we need to be able to accept
  any (even binary) values, for example for the SSID; the SSID can validly
  end with \0 and I'd still love to be able to take advantage of
  NLA_STRING and .len = 32 so I don't need to check the length myself.
  However, given the code above, an SSID with a terminating \0 would be
  reduced by one character.
 
 I suggest that you introduce NLA_BINARY which enforces a maximum length.

Alright, I'll post a patch in a bit.

johannes


signature.asc
Description: This is a digitally signed message part

Re: [NET_SCHED 11/11]: qdisc: avoid dequeue while throttled

2007-03-23 Thread Patrick McHardy

Patrick McHardy wrote:
[NET_SCHED]: qdisc: avoid dequeue while throttled
 
 
 It just occured to me that this doesn't work properly with qdiscs
 that have multiple classes since they all don't properly maintain
 the TCQ_F_THROTTLED flag. They set it on dequeue when no active
 class is willing to give out packets, but when enqueueing to a
 non-active class (thereby activating it) it is still set even though
 we don't know if that class could be dequeued.
 
 So this updated patch unsets the TCQ_F_THROTTLED flag whenever we
 activate a class. Additionally it removes the unsetting of
 TCQ_F_THROTTLED on successful dequeue since we're now guaranteed
 that it was not set before.


I found another case that doesn't work properly, so let me retract
this patch until I've properly thought this through.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] netlink: introduce NLA_BINARY type

2007-03-23 Thread Johannes Berg

This patch introduces a new NLA_BINARY attribute policy type with the
verification of simply checking the maximum length of the payload.

It also fixes a small typo in the example.

Signed-off-by: Johannes Berg [EMAIL PROTECTED]
Cc: Thomas Graf [EMAIL PROTECTED]
Cc: netdev@vger.kernel.org

---
 include/net/netlink.h |4 +++-
 net/netlink/attr.c|5 +
 2 files changed, 8 insertions(+), 1 deletion(-)

--- linux-2.6.orig/include/net/netlink.h2007-03-23 15:45:52.932598534 
+0100
+++ linux-2.6/include/net/netlink.h 2007-03-23 15:46:25.962598534 +0100
@@ -171,6 +171,7 @@ enum {
NLA_MSECS,
NLA_NESTED,
NLA_NUL_STRING,
+   NLA_BINARY,
__NLA_TYPE_MAX,
 };
 
@@ -188,12 +189,13 @@ enum {
  *NLA_STRING   Maximum length of string
  *NLA_NUL_STRING   Maximum length of string (excluding NUL)
  *NLA_FLAG Unused
+ *NLA_BINARY   Maximum length of attribute payload
  *All otherExact length of attribute payload
  *
  * Example:
  * static struct nla_policy my_policy[ATTR_MAX+1] __read_mostly = {
  * [ATTR_FOO] = { .type = NLA_U16 },
- * [ATTR_BAR] = { .type = NLA_STRING, len = BARSIZ },
+ * [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ },
  * [ATTR_BAZ] = { .len = sizeof(struct mystruct) },
  * };
  */
--- linux-2.6.orig/net/netlink/attr.c   2007-03-23 15:46:53.112598534 +0100
+++ linux-2.6/net/netlink/attr.c2007-03-23 15:48:12.902598534 +0100
@@ -67,6 +67,11 @@ static int validate_nla(struct nlattr *n
}
break;
 
+   case NLA_BINARY:
+   if (pt-len  attrlen  pt-len)
+   return -ERANGE;
+   break;
+
default:
if (pt-len)
minlen = pt-len;


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netlink: introduce NLA_BINARY type

2007-03-23 Thread Thomas Graf

* Johannes Berg [EMAIL PROTECTED] 2007-03-23 16:02
 This patch introduces a new NLA_BINARY attribute policy type with the
 verification of simply checking the maximum length of the payload.

 It also fixes a small typo in the example.

 Signed-off-by: Johannes Berg [EMAIL PROTECTED]
 Cc: Thomas Graf [EMAIL PROTECTED]
 Cc: netdev@vger.kernel.org

Signed-off-by: Thomas Graf [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Fix use of uninitialized field in mv643xx_eth

2007-03-23 Thread Dale Farnsworth

On Fri, Mar 23, 2007 at 01:30:02PM +0100, Gabriel Paubert wrote:
 In this driver, the default ethernet address is first set by by calling
 eth_port_uc_addr_get() which reads the relevant registers of the 
 corresponding port as initially set by firmware. However that function 
 used the port_num field accessed through the private area of net_dev 
 before it was set.  

Gabriel, you're right.  I introduced the bug and I'm sorry for your
trouble.

 The result was that one board I have ended up with the unicast address 
 set to 00:00:00:00:00:00 (only port 1 is connected on this board). The
 problem appeared after commit 84dd619e4dc3b0b1c40dafd98c90fd950bce7bc5.
 
 This patch fixes the bug by making eth_port_uc_get_addr() more similar 
 to eth_port_uc_set_addr(), i.e., by using the port number as the first
 parameter instead of a pointer to struct net_device.
 
 Signed-off-by: Gabriel Paubert [EMAIL PROTECTED]
 
 --
 
 The minimal patch I first tried consisted in just moving mp-port_num
 to before the call to eth_port_uc_get_addr().

Hmm.  That should have fixed it.  I reproduced the problem here and
this fixed it for me:

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 1ee27c3..643ea31 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -1379,7 +1379,7 @@ #endif
 
spin_lock_init(mp-lock);
 
-   port_num = pd-port_number;
+   port_num = mp-port_num = pd-port_number;
 
/* set default config values */
eth_port_uc_addr_get(dev, dev-dev_addr);
@@ -1411,8 +1411,6 @@ #endif
duplex = pd-duplex;
speed = pd-speed;
 
-   mp-port_num = port_num;
-
/* Hook up MII support for ethtool */
mp-mii.dev = dev;
mp-mii.mdio_read = mv643xx_mdio_read;

Would you please confirm that this fixes it for you?  If so, I'll submit
it upstream as coming from you, since you did all the work.  OK?

 The other question is why
 the driver never gets the info from the device tree on this PPC board,
 but that's for another list despite the fact I lost some time looking 
 for bugs in the OF interface before stumbling on this use of a field
 before it was initialized.

Probably just because the mac address in the hardware was correct and
it didn't seem necessary to overwrite it.

Thank you,
-Dale
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][SCTP]: Update SCTP Maintainers entry

2007-03-23 Thread Sridhar Samudrala

Dave,

I have asked Vlad Yasevich to take over the role of primary 
maintainer of SCTP and he has accepted it. He has been 
contributing to SCTP for more than 2 years and has become
more active than me in the past year.

Thanks
Sridhar

[SCTP]: Update SCTP Maintainers entry

Add Vlad Yasevich as the primary maintainer of SCTP and add a
link to the project website.

Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED]

---
 MAINTAINERS |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6d8d5b9..d4bfb9d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2928,9 +2928,12 @@ L:   linux-scsi@vger.kernel.org
 S: Maintained
 
 SCTP PROTOCOL
+P: Vlad Yasevich
+M: [EMAIL PROTECTED]
 P: Sridhar Samudrala
 M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
+W: http://lksctp.sourceforge.net
 S: Supported
 
 SCx200 CPU SUPPORT


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET_SCHED 01/11]: sch_netem: fix off-by-one in send time comparison

2007-03-23 Thread Stephen Hemminger

On Fri, 23 Mar 2007 14:35:40 +0100 (MET)
Patrick McHardy [EMAIL PROTECTED] wrote:

 [NET_SCHED]: sch_netem: fix off-by-one in send time comparison
 
 netem checks PSCHED_TLESS(cb-time_to_send, now) to find out whether it is
 allowed to send a packet, which is equivalent to cb-time_to_send  now.
 Use !PSCHED_TLESS(now, cb-time_to_send) instead to properly handle
 cb-time_to_send == now.
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]


Thanks, I saw that earlier in another spot and fixed it.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-03-23 Thread Joy Latten

On Fri, 2007-03-23 at 01:39 -0400, Eric Paris wrote:

 
 In either case though proper auditing needs to be addressed.  I see that
 the first patch from Joy wouldn't audit deletion failures.  It appears
 to me if the check is done per policy then the security hook return code
 needs to be recorded and passed to xfrm_audit_log instead of the hard
 coded 1 result used now.
 
 Assuming we go with James's double loop what should we be auditing for a
 security hook denial?  Just audit the first policy entry which we tried
 to remove but couldn't and then leave the rest of the auditing in those
 functions the way it is now in case there was no denial, calling
 xfrm_audit_log with a hard coded 1 for the result?
 
Actually, I thought the original intent of the ipsec auditing was to
just audit changes made to the SAD/SPD databases, not securiy hook
denials, right? 

Joy
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-03-23 Thread Eric Paris

On Fri, 2007-03-23 at 10:33 -0600, Joy Latten wrote:
 On Fri, 2007-03-23 at 01:39 -0400, Eric Paris wrote:
 
  
  In either case though proper auditing needs to be addressed.  I see that
  the first patch from Joy wouldn't audit deletion failures.  It appears
  to me if the check is done per policy then the security hook return code
  needs to be recorded and passed to xfrm_audit_log instead of the hard
  coded 1 result used now.
  
  Assuming we go with James's double loop what should we be auditing for a
  security hook denial?  Just audit the first policy entry which we tried
  to remove but couldn't and then leave the rest of the auditing in those
  functions the way it is now in case there was no denial, calling
  xfrm_audit_log with a hard coded 1 for the result?
  
 Actually, I thought the original intent of the ipsec auditing was to
 just audit changes made to the SAD/SPD databases, not securiy hook
 denials, right? 

Then what is the point of the 'result' field that we capture and log in
xfrm_audit_log if the only things you care to audit are successful
changes to the databases?

-Eric

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] NAPI support for Sibyte MAC

2007-03-23 Thread mason


[ This is a re-post, but the patch still applies and works fine against
the linux-mips.org tip. We'd really like to get this in. -Mark]

  This patch completes the NAPI functionality for SB1250 MAC, including making
  NAPI a kernel option that can be turned on or off and adds the sbmac_poll
  routine.

Signed off by: Mark Mason ([EMAIL PROTECTED])
Signed off by: Dan Krejsa ([EMAIL PROTECTED])
Signed off by: Steve Yang ([EMAIL PROTECTED])

Index: linux-2.6.14-cgl/drivers/net/Kconfig
===
--- linux-2.6.14-cgl.orig/drivers/net/Kconfig   2006-09-20 14:58:54.0 
-0700
+++ linux-2.6.14-cgl/drivers/net/Kconfig2006-09-20 17:04:31.0 
-0700
@@ -2031,6 +2031,23 @@
tristate SB1250 Ethernet support
depends on SIBYTE_SB1xxx_SOC
 
+config SBMAC_NAPI
+   bool SBMAC: Use Rx Polling (NAPI) (EXPERIMENTAL)
+   depends on NET_SB1250_MAC  EXPERIMENTAL
+   help
+ NAPI is a new driver API designed to reduce CPU and interrupt load
+ when the driver is receiving lots of packets from the card. It is
+ still somewhat experimental and thus not yet enabled by default.
+
+ If your estimated Rx load is 10kpps or more, or if the card will be
+ deployed on potentially unfriendly networks (e.g. in a firewall),
+ then say Y here.
+
+ See file:Documentation/networking/NAPI_HOWTO.txt for more
+ information.
+
+ If in doubt, say y.
+
 config R8169_VLAN
bool VLAN support
depends on R8169  VLAN_8021Q
@@ -2826,3 +2843,5 @@
def_bool NETPOLL
 
 endmenu
+
+
Index: linux-2.6.14-cgl/drivers/net/sb1250-mac.c
===
--- linux-2.6.14-cgl.orig/drivers/net/sb1250-mac.c  2006-09-20 
14:59:00.0 -0700
+++ linux-2.6.14-cgl/drivers/net/sb1250-mac.c   2006-09-20 20:16:27.0 
-0700
@@ -95,19 +95,28 @@
 #endif
 
 #ifdef CONFIG_SBMAC_COALESCE
-static int int_pktcnt = 0;
-module_param(int_pktcnt, int, S_IRUGO);
-MODULE_PARM_DESC(int_pktcnt, Packet count);
-
-static int int_timeout = 0;
-module_param(int_timeout, int, S_IRUGO);
-MODULE_PARM_DESC(int_timeout, Timeout value);
+static int int_pktcnt_tx = 255;
+module_param(int_pktcnt_tx, int, S_IRUGO);
+MODULE_PARM_DESC(int_pktcnt_tx, TX packet count);
+
+static int int_timeout_tx = 255;
+module_param(int_timeout_tx, int, S_IRUGO);
+MODULE_PARM_DESC(int_timeout_tx, TX timeout value);
+
+static int int_pktcnt_rx = 64;
+module_param(int_pktcnt_rx, int, S_IRUGO);
+MODULE_PARM_DESC(int_pktcnt_rx, RX packet count);
+
+static int int_timeout_rx = 64;
+module_param(int_timeout_rx, int, S_IRUGO);
+MODULE_PARM_DESC(int_timeout_rx, RX timeout value);
 #endif
 
 #include asm/sibyte/sb1250.h
 #if defined(CONFIG_SIBYTE_BCM1x55) || defined(CONFIG_SIBYTE_BCM1x80)
 #include asm/sibyte/bcm1480_regs.h
 #include asm/sibyte/bcm1480_int.h
+#define R_MAC_DMA_OODPKTLOST_RXR_MAC_DMA_OODPKTLOST
 #elif defined(CONFIG_SIBYTE_SB1250) || defined(CONFIG_SIBYTE_BCM112X)
 #include asm/sibyte/sb1250_regs.h
 #include asm/sibyte/sb1250_int.h
@@ -155,8 +164,8 @@
 
 #define NUMCACHEBLKS(x) (((x)+SMP_CACHE_BYTES-1)/SMP_CACHE_BYTES)
 
-#define SBMAC_MAX_TXDESCR  32
-#define SBMAC_MAX_RXDESCR  32
+#define SBMAC_MAX_TXDESCR  256
+#define SBMAC_MAX_RXDESCR  256
 
 #define ETHER_ALIGN2
 #define ETHER_ADDR_LEN 6
@@ -185,10 +194,10 @@
 * associated with it.
 */
 
-   struct sbmac_softc *sbdma_eth;  /* back pointer to associated 
MAC */
-   int  sbdma_channel; /* channel number */
+   struct sbmac_softc *sbdma_eth;  /* back pointer to associated MAC */
+   int  sbdma_channel; /* channel number */
int  sbdma_txdir;   /* direction (1=transmit) */
-   int  sbdma_maxdescr;/* total # of descriptors in 
ring */
+   int  sbdma_maxdescr;/* total # of descriptors in ring */
 #ifdef CONFIG_SBMAC_COALESCE
int  sbdma_int_pktcnt;  /* # descriptors rx/tx before 
interrupt*/
int  sbdma_int_timeout; /* # usec rx/tx interrupt */
@@ -197,13 +206,16 @@
volatile void __iomem *sbdma_config0;   /* DMA config register 0 */
volatile void __iomem *sbdma_config1;   /* DMA config register 1 */
volatile void __iomem *sbdma_dscrbase;  /* Descriptor base address */
-   volatile void __iomem *sbdma_dscrcnt; /* Descriptor count register 
*/
+   volatile void __iomem *sbdma_dscrcnt;   /* Descriptor count register */
volatile void __iomem *sbdma_curdscr;   /* current descriptor address */
+   volatile void __iomem *sbdma_oodpktlost;/* pkt drop (rx only) */
+
 
/*
 * This stuff is for maintenance of the ring
 */
 
+   sbdmadscr_t *sbdma_dscrtable_unaligned;
sbdmadscr_t *sbdma_dscrtable;   /* base of

Re: Recent net-2.6.22 patches break bootup!

2007-03-23 Thread Stephen Hemminger

On Thu, 22 Mar 2007 21:41:23 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Thomas Graf [EMAIL PROTECTED]
 Date: Fri, 23 Mar 2007 00:47:04 +0100
 
  * Stephen Hemminger [EMAIL PROTECTED] 2007-03-22 14:27
   Something is broken now.  If I boot the system (Fedora) it gets to:
   
   Bringing up loopback interface:  RTNETLINK answers: Invalid argument
   Dump terminated
   RTNETLINK answers: Invalid argument
   
   
   tg3 device eth0 does not seem to be present, delaying initialization
   
   
   then it hangs because cups won't come up without loopback
  
  Thinko. It always returned the first message handler of a rtnl
  family.
  
  [RTNL]: Properly return rntl message handler
  
  Signed-off-by: Thomas Graf [EMAIL PROTECTED]
 
 Applied, thanks Thomas.

Thanks, that fixes it

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET_SCHED 11/11]: qdisc: avoid dequeue while throttled

2007-03-23 Thread David Miller

From: Patrick McHardy [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 15:57:08 +0100

 I found another case that doesn't work properly, so let me retract
 this patch until I've properly thought this through.

Ok, I'll apply the rest.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NET_SCHED 00/11]: pkt_sched.h cleanup + misc changes

2007-03-23 Thread David Miller

From: Patrick McHardy [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 14:35:38 +0100 (MET)

 These patches fix an off-by-one in netem, clean up pkt_sched.h by removing
 most of the now unnecessary PSCHED time macros and turning the two remaining
 ones into inline functions, consolidate some common filter destruction code
 and move the TCQ_F_THROTTLED optimization from netem to qdisc_restart.

 Please apply, thanks.

Patches 1-10 applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATH 0/6] New SCTP functionality for 2.6.22

2007-03-23 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 09:52:46 -0400

 This patch series implements additional SCTP socket options.  This
 was originally submitted too late for 2.6.21, so I am re-submitting
 for 2.6.22.

 Please consider applying.

All 6 patches applied, thanks Vlad.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] netlink: introduce NLA_BINARY type

2007-03-23 Thread David Miller

From: Thomas Graf [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 16:13:24 +0100

 * Johannes Berg [EMAIL PROTECTED] 2007-03-23 16:02
  This patch introduces a new NLA_BINARY attribute policy type with the
  verification of simply checking the maximum length of the payload.

  It also fixes a small typo in the example.

  Signed-off-by: Johannes Berg [EMAIL PROTECTED]
  Cc: Thomas Graf [EMAIL PROTECTED]
  Cc: netdev@vger.kernel.org

 Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Applied to net-2.6.22, thanks everyone.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][SCTP]: Update SCTP Maintainers entry

2007-03-23 Thread David Miller

From: Sridhar Samudrala [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 09:28:30 -0700

 I have asked Vlad Yasevich to take over the role of primary 
 maintainer of SCTP and he has accepted it. He has been 
 contributing to SCTP for more than 2 years and has become
 more active than me in the past year.

Applied, thanks for all of your SCTP work Sridhar.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-03-23 Thread James Morris

On Fri, 23 Mar 2007, Eric Paris wrote:

 Maybe I'm way out on a limb here but if I am a regular user and I say
 rm /tmp/* and I only have permissions to delete some of the files I
 expect just those couple to be delete, not the whole operation denied.

I don't think this analogy holds up, as rm is a per-file deletion 
operation, and it is the shell which expands the wildcard for you.

A 'flush' has a semantic implication that all entries will be removed, and 
it should be atomic and either succeed or fail at that granularity.



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Established connections hash function

2007-03-23 Thread David Miller

From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 09:00:08 +0100

 I dont consider this new hash as bug fix at all, ie your patch might enter 
 2.6.22 normal dev cycle.

Ok, I checked the patch into net-2.6.22
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-03-23 Thread David Miller

From: James Morris [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 14:46:48 -0400 (EDT)

 A 'flush' has a semantic implication that all entries will be removed, and 
 it should be atomic and either succeed or fail at that granularity.

Correct.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: VIA Velocity VLAN vexation

2007-03-23 Thread linux

 Or should I just get a different gigabit card ?

 This one probably got answered the 2005/11/29. :o)

Ah, that's where I asked before.  I misplaced the e-mail.
I hope you don't mind my asking every year or two.

But I don't see any suggestions for an alternative gigabit
card anywhere.  I had assumed they all mostly worked, but
now it appears I need to know details.

 I'll got to bed in a few minutes but I'll happily resurrect the
 velocity vlan patches.

Haven't they been merged upstream already?


Anyway, thanks for the reply!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-03-23 Thread Eric Paris

On Fri, 2007-03-23 at 11:47 -0700, David Miller wrote:
 From: James Morris [EMAIL PROTECTED]
 Date: Fri, 23 Mar 2007 14:46:48 -0400 (EDT)

  A 'flush' has a semantic implication that all entries will be removed, and 
  it should be atomic and either succeed or fail at that granularity.

 Correct.

Fair enough, does it matter that we have no way to report failure back
to users who can no longer assume success?

-Eric

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] mv643xx_eth: Fix use of uninitialized port_num field

2007-03-23 Thread Dale Farnsworth

From: Gabriel Paubert [EMAIL PROTECTED]

In this driver, the default ethernet address is first set by by calling
eth_port_uc_addr_get() which reads the relevant registers of the
corresponding port as initially set by firmware. However that function
used the port_num field accessed through the private area of net_dev
before it was set.

The result was that one board I have ended up with the unicast address
set to 00:00:00:00:00:00 (only port 1 is connected on this board). The
problem appeared after commit 84dd619e4dc3b0b1c40dafd98c90fd950bce7bc5.

This patch fixes the bug by setting mp-port_num prior to calling
eth_port_uc_get_addr().

Signed-off-by: Gabriel Paubert [EMAIL PROTECTED]
Signed-off-by: Dale Farnsworth [EMAIL PROTECTED]
---

This fixes a serious bug and should expeditiously pushed upstream.

 drivers/net/mv643xx_eth.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 1ee27c3..643ea31 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -1379,7 +1379,7 @@ #endif
 
spin_lock_init(mp-lock);
 
-   port_num = pd-port_number;
+   port_num = mp-port_num = pd-port_number;
 
/* set default config values */
eth_port_uc_addr_get(dev, dev-dev_addr);
@@ -1411,8 +1411,6 @@ #endif
duplex = pd-duplex;
speed = pd-speed;
 
-   mp-port_num = port_num;
-
/* Hook up MII support for ethtool */
mp-mii.dev = dev;
mp-mii.mdio_read = mv643xx_mdio_read;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Add security check before flushing SAD/SPD

2007-03-23 Thread Joy Latten

On Fri, 2007-03-23 at 12:59 -0400, Eric Paris wrote:
 On Fri, 2007-03-23 at 10:33 -0600, Joy Latten wrote:
  On Fri, 2007-03-23 at 01:39 -0400, Eric Paris wrote:
  
   
   In either case though proper auditing needs to be addressed.  I see that
   the first patch from Joy wouldn't audit deletion failures.  It appears
   to me if the check is done per policy then the security hook return code
   needs to be recorded and passed to xfrm_audit_log instead of the hard
   coded 1 result used now.
   
   Assuming we go with James's double loop what should we be auditing for a
   security hook denial?  Just audit the first policy entry which we tried
   to remove but couldn't and then leave the rest of the auditing in those
   functions the way it is now in case there was no denial, calling
   xfrm_audit_log with a hard coded 1 for the result?
   
  Actually, I thought the original intent of the ipsec auditing was to
  just audit changes made to the SAD/SPD databases, not securiy hook
  denials, right? 
 
 Then what is the point of the 'result' field that we capture and log in
 xfrm_audit_log if the only things you care to audit are successful
 changes to the databases?
 
Yes, I think we do want to audit the security denial since it is the
reason we could not change the policy. In the flush case it seem it will
be the only reason. As you suggested, I will audit the first denial
since this is the reason the flush will fail.

But sometimes, in other cases, the delete or add could fail for other
reasons too such as not being able to allocate memory, not finding the
entry, etc... which is passed in the result field. 


Regards,
Joy 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] mv643xx_eth: make eth_port_uc_addr_{get,set}() calls symmetric

2007-03-23 Thread Dale Farnsworth

From: Gabriel Paubert [EMAIL PROTECTED]

There is no good reason for the asymmetry in the parameters of
eth_port_uc_addr_get() and eth_port_uc_addr_set().  Make them
symmetric.  Remove some gratuitous block comments while we're here.

Signed-off-by: Gabriel Paubert [EMAIL PROTECTED]
Signed-off-by: Dale Farnsworth [EMAIL PROTECTED]
---
This is a clean-up patch that needn't be rushed upstream.  -Dale

 drivers/net/mv643xx_eth.c |   59 +++-
 drivers/net/mv643xx_eth.h |4 --
 2 files changed, 13 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 643ea31..f58d96e 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -51,8 +51,8 @@ #include asm/delay.h
 #include mv643xx_eth.h
 
 /* Static function declarations */
-static void eth_port_uc_addr_get(struct net_device *dev,
-   unsigned char *MacAddr);
+static void eth_port_uc_addr_get(unsigned int port_num, unsigned char *p_addr);
+static void eth_port_uc_addr_set(unsigned int port_num, unsigned char *p_addr);
 static void eth_port_set_multicast_list(struct net_device *);
 static void mv643xx_eth_port_enable_tx(unsigned int port_num,
unsigned int queues);
@@ -1382,7 +1382,7 @@ #endif
port_num = mp-port_num = pd-port_number;
 
/* set default config values */
-   eth_port_uc_addr_get(dev, dev-dev_addr);
+   eth_port_uc_addr_get(port_num, dev-dev_addr);
mp-rx_ring_size = MV643XX_ETH_PORT_DEFAULT_RECEIVE_QUEUE_SIZE;
mp-tx_ring_size = MV643XX_ETH_PORT_DEFAULT_TRANSMIT_QUEUE_SIZE;
 
@@ -1826,26 +1826,9 @@ static void eth_port_start(struct net_de
 }
 
 /*
- * eth_port_uc_addr_set - This function Set the port Unicast address.
- *
- * DESCRIPTION:
- * This function Set the port Ethernet MAC address.
- *
- * INPUT:
- * unsigned inteth_port_numPort number.
- * char *  p_addr  Address to be set
- *
- * OUTPUT:
- * Set MAC address low and high registers. also calls
- * eth_port_set_filter_table_entry() to set the unicast
- * table with the proper information.
- *
- * RETURN:
- * N/A.
- *
+ * eth_port_uc_addr_set - Write a MAC address into the port's hw registers
  */
-static void eth_port_uc_addr_set(unsigned int eth_port_num,
-   unsigned char *p_addr)
+static void eth_port_uc_addr_set(unsigned int port_num, unsigned char *p_addr)
 {
unsigned int mac_h;
unsigned int mac_l;
@@ -1855,40 +1838,24 @@ static void eth_port_uc_addr_set(unsigne
mac_h = (p_addr[0]  24) | (p_addr[1]  16) | (p_addr[2]  8) |
(p_addr[3]  0);
 
-   mv_write(MV643XX_ETH_MAC_ADDR_LOW(eth_port_num), mac_l);
-   mv_write(MV643XX_ETH_MAC_ADDR_HIGH(eth_port_num), mac_h);
+   mv_write(MV643XX_ETH_MAC_ADDR_LOW(port_num), mac_l);
+   mv_write(MV643XX_ETH_MAC_ADDR_HIGH(port_num), mac_h);
 
-   /* Accept frames of this address */
-   table = MV643XX_ETH_DA_FILTER_UNICAST_TABLE_BASE(eth_port_num);
+   /* Accept frames with this address */
+   table = MV643XX_ETH_DA_FILTER_UNICAST_TABLE_BASE(port_num);
eth_port_set_filter_table_entry(table, p_addr[5]  0x0f);
 }
 
 /*
- * eth_port_uc_addr_get - This function retrieves the port Unicast address
- * (MAC address) from the ethernet hw registers.
- *
- * DESCRIPTION:
- * This function retrieves the port Ethernet MAC address.
- *
- * INPUT:
- * unsigned inteth_port_numPort number.
- * char*MacAddrpointer where the MAC address is stored
- *
- * OUTPUT:
- * Copy the MAC address to the location pointed to by MacAddr
- *
- * RETURN:
- * N/A.
- *
+ * eth_port_uc_addr_get - Read the MAC address from the port's hw registers
  */
-static void eth_port_uc_addr_get(struct net_device *dev, unsigned char *p_addr)
+static void eth_port_uc_addr_get(unsigned int port_num, unsigned char *p_addr)
 {
-   struct mv643xx_private *mp = netdev_priv(dev);
unsigned int mac_h;
unsigned int mac_l;
 
-   mac_h = mv_read(MV643XX_ETH_MAC_ADDR_HIGH(mp-port_num));
-   mac_l = mv_read(MV643XX_ETH_MAC_ADDR_LOW(mp-port_num));
+   mac_h = mv_read(MV643XX_ETH_MAC_ADDR_HIGH(port_num));
+   mac_l = mv_read(MV643XX_ETH_MAC_ADDR_LOW(port_num));
 
p_addr[0] = (mac_h  24)  0xff;
p_addr[1] = (mac_h  16)  0xff;
diff --git a/drivers/net/mv643xx_eth.h b/drivers/net/mv643xx_eth.h
index 7d4e90c..82f8c0c 100644
--- a/drivers/net/mv643xx_eth.h
+++ b/drivers/net/mv643xx_eth.h
@@ -346,10 +346,6 @@ static void eth_port_init(struct mv643xx
 static void eth_port_reset(unsigned int eth_port_num);
 static void eth_port_start(struct net_device *dev);
 
-/* Port MAC address routines */
-static void eth_port_uc_addr_set(unsigned int eth_port_num,
-

[PATCH] tcp: cubic update for net-2.6.22

2007-03-23 Thread Stephen Hemminger

The following update received from Injong updates TCP cubic to the latest
version. I am running more complete tests and will have results after 4/1.

According to Injong: the new version improves on its scalability,
fairness and stability.  So in all properties, we confirmed it shows better
performance.

NCSU results (for 2.6.18 and 2.6.20) available:
http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_Testing

This version is described in a new Internet draft for CUBIC.
http://www.ietf.org/internet-drafts/draft-rhee-tcp-cubic-00.txt


Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/ipv4/tcp_cubic.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 15c5803..296845b 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -1,5 +1,5 @@
 /*
- * TCP CUBIC: Binary Increase Congestion control for TCP v2.0
+ * TCP CUBIC: Binary Increase Congestion control for TCP v2.1
  *
  * This is from the implementation of CUBIC TCP in
  * Injong Rhee, Lisong Xu.
@@ -214,7 +214,9 @@ static inline void bictcp_update(struct bictcp *ca, u32 
cwnd)
if (ca-delay_min  0) {
/* max increment = Smax * rtt / 0.1  */
min_cnt = (cwnd * HZ * 8)/(10 * max_increment * ca-delay_min);
-   if (ca-cnt  min_cnt)
+
+   /* use concave growth when the target is above the origin */
+   if (ca-cnt  min_cnt  t = ca-bic_K)
ca-cnt = min_cnt;
}
 
@@ -400,4 +402,4 @@ module_exit(cubictcp_unregister);
 MODULE_AUTHOR(Sangtae Ha, Stephen Hemminger);
 MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(CUBIC TCP);
-MODULE_VERSION(2.0);
+MODULE_VERSION(2.1);
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] tcp_probe: improvements for net-2.6.22

2007-03-23 Thread Stephen Hemminger

Change tcp_probe to use ktime (needed to add one export).
Add option to only get events when cwnd changes - from Doug Leith

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 kernel/time.c|1 +
 net/ipv4/tcp_probe.c |   68 -
 2 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/kernel/time.c b/kernel/time.c
index ec5b10c..de66866 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -452,6 +452,7 @@ struct timespec ns_to_timespec(const s64 nsec)
 
return ts;
 }
+EXPORT_SYMBOL(ns_to_timespec);
 
 /**
  * ns_to_timeval - Convert nanoseconds to timeval
diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
index 61f406f..d03ae9b 100644
--- a/net/ipv4/tcp_probe.c
+++ b/net/ipv4/tcp_probe.c
@@ -26,6 +26,8 @@
 #include linux/proc_fs.h
 #include linux/module.h
 #include linux/kfifo.h
+#include linux/ktime.h
+#include linux/time.h
 #include linux/vmalloc.h
 
 #include net/tcp.h
@@ -34,43 +36,45 @@ MODULE_AUTHOR(Stephen Hemminger [EMAIL PROTECTED]);
 MODULE_DESCRIPTION(TCP cwnd snooper);
 MODULE_LICENSE(GPL);
 
-static int port = 0;
+static int port __read_mostly = 0;
 MODULE_PARM_DESC(port, Port to match (0=all));
 module_param(port, int, 0);
 
-static int bufsize = 64*1024;
+static int bufsize __read_mostly = 64*1024;
 MODULE_PARM_DESC(bufsize, Log buffer size (default 64k));
 module_param(bufsize, int, 0);
 
+static int full __read_mostly;
+MODULE_PARM_DESC(full, Full log (1=every ack packet received,  0=only cwnd 
changes));
+module_param(full, int, 0);
+
 static const char procname[] = tcpprobe;
 
 struct {
-   struct kfifo  *fifo;
-   spinlock_tlock;
+   struct kfifo*fifo;
+   spinlock_t  lock;
wait_queue_head_t wait;
-   struct timeval tstart;
+   ktime_t start;
+   u32 lastcwnd;
 } tcpw;
 
+/*
+ * Print to log with timestamps.
+ * FIXME: causes an extra copy
+ */
 static void printl(const char *fmt, ...)
 {
va_list args;
int len;
-   struct timeval now;
+   struct timespec tv;
char tbuf[256];
 
va_start(args, fmt);
-   do_gettimeofday(now);
+   /* want monotonic time since start of tcp_probe */
+   tv = ktime_to_timespec(ktime_sub(ktime_get(), tcpw.start));
 
-   now.tv_sec -= tcpw.tstart.tv_sec;
-   now.tv_usec -= tcpw.tstart.tv_usec;
-   if (now.tv_usec  0) {
-   --now.tv_sec;
-   now.tv_usec += 100;
-   }
-
-   len = sprintf(tbuf, %lu.%06lu ,
- (unsigned long) now.tv_sec,
- (unsigned long) now.tv_usec);
+   len = sprintf(tbuf, %lu.%09lu , 
+ (unsigned long) tv.tv_sec, (unsigned long) tv.tv_nsec);
len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args);
va_end(args);
 
@@ -78,38 +82,44 @@ static void printl(const char *fmt, ...)
wake_up(tcpw.wait);
 }
 
-static int jtcp_sendmsg(struct kiocb *iocb, struct sock *sk,
-   struct msghdr *msg, size_t size)
+/* 
+ * Hook inserted to be called before each receive packet.
+ * Note: arguments must match tcp_rcv_established()!
+ */
+static int jtcp_rcv_established(struct sock *sk, struct sk_buff *skb,
+  struct tcphdr *th, unsigned len)
 {
const struct tcp_sock *tp = tcp_sk(sk);
const struct inet_sock *inet = inet_sk(sk);
 
-   if (port == 0 || ntohs(inet-dport) == port ||
-   ntohs(inet-sport) == port) {
+   /* Only update if port matches */
+   if ((port == 0 || ntohs(inet-dport) == port || ntohs(inet-sport) == 
port)
+(full || tp-snd_cwnd != tcpw.lastcwnd)) {
printl(%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d %#x %#x %u %u %u\n,
   NIPQUAD(inet-saddr), ntohs(inet-sport),
   NIPQUAD(inet-daddr), ntohs(inet-dport),
-  size, tp-snd_nxt, tp-snd_una,
+  skb-len, tp-snd_nxt, tp-snd_una,
   tp-snd_cwnd, tcp_current_ssthresh(sk),
-  tp-snd_wnd);
+  tp-snd_wnd, tp-srtt  3);
+   tcpw.lastcwnd = tp-snd_cwnd;
}
 
jprobe_return();
return 0;
 }
 
-static struct jprobe tcp_send_probe = {
+static struct jprobe tcp_probe = {
.kp = {
-   .symbol_name= tcp_sendmsg,
+   .symbol_name= tcp_rcv_established,
},
-   .entry  = JPROBE_ENTRY(jtcp_sendmsg),
+   .entry  = JPROBE_ENTRY(jtcp_rcv_established),
 };
 
 
 static int tcpprobe_open(struct inode * inode, struct file * file)
 {
kfifo_reset(tcpw.fifo);
-   do_gettimeofday(tcpw.tstart);
+   tcpw.start = ktime_get();
return 0;
 }
 
@@ -162,7 +172,7 @@ static __init int tcpprobe_init(void)
if (!proc_net_fops_create(procname, S_IRUSR, tcpprobe_fops))
goto err0;
 
-   ret = register_jprobe(tcp_send_probe);
+   ret =

Re: VIA Velocity VLAN vexation

2007-03-23 Thread Francois Romieu

[EMAIL PROTECTED] [EMAIL PROTECTED] :
[...]
 But I don't see any suggestions for an alternative gigabit
 card anywhere.  I had assumed they all mostly worked, but
 now it appears I need to know details.

Mostly.

Assuming you won't play with huge jumbo frames, I'd suggest
a plain old pci 8169 (not a PCIe 8168) for VLAN. 

[...]
 Haven't they been merged upstream already?

No.

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] forcedeth: fix nic poll

2007-03-23 Thread Ayaz Abdulla

The nic poll routine was missing the call to the optimized irq routine. 
This patch adds the missing call for the optimized path.


See http://bugzilla.kernel.org/show_bug.cgi?id=7950 for more information.

Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED]

--- orig/drivers/net/forcedeth.c2007-03-11 20:38:20.0 -0500
+++ new/drivers/net/forcedeth.c 2007-03-11 20:38:24.0 -0500
@@ -3536,7 +3536,10 @@
pci_push(base);
 
if (!using_multi_irqs(dev)) {
-   nv_nic_irq(0, dev);
+   if (np-desc_ver == DESC_VER_3)
+   nv_nic_irq_optimized(0, dev);
+   else
+   nv_nic_irq(0, dev);
if (np-msi_flags  NV_MSI_X_ENABLED)

enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
else

[PATCH 2/2] forcedeth: fix tx timeout

2007-03-23 Thread Ayaz Abdulla

The tx timeout routine was waking the tx queue conditionally. However, 
it must call it unconditionally since the dev_watchdog has halted the tx 
queue before calling the timeout function.


Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED]

--- orig/drivers/net/forcedeth.c2007-03-11 20:59:06.0 -0500
+++ new/drivers/net/forcedeth.c 2007-03-11 20:58:59.0 -0500
@@ -2050,9 +2050,10 @@
nv_drain_tx(dev);
nv_init_tx(dev);
setup_hw_rings(dev, NV_SETUP_TX_RING);
-   netif_wake_queue(dev);
}
 
+   netif_wake_queue(dev);
+
/* 4) restart tx engine */
nv_start_tx(dev);
spin_unlock_irq(np-lock);

[PATCH 3/5 2.6.21-rc4] l2tp: pppox protocol module load

2007-03-23 Thread James Chapman

[PPPOL2TP]: Add the ability to autoload a pppox protocol module.

This patch allows a name pppox-proto-nnn to be used in modprobe.conf
to autoload a driver for PPPoX protocol nnn.

Signed-off-by: James Chapman [EMAIL PROTECTED]

Index: linux-2.6.21-rc4/drivers/net/pppox.c
===
--- linux-2.6.21-rc4.orig/drivers/net/pppox.c
+++ linux-2.6.21-rc4/drivers/net/pppox.c
@@ -114,6 +114,13 @@ static int pppox_create(struct socket *s
goto out;
 
rc = -EPROTONOSUPPORT;
+#ifdef CONFIG_KMOD
+   if (!pppox_protos[protocol]) {
+   char buffer[32];
+   sprintf(buffer, pppox-proto-%d, protocol);
+   request_module(buffer);
+   }
+#endif
if (!pppox_protos[protocol] ||
!try_module_get(pppox_protos[protocol]-owner))
goto out;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5 2.6.21-rc4] l2tp: pppol2tp kbuild changes

2007-03-23 Thread James Chapman

[PPPOL2TP]: Modify kbuild for the new pppol2tp driver.

This patch adds a new config option, CONFIG_PPPOL2TP and adds
if_pppol2tp.h to the list of exported headers.

Signed-off-by: James Chapman [EMAIL PROTECTED]

Index: linux-2.6.21-rc4/drivers/net/Kconfig
===
--- linux-2.6.21-rc4.orig/drivers/net/Kconfig
+++ linux-2.6.21-rc4/drivers/net/Kconfig
@@ -2812,6 +2812,19 @@ config PPPOATM
  which can lead to bad results if the ATM peer loses state and
  changes its encapsulation unilaterally.
 
+config PPPOL2TP
+   tristate PPP over L2TP (EXPERIMENTAL)
+   depends on EXPERIMENTAL  PPP
+   help
+ Support for PPP-over-L2TP socket family. L2TP is a protocol
+ used by ISPs and enterprises to tunnel PPP traffic over UDP
+ tunnels. L2TP is replacing PPTP for VPN uses.
+
+ This kernel component handles only L2TP data packets: a
+ userland daemon handles L2TP the control protocol (tunnel
+ and session setup). One such daemon is OpenL2TP
+ (http://openl2tp.sourceforge.net/).
+
 config SLIP
tristate SLIP (serial line) support
---help---
Index: linux-2.6.21-rc4/drivers/net/Makefile
===
--- linux-2.6.21-rc4.orig/drivers/net/Makefile
+++ linux-2.6.21-rc4/drivers/net/Makefile
@@ -119,6 +119,7 @@ obj-$(CONFIG_PPP_DEFLATE) += ppp_deflate
 obj-$(CONFIG_PPP_BSDCOMP) += bsd_comp.o
 obj-$(CONFIG_PPP_MPPE) += ppp_mppe.o
 obj-$(CONFIG_PPPOE) += pppox.o pppoe.o
+obj-$(CONFIG_PPPOL2TP) += pppox.o pppol2tp.o
 
 obj-$(CONFIG_SLIP) += slip.o
 obj-$(CONFIG_SLHC) += slhc.o
Index: linux-2.6.21-rc4/include/linux/Kbuild
===
--- linux-2.6.21-rc4.orig/include/linux/Kbuild
+++ linux-2.6.21-rc4/include/linux/Kbuild
@@ -220,6 +220,7 @@ unifdef-y += if_fddi.h
 unifdef-y += if_frad.h
 unifdef-y += if_ltalk.h
 unifdef-y += if_link.h
+unifdef-y += if_pppol2tp.h
 unifdef-y += if_pppox.h
 unifdef-y += if_shaper.h
 unifdef-y += if_tr.h
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5 2.6.21-rc4] l2tp: add pppol2tp maintainer

2007-03-23 Thread James Chapman

[PPPOL2TP]: Update maintainers file for PPP over L2TP.

Signed-off-by: James Chapman [EMAIL PROTECTED]

Index: linux-2.6.21-rc4/MAINTAINERS
===
--- linux-2.6.21-rc4.orig/MAINTAINERS
+++ linux-2.6.21-rc4/MAINTAINERS
@@ -2700,6 +2700,11 @@ P:   Michal Ostrowski
 M: [EMAIL PROTECTED]
 S: Maintained
 
+PPP OVER L2TP
+P: James Chapman
+M: [EMAIL PROTECTED]
+S: Maintained
+
 PREEMPTIBLE KERNEL
 P: Robert Love
 M: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH]: SAD sometimes has double SAs.

2007-03-23 Thread Joy Latten

Last Friday I proposed creating larval SAs to act as
placeholders to prevent a second acquire resulting in 
double SAs being created. 
I tried this and so far I have not seen any double SAs
being created. I also plan to run some stress tests over 
the weekend.

Please let me know what improvements I can make to this patch or
if there is a better way to do this.

 A while back I reported that I sometimes saw double and triple 
 SAs being created. The patch to check for protocol when deleting
 larval SA removed one obstacle in that I no longer see triple SAs. 
 Now, once in a while double SAs. I think I have figured out the
 second obstacle.
 
 The initiator installs his SAs into the kernel before the responder.
 As soon as they are installed, the blocked packet (which started
 the ACQUIRE) is sent. By this time the responder has installed his
 inbound SA(s) and so the newly arrived ipsec packet can be processed.
 In the case of tcp connections and a ping, a response may be 
 warranted, and thus an outgoing packet results. 
 
 From what I can tell of the log file below, sometimes, this
 might happen before the responder has completed installing 
 the outbound SAs. In the log file, the outbound AH has been added, 
 but not the outbound ESP, which is the one the outgoing packet 
 looks for first. Thus resulting in a second acquire. 
 
 I think this becomes more problematic when using both AH AND ESP, 
 rather than just using ESP with authentication. With the latter, 
 only one SA needed thus reducing the latency in installing the 
 SAs before incoming packet arrives. 
 
 So far, the only solution I can think of besides mandating all 
 userspace key daemons do SA maintenance is to perhaps add larval 
 SAs for both directions when adding the SPI. Currently, responder 
 does GETSPI for one way and initiator for opposite. When GETSPI is
 called, larval SA is created containing the SPI, but it is only 
 for one direction. Perhaps we can add a larval SA (no SPI) for 
 opposite direction to act as a placeholder indicating ACQUIRE 
 occurring, since SAs are created for both directions during an ACQUIRE.
 The initiator may have larval SA from GETSPI and larval SA from the
 ACQUIRE depending that GETSPI is in opposite direction of ACQUIRE.
 Calling __find_acq_core() should ensure we don't create duplicate 
 larval SAs. Also, should IKE negotiations return error, larval SAs
 should expire. They also should be removed when we do the
 xfrm_state_add() and xfrm_state_update() to add the new SAs.
  

Joy


This patch is against linux-2.6.21-rc4-git5

Signed-off-by: Joy Latten[EMAIL PROTECTED]


diff -urpN linux-2.6.20.orig/net/xfrm/xfrm_state.c 
linux-2.6.20/net/xfrm/xfrm_state.c
--- linux-2.6.20.orig/net/xfrm/xfrm_state.c 2007-03-20 22:39:15.0 
-0500
+++ linux-2.6.20/net/xfrm/xfrm_state.c  2007-03-23 16:38:37.0 -0500
@@ -692,12 +692,15 @@ void xfrm_state_insert(struct xfrm_state
 }
 EXPORT_SYMBOL(xfrm_state_insert);
 
+static struct xfrm_state *create_larval_sa(unsigned short family, u8 mode, u32 
reqid, u8 proto, xfrm_address_t *daddr, xfrm_address_t *saddr);
+
 /* xfrm_state_lock is held */
 static struct xfrm_state *__find_acq_core(unsigned short family, u8 mode, u32 
reqid, u8 proto, xfrm_address_t *daddr, xfrm_address_t *saddr, int create)
 {
unsigned int h = xfrm_dst_hash(daddr, saddr, reqid, family);
struct hlist_node *entry;
-   struct xfrm_state *x;
+   struct xfrm_state *x, *x1;
+   int track_opposite = 0;
 
hlist_for_each_entry(x, entry, xfrm_state_bydst+h, bydst) {
if (x-props.reqid  != reqid ||
@@ -710,11 +713,20 @@ static struct xfrm_state *__find_acq_cor
 
switch (family) {
case AF_INET:
+   if (x-id.daddr.a4 == saddr-a4 
+   x-props.saddr.a4 == daddr-a4)
+   track_opposite = 1;
if (x-id.daddr.a4!= daddr-a4 ||
x-props.saddr.a4 != saddr-a4)
continue;
break;
case AF_INET6:
+   if (ipv6_addr_equal((struct in6_addr *)x-id.daddr.a6,
+(struct in6_addr *)saddr) ||
+   ipv6_addr_equal((struct in6_addr *)
+x-props.saddr.a6,
+(struct in6_addr *)daddr))
+   track_opposite = 1;
if (!ipv6_addr_equal((struct in6_addr *)x-id.daddr.a6,
 (struct in6_addr *)daddr) ||
!ipv6_addr_equal((struct in6_addr *)
@@ -731,6 +743,27 @@ static struct xfrm_state *__find_acq_cor
if (!create)
return NULL;
 
+   x = create_larval_sa(family, mode, reqid, proto, daddr, saddr);
+   
+   /* create a larval

Re: [PATCH]: SAD sometimes has double SAs.

2007-03-23 Thread David Miller

From: Joy Latten [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 16:58:20 -0600

 Last Friday I proposed creating larval SAs to act as
 placeholders to prevent a second acquire resulting in 
 double SAs being created. 
 I tried this and so far I have not seen any double SAs
 being created. I also plan to run some stress tests over 
 the weekend.

 Please let me know what improvements I can make to this patch or
 if there is a better way to do this.

I'll take a look at your patch after I deal with some ipv6
locking bugs, thanks Joy.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[TG3 1/3]: Eliminate the unused TG3_FLAG_SPLIT_MODE flag.

2007-03-23 Thread Michael Chan

[TG3]: Eliminate the unused TG3_FLAG_SPLIT_MODE flag.

This flag to support multiple PCIX split completions was never used
because of hardware bugs.  This will make room for a new flag.

Signed-off-by: Michael Chan [EMAIL PROTECTED]

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 8c8f9f4..ab87bb1 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -6321,8 +6321,6 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
  RDMAC_MODE_ADDROFLOW_ENAB | RDMAC_MODE_FIFOOFLOW_ENAB |
  RDMAC_MODE_FIFOURUN_ENAB | RDMAC_MODE_FIFOOREAD_ENAB |
  RDMAC_MODE_LNGREAD_ENAB);
-   if (tp-tg3_flags  TG3_FLAG_SPLIT_MODE)
-   rdmac_mode |= RDMAC_MODE_SPLIT_ENABLE;
 
/* If statement applies to 5705 and 5750 PCI devices only */
if ((GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5705 
@@ -6495,9 +6493,6 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
} else if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5704) {
val = ~(PCIX_CAPS_SPLIT_MASK | PCIX_CAPS_BURST_MASK);
val |= (PCIX_CAPS_MAX_BURST_CPIOB  
PCIX_CAPS_BURST_SHIFT);
-   if (tp-tg3_flags  TG3_FLAG_SPLIT_MODE)
-   val |= (tp-split_mode_max_reqs 
-   PCIX_CAPS_SPLIT_SHIFT);
}
tw32(TG3PCI_X_CAPS, val);
}
@@ -10863,14 +10858,6 @@ static int __devinit tg3_get_invariants(struct tg3 *tp)
grc_misc_cfg = tr32(GRC_MISC_CFG);
grc_misc_cfg = GRC_MISC_CFG_BOARD_ID_MASK;
 
-   /* Broadcom's driver says that CIOBE multisplit has a bug */
-#if 0
-   if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5704 
-   grc_misc_cfg == GRC_MISC_CFG_BOARD_ID_5704CIOBE) {
-   tp-tg3_flags |= TG3_FLAG_SPLIT_MODE;
-   tp-split_mode_max_reqs = SPLIT_MODE_5704_MAX_REQ;
-   }
-#endif
if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5705 
(grc_misc_cfg == GRC_MISC_CFG_BOARD_ID_5788 ||
 grc_misc_cfg == GRC_MISC_CFG_BOARD_ID_5788M))
@@ -11968,14 +11955,12 @@ static int __devinit tg3_init_one(struct pci_dev 
*pdev,
   i == 5 ? '\n' : ':');
 
printk(KERN_INFO %s: RXcsums[%d] LinkChgREG[%d] 
-  MIirq[%d] ASF[%d] Split[%d] WireSpeed[%d] 
-  TSOcap[%d] \n,
+  MIirq[%d] ASF[%d] WireSpeed[%d] TSOcap[%d]\n,
   dev-name,
   (tp-tg3_flags  TG3_FLAG_RX_CHECKSUMS) != 0,
   (tp-tg3_flags  TG3_FLAG_USE_LINKCHG_REG) != 0,
   (tp-tg3_flags  TG3_FLAG_USE_MI_INTERRUPT) != 0,
   (tp-tg3_flags  TG3_FLAG_ENABLE_ASF) != 0,
-  (tp-tg3_flags  TG3_FLAG_SPLIT_MODE) != 0,
   (tp-tg3_flags2  TG3_FLG2_NO_ETH_WIRE_SPEED) == 0,
   (tp-tg3_flags2  TG3_FLG2_TSO_CAPABLE) != 0);
printk(KERN_INFO %s: dma_rwctrl[%08x] dma_mask[%d-bit]\n,
diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h
index 086892d..5df8f76 100644
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -2223,7 +2223,6 @@ struct tg3 {
 #define TG3_FLAG_40BIT_DMA_BUG 0x0800
 #define TG3_FLAG_BROKEN_CHECKSUMS  0x1000
 #define TG3_FLAG_GOT_SERDES_FLOWCTL0x2000
-#define TG3_FLAG_SPLIT_MODE0x4000
 #define TG3_FLAG_INIT_COMPLETE 0x8000
u32 tg3_flags2;
 #define TG3_FLG2_RESTART_TIMER 0x0001
@@ -2262,9 +2261,6 @@ struct tg3 {
 #define TG3_FLG2_NO_FWARE_REPORTED 0x4000
 #define TG3_FLG2_PHY_ADJUST_TRIM   0x8000
 
-   u32 split_mode_max_reqs;
-#define SPLIT_MODE_5704_MAX_REQ3
-
struct timer_list   timer;
u16 timer_counter;
u16 timer_multiplier;


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[TG3 2/3]: Exit irq handler during chip reset.

2007-03-23 Thread Michael Chan

[TG3]: Exit irq handler during chip reset.

On most tg3 chips, the memory enable bit in the PCI command register
gets cleared during chip reset and must be restored before accessing
PCI registers using memory cycles.  The chip does not generate
interrupt during chip reset, but the irq handler can still be called
because of irq sharing or irqpoll.  Reading a register in the irq
handler can cause a master abort in this scenario and may result in a
crash on some architectures.

Use the TG3_FLAG_CHIP_RESETTING flag to tell the irq handler to exit
without touching any registers.  The checking of the flag is in the
slow path of the irq handler and will not affect normal performance.
The msi handler is not shared and therefore does not require checking
the flag.

Thanks to Bernhard Walle [EMAIL PROTECTED] for reporting the problem.

Signed-off-by: Michael Chan [EMAIL PROTECTED]

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index ab87bb1..9aca100 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -3568,32 +3568,34 @@ static irqreturn_t tg3_interrupt(int irq, void *dev_id)
 * Reading the PCI State register will confirm whether the
 * interrupt is ours and will flush the status block.
 */
-   if ((sblk-status  SD_STATUS_UPDATED) ||
-   !(tr32(TG3PCI_PCISTATE)  PCISTATE_INT_NOT_ACTIVE)) {
-   /*
-* Writing any value to intr-mbox-0 clears PCI INTA# and
-* chip-internal interrupt pending events.
-* Writing non-zero to intr-mbox-0 additional tells the
-* NIC to stop sending us irqs, engaging in-intr-handler
-* event coalescing.
-*/
-   tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
-0x0001);
-   if (tg3_irq_sync(tp))
+   if (unlikely(!(sblk-status  SD_STATUS_UPDATED))) {
+   if ((tp-tg3_flags  TG3_FLAG_CHIP_RESETTING) ||
+   (tr32(TG3PCI_PCISTATE)  PCISTATE_INT_NOT_ACTIVE)) {
+   handled = 0;
goto out;
-   sblk-status = ~SD_STATUS_UPDATED;
-   if (likely(tg3_has_work(tp))) {
-   prefetch(tp-rx_rcb[tp-rx_rcb_ptr]);
-   netif_rx_schedule(dev); /* schedule NAPI poll */
-   } else {
-   /* No work, shared interrupt perhaps?  re-enable
-* interrupts, and flush that PCI write
-*/
-   tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
-   0x);
}
-   } else {/* shared interrupt */
-   handled = 0;
+   }
+
+   /*
+* Writing any value to intr-mbox-0 clears PCI INTA# and
+* chip-internal interrupt pending events.
+* Writing non-zero to intr-mbox-0 additional tells the
+* NIC to stop sending us irqs, engaging in-intr-handler
+* event coalescing.
+*/
+   tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x0001);
+   if (tg3_irq_sync(tp))
+   goto out;
+   sblk-status = ~SD_STATUS_UPDATED;
+   if (likely(tg3_has_work(tp))) {
+   prefetch(tp-rx_rcb[tp-rx_rcb_ptr]);
+   netif_rx_schedule(dev); /* schedule NAPI poll */
+   } else {
+   /* No work, shared interrupt perhaps?  re-enable
+* interrupts, and flush that PCI write
+*/
+   tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
+   0x);
}
 out:
return IRQ_RETVAL(handled);
@@ -3611,31 +3613,33 @@ static irqreturn_t tg3_interrupt_tagged(int irq, void 
*dev_id)
 * Reading the PCI State register will confirm whether the
 * interrupt is ours and will flush the status block.
 */
-   if ((sblk-status_tag != tp-last_tag) ||
-   !(tr32(TG3PCI_PCISTATE)  PCISTATE_INT_NOT_ACTIVE)) {
-   /*
-* writing any value to intr-mbox-0 clears PCI INTA# and
-* chip-internal interrupt pending events.
-* writing non-zero to intr-mbox-0 additional tells the
-* NIC to stop sending us irqs, engaging in-intr-handler
-* event coalescing.
-*/
-   tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW,
-0x0001);
-   if (tg3_irq_sync(tp))
+   if (unlikely(sblk-status_tag == tp-last_tag)) {
+   if ((tp-tg3_flags  TG3_FLAG_CHIP_RESETTING) ||
+   (tr32(TG3PCI_PCISTATE)  PCISTATE_INT_NOT_ACTIVE)) {
+   handled = 0;
goto out;
-   if (netif_rx_schedule_prep(dev)) {
-   prefetch(tp-rx_rcb[tp-rx_rcb_ptr]);
-   /* Update last_tag to mark

[TG3 3/3]: Update version and reldate.

2007-03-23 Thread Michael Chan

[TG3]: Update version and reldate.

Update version to 3.75.

Signed-off-by: Michael Chan [EMAIL PROTECTED]

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 9aca100..e682f90 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -64,8 +64,8 @@
 
 #define DRV_MODULE_NAMEtg3
 #define PFX DRV_MODULE_NAME: 
-#define DRV_MODULE_VERSION 3.74
-#define DRV_MODULE_RELDATE February 20, 2007
+#define DRV_MODULE_VERSION 3.75
+#define DRV_MODULE_RELDATE March 23, 2007
 
 #define TG3_DEF_MAC_MODE   0
 #define TG3_DEF_RX_MODE0


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-23 Thread Samuel Ortiz

On Fri, Mar 23, 2007 at 01:14:43PM +0100, Guennadi Liakhovetski wrote:
 On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:
 
  On Wed, 21 Mar 2007, Samuel Ortiz wrote:
 
  I'm quite sure the leak is in the IrDA code rather than in the ppp or
  ipv4 one, hence the need for full irda debug...
 
 Well, looks like you were wrong, Samuel. 
Heh, it's good to be wrong sometimes :-)

 Below is a patch that fixes ONE 
 sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
 there are more there.
Are you still seeing the skb cache growing with your fix ?

Cheers,
Samuel.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] [NET] MTU discovery changes

2007-03-23 Thread John Heffner

These are a few changes to fix/clean up some of the MTU discovery 
processing with non-stream sockets, and add a probing mode.  See also 
matching patches to tracepath to take advantage of this.


  -John
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] [NET] Do pmtu check in transport layer

2007-03-23 Thread John Heffner

Check the pmtu check at the transport layer (for UDP, ICMP and raw), and
send a local error if socket is PMTUDISC_DO and packet is too big.  This is
actually a pure bugfix for ipv6.  For ipv4, it allows us to do pmtu checks
in the same way as for ipv6.

Signed-off-by: John Heffner [EMAIL PROTECTED]
---
 net/ipv4/ip_output.c  |4 +++-
 net/ipv4/raw.c|8 +---
 net/ipv6/ip6_output.c |   11 ++-
 net/ipv6/raw.c|7 +--
 4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d096332..593acf7 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -822,7 +822,9 @@ int ip_append_data(struct sock *sk,
fragheaderlen = sizeof(struct iphdr) + (opt ? opt-optlen : 0);
maxfraglen = ((mtu - fragheaderlen)  ~7) + fragheaderlen;
 
-   if (inet-cork.length + length  0x - fragheaderlen) {
+   if (inet-cork.length + length  0x - fragheaderlen ||
+   (inet-pmtudisc = IP_PMTUDISC_DO 
+inet-cork.length + length  mtu)) {
ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, 
mtu-exthdrlen);
return -EMSGSIZE;
}
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 87e9c16..f252f4e 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -271,10 +271,12 @@ static int raw_send_hdrinc(struct sock *sk, void *from, 
size_t length,
struct iphdr *iph;
struct sk_buff *skb;
int err;
+   int mtu;
 
-   if (length  rt-u.dst.dev-mtu) {
-   ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport,
-  rt-u.dst.dev-mtu);
+   mtu = inet-pmtudisc == IP_PMTUDISC_DO ? dst_mtu(rt-u.dst) :
+rt-u.dst.dev-mtu;
+   if (length  mtu) {
+   ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, mtu);
return -EMSGSIZE;
}
if (flagsMSG_PROBE)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 3055169..711dfc3 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1044,11 +1044,12 @@ int ip6_append_data(struct sock *sk, int getfrag(void 
*from, char *to,
fragheaderlen = sizeof(struct ipv6hdr) + rt-u.dst.nfheader_len + (opt 
? opt-opt_nflen : 0);
maxfraglen = ((mtu - fragheaderlen)  ~7) + fragheaderlen - 
sizeof(struct frag_hdr);
 
-   if (mtu = sizeof(struct ipv6hdr) + IPV6_MAXPLEN) {
-   if (inet-cork.length + length  sizeof(struct ipv6hdr) + 
IPV6_MAXPLEN - fragheaderlen) {
-   ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen);
-   return -EMSGSIZE;
-   }
+   if ((mtu = sizeof(struct ipv6hdr) + IPV6_MAXPLEN 
+inet-cork.length + length  sizeof(struct ipv6hdr) + IPV6_MAXPLEN 
- fragheaderlen) ||
+   (np-pmtudisc = IPV6_PMTUDISC_DO 
+inet-cork.length + length  mtu)) {
+   ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen);
+   return -EMSGSIZE;
}
 
/*
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 306d5d8..75db277 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -556,9 +556,12 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, 
int length,
struct sk_buff *skb;
unsigned int hh_len;
int err;
+   int mtu;
 
-   if (length  rt-u.dst.dev-mtu) {
-   ipv6_local_error(sk, EMSGSIZE, fl, rt-u.dst.dev-mtu);
+   mtu = np-pmtudisc == IPV6_PMTUDISC_DO ? dst_mtu(rt-u.dst) :
+rt-u.dst.dev-mtu;
+   if (length  mtu) {
+   ipv6_local_error(sk, EMSGSIZE, fl, mtu);
return -EMSGSIZE;
}
if (flagsMSG_PROBE)
-- 
1.5.0.2.gc260-dirty

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] [NET] Move DF check to ip_forward

2007-03-23 Thread John Heffner

Do fragmentation check in ip_forward, similar to ipv6 forwarding.  Also add
a debug printk in the DF check in ip_fragment since we should now never
reach it.

Signed-off-by: John Heffner [EMAIL PROTECTED]
---
 net/ipv4/ip_forward.c |8 
 net/ipv4/ip_output.c  |2 ++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 369e721..0efb1f5 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -85,6 +85,14 @@ int ip_forward(struct sk_buff *skb)
if (opt-is_strictroute  rt-rt_dst != rt-rt_gateway)
goto sr_failed;
 
+   if (unlikely(skb-len  dst_mtu(rt-u.dst) 
+(skb-nh.iph-frag_off  htons(IP_DF)))  !skb-local_df) 
{
+   IP_INC_STATS(IPSTATS_MIB_FRAGFAILS);
+   icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+ htonl(dst_mtu(rt-u.dst)));
+   goto drop;
+   }
+
/* We are about to mangle packet. Copy it! */
if (skb_cow(skb, LL_RESERVED_SPACE(rt-u.dst.dev)+rt-u.dst.header_len))
goto drop;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 593acf7..90bdd53 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -433,6 +433,8 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct 
sk_buff*))
iph = skb-nh.iph;
 
if (unlikely((iph-frag_off  htons(IP_DF))  !skb-local_df)) {
+   if (net_ratelimit())
+   printk(KERN_DEBUG ip_fragment: requested fragment of 
packet with DF set\n);
IP_INC_STATS(IPSTATS_MIB_FRAGFAILS);
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
  htonl(dst_mtu(rt-u.dst)));
-- 
1.5.0.2.gc260-dirty

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] [NET] Add IP(V6)_PMTUDISC_RPOBE

2007-03-23 Thread John Heffner

Add IP(V6)_PMTUDISC_PROBE value for IP(V6)_MTU_DISCOVER.  This option forces
us not to fragment, but does not make use of the kernel path MTU discovery. 
That is, it allows for user-mode MTU probing (or, packetization-layer path
MTU discovery).  This is particularly useful for diagnostic utilities, like
traceroute/tracepath.

Signed-off-by: John Heffner [EMAIL PROTECTED]
---
 include/linux/in.h   |1 +
 include/linux/in6.h  |1 +
 include/linux/skbuff.h   |3 ++-
 include/net/ip.h |2 +-
 net/core/skbuff.c|2 ++
 net/ipv4/ip_output.c |   14 ++
 net/ipv4/ip_sockglue.c   |2 +-
 net/ipv4/raw.c   |3 +++
 net/ipv6/ip6_output.c|   12 
 net/ipv6/ipv6_sockglue.c |2 +-
 net/ipv6/raw.c   |3 +++
 11 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/include/linux/in.h b/include/linux/in.h
index 1912e7c..2dc1f8a 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -83,6 +83,7 @@ struct in_addr {
 #define IP_PMTUDISC_DONT   0   /* Never send DF frames */
 #define IP_PMTUDISC_WANT   1   /* Use per route hints  */
 #define IP_PMTUDISC_DO 2   /* Always DF*/
+#define IP_PMTUDISC_PROBE  3   /* Ignore dst pmtu  */
 
 #define IP_MULTICAST_IF32
 #define IP_MULTICAST_TTL   33
diff --git a/include/linux/in6.h b/include/linux/in6.h
index 4e8350a..d559fac 100644
--- a/include/linux/in6.h
+++ b/include/linux/in6.h
@@ -179,6 +179,7 @@ struct in6_flowlabel_req
 #define IPV6_PMTUDISC_DONT 0
 #define IPV6_PMTUDISC_WANT 1
 #define IPV6_PMTUDISC_DO   2
+#define IPV6_PMTUDISC_PROBE3
 
 /* Flowlabel */
 #define IPV6_FLOWLABEL_MGR 32
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ff3940..64038b4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -284,7 +284,8 @@ struct sk_buff {
nfctinfo:3;
__u8pkt_type:3,
fclone:2,
-   ipvs_property:1;
+   ipvs_property:1,
+   ign_dst_mtu;
__be16  protocol;
 
void(*destructor)(struct sk_buff *skb);
diff --git a/include/net/ip.h b/include/net/ip.h
index e79c3e3..f5874a3 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -201,7 +201,7 @@ int ip_decrease_ttl(struct iphdr *iph)
 static inline
 int ip_dont_fragment(struct sock *sk, struct dst_entry *dst)
 {
-   return (inet_sk(sk)-pmtudisc == IP_PMTUDISC_DO ||
+   return (inet_sk(sk)-pmtudisc = IP_PMTUDISC_DO ||
(inet_sk(sk)-pmtudisc == IP_PMTUDISC_WANT 
 !(dst_metric(dst, RTAX_LOCK)(1RTAX_MTU;
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 702fa8f..5c8515c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -474,6 +474,7 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t 
gfp_mask)
 #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
C(ipvs_property);
 #endif
+   C(ign_dst_mtu);
C(protocol);
n-destructor = NULL;
C(mark);
@@ -549,6 +550,7 @@ static void copy_skb_header(struct sk_buff *new, const 
struct sk_buff *old)
 #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
new-ipvs_property = old-ipvs_property;
 #endif
+   new-ign_dst_mtu= old-ign_dst_mtu;
 #ifdef CONFIG_BRIDGE_NETFILTER
new-nf_bridge  = old-nf_bridge;
nf_bridge_get(old-nf_bridge);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 90bdd53..a7e8944 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -201,7 +201,8 @@ static inline int ip_finish_output(struct sk_buff *skb)
return dst_output(skb);
}
 #endif
-   if (skb-len  dst_mtu(skb-dst)  !skb_is_gso(skb))
+   if (skb-len  dst_mtu(skb-dst) 
+   !skb-ign_dst_mtu  !skb_is_gso(skb))
return ip_fragment(skb, ip_finish_output2);
else
return ip_finish_output2(skb);
@@ -801,7 +802,9 @@ int ip_append_data(struct sock *sk,
inet-cork.addr = ipc-addr;
}
dst_hold(rt-u.dst);
-   inet-cork.fragsize = mtu = dst_mtu(rt-u.dst.path);
+   inet-cork.fragsize = mtu = inet-pmtudisc == IP_PMTUDISC_PROBE 
?
+   rt-u.dst.dev-mtu :
+   dst_mtu(rt-u.dst.path);
inet-cork.rt = rt;
inet-cork.length = 0;
sk-sk_sndmsg_page = NULL;
@@ -1220,13 +1223,16 @@ int ip_push_pending_frames(struct sock *sk)
 * to fragment the frame generated here. No matter, what transforms
 * how transforms change size of the packet, it will come out.
 */
-   if

[PATCH 0/2] [iputils] MTU discovery changes

2007-03-23 Thread John Heffner

These add some changes that make tracepath a little more useful for 
diagnosing MTU issues.  The length flag helps distinguish between MTU 
black holes and other types of black holes by allowing you to vary the 
probe packet lengths.  Using PMTUDISC_PROBE gives you the same results 
on each run without having to flush the route cache, so you can see 
where MTU changes in the path actually occur.


The PMTUDISC_PROBE patch goes in should be conditional on whether the 
corresponding kernel patch (just sent) goes in.


  -John
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] [iputils] Use PMTUDISC_PROBE mode if it exists.

2007-03-23 Thread John Heffner

Signed-off-by: John Heffner [EMAIL PROTECTED]
---
 tracepath.c  |   10 --
 tracepath6.c |   10 --
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tracepath.c b/tracepath.c
index 1f901ba..a562d88 100644
--- a/tracepath.c
+++ b/tracepath.c
@@ -24,6 +24,10 @@
 #include sys/uio.h
 #include arpa/inet.h
 
+#ifndef IP_PMTUDISC_PROBE
+#define IP_PMTUDISC_PROBE  3
+#endif
+
 struct hhistory
 {
int hops;
@@ -322,8 +326,10 @@ main(int argc, char **argv)
}
memcpy(target.sin_addr, he-h_addr, 4);
 
-   on = IP_PMTUDISC_DO;
-   if (setsockopt(fd, SOL_IP, IP_MTU_DISCOVER, on, sizeof(on))) {
+   on = IP_PMTUDISC_PROBE;
+   if (setsockopt(fd, SOL_IP, IP_MTU_DISCOVER, on, sizeof(on)) 
+   (on = IP_PMTUDISC_DO,
+setsockopt(fd, SOL_IP, IP_MTU_DISCOVER, on, sizeof(on {
perror(IP_MTU_DISCOVER);
exit(1);
}
diff --git a/tracepath6.c b/tracepath6.c
index d65230d..6f13a51 100644
--- a/tracepath6.c
+++ b/tracepath6.c
@@ -30,6 +30,10 @@
 #define SOL_IPV6 IPPROTO_IPV6
 #endif
 
+#ifndef IPV6_PMTUDISC_PROBE
+#define IPV6_PMTUDISC_PROBE3
+#endif
+
 int overhead = 48;
 int mtu = 128000;
 int hops_to = -1;
@@ -369,8 +373,10 @@ int main(int argc, char **argv)
mapped = 1;
}
 
-   on = IPV6_PMTUDISC_DO;
-   if (setsockopt(fd, SOL_IPV6, IPV6_MTU_DISCOVER, on, sizeof(on))) {
+   on = IPV6_PMTUDISC_PROBE;
+   if (setsockopt(fd, SOL_IPV6, IPV6_MTU_DISCOVER, on, sizeof(on)) 
+   (on = IPV6_PMTUDISC_DO,
+setsockopt(fd, SOL_IPV6, IPV6_MTU_DISCOVER, on, sizeof(on {
perror(IPV6_MTU_DISCOVER);
exit(1);
}
-- 
1.5.0.2.gc260-dirty

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] ehea: removing unused functionality

2007-03-23 Thread Jan-Bernd Themann

This patch includes:
- removal of unused fields in structs
- ethtool statistics cleanup
- removes unsed functionality from send path

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]
---


This patch applies on top of the netdev upstream branch for 2.6.22


 drivers/net/ehea/ehea.h |   25 ++---
 drivers/net/ehea/ehea_ethtool.c |  111 ++--
 drivers/net/ehea/ehea_main.c|   55 +++
 drivers/net/ehea/ehea_qmr.h |2
 4 files changed, 69 insertions(+), 124 deletions(-)


diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index f889933..1405d0b 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@ #include asm/abs_addr.h
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0054
+#define DRV_VERSIONEHEA_0055
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
@@ -79,7 +79,6 @@ #define EHEA_RQ2_PKT_SIZE   1522
 #define EHEA_L_PKT_SIZE 256/* low latency */
 
 /* Send completion signaling */
-#define EHEA_SIG_IV_LONG   1
 
 /* Protection Domain Identifier */
 #define EHEA_PD_ID0xaabcdeff
@@ -106,11 +105,7 @@ #define EHEA_BCMC_VLANID_SINGLE0x00
 #define EHEA_CACHE_LINE  128
 
 /* Memory Regions */
-#define EHEA_MR_MAX_TX_PAGES   20
-#define EHEA_MR_TX_DATA_PN  3
 #define EHEA_MR_ACC_CTRL   0x0080
-#define EHEA_RWQES_PER_MR_RQ2  10
-#define EHEA_RWQES_PER_MR_RQ3  10
 
 #define EHEA_WATCH_DOG_TIMEOUT 10*HZ
 
@@ -318,17 +313,12 @@ struct ehea_mr {
 /*
  * Port state information
  */
-struct port_state {
-   int poll_max_processed;
+struct port_stats {
int poll_receive_errors;
-   int ehea_poll;
int queue_stopped;
-   int min_swqe_avail;
-   u64 sqc_stop_sum;
-   int pkt_send;
-   int pkt_xmit;
-   int send_tasklet;
-   int nwqe;
+   int err_tcp_cksum;
+   int err_ip_cksum;
+   int err_frame_crc;
 };
 
 #define EHEA_IRQ_NAME_SIZE 20
@@ -347,6 +337,7 @@ struct ehea_q_skb_arr {
  * Port resources
  */
 struct ehea_port_res {
+   struct port_stats p_stats;
struct ehea_mr send_mr; /* send memory region */
struct ehea_mr recv_mr; /* receive memory region */
spinlock_t xmit_lock;
@@ -358,7 +349,6 @@ struct ehea_port_res {
struct ehea_cq *recv_cq;
struct ehea_eq *eq;
struct net_device *d_netdev;
-   spinlock_t send_lock;
struct ehea_q_skb_arr rq1_skba;
struct ehea_q_skb_arr rq2_skba;
struct ehea_q_skb_arr rq3_skba;
@@ -368,11 +358,8 @@ struct ehea_port_res {
int swqe_refill_th;
atomic_t swqe_avail;
int swqe_ll_count;
-   int swqe_count;
u32 swqe_id_counter;
u64 tx_packets;
-   spinlock_t recv_lock;
-   struct port_state p_state;
u64 rx_packets;
u32 poll_counter;
 };
diff --git a/drivers/net/ehea/ehea_ethtool.c b/drivers/net/ehea/ehea_ethtool.c
index 9f57c2e..170aff3 100644
--- a/drivers/net/ehea/ehea_ethtool.c
+++ b/drivers/net/ehea/ehea_ethtool.c
@@ -166,33 +166,23 @@ static u32 ehea_get_rx_csum(struct net_d
 }
 
 static char ehea_ethtool_stats_keys[][ETH_GSTRING_LEN] = {
-   {poll_max_processed},
-   {queue_stopped},
-   {min_swqe_avail},
-   {poll_receive_err},
-   {pkt_send},
-   {pkt_xmit},
-   {send_tasklet},
-   {ehea_poll},
-   {nwqe},
-   {swqe_available_0},
{sig_comp_iv},
{swqe_refill_th},
{port resets},
-   {rxo},
-   {rx64},
-   {rx65},
-   {rx128},
-   {rx256},
-   {rx512},
-   {rx1024},
-   {txo},
-   {tx64},
-   {tx65},
-   {tx128},
-   {tx256},
-   {tx512},
-   {tx1024},
+   {Receive errors},
+   {TCP cksum errors},
+   {IP cksum errors},
+   {Frame cksum errors},
+   {num SQ stopped},
+   {SQ stopped},
+   {PR0 free_swqes},
+   {PR1 free_swqes},
+   {PR2 free_swqes},
+   {PR3 free_swqes},
+   {PR4 free_swqes},
+   {PR5 free_swqes},
+   {PR6 free_swqes},
+   {PR7 free_swqes},
 };
 
 static void ehea_get_strings(struct net_device *dev, u32 stringset, u8 *data)
@@ -211,63 +201,44 @@ static int ehea_get_stats_count(struct n
 static void ehea_get_ethtool_stats(struct net_device *dev,
 struct ethtool_stats *stats, u64 *data)
 {
-   u64 hret;
-   int i;
+   int i, k, tmp;
struct ehea_port *port = netdev_priv(dev);
-   struct ehea_adapter *adapter = port-adapter;
-   struct ehea_port_res *pr = port-port_res[0];
-   struct port_state *p_state = pr-p_state;
-   struct hcp_ehea_port_cb6 *cb6;
 
for (i = 0; i  ehea_get_stats_count(dev); i++)
data[i] = 0;
-
i = 0;
 
-   data[i++] = p_state-poll_max_processed;
-   data[i++] = p_state-queue_stopped;
-

Re: [git patches] net driver fixes

2007-03-23 Thread Guennadi Liakhovetski

Jeff, might be worth getting the sk_buff leak fix in ppp from 
http://www.spinics.net/lists/netdev/msg27706.html in 2.6.21 too?

Don't know how important it is for stable. It was present in 2.6.18 too.

Thanks
Guennadi
---
Guennadi Liakhovetski
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[5/5] 2.6.21-rc4: known regressions (v2)

2007-03-23 Thread Adrian Bunk

This email lists some known regressions in Linus' tree compared to 2.6.20.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way
possibly involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject: Oops when changing DVB-T adapter
References : http://lkml.org/lkml/2007/3/9/212
Submitter  : CIJOML [EMAIL PROTECTED]
Status : unknown


Subject: USB: iPod doesn't work
References : http://lkml.org/lkml/2007/3/21/320
Submitter  : Tino Keitel [EMAIL PROTECTED]
Handled-By : Oliver Neukum [EMAIL PROTECTED]
Status : problem is being debuggged


Subject: snd_intel8x0: divide error: 
References : http://lkml.org/lkml/2007/3/5/252
Submitter  : Michal Piotrowski [EMAIL PROTECTED]
Handled-By : Takashi Iwai [EMAIL PROTECTED]
Status : problem is being debugged


Subject: forcedeth: skb_over_panic
References : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Submitter  : Albert Hopkins [EMAIL PROTECTED]
Handled-By : Ayaz Abdulla [EMAIL PROTECTED]
Patch  : http://bugzilla.kernel.org/show_bug.cgi?id=8058
Status : patch available


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Established connections hash function

2007-03-23 Thread David Miller

From: Eric Dumazet [EMAIL PROTECTED]
Date: Thu, 22 Mar 2007 23:03:04 +0100

 David Miller a écrit :
  From: Nikolaos D. Bougalis [EMAIL PROTECTED]
  Date: Thu, 22 Mar 2007 12:44:09 -0700
  
  People _have_ had problems. _I_ have had problems. And when
  someone with a few thousand drones under his control hoses your
  servers because he can do math and he leaves you with 2-item
  long chains, _you_ will have problems.
  
  No need to further argue this point, the people that matter
  (ie. me :-) understand it, don't worry..
 
 Yes, I recall having one big server hit two years ago by an attack on tcp 
 hash 
 function. David sent me the patch to use jhash. It's performing well :)
 
 Welcome to the club :)

Ok, how about we put something like the following into 2.6.21?

I'm not looking for the hash perfectionist analysis, please bug the
heck off if that's what your reply is going to be about, don't bother
hitting the reply button I will ignore you.

I want to hear instead if this makes attackability markedly _HARDER_
than what we have now and I am sure beyond a shadow of a doubt that it
does.

The secret is inialized when the first ehash-using socket is created,
that's not perfect (bug off!) but it's better than doing the
initialization in inet_init() or {tcp,dccp}_init() as we'll have a
chance of at least some entropy when that first such socket is
created.  We definitely can't do it for the first AF_INET socket
creation, because icmp creates a bunch of SOCK_RAW inet sockets at
init time which would defeat the whole purpose of deferring this. :)

Thanks.

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index c28e424..668056b 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -19,6 +19,9 @@
 #include linux/in6.h
 #include linux/ipv6.h
 #include linux/types.h
+#include linux/jhash.h
+
+#include net/inet_sock.h
 
 #include net/ipv6.h
 
@@ -28,12 +31,11 @@ struct inet_hashinfo;
 static inline unsigned int inet6_ehashfn(const struct in6_addr *laddr, const 
u16 lport,
const struct in6_addr *faddr, const __be16 
fport)
 {
-   unsigned int hashent = (lport ^ (__force u16)fport);
+   u32 ports = (lport ^ (__force u16)fport);
 
-   hashent ^= (__force u32)(laddr-s6_addr32[3] ^ faddr-s6_addr32[3]);
-   hashent ^= hashent  16;
-   hashent ^= hashent  8;
-   return hashent;
+   return jhash_3words((__force u32)laddr-s6_addr32[3],
+   (__force u32)faddr-s6_addr32[3],
+   ports, inet_ehash_secret);
 }
 
 static inline int inet6_sk_ehashfn(const struct sock *sk)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index ce6da97..62daf21 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -19,6 +19,7 @@
 
 #include linux/string.h
 #include linux/types.h
+#include linux/jhash.h
 
 #include net/flow.h
 #include net/sock.h
@@ -167,13 +168,15 @@ static inline void inet_sk_copy_descendant(struct sock 
*sk_to,
 
 extern int inet_sk_rebuild_header(struct sock *sk);
 
+extern u32 inet_ehash_secret;
+extern void build_ehash_secret(void);
+
 static inline unsigned int inet_ehashfn(const __be32 laddr, const __u16 lport,
const __be32 faddr, const __be16 fport)
 {
-   unsigned int h = ((__force __u32)laddr ^ lport) ^ ((__force __u32)faddr 
^ (__force __u32)fport);
-   h ^= h  16;
-   h ^= h  8;
-   return h;
+   return jhash_2words((__force __u32) laddr ^ (__force __u32) faddr,
+   ((__u32) lport)  16 | (__force __u32)fport,
+   inet_ehash_secret);
 }
 
 static inline int inet_sk_ehashfn(const struct sock *sk)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cf358c8..308318a 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -87,6 +87,7 @@
 #include linux/init.h
 #include linux/poll.h
 #include linux/netfilter_ipv4.h
+#include linux/random.h
 
 #include asm/uaccess.h
 #include asm/system.h
@@ -217,6 +218,16 @@ out:
return err;
 }
 
+u32 inet_ehash_secret;
+EXPORT_SYMBOL(inet_ehash_secret);
+
+void build_ehash_secret(void)
+{
+   while (!inet_ehash_secret)
+   get_random_bytes(inet_ehash_secret, 4);
+}
+EXPORT_SYMBOL(build_ehash_secret);
+
 /*
  * Create an inet socket.
  */
@@ -233,6 +244,11 @@ static int inet_create(struct socket *sock, int protocol)
int try_loading_module = 0;
int err;
 
+   if (sock-type != SOCK_RAW 
+   sock-type != SOCK_DGRAM 
+   !inet_ehash_secret)
+   build_ehash_secret();
+
sock-state = SS_UNCONNECTED;
 
/* Look for the requested type/protocol pair. */
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 5cac14a..0de723f 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -98,6 +98,11 @@ static int inet6_create(struct socket *sock, int protocol)
int try_loading_module

Re: L2TP support?

2007-03-23 Thread Jorge Boncompte [DTI2]

- Original Message - 
From: James Chapman [EMAIL PROTECTED]

To: Ingo Oeser [EMAIL PROTECTED]
Cc: netdev@vger.kernel.org
Sent: Thursday, March 22, 2007 9:13 PM
Subject: Re: L2TP support?

Yes there is. There's a pppd plugin which comes with the openl2tp project, 
http://sf.net/projects/openl2tp. OpenL2TP supports both LAC and LNS 
operation. A patch is also available to allow this driver to be used with 
another L2TP implementation, l2tpd.

   Well, I am using a modified rp-l2tp with the pppol2tp kernel module 
myself so now it accounts for three implementations I guess :-)

   -Jorge

==
Jorge Boncompte - Ingenieria y Gestion de RED
DTI2 - Desarrollo de la Tecnologia de las Comunicaciones
--
C/ Abogado Enriquez Barrios, 5   14004 CORDOBA (SPAIN)
Tlf: +34 957 761395 / FAX: +34 957 450380
==
- Sin pistachos no hay Rock  Roll...
- Without wicker a basket cannot be made.
==

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Established connections hash function

2007-03-23 Thread Evgeniy Polyakov

On Thu, Mar 22, 2007 at 01:53:03PM -0700, Nikolaos D. Bougalis ([EMAIL 
PROTECTED]) wrote:
 Grrr, I think I pointed several times already, that properly distributed
 values do not change distribution after folding. And it can be seen in
 all tests (and in that you pointed too).
 
Yes, I agree that the folding will not be a problem _IF_ the values are 
 properly distributed -- although in that case, the folding is unnecessary. 
 But that the Jenkins distribution didn't change (according to posts you 
 made) after folding says that the output of Jenkins is pretty good to begin 
 with ;)

In _some_ cases, but not in all.
 
  We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) 
  though.
  
 Please explain to me how jhash_2words solves the issue that you 
   claim
  jhash_3words has, when they both use the same underlying bit-mixer?
  
  $c value is not properly distributed and significanly breaks overall
  distribution. Attacker, which controls $c (and it does it by 
  controlling
  ports), can significantly increase selected hash chains.
 
Even if we assume that $c is not properly distributed, using a secret 
 cookie and mixing operations from different algebraic groups changes the 
 calculus dramatically. It's no longer straight-forward for the attacker to 
 generate collisions (as it is with the current function) because the '$c' 
 supplied by the attacker is used in conjunction with the secret cookie 
 before __jhash_mix thoroughly mixes the inputs to generate a hash.

With XOR hash attacker can predict end result easily, with jenkins it
can not (easily), but jenkins distribution itself (even for usual data) 
results in too long chains - there are two problems:
1. easily predicted result
2. broken distribution

Xor hash has problems with first one, Jenkins (in some cases) with
second.
 
 I've tested the Jenkins hash extensively. I see no evidence of this
  improper distribution that you describe. In fact, about the only 
  person
  that I've seen advocate this in the archives of netdev is you, and a lot 
  of
  other very smart people disagree with you, so I consider myself to be in
  good company.
 
 Hmm, I ran tests to select proper hash for netchannel implementation
 (actualy the same as sockets) and showed Jenkin's hash problems - it is
 enough to have only problem to state that there is a problem, doesn't
 it?
 
Again, from what I've seen from your other posts, I don't believe you've 
 identified any inherent problems with the Jenkins hash.
 
But that aside for a moment, surely you will agree that the ability of 
 an attacker with a few dozen machines under his control to trivially mount 
 an algorithmic complexity attack causing serious performance drops is also 
 a problem with the current code and one that must be addressed.

Please refer to above two problems - Jenkins hash does not have problem
with easy end result detection, instead if has distribution problem.
Which means that attacker should not guess hash chains, it should
provide special crafted input and distribution will be shifted to the
higher levels.
 
 I will try to decipher phrase 'whatever it is, it's not there'...
 
It meant that I saw nothing particularly interesting running the example 
 you suggested and looking at the output.
 
 
 This thread for example:
 http://marc.info/?t=11705761351r=1w=2
 
I went through most of this thread. I don't see an analysis of the 
 Jenkins. Am I missing something?

There is no full analysis, I just posted results I found when selected
hash for different projects with similar to sockets background.
 
 One your test shows thare are no problems, try that one I propose, which
 can be even created in userspace - you do not want even to get into
 account what I try to say to you.
 
I'm not trying to be obnoxious on purpose here, but I don't see the test 
 that you are referring to. Could you be more specific?

http://marc.info/?l=linux-netdevm=117199140430104q=p5

-n
-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Established connections hash function

2007-03-23 Thread Eric Dumazet


David Miller a écrit :

From: Eric Dumazet [EMAIL PROTECTED]



Welcome to the club :)


Ok, how about we put something like the following into 2.6.21?


2.6.21 really ?

Just to be clear : I had an attack two years ago, I applied your patch, 
rebooted the machine, and since then the attackers had to find another way to 
hurt the machine. Eventually, when I update the kernel of this machine, I 
forget to appply jhash patch, and attackers dont know they can try again :)


I dont consider this new hash as bug fix at all, ie your patch might enter 
2.6.22 normal dev cycle.


Maybe a *fix*, independant of the hash function (so that no math expert can 
insult us), would be to have a *limit*, say... 1000 (something insane) on the 
length of a hash chain ?


In my case, I saw lengths of about 3000 two years ago under attack, but 
machine was still usable... maybe in half power mode.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Established connections hash function

2007-03-23 Thread Evgeniy Polyakov

On Thu, Mar 22, 2007 at 01:58:34PM -0700, David Miller ([EMAIL PROTECTED]) 
wrote:
 From: Nikolaos D. Bougalis [EMAIL PROTECTED]
 Date: Thu, 22 Mar 2007 12:44:09 -0700

  People _have_ had problems. _I_ have had problems. And when
  someone with a few thousand drones under his control hoses your
  servers because he can do math and he leaves you with 2-item
  long chains, _you_ will have problems.

 No need to further argue this point, the people that matter
 (ie. me :-) understand it, don't worry..

Call me a loooser which mail will be deleted on arrival, but...

jhash_2words(const, const, ((const  16) | $sport) ^ $random)

where $sport is 1-65535 in a loop, and $random is pseudo-random number
obtained on start.

Which is exactly the case of web server and attacker connects to 80 port
from the same IP address and different source ports.

Result with jenkins:
1 23880
2 12108
3 4040
4 1019
5 200
6 30
7 8
8 1

Xor:
1 65536

Please, do not apply patch as is, I will devote this day to find where
jenkins has problems and try to fix distribution. If I will fail, then
it is up to you to decide that above results are bad or good.

Thank you.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Established connections hash function

2007-03-23 Thread Eric Dumazet


Evgeniy Polyakov a ecrit :

Call me a loooser which mail will be deleted on arrival, but...

jhash_2words(const, const, ((const  16) | $sport) ^ $random)

where $sport is 1-65535 in a loop, and $random is pseudo-random number
obtained on start.

Which is exactly the case of web server and attacker connects to 80 port
from the same IP address and different source ports.

Result with jenkins:
1 23880
2 12108
3 4040
4 1019
5 200
6 30
7 8
8 1

Xor:
1 65536


So what ? You still think hash function must be bijective ? Come on !

You have a machine somewhere that allows 65536 concurrent connections coming 
from the same IP address ?


The last problem you have is the nature of tcp hash function.

Dont argue again with your pseudo science.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Established connections hash function

2007-03-23 Thread Evgeniy Polyakov

On Fri, Mar 23, 2007 at 09:17:19AM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
 You have a machine somewhere that allows 65536 concurrent connections 
 coming from the same IP address ?

Attached png file of botnet scenario:
1000 addresses from the same network (class B for example),
each one creates 1024 connections to the same static port.

Eric, I agree, that XOR hash is not perfect, and it should be changed,
but not blindly.

I perfectly know that hash function is not bijective, but it must have
good distribution. 
Function like this
int hash(u32 saddr, u16 sport, u32 daddr, u16 dport, u32 rand)
{
return rand ^ saddr ^ daddr)16)^(dport ^ sport)) 8);
}

has even worse _distribution_, although you can not predict its end
result due to random value, and attacker will not try to do it.

-- 
Evgeniy Polyakov


jhash_botnet.png
Description: PNG image

Re: RFC: Established connections hash function

2007-03-23 Thread Evgeniy Polyakov

On Fri, Mar 23, 2007 at 11:33:32AM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) 
wrote:
 Eric, I agree, that XOR hash is not perfect, and it should be changed,
 but not blindly.

Attached case of how broken can be xor in botnet scenario.


-- 
Evgeniy Polyakov


jhash_good.png
Description: PNG image

Re: [PATCH 4/5] netem: avoid excessive requeues

2007-03-23 Thread Patrick McHardy

David Miller wrote:
 From: Patrick McHardy [EMAIL PROTECTED]
 Date: Thu, 22 Mar 2007 21:40:43 +0100

Perhaps we should put this in qdisc_restart, other qdiscs have the
same problem.

 Agreed, patches welcome :)

I've tried this, but for some reason it makes TBF stay about
5% under the configured rate. Probably because of late timers,
the strange thing is that the 5% happen constantly even with
very low rates.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

XOR hash beauty solved [Was: RFC: Established connections hash function]

2007-03-23 Thread Evgeniy Polyakov

 Please, do not apply patch as is, I will devote this day to find where
 jenkins has problems and try to fix distribution. If I will fail, then
 it is up to you to decide that above results are bad or good.

I need to admit that I was partially wrong in my analysis of the Jenkins
hash distribution - it does _not_ have problems and any kind of artifacts. 
Waves found in tests are results of folding into hash_size boundary,
distribution inside F(32) field is unifirm.
XOR hash does not have such problem, because it uses (u32 ^ u16) as one
round, which results in the uniform (it is not correct to call that
distribution uniform as is, but only getting into account that u16 values 
used in tests were uniformly distributed) distribution inside F(16), which 
does not suffer from hash_size boundary folding. Since XOR hash has 3 rounds, 
only one of them (xor of the final u32 values) will suffer from folding, but 
tests where is it can be determined for sure use constant addresses, so 
problem hides again.

So, briefly saying, jhash_2/3words have safe distribution, but have
higher-number of elements waves as a result of folding which is
unavoidable for general-purpose hash.

Now my conscience is calm :)

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-23 Thread Guennadi Liakhovetski


On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:


On Wed, 21 Mar 2007, Samuel Ortiz wrote:


I'm quite sure the leak is in the IrDA code rather than in the ppp or
ipv4 one, hence the need for full irda debug...


Well, looks like you were wrong, Samuel. Below is a patch that fixes ONE 
sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
there are more there.


Thanks
Guennadi
-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

Don't leak an sk_buff on interface destruction.

Signed-off-by: G. Liakhovetski [EMAIL PROTECTED]

--- a/drivers/net/ppp_generic.c 2007-03-23 13:04:04.0 +0100
+++ b/drivers/net/ppp_generic.c 2007-03-23 13:05:29.0 +0100
@@ -2544,6 +2544,9 @@
ppp-active_filter = NULL;
 #endif /* CONFIG_PPP_FILTER */

+   if (ppp-xmit_pending)
+   kfree_skb(ppp-xmit_pending);
+
kfree(ppp);
 }

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFC: Established connections hash function

2007-03-23 Thread Nikolaos D. Bougalis

   Let me start off by saying that I hope I didn't come across as 
condenscending in my previous posts. If I did, then it wasn't intended. Now, 
on to more important things :)




jhash_2words(const, const, ((const  16) | $sport) ^ $random)

where $sport is 1-65535 in a loop, and $random is pseudo-random number
obtained on start.


   If you are correct that jhash_3words doesn't properly distribute the 
bits in 'c' (which I don't believe you are, but let's assume it for a 
second) then this function will also be broken:  jhash_2words calls 
jhash_3words; jhash_3words adds (a linear operation) initval and c before 
calling __jhash_mix. So, if there is a problem with passing values under the 
direct control of the attacker into 'c' both jhash_2words and jhash_3words 
are affected; in other words, this variant would also be flawed.




Which is exactly the case of web server and attacker connects to 80 port
from the same IP address and different source ports.

Result with jenkins:
1 23880
2 12108
3 4040
4 1019
5 200
6 30
7 8
8 1

Xor:
1 65536


   I believe that the XOR results, if generated by your test above, are 
somewhat meaningless because you're feeding what is ideal input into the XOR 
hash. Which means that you'll get a perfect distribution. With your input, 
one might as well suggest that using the remote port will give a perfect 
distribution, and it will, but only for that specific input.


   Just for kicks, I went to one of our servers, and did netstat -n | grep 
ESTABLISHED and ended up with 31072 distinct ip:port/ip:port 4-tuples which 
I then hashed into a 65536 bucket table. There are the results; feel free to 
draw your own conclusions:


[ I think this should come out looking good; sorry if whitespace is screwy ]

+---+---+---+---+---+
|   |  xor  | j2w 1 | j2w 2 | j3w 1 |
+---+---+---+---+
| 0 | 40868 | 40930 | 40767 | 40750 |
| 1 | 19208 | 19119 | 19382 | 19413 |
| 2 |  4636 |  4618 |  4576 |  4554 |
| 3 |   716 |   769 |   715 |   734 |
| 4 |99 |91 |87 |76 |
| 5 | 7 | 8 | 9 | 9 |
| 6 | 1 | 1 | 0 | 0 |
| 7 | 1 | 0 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+

xor: the vanilla linux function
j2w 1 is my variant: jhash_2words(laddr + rport, raddr + lport, seed)
j2w 2 is your variant: jhash_2words(laddr, raddr, (rport  16) ^ lport) ^ 
seed)

j3w: jhash_3words(laddr, raddr, (rport  12) + lport, seed)

   The seed used for all the Jenkins hashes came from the low-order 32-bits 
returned by RDTSC, executed when the program started. It remained constant 
throughout the run. 8 runs where made, to ensure that the seed wasn't 
causing weirdness, all runs giving almost identical results. The Jenkins 
hashes did not use the extra 2 right-shifts to fold high-order bits into the 
low-order bits, that is employed by the XOR hash.


   -n 



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Fix use of uninitialized field in mv643xx_eth

2007-03-23 Thread Gabriel Paubert

In this driver, the default ethernet address is first set by by calling
eth_port_uc_addr_get() which reads the relevant registers of the 
corresponding port as initially set by firmware. However that function 
used the port_num field accessed through the private area of net_dev 
before it was set.  

The result was that one board I have ended up with the unicast address 
set to 00:00:00:00:00:00 (only port 1 is connected on this board). The
problem appeared after commit 84dd619e4dc3b0b1c40dafd98c90fd950bce7bc5.

This patch fixes the bug by making eth_port_uc_get_addr() more similar 
to eth_port_uc_set_addr(), i.e., by using the port number as the first
parameter instead of a pointer to struct net_device.

Signed-off-by: Gabriel Paubert [EMAIL PROTECTED]

--

The minimal patch I first tried consisted in just moving mp-port_num
to before the call to eth_port_uc_get_addr(). The other question is why
the driver never gets the info from the device tree on this PPC board,
but that's for another list despite the fact I lost some time looking 
for bugs in the OF interface before stumbling on this use of a field
before it was initialized.


diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 1ee27c3..ca459e0 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -51,7 +51,7 @@
 #include mv643xx_eth.h
 
 /* Static function declarations */
-static void eth_port_uc_addr_get(struct net_device *dev,
+static void eth_port_uc_addr_get(unsigned int port_num, 
unsigned char *MacAddr);
 static void eth_port_set_multicast_list(struct net_device *);
 static void mv643xx_eth_port_enable_tx(unsigned int port_num,
@@ -1382,7 +1382,7 @@ static int mv643xx_eth_probe(struct platform_device *pdev)
port_num = pd-port_number;
 
/* set default config values */
-   eth_port_uc_addr_get(dev, dev-dev_addr);
+   eth_port_uc_addr_get(port_num, dev-dev_addr);
mp-rx_ring_size = MV643XX_ETH_PORT_DEFAULT_RECEIVE_QUEUE_SIZE;
mp-tx_ring_size = MV643XX_ETH_PORT_DEFAULT_TRANSMIT_QUEUE_SIZE;
 
@@ -1883,14 +1883,13 @@ static void eth_port_uc_addr_set(unsigned int 
eth_port_num,
  * N/A.
  *
  */
-static void eth_port_uc_addr_get(struct net_device *dev, unsigned char *p_addr)
+static void eth_port_uc_addr_get(unsigned int port_num, unsigned char *p_addr)
 {
-   struct mv643xx_private *mp = netdev_priv(dev);
unsigned int mac_h;
unsigned int mac_l;
 
-   mac_h = mv_read(MV643XX_ETH_MAC_ADDR_HIGH(mp-port_num));
-   mac_l = mv_read(MV643XX_ETH_MAC_ADDR_LOW(mp-port_num));
+   mac_h = mv_read(MV643XX_ETH_MAC_ADDR_HIGH(port_num));
+   mac_l = mv_read(MV643XX_ETH_MAC_ADDR_LOW(port_num));
 
p_addr[0] = (mac_h  24)  0xff;
p_addr[1] = (mac_h  16)  0xff;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: XOR hash beauty solved [Was: RFC: Established connections hash function]

2007-03-23 Thread Nikolaos D. Bougalis


So, briefly saying, jhash_2/3words have safe distribution, but have
higher-number of elements waves as a result of folding which is
unavoidable for general-purpose hash.


   Thanks for the analysis. 


   -n


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks

2007-03-23 Thread jamal

On Thu, 2007-22-03 at 12:36 +0100, Patrick McHardy wrote:
 jamal wrote:
  On Wed, 2007-21-03 at 15:04 +0100, Patrick McHardy wrote:

 We can remove qdisc_tree_lock since with this patch all changes
 and all tree walking happen under the RTNL. 
 We still need to keep dev-queue_lock for the data path.
 

ok, that would work. Should have been obvious to me.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET_SCHED 00/11]: pkt_sched.h cleanup + misc changes

2007-03-23 Thread Patrick McHardy

These patches fix an off-by-one in netem, clean up pkt_sched.h by removing
most of the now unnecessary PSCHED time macros and turning the two remaining
ones into inline functions, consolidate some common filter destruction code
and move the TCQ_F_THROTTLED optimization from netem to qdisc_restart.

Please apply, thanks.


 include/net/pkt_sched.h   |   24 +++---
 include/net/red.h |   10 +++---
 include/net/sch_generic.h |   10 +-
 net/sched/act_police.c|   17 --
 net/sched/sch_api.c   |   20 ++--
 net/sched/sch_atm.c   |   17 +-
 net/sched/sch_cbq.c   |   76 ++
 net/sched/sch_dsmark.c|8 
 net/sched/sch_generic.c   |4 ++
 net/sched/sch_hfsc.c  |   23 +++--
 net/sched/sch_htb.c   |   24 --
 net/sched/sch_ingress.c   |7 
 net/sched/sch_netem.c |   24 +-
 net/sched/sch_prio.c  |7 
 net/sched/sch_tbf.c   |9 ++---
 15 files changed, 110 insertions(+), 170 deletions(-)

Patrick McHardy (11):
  [NET_SCHED]: sch_netem: fix off-by-one in send time comparison
  [NET_SCHED]: kill PSCHED_AUDIT_TDIFF
  [NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2
  [NET_SCHED]: kill PSCHED_TLESS
  [NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT
  [NET_SCHED]: kill PSCHED_TDIFF
  [NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function
  [NET_SCHED]: turn PSCHED_GET_TIME into inline function
  [NET_SCHED]: Unline tcf_destroy
  [NET_SCHED]: qdisc: remove unnecessary memory barriers
  [NET_SCHED]: qdisc: avoid dequeue while throttled
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET_SCHED 01/11]: sch_netem: fix off-by-one in send time comparison

2007-03-23 Thread Patrick McHardy

[NET_SCHED]: sch_netem: fix off-by-one in send time comparison

netem checks PSCHED_TLESS(cb-time_to_send, now) to find out whether it is
allowed to send a packet, which is equivalent to cb-time_to_send  now.
Use !PSCHED_TLESS(now, cb-time_to_send) instead to properly handle
cb-time_to_send == now.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 9f8c07452088f432c79ac3a8d87d6adebcce57df
tree 42b214f74b8b2d5bd3065e9f63d8048beb4f3bdc
parent 3231f075945001667eafaf325abab8c992b3d1e4
author Patrick McHardy [EMAIL PROTECTED] Thu, 22 Mar 2007 23:57:32 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:26 +0100

 net/sched/sch_netem.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 3e1b633..bc42843 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -286,7 +286,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
/* if more time remaining? */
PSCHED_GET_TIME(now);
 
-   if (PSCHED_TLESS(cb-time_to_send, now)) {
+   if (!PSCHED_TLESS(now, cb-time_to_send)) {
pr_debug(netem_dequeue: return skb=%p\n, skb);
sch-q.qlen--;
return skb;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET_SCHED 02/11]: kill PSCHED_AUDIT_TDIFF

2007-03-23 Thread Patrick McHardy

[NET_SCHED]: kill PSCHED_AUDIT_TDIFF

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 4a4a3d59dca71f202ab063b909c84c96c8ea09a7
tree 4958bfec571a3330bd023ebe50f7b071f6dc7dd7
parent 9f8c07452088f432c79ac3a8d87d6adebcce57df
author Patrick McHardy [EMAIL PROTECTED] Thu, 22 Mar 2007 23:58:12 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:27 +0100

 include/net/pkt_sched.h |1 -
 net/sched/sch_cbq.c |2 --
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 6555e57..276d1ad 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -59,7 +59,6 @@ typedef long  psched_tdiff_t;
 #define PSCHED_TADD(tv, delta) ((tv) += (delta))
 #define PSCHED_SET_PASTPERFECT(t)  ((t) = 0)
 #define PSCHED_IS_PASTPERFECT(t)   ((t) == 0)
-#definePSCHED_AUDIT_TDIFF(t)
 
 struct qdisc_watchdog {
struct hrtimer  timer;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index dcd9c31..57ac6c5 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -820,8 +820,6 @@ cbq_update(struct cbq_sched_data *q)
idle -= L2T(q-link, len);
idle += L2T(cl, len);
 
-   PSCHED_AUDIT_TDIFF(idle);
-
PSCHED_TADD2(q-now, idle, cl-undertime);
} else {
/* Underlimit */
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET_SCHED 05/11]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT

2007-03-23 Thread Patrick McHardy

[NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT

Use direct assignment and comparison instead.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit ec252ac5640ea38d3630cdb97333c398a75391b9
tree bce7b2c63ffb0694942484418f1adf08ed78292d
parent 4f8fc418f88c0b7ee6e726b05f27c42d8e20593c
author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:01:32 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:29 +0100

 include/net/pkt_sched.h |3 +--
 include/net/red.h   |4 ++--
 net/sched/sch_cbq.c |   17 -
 net/sched/sch_netem.c   |2 +-
 4 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 49325ff..c40147a 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -54,8 +54,7 @@ typedef long  psched_tdiff_t;
 #define PSCHED_TDIFF(tv1, tv2) (long)((tv1) - (tv2))
 #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \
min_t(long long, (tv1) - (tv2), bound)
-#define PSCHED_SET_PASTPERFECT(t)  ((t) = 0)
-#define PSCHED_IS_PASTPERFECT(t)   ((t) == 0)
+#define PSCHED_PASTPERFECT 0
 
 struct qdisc_watchdog {
struct hrtimer  timer;
diff --git a/include/net/red.h b/include/net/red.h
index a4eb379..d9e1149 100644
--- a/include/net/red.h
+++ b/include/net/red.h
@@ -151,7 +151,7 @@ static inline void red_set_parms(struct red_parms *p,
 
 static inline int red_is_idling(struct red_parms *p)
 {
-   return !PSCHED_IS_PASTPERFECT(p-qidlestart);
+   return p-qidlestart != PSCHED_PASTPERFECT;
 }
 
 static inline void red_start_of_idle_period(struct red_parms *p)
@@ -161,7 +161,7 @@ static inline void red_start_of_idle_period(struct 
red_parms *p)
 
 static inline void red_end_of_idle_period(struct red_parms *p)
 {
-   PSCHED_SET_PASTPERFECT(p-qidlestart);
+   p-qidlestart = PSCHED_PASTPERFECT;
 }
 
 static inline void red_restart(struct red_parms *p)
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 9e6cdab..2bb271b 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -738,7 +738,7 @@ cbq_update_toplevel(struct cbq_sched_data *q, struct 
cbq_class *cl,
if (cl  q-toplevel = borrowed-level) {
if (cl-q-q.qlen  1) {
do {
-   if (PSCHED_IS_PASTPERFECT(borrowed-undertime)) 
{
+   if (borrowed-undertime == PSCHED_PASTPERFECT) {
q-toplevel = borrowed-level;
return;
}
@@ -824,7 +824,7 @@ cbq_update(struct cbq_sched_data *q)
} else {
/* Underlimit */
 
-   PSCHED_SET_PASTPERFECT(cl-undertime);
+   cl-undertime = PSCHED_PASTPERFECT;
if (avgidle  cl-maxidle)
cl-avgidle = cl-maxidle;
else
@@ -845,7 +845,7 @@ cbq_under_limit(struct cbq_class *cl)
if (cl-tparent == NULL)
return cl;
 
-   if (PSCHED_IS_PASTPERFECT(cl-undertime) || q-now = cl-undertime) {
+   if (cl-undertime == PSCHED_PASTPERFECT || q-now = cl-undertime) {
cl-delayed = 0;
return cl;
}
@@ -868,8 +868,7 @@ cbq_under_limit(struct cbq_class *cl)
}
if (cl-level  q-toplevel)
return NULL;
-   } while (!PSCHED_IS_PASTPERFECT(cl-undertime) 
-q-now  cl-undertime);
+   } while (cl-undertime != PSCHED_PASTPERFECT  q-now  cl-undertime);
 
cl-delayed = 0;
return cl;
@@ -1054,11 +1053,11 @@ cbq_dequeue(struct Qdisc *sch)
*/
 
if (q-toplevel == TC_CBQ_MAXLEVEL 
-   PSCHED_IS_PASTPERFECT(q-link.undertime))
+   q-link.undertime == PSCHED_PASTPERFECT)
break;
 
q-toplevel = TC_CBQ_MAXLEVEL;
-   PSCHED_SET_PASTPERFECT(q-link.undertime);
+   q-link.undertime = PSCHED_PASTPERFECT;
}
 
/* No packets in scheduler or nobody wants to give them to us :-(
@@ -1289,7 +1288,7 @@ cbq_reset(struct Qdisc* sch)
qdisc_reset(cl-q);
 
cl-next_alive = NULL;
-   PSCHED_SET_PASTPERFECT(cl-undertime);
+   cl-undertime = PSCHED_PASTPERFECT;
cl-avgidle = cl-maxidle;
cl-deficit = cl-quantum;
cl-cpriority = cl-priority;
@@ -1650,7 +1649,7 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
cl-xstats.avgidle = cl-avgidle;
cl-xstats.undertime = 0;
 
-   if (!PSCHED_IS_PASTPERFECT(cl-undertime))
+   if (cl-undertime != PSCHED_PASTPERFECT)
cl-xstats.undertime = PSCHED_TDIFF(cl-undertime,

[NET_SCHED 07/11]: turn PSCHED_TDIFF_SAFE into inline function

2007-03-23 Thread Patrick McHardy

[NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function

Also rename to psched_tdiff_bounded.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit c86b236046f7de4094ceb2b2cb069c32969ee36c
tree 27c99a0d619bcabf384838adeae3c0469472b86b
parent d72d57707edf96c31e62da0841faf59c011dcd92
author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:01:59 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:30 +0100

 include/net/pkt_sched.h |8 ++--
 include/net/red.h   |2 +-
 net/sched/act_police.c  |8 
 net/sched/sch_htb.c |4 ++--
 net/sched/sch_tbf.c |2 +-
 5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 1639737..e6b1da0 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -51,10 +51,14 @@ typedef longpsched_tdiff_t;
 #define PSCHED_GET_TIME(stamp) \
((stamp) = PSCHED_NS2US(ktime_to_ns(ktime_get(
 
-#define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \
-   min_t(long long, (tv1) - (tv2), bound)
 #define PSCHED_PASTPERFECT 0
 
+static inline psched_tdiff_t
+psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, psched_time_t bound)
+{
+   return min(tv1 - tv2, bound);
+}
+
 struct qdisc_watchdog {
struct hrtimer  timer;
struct Qdisc*qdisc;
diff --git a/include/net/red.h b/include/net/red.h
index d9e1149..0bc1691 100644
--- a/include/net/red.h
+++ b/include/net/red.h
@@ -178,7 +178,7 @@ static inline unsigned long 
red_calc_qavg_from_idle_time(struct red_parms *p)
int  shift;
 
PSCHED_GET_TIME(now);
-   us_idle = PSCHED_TDIFF_SAFE(now, p-qidlestart, p-Scell_max);
+   us_idle = psched_tdiff_bounded(now, p-qidlestart, p-Scell_max);
 
/*
 * The problem: ideally, average length queue recalcultion should
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 0a5679e..65d60a3 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -298,8 +298,8 @@ static int tcf_act_police(struct sk_buff *skb, struct 
tc_action *a,
 
PSCHED_GET_TIME(now);
 
-   toks = PSCHED_TDIFF_SAFE(now, police-tcfp_t_c,
-police-tcfp_burst);
+   toks = psched_tdiff_bounded(now, police-tcfp_t_c,
+   police-tcfp_burst);
if (police-tcfp_P_tab) {
ptoks = toks + police-tcfp_ptoks;
if (ptoks  (long)L2T_P(police, police-tcfp_mtu))
@@ -544,8 +544,8 @@ int tcf_police(struct sk_buff *skb, struct tcf_police 
*police)
}
 
PSCHED_GET_TIME(now);
-   toks = PSCHED_TDIFF_SAFE(now, police-tcfp_t_c,
-police-tcfp_burst);
+   toks = psched_tdiff_bounded(now, police-tcfp_t_c,
+   police-tcfp_burst);
if (police-tcfp_P_tab) {
ptoks = toks + police-tcfp_ptoks;
if (ptoks  (long)L2T_P(police, police-tcfp_mtu))
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index d265ac4..f629ce2 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -729,7 +729,7 @@ static void htb_charge_class(struct htb_sched *q, struct 
htb_class *cl,
cl-T = toks
 
while (cl) {
-   diff = PSCHED_TDIFF_SAFE(q-now, cl-t_c, (u32) cl-mbuffer);
+   diff = psched_tdiff_bounded(q-now, cl-t_c, cl-mbuffer);
if (cl-level = level) {
if (cl-level == level)
cl-xstats.lends++;
@@ -789,7 +789,7 @@ static psched_time_t htb_do_events(struct htb_sched *q, int 
level)
return cl-pq_key;
 
htb_safe_rb_erase(p, q-wait_pq + level);
-   diff = PSCHED_TDIFF_SAFE(q-now, cl-t_c, (u32) cl-mbuffer);
+   diff = psched_tdiff_bounded(q-now, cl-t_c, cl-mbuffer);
htb_change_class_mode(q, cl, diff);
if (cl-cmode != HTB_CAN_SEND)
htb_add_to_wait_tree(q, cl, diff);
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 626ce96..da9f40e 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -201,7 +201,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc* sch)
 
PSCHED_GET_TIME(now);
 
-   toks = PSCHED_TDIFF_SAFE(now, q-t_c, q-buffer);
+   toks = psched_tdiff_bounded(now, q-t_c, q-buffer);
 
if (q-P_tab) {
ptoks = toks + q-ptokens;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET_SCHED 06/11]: kill PSCHED_TDIFF

2007-03-23 Thread Patrick McHardy

[NET_SCHED]: kill PSCHED_TDIFF

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit d72d57707edf96c31e62da0841faf59c011dcd92
tree 8b6192c94e025fb8b6e1be3b02526d4792bd4fa1
parent ec252ac5640ea38d3630cdb97333c398a75391b9
author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:01:47 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:29 +0100

 include/net/pkt_sched.h |1 -
 net/sched/sch_cbq.c |   14 +++---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index c40147a..1639737 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -51,7 +51,6 @@ typedef long  psched_tdiff_t;
 #define PSCHED_GET_TIME(stamp) \
((stamp) = PSCHED_NS2US(ktime_to_ns(ktime_get(
 
-#define PSCHED_TDIFF(tv1, tv2) (long)((tv1) - (tv2))
 #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \
min_t(long long, (tv1) - (tv2), bound)
 #define PSCHED_PASTPERFECT 0
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 2bb271b..f9e8403 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -386,7 +386,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct 
cbq_class *cl)
psched_tdiff_t incr;
 
PSCHED_GET_TIME(now);
-   incr = PSCHED_TDIFF(now, q-now_rt);
+   incr = now - q-now_rt;
now = q-now + incr;
 
do {
@@ -474,7 +474,7 @@ cbq_requeue(struct sk_buff *skb, struct Qdisc *sch)
 static void cbq_ovl_classic(struct cbq_class *cl)
 {
struct cbq_sched_data *q = qdisc_priv(cl-qdisc);
-   psched_tdiff_t delay = PSCHED_TDIFF(cl-undertime, q-now);
+   psched_tdiff_t delay = cl-undertime - q-now;
 
if (!cl-delayed) {
delay += cl-offtime;
@@ -509,7 +509,7 @@ static void cbq_ovl_classic(struct cbq_class *cl)
psched_tdiff_t base_delay = q-wd_expires;
 
for (b = cl-borrow; b; b = b-borrow) {
-   delay = PSCHED_TDIFF(b-undertime, q-now);
+   delay = b-undertime - q-now;
if (delay  base_delay) {
if (delay = 0)
delay = 1;
@@ -547,7 +547,7 @@ static void cbq_ovl_rclassic(struct cbq_class *cl)
 static void cbq_ovl_delay(struct cbq_class *cl)
 {
struct cbq_sched_data *q = qdisc_priv(cl-qdisc);
-   psched_tdiff_t delay = PSCHED_TDIFF(cl-undertime, q-now);
+   psched_tdiff_t delay = cl-undertime - q-now;
 
if (!cl-delayed) {
psched_time_t sched = q-now;
@@ -776,7 +776,7 @@ cbq_update(struct cbq_sched_data *q)
 idle = (now - last) - last_pktlen/rate
 */
 
-   idle = PSCHED_TDIFF(q-now, cl-last);
+   idle = q-now - cl-last;
if ((unsigned long)idle  128*1024*1024) {
avgidle = cl-maxidle;
} else {
@@ -1004,7 +1004,7 @@ cbq_dequeue(struct Qdisc *sch)
psched_tdiff_t incr;
 
PSCHED_GET_TIME(now);
-   incr = PSCHED_TDIFF(now, q-now_rt);
+   incr = now - q-now_rt;
 
if (q-tx_class) {
psched_tdiff_t incr2;
@@ -1650,7 +1650,7 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg,
cl-xstats.undertime = 0;
 
if (cl-undertime != PSCHED_PASTPERFECT)
-   cl-xstats.undertime = PSCHED_TDIFF(cl-undertime, q-now);
+   cl-xstats.undertime = cl-undertime - q-now;
 
if (gnet_stats_copy_basic(d, cl-bstats)  0 ||
 #ifdef CONFIG_NET_ESTIMATOR
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET_SCHED 09/11]: Unline tcf_destroy

2007-03-23 Thread Patrick McHardy

[NET_SCHED]: Unline tcf_destroy

Uninline tcf_destroy and add a helper function to destroy an entire filter
chain.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 8da4bcec7e54c8344c8fd77c72a61f24ce12cfc3
tree 7f36f4af8e9413637fb9b65501f281fd8a915da3
parent 231788aa3b9eef85b72ecac2e33441bd842ce3f4
author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:31 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:31 +0100

 include/net/sch_generic.h |   10 ++
 net/sched/sch_api.c   |   18 ++
 net/sched/sch_atm.c   |   17 ++---
 net/sched/sch_cbq.c   |   14 ++
 net/sched/sch_dsmark.c|8 +---
 net/sched/sch_hfsc.c  |   13 +
 net/sched/sch_htb.c   |   14 ++
 net/sched/sch_ingress.c   |7 +--
 net/sched/sch_prio.c  |7 +--
 9 files changed, 30 insertions(+), 78 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a3f4ddd..1b8e351 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -177,14 +177,8 @@ extern void qdisc_tree_decrease_qlen(struct Qdisc *qdisc, 
unsigned int n);
 extern struct Qdisc *qdisc_alloc(struct net_device *dev, struct Qdisc_ops 
*ops);
 extern struct Qdisc *qdisc_create_dflt(struct net_device *dev,
   struct Qdisc_ops *ops, u32 parentid);
-
-static inline void
-tcf_destroy(struct tcf_proto *tp)
-{
-   tp-ops-destroy(tp);
-   module_put(tp-ops-owner);
-   kfree(tp);
-}
+extern void tcf_destroy(struct tcf_proto *tp);
+extern void tcf_destroy_chain(struct tcf_proto *fl);
 
 static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
   struct sk_buff_head *list)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 5873250..5b5bce0 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1220,6 +1220,24 @@ reclassify:
return -1;
 }
 
+void tcf_destroy(struct tcf_proto *tp)
+{
+   tp-ops-destroy(tp);
+   module_put(tp-ops-owner);
+   kfree(tp);
+}
+
+void tcf_destroy_chain(struct tcf_proto *fl)
+{
+   struct tcf_proto *tp;
+
+   while ((tp = fl) != NULL) {
+   fl = tp-next;
+   tcf_destroy(tp);
+   }
+}
+EXPORT_SYMBOL(tcf_destroy_chain);
+
 #ifdef CONFIG_PROC_FS
 static int psched_show(struct seq_file *seq, void *v)
 {
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index 0cc3c9b..be7d299 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -158,19 +158,6 @@ static unsigned long atm_tc_bind_filter(struct Qdisc *sch,
return atm_tc_get(sch,classid);
 }
 
-
-static void destroy_filters(struct atm_flow_data *flow)
-{
-   struct tcf_proto *filter;
-
-   while ((filter = flow-filter_list)) {
-   DPRINTK(destroy_filters: destroying filter %p\n,filter);
-   flow-filter_list = filter-next;
-   tcf_destroy(filter);
-   }
-}
-
-
 /*
  * atm_tc_put handles all destructions, including the ones that are explicitly
  * requested (atm_tc_destroy, etc.). The assumption here is that we never drop
@@ -195,7 +182,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
*prev = flow-next;
DPRINTK(atm_tc_put: qdisc %p\n,flow-q);
qdisc_destroy(flow-q);
-   destroy_filters(flow);
+   tcf_destroy_chain(flow-filter_list);
if (flow-sock) {
DPRINTK(atm_tc_put: f_count %d\n,
file_count(flow-sock-file));
@@ -611,7 +598,7 @@ static void atm_tc_destroy(struct Qdisc *sch)
DPRINTK(atm_tc_destroy(sch %p,[qdisc %p])\n,sch,p);
/* races ? */
while ((flow = p-flows)) {
-   destroy_filters(flow);
+   tcf_destroy_chain(flow-filter_list);
if (flow-ref  1)
printk(KERN_ERR atm_destroy: %p-ref = %d\n,flow,
flow-ref);
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 414a97c..a294542 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1717,23 +1717,13 @@ static unsigned long cbq_get(struct Qdisc *sch, u32 
classid)
return 0;
 }
 
-static void cbq_destroy_filters(struct cbq_class *cl)
-{
-   struct tcf_proto *tp;
-
-   while ((tp = cl-filter_list) != NULL) {
-   cl-filter_list = tp-next;
-   tcf_destroy(tp);
-   }
-}
-
 static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
 {
struct cbq_sched_data *q = qdisc_priv(sch);
 
BUG_TRAP(!cl-filters);
 
-   cbq_destroy_filters(cl);
+   tcf_destroy_chain(cl-filter_list);
qdisc_destroy(cl-q);
qdisc_put_rtab(cl-R_tab);
 #ifdef CONFIG_NET_ESTIMATOR
@@ -1760,7 +1750,7 @@ cbq_destroy(struct Qdisc* sch)
 */
for (h = 0; h  16; h++)
for (cl = q-classes[h]; cl; cl = cl-next)
-

[NET_SCHED 03/11]: kill PSCHED_TADD/PSCHED_TADD2

2007-03-23 Thread Patrick McHardy

[NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 145a1a6010c6b852ffab28c110d8911a6161aa8b
tree 84b7bf284ea3b870a9b5fd9dae3adaad9979dc26
parent 4a4a3d59dca71f202ab063b909c84c96c8ea09a7
author Patrick McHardy [EMAIL PROTECTED] Thu, 22 Mar 2007 23:58:42 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:27 +0100

 include/net/pkt_sched.h |2 --
 net/sched/sch_cbq.c |   12 ++--
 net/sched/sch_netem.c   |2 +-
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 276d1ad..32cdf01 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -55,8 +55,6 @@ typedef long  psched_tdiff_t;
 #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \
min_t(long long, (tv1) - (tv2), bound)
 #define PSCHED_TLESS(tv1, tv2) ((tv1)  (tv2))
-#define PSCHED_TADD2(tv, delta, tv_res) ((tv_res) = (tv) + (delta))
-#define PSCHED_TADD(tv, delta) ((tv) += (delta))
 #define PSCHED_SET_PASTPERFECT(t)  ((t) = 0)
 #define PSCHED_IS_PASTPERFECT(t)   ((t) == 0)
 
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 57ac6c5..290b26b 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -387,7 +387,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct 
cbq_class *cl)
 
PSCHED_GET_TIME(now);
incr = PSCHED_TDIFF(now, q-now_rt);
-   PSCHED_TADD2(q-now, incr, now);
+   now = q-now + incr;
 
do {
if (PSCHED_TLESS(cl-undertime, now)) {
@@ -492,7 +492,7 @@ static void cbq_ovl_classic(struct cbq_class *cl)
cl-avgidle = cl-minidle;
if (delay = 0)
delay = 1;
-   PSCHED_TADD2(q-now, delay, cl-undertime);
+   cl-undertime = q-now + delay;
 
cl-xstats.overactions++;
cl-delayed = 1;
@@ -558,7 +558,7 @@ static void cbq_ovl_delay(struct cbq_class *cl)
delay -= (-cl-avgidle) - ((-cl-avgidle)  
cl-ewma_log);
if (cl-avgidle  cl-minidle)
cl-avgidle = cl-minidle;
-   PSCHED_TADD2(q-now, delay, cl-undertime);
+   cl-undertime = q-now + delay;
 
if (delay  0) {
sched += delay + cl-penalty;
@@ -820,7 +820,7 @@ cbq_update(struct cbq_sched_data *q)
idle -= L2T(q-link, len);
idle += L2T(cl, len);
 
-   PSCHED_TADD2(q-now, idle, cl-undertime);
+   cl-undertime = q-now + idle;
} else {
/* Underlimit */
 
@@ -1018,12 +1018,12 @@ cbq_dequeue(struct Qdisc *sch)
   cbq_time = max(real_time, work);
 */
incr2 = L2T(q-link, q-tx_len);
-   PSCHED_TADD(q-now, incr2);
+   q-now += incr2;
cbq_update(q);
if ((incr -= incr2)  0)
incr = 0;
}
-   PSCHED_TADD(q-now, incr);
+   q-now += incr;
q-now_rt = now;
 
for (;;) {
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index bc42843..6044ae7 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -218,7 +218,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc 
*sch)
  q-delay_cor, q-delay_dist);
 
PSCHED_GET_TIME(now);
-   PSCHED_TADD2(now, delay, cb-time_to_send);
+   cb-time_to_send = now + delay;
++q-counter;
ret = q-qdisc-enqueue(skb, q-qdisc);
} else {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET_SCHED 08/11]: turn PSCHED_GET_TIME into inline function

2007-03-23 Thread Patrick McHardy

[NET_SCHED]: turn PSCHED_GET_TIME into inline function

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 231788aa3b9eef85b72ecac2e33441bd842ce3f4
tree f302e509ec32a86bc9a6c3712d188fc91455a213
parent c86b236046f7de4094ceb2b2cb069c32969ee36c
author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:02:12 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:30 +0100

 include/net/pkt_sched.h |8 +---
 include/net/red.h   |4 ++--
 net/sched/act_police.c  |9 -
 net/sched/sch_cbq.c |   10 +-
 net/sched/sch_hfsc.c|   10 --
 net/sched/sch_htb.c |6 +++---
 net/sched/sch_netem.c   |8 +++-
 net/sched/sch_tbf.c |7 +++
 8 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index e6b1da0..b2cc9a8 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -48,11 +48,13 @@ typedef longpsched_tdiff_t;
 #define PSCHED_NS2US(x)((x)  10)
 
 #define PSCHED_TICKS_PER_SEC   PSCHED_NS2US(NSEC_PER_SEC)
-#define PSCHED_GET_TIME(stamp) \
-   ((stamp) = PSCHED_NS2US(ktime_to_ns(ktime_get(
-
 #define PSCHED_PASTPERFECT 0
 
+static inline psched_time_t psched_get_time(void)
+{
+   return PSCHED_NS2US(ktime_to_ns(ktime_get()));
+}
+
 static inline psched_tdiff_t
 psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, psched_time_t bound)
 {
diff --git a/include/net/red.h b/include/net/red.h
index 0bc1691..3cf31d4 100644
--- a/include/net/red.h
+++ b/include/net/red.h
@@ -156,7 +156,7 @@ static inline int red_is_idling(struct red_parms *p)
 
 static inline void red_start_of_idle_period(struct red_parms *p)
 {
-   PSCHED_GET_TIME(p-qidlestart);
+   p-qidlestart = psched_get_time();
 }
 
 static inline void red_end_of_idle_period(struct red_parms *p)
@@ -177,7 +177,7 @@ static inline unsigned long 
red_calc_qavg_from_idle_time(struct red_parms *p)
long us_idle;
int  shift;
 
-   PSCHED_GET_TIME(now);
+   now = psched_get_time();
us_idle = psched_tdiff_bounded(now, p-qidlestart, p-Scell_max);
 
/*
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 65d60a3..616f465 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -241,7 +241,7 @@ override:
if (ret != ACT_P_CREATED)
return ret;
 
-   PSCHED_GET_TIME(police-tcfp_t_c);
+   police-tcfp_t_c = psched_get_time();
police-tcf_index = parm-index ? parm-index :
tcf_hash_new_index(police_idx_gen, police_hash_info);
h = tcf_hash(police-tcf_index, POL_TAB_MASK);
@@ -296,8 +296,7 @@ static int tcf_act_police(struct sk_buff *skb, struct 
tc_action *a,
return police-tcfp_result;
}
 
-   PSCHED_GET_TIME(now);
-
+   now = psched_get_time();
toks = psched_tdiff_bounded(now, police-tcfp_t_c,
police-tcfp_burst);
if (police-tcfp_P_tab) {
@@ -495,7 +494,7 @@ struct tcf_police *tcf_police_locate(struct rtattr *rta, 
struct rtattr *est)
}
if (police-tcfp_P_tab)
police-tcfp_ptoks = L2T_P(police, police-tcfp_mtu);
-   PSCHED_GET_TIME(police-tcfp_t_c);
+   police-tcfp_t_c = psched_get_time();
police-tcf_index = parm-index ? parm-index :
tcf_police_new_index();
police-tcf_action = parm-action;
@@ -543,7 +542,7 @@ int tcf_police(struct sk_buff *skb, struct tcf_police 
*police)
return police-tcfp_result;
}
 
-   PSCHED_GET_TIME(now);
+   now = psched_get_time();
toks = psched_tdiff_bounded(now, police-tcfp_t_c,
police-tcfp_burst);
if (police-tcfp_P_tab) {
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f9e8403..414a97c 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -385,7 +385,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct 
cbq_class *cl)
psched_time_t now;
psched_tdiff_t incr;
 
-   PSCHED_GET_TIME(now);
+   now = psched_get_time();
incr = now - q-now_rt;
now = q-now + incr;
 
@@ -654,7 +654,7 @@ static enum hrtimer_restart cbq_undelay(struct hrtimer 
*timer)
psched_tdiff_t delay = 0;
unsigned pmask;
 
-   PSCHED_GET_TIME(now);
+   now = psched_get_time();
 
pmask = q-pmask;
q-pmask = 0;
@@ -1003,7 +1003,7 @@ cbq_dequeue(struct Qdisc *sch)
psched_time_t now;
psched_tdiff_t incr;
 
-   PSCHED_GET_TIME(now);
+   now = psched_get_time();
incr = now - q-now_rt;
 
if (q-tx_class) {
@@ -1277,7 +1277,7 @@ cbq_reset(struct Qdisc* sch)
qdisc_watchdog_cancel(q-watchdog);

[NET_SCHED 04/11]: kill PSCHED_TLESS

2007-03-23 Thread Patrick McHardy

[NET_SCHED]: kill PSCHED_TLESS

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 4f8fc418f88c0b7ee6e726b05f27c42d8e20593c
tree c70508f2e0174aef42aaf99bf0cef4184d7ed07e
parent 145a1a6010c6b852ffab28c110d8911a6161aa8b
author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:00:55 +0100
committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:28 +0100

 include/net/pkt_sched.h |1 -
 net/sched/sch_cbq.c |7 +++
 net/sched/sch_netem.c   |6 +++---
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 32cdf01..49325ff 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -54,7 +54,6 @@ typedef long  psched_tdiff_t;
 #define PSCHED_TDIFF(tv1, tv2) (long)((tv1) - (tv2))
 #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \
min_t(long long, (tv1) - (tv2), bound)
-#define PSCHED_TLESS(tv1, tv2) ((tv1)  (tv2))
 #define PSCHED_SET_PASTPERFECT(t)  ((t) = 0)
 #define PSCHED_IS_PASTPERFECT(t)   ((t) == 0)
 
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 290b26b..9e6cdab 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -390,7 +390,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct 
cbq_class *cl)
now = q-now + incr;
 
do {
-   if (PSCHED_TLESS(cl-undertime, now)) {
+   if (cl-undertime  now) {
q-toplevel = cl-level;
return;
}
@@ -845,8 +845,7 @@ cbq_under_limit(struct cbq_class *cl)
if (cl-tparent == NULL)
return cl;
 
-   if (PSCHED_IS_PASTPERFECT(cl-undertime) ||
-   !PSCHED_TLESS(q-now, cl-undertime)) {
+   if (PSCHED_IS_PASTPERFECT(cl-undertime) || q-now = cl-undertime) {
cl-delayed = 0;
return cl;
}
@@ -870,7 +869,7 @@ cbq_under_limit(struct cbq_class *cl)
if (cl-level  q-toplevel)
return NULL;
} while (!PSCHED_IS_PASTPERFECT(cl-undertime) 
-PSCHED_TLESS(q-now, cl-undertime));
+q-now  cl-undertime);
 
cl-delayed = 0;
return cl;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 6044ae7..5d571aa 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -286,7 +286,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
/* if more time remaining? */
PSCHED_GET_TIME(now);
 
-   if (!PSCHED_TLESS(now, cb-time_to_send)) {
+   if (cb-time_to_send = now) {
pr_debug(netem_dequeue: return skb=%p\n, skb);
sch-q.qlen--;
return skb;
@@ -494,7 +494,7 @@ static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc 
*sch)
 
if (likely(skb_queue_len(list)  q-limit)) {
/* Optimize for add at tail */
-   if (likely(skb_queue_empty(list) || !PSCHED_TLESS(tnext, 
q-oldest))) {
+   if (likely(skb_queue_empty(list) || tnext = q-oldest)) {
q-oldest = tnext;
return qdisc_enqueue_tail(nskb, sch);
}
@@ -503,7 +503,7 @@ static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc 
*sch)
const struct netem_skb_cb *cb
= (const struct netem_skb_cb *)skb-cb;
 
-   if (!PSCHED_TLESS(tnext, cb-time_to_send))
+   if (tnext = cb-time_to_send)
break;
}
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] remove NLA_STRING NUL trimming

2007-03-23 Thread Johannes Berg

Looking through the netlink/attr.c code I noticed that NLA_STRING
attributes that end with a binary NUL have it removed before passing it
to the consumer.

For wireless, we have a few places where we need to be able to accept
any (even binary) values, for example for the SSID; the SSID can validly
end with \0 and I'd still love to be able to take advantage of
NLA_STRING and .len = 32 so I don't need to check the length myself.
However, given the code above, an SSID with a terminating \0 would be
reduced by one character.

This patch removes the trimming.

Signed-off-by: Johannes Berg [EMAIL PROTECTED]

---
This shouldn't break things if all users that rely on terminating NULs
have migrated to NLA_NUL_STRING already. I don't see many users of
NLA_STRING still, but if we can't make that change because some users
still rely on it trimming the NUL I could also make a patch that
introduces NLA_BIN_STRING with the changed semantics.

--- wireless-dev.orig/net/netlink/attr.c2007-03-23 00:06:41.293435409 
+0100
+++ wireless-dev/net/netlink/attr.c 2007-03-23 00:07:13.753435409 +0100
@@ -56,15 +56,8 @@ static int validate_nla(struct nlattr *n
if (attrlen  1)
return -ERANGE;
 
-   if (pt-len) {
-   char *buf = nla_data(nla);
-
-   if (buf[attrlen - 1] == '\0')
-   attrlen--;
-
-   if (attrlen  pt-len)
-   return -ERANGE;
-   }
+   if (pt-len  attrlen  pt-len)
+   return -ERANGE;
break;
 
default:


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATH 0/6] New SCTP functionality for 2.6.22

2007-03-23 Thread Vlad Yasevich


This patch series implements additional SCTP socket options.  This
was originally submitted too late for 2.6.21, so I am re-submitting
for 2.6.22.

Please consider applying.

Thanks
-vlad
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] [SCTP] Implement SCTP_PARTIAL_DELIVERY_POINT option.

2007-03-23 Thread Vlad Yasevich

This option induces partial delivery to run as soon
as the specified amount of data has been accumulated on
the association.  However, we give preference to fully
reassembled messages over PD messages.  In any case,
window and buffer is freed up.

Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
---
 include/net/sctp/structs.h |1 +
 include/net/sctp/user.h|2 +
 net/sctp/socket.c  |   57 +++
 net/sctp/ulpqueue.c|   64 +---
 4 files changed, 120 insertions(+), 4 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 6883c7d..f4bb396 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -307,6 +307,7 @@ struct sctp_sock {
__u8 v4mapped;
__u8 frag_interleave;
__u32 adaptation_ind;
+   __u32 pd_point;
 
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index e773160..9a83527 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -99,6 +99,8 @@ enum sctp_optname {
 #define SCTP_CONTEXT SCTP_CONTEXT
SCTP_FRAGMENT_INTERLEAVE,
 #define SCTP_FRAGMENT_INTERLEAVE SCTP_FRAGMENT_INTERLEAVE
+   SCTP_PARTIAL_DELIVERY_POINT,/* Set/Get partial delivery point */
+#define SCTP_PARTIAL_DELIVERY_POINT SCTP_PARTIAL_DELIVERY_POINT
 
/* Internal Socket Options. Some of the sctp library functions are 
 * implemented using these socket options.
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 912073d..2d0c2ee 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2826,6 +2826,32 @@ static int sctp_setsockopt_fragment_interleave(struct 
sock *sk,
return 0;
 }
 
+/*
+ * 7.1.25.  Set or Get the sctp partial delivery point
+ *   (SCTP_PARTIAL_DELIVERY_POINT)
+ * This option will set or get the SCTP partial delivery point.  This
+ * point is the size of a message where the partial delivery API will be
+ * invoked to help free up rwnd space for the peer.  Setting this to a
+ * lower value will cause partial delivery's to happen more often.  The
+ * calls argument is an integer that sets or gets the partial delivery
+ * point.
+ */
+static int sctp_setsockopt_partial_delivery_point(struct sock *sk,
+ char __user *optval,
+ int optlen)
+{
+   u32 val;
+
+   if (optlen != sizeof(u32))
+   return -EINVAL;
+   if (get_user(val, (int __user *)optval))
+   return -EFAULT;
+
+   sctp_sk(sk)-pd_point = val;
+
+   return 0; /* is this the right error code? */
+}
+
 /* API 6.2 setsockopt(), getsockopt()
  *
  * Applications use setsockopt() and getsockopt() to set or retrieve
@@ -2905,6 +2931,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int 
level, int optname,
case SCTP_DELAYED_ACK_TIME:
retval = sctp_setsockopt_delayed_ack_time(sk, optval, optlen);
break;
+   case SCTP_PARTIAL_DELIVERY_POINT:
+   retval = sctp_setsockopt_partial_delivery_point(sk, optval, 
optlen);
+   break;
 
case SCTP_INITMSG:
retval = sctp_setsockopt_initmsg(sk, optval, optlen);
@@ -4596,6 +4625,30 @@ static int sctp_getsockopt_fragment_interleave(struct 
sock *sk, int len,
return 0;
 }
 
+/*
+ * 7.1.25.  Set or Get the sctp partial delivery point
+ * (chapter and verse is quoted at sctp_setsockopt_partial_delivery_point())
+ */
+static int sctp_getsockopt_partial_delivery_point(struct sock *sk, int len,
+ char __user *optval,
+ int __user *optlen)
+{
+u32 val;
+
+   if (len  sizeof(u32))
+   return -EINVAL;
+
+   len = sizeof(u32);
+
+   val = sctp_sk(sk)-pd_point;
+   if (put_user(len, optlen))
+   return -EFAULT;
+   if (copy_to_user(optval, val, len))
+   return -EFAULT;
+
+   return -ENOTSUPP;
+}
+
 SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
 {
@@ -4712,6 +4765,10 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int 
level, int optname,
retval = sctp_getsockopt_fragment_interleave(sk, len, optval,
 optlen);
break;
+   case SCTP_PARTIAL_DELIVERY_POINT:
+   retval = sctp_getsockopt_partial_delivery_point(sk, len, optval,
+   optlen);
+   break;
default:
retval = -ENOPROTOOPT;
break;
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index 896e834..6f64b15 100644
--- a/net/sctp/ulpqueue.c

[PATCH 5/6] [SCTP] Implement sac_info field in SCTP_ASSOC_CHANGE notification.

2007-03-23 Thread Vlad Yasevich

As stated in the sctp socket api draft:

   sac_info: variable

   If the sac_state is SCTP_COMM_LOST and an ABORT chunk was received
   for this association, sac_info[] contains the complete ABORT chunk as
   defined in the SCTP specification RFC2960 [RFC2960] section 3.3.7.

We now save received ABORT chunks into the sac_info field and pass that
to the user.

Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
---
 include/net/sctp/ulpevent.h |1 +
 include/net/sctp/user.h |1 +
 net/sctp/sm_sideeffect.c|   11 +++--
 net/sctp/sm_statefuns.c |   14 ++--
 net/sctp/ulpevent.c |   49 --
 5 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/include/net/sctp/ulpevent.h b/include/net/sctp/ulpevent.h
index 2923e3d..de88ed5 100644
--- a/include/net/sctp/ulpevent.h
+++ b/include/net/sctp/ulpevent.h
@@ -89,6 +89,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_assoc_change(
__u16 error,
__u16 outbound,
__u16 inbound,
+   struct sctp_chunk *chunk,
gfp_t gfp);
 
 struct sctp_ulpevent *sctp_ulpevent_make_peer_addr_change(
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 80b7afe..1b3153c 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -217,6 +217,7 @@ struct sctp_assoc_change {
__u16 sac_outbound_streams;
__u16 sac_inbound_streams;
sctp_assoc_t sac_assoc_id;
+   __u8 sac_info[0];
 };
 
 /*
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 1355674..0a1a197 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -464,7 +464,7 @@ static void sctp_cmd_init_failed(sctp_cmd_seq_t *commands,
struct sctp_ulpevent *event;
 
event = sctp_ulpevent_make_assoc_change(asoc,0, SCTP_CANT_STR_ASSOC,
-   (__u16)error, 0, 0,
+   (__u16)error, 0, 0, NULL,
GFP_ATOMIC);
 
if (event)
@@ -492,8 +492,13 @@ static void sctp_cmd_assoc_failed(sctp_cmd_seq_t *commands,
/* Cancel any partial delivery in progress. */
sctp_ulpq_abort_pd(asoc-ulpq, GFP_ATOMIC);
 
-   event = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_COMM_LOST,
-   (__u16)error, 0, 0,
+   if (event_type == SCTP_EVENT_T_CHUNK  subtype.chunk == SCTP_CID_ABORT)
+   event = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_COMM_LOST,
+   (__u16)error, 0, 0, chunk,
+   GFP_ATOMIC);
+   else
+   event = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_COMM_LOST,
+   (__u16)error, 0, 0, NULL,
GFP_ATOMIC);
if (event)
sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index c85b517..cceaf90 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -186,7 +186,7 @@ sctp_disposition_t sctp_sf_do_4_C(const struct 
sctp_endpoint *ep,
 * notification is passed to the upper layer.
 */
ev = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_SHUTDOWN_COMP,
-0, 0, 0, GFP_ATOMIC);
+0, 0, 0, NULL, GFP_ATOMIC);
if (ev)
sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
SCTP_ULPEVENT(ev));
@@ -661,7 +661,7 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const struct 
sctp_endpoint *ep,
ev = sctp_ulpevent_make_assoc_change(new_asoc, 0, SCTP_COMM_UP, 0,
 new_asoc-c.sinit_num_ostreams,
 new_asoc-c.sinit_max_instreams,
-GFP_ATOMIC);
+NULL, GFP_ATOMIC);
if (!ev)
goto nomem_ev;
 
@@ -790,7 +790,7 @@ sctp_disposition_t sctp_sf_do_5_1E_ca(const struct 
sctp_endpoint *ep,
ev = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_COMM_UP,
 0, asoc-c.sinit_num_ostreams,
 asoc-c.sinit_max_instreams,
-GFP_ATOMIC);
+NULL, GFP_ATOMIC);
 
if (!ev)
goto nomem;
@@ -1625,7 +1625,7 @@ static sctp_disposition_t sctp_sf_do_dupcook_a(const 
struct sctp_endpoint *ep,
ev = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_RESTART, 0,
 new_asoc-c.sinit_num_ostreams,
 new_asoc-c.sinit_max_instreams,
-GFP_ATOMIC);

[PATCH 4/6] [SCTP] Honor flags when setting peer address parameters

2007-03-23 Thread Vlad Yasevich

Parameters only take effect when a corresponding flag bit is set
and a value is specified. This means we need to check the flags
in addition to checking for non-zero value.

Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
---
 include/net/sctp/user.h |   15 +++--
 net/sctp/socket.c   |   54 ++
 2 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 4ed7521..80b7afe 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -513,16 +513,17 @@ struct sctp_setadaptation {
  *   address's parameters:
  */
 enum  sctp_spp_flags {
-   SPP_HB_ENABLE = 1,  /*Enable heartbeats*/
-   SPP_HB_DISABLE = 2, /*Disable heartbeats*/
+   SPP_HB_ENABLE = 10,   /*Enable heartbeats*/
+   SPP_HB_DISABLE = 11,  /*Disable heartbeats*/
SPP_HB = SPP_HB_ENABLE | SPP_HB_DISABLE,
-   SPP_HB_DEMAND = 4,  /*Send heartbeat immediately*/
-   SPP_PMTUD_ENABLE = 8,   /*Enable PMTU discovery*/
-   SPP_PMTUD_DISABLE = 16, /*Disable PMTU discovery*/
+   SPP_HB_DEMAND = 12,   /*Send heartbeat immediately*/
+   SPP_PMTUD_ENABLE = 13,/*Enable PMTU discovery*/
+   SPP_PMTUD_DISABLE = 14,   /*Disable PMTU discovery*/
SPP_PMTUD = SPP_PMTUD_ENABLE | SPP_PMTUD_DISABLE,
-   SPP_SACKDELAY_ENABLE = 32,  /*Enable SACK*/
-   SPP_SACKDELAY_DISABLE = 64, /*Disable SACK*/
+   SPP_SACKDELAY_ENABLE = 15,/*Enable SACK*/
+   SPP_SACKDELAY_DISABLE = 16,   /*Disable SACK*/
SPP_SACKDELAY = SPP_SACKDELAY_ENABLE | SPP_SACKDELAY_DISABLE,
+   SPP_HB_TIME_IS_ZERO = 17, /* Set HB delay to 0 */
 };
 
 struct sctp_paddrparams {
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 2d0c2ee..8939536 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2033,6 +2033,10 @@ static int sctp_setsockopt_autoclose(struct sock *sk, 
char __user *optval,
  * SPP_HB_DEMAND - Request a user initiated heartbeat
  * to be made immediately.
  *
+ * SPP_HB_TIME_IS_ZERO - Specify's that the time for
+ * heartbeat delayis to be set to the value of 0
+ * milliseconds.
+ *
  * SPP_PMTUD_ENABLE - This field will enable PMTU
  * discovery upon the specified address. Note that
  * if the address feild is empty then all addresses
@@ -2075,13 +2079,30 @@ static int sctp_apply_peer_addr_params(struct 
sctp_paddrparams *params,
return error;
}
 
-   if (params-spp_hbinterval) {
-   if (trans) {
-   trans-hbinterval = 
msecs_to_jiffies(params-spp_hbinterval);
-   } else if (asoc) {
-   asoc-hbinterval = 
msecs_to_jiffies(params-spp_hbinterval);
-   } else {
-   sp-hbinterval = params-spp_hbinterval;
+   /* Note that unless the spp_flag is set to SPP_HB_ENABLE the value of
+* this field is ignored.  Note also that a value of zero indicates
+* the current setting should be left unchanged.
+*/
+   if (params-spp_flags  SPP_HB_ENABLE) { 
+
+   /* Re-zero the interval if the SPP_HB_TIME_IS_ZERO is
+* set.  This lets us use 0 value when this flag
+* is set.
+*/
+   if (params-spp_flags  SPP_HB_TIME_IS_ZERO)
+   params-spp_hbinterval = 0;
+
+   if (params-spp_hbinterval ||
+   (params-spp_flags  SPP_HB_TIME_IS_ZERO)) {
+   if (trans) {
+   trans-hbinterval =
+   msecs_to_jiffies(params-spp_hbinterval);
+   } else if (asoc) {
+   asoc-hbinterval =
+   msecs_to_jiffies(params-spp_hbinterval);
+   } else {
+   sp-hbinterval = params-spp_hbinterval;
+   }
}
}
 
@@ -2098,7 +2119,12 @@ static int sctp_apply_peer_addr_params(struct 
sctp_paddrparams *params,
}
}
 
-   if (params-spp_pathmtu) {
+   /* When Path MTU discovery is disabled the value specified here will
+* be the fixed path mtu (i.e. the value of the spp_flags field must
+* include the flag SPP_PMTUD_DISABLE for this field to have any
+* effect).
+*/
+   if ((params-spp_flags  SPP_PMTUD_DISABLE)  params-spp_pathmtu) {
if (trans) {
trans-pathmtu = params-spp_pathmtu;
sctp_assoc_sync_pmtu(asoc);
@@ -2129,7 +2155,11 @@ static int sctp_apply_peer_addr_params(struct 
sctp_paddrparams *params,
}
}
 
-

[PATCH 3/6] [SCTP]: Implement SCTP_ADDR_CONFIRMED state for ADDR_CHNAGE event

2007-03-23 Thread Vlad Yasevich

Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
---
 include/net/sctp/user.h |1 +
 net/sctp/associola.c|   10 +-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 9a83527..4ed7521 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -265,6 +265,7 @@ enum sctp_spc_state {
SCTP_ADDR_REMOVED,
SCTP_ADDR_ADDED,
SCTP_ADDR_MADE_PRIM,
+   SCTP_ADDR_CONFIRMED,
 };
 
 
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index fa82b73..294be94 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -714,8 +714,16 @@ void sctp_assoc_control_transport(struct sctp_association 
*asoc,
/* Record the transition on the transport.  */
switch (command) {
case SCTP_TRANSPORT_UP:
+   /* If we are moving from UNCONFIRMED state due
+* to heartbeat success, report the SCTP_ADDR_CONFIRMED
+* state to the user, otherwise report SCTP_ADDR_AVAILABLE.
+*/
+   if (SCTP_UNCONFIRMED == transport-state 
+   SCTP_HEARTBEAT_SUCCESS == error)
+   spc_state = SCTP_ADDR_CONFIRMED;
+   else
+   spc_state = SCTP_ADDR_AVAILABLE;
transport-state = SCTP_ACTIVE;
-   spc_state = SCTP_ADDR_AVAILABLE;
break;
 
case SCTP_TRANSPORT_DOWN:
-- 
1.5.0.3.438.gc49b2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] [SCTP] Implement SCTP_MAX_BURST socket option.

2007-03-23 Thread Vlad Yasevich

Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
---
 include/net/sctp/constants.h |2 +-
 include/net/sctp/structs.h   |1 +
 include/net/sctp/user.h  |2 +
 net/sctp/associola.c |2 +-
 net/sctp/protocol.c  |2 +-
 net/sctp/socket.c|   61 ++
 6 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 5ddb855..bb37724 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -283,7 +283,7 @@ enum { SCTP_MAX_GABS = 16 };
 #define SCTP_RTO_BETA   2   /* 1/4 when converted to right shifts. */
 
 /* Maximum number of new data packets that can be sent in a burst.  */
-#define SCTP_MAX_BURST 4
+#define SCTP_DEFAULT_MAX_BURST 4
 
 #define SCTP_CLOCK_GRANULARITY 1   /* 1 jiffy */
 
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index f4bb396..8135815 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -276,6 +276,7 @@ struct sctp_sock {
__u32 default_context;
__u32 default_timetolive;
__u32 default_rcv_context;
+   int max_burst;
 
/* Heartbeat interval: The endpoint sends out a Heartbeat chunk to
 * the destination address every heartbeat interval. This value
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 1b3153c..6d2b577 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -101,6 +101,8 @@ enum sctp_optname {
 #define SCTP_FRAGMENT_INTERLEAVE SCTP_FRAGMENT_INTERLEAVE
SCTP_PARTIAL_DELIVERY_POINT,/* Set/Get partial delivery point */
 #define SCTP_PARTIAL_DELIVERY_POINT SCTP_PARTIAL_DELIVERY_POINT
+   SCTP_MAX_BURST, /* Set/Get max burst */
+#define SCTP_MAX_BURST SCTP_MAX_BURST
 
/* Internal Socket Options. Some of the sctp library functions are 
 * implemented using these socket options.
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 294be94..2f61d58 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -143,7 +143,7 @@ static struct sctp_association 
*sctp_association_init(struct sctp_association *a
/* Initialize the maximum mumber of new data packets that can be sent
 * in a burst.
 */
-   asoc-max_burst = sctp_max_burst;
+   asoc-max_burst = sp-max_burst;
 
/* initialize association timers */
asoc-timeouts[SCTP_EVENT_TIMEOUT_NONE] = 0;
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 7c28c9b..c361deb 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1042,7 +1042,7 @@ SCTP_STATIC __init int sctp_init(void)
sctp_cookie_preserve_enable = 1;
 
/* Max.Burst- 4 */
-   sctp_max_burst  = SCTP_MAX_BURST;
+   sctp_max_burst  = SCTP_DEFAULT_MAX_BURST;
 
/* Association.Max.Retrans  - 10 attempts
 * Path.Max.Retrans - 5  attempts (per destination address)
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 8939536..e45cff4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2886,6 +2886,36 @@ static int sctp_setsockopt_partial_delivery_point(struct 
sock *sk,
return 0; /* is this the right error code? */
 }
 
+/*
+ * 7.1.28.  Set or Get the maximum burst (SCTP_MAX_BURST)
+ *
+ * This option will allow a user to change the maximum burst of packets
+ * that can be emitted by this association.  Note that the default value
+ * is 4, and some implementations may restrict this setting so that it
+ * can only be lowered.
+ *
+ * NOTE: This text doesn't seem right.  Do this on a socket basis with
+ * future associations inheriting the socket value.
+ */
+static int sctp_setsockopt_maxburst(struct sock *sk,
+   char __user *optval,
+   int optlen)
+{
+   int val;
+
+   if (optlen != sizeof(int))
+   return -EINVAL;
+   if (get_user(val, (int __user *)optval))
+   return -EFAULT;
+
+   if (val  0)
+   return -EINVAL;
+
+   sctp_sk(sk)-max_burst = val;
+
+   return 0;
+}
+
 /* API 6.2 setsockopt(), getsockopt()
  *
  * Applications use setsockopt() and getsockopt() to set or retrieve
@@ -3006,6 +3036,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int 
level, int optname,
case SCTP_FRAGMENT_INTERLEAVE:
retval = sctp_setsockopt_fragment_interleave(sk, optval, 
optlen);
break;
+   case SCTP_MAX_BURST:
+   retval = sctp_setsockopt_maxburst(sk, optval, optlen);
+   break;
default:
retval = -ENOPROTOOPT;
break;
@@ -3165,6 +3198,7 @@ SCTP_STATIC int sctp_init_sock(struct sock *sk)
sp-default_timetolive = 0;
 
sp-default_rcv_context = 0;
+   sp-max_burst = sctp_max_burst;
 
/* Initialize default

[PATCH 1/6] [SCTP] Implement SCTP_FRAGMENT_INTERLEAVE socket option

2007-03-23 Thread Vlad Yasevich

This option was introduced in draft-ietf-tsvwg-sctpsocket-13.  It
prevents head-of-line blocking in the case of one-to-many endpoint.
Applications enabling this option really must enable SCTP_SNDRCV event
so that they would know where the data belongs.  Based on an
earlier patch by Ivan Skytte Jørgensen.

Additionally, this functionality now permits multiple associations
on the same endpoint to enter Partial Delivery.  Applications should
be extra careful, when using this functionality, to track EOR indicators.

Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]
---
 include/net/sctp/structs.h  |3 +-
 include/net/sctp/ulpqueue.h |2 +-
 include/net/sctp/user.h |4 +-
 net/sctp/socket.c   |   84 +---
 net/sctp/ulpqueue.c |   88 --
 5 files changed, 150 insertions(+), 31 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 31a8e88..6883c7d 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -304,10 +304,11 @@ struct sctp_sock {
__u32 autoclose;
__u8 nodelay;
__u8 disable_fragments;
-   __u8 pd_mode;
__u8 v4mapped;
+   __u8 frag_interleave;
__u32 adaptation_ind;
 
+   atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
struct sk_buff_head pd_lobby;
 };
diff --git a/include/net/sctp/ulpqueue.h b/include/net/sctp/ulpqueue.h
index a43c878..3421b19 100644
--- a/include/net/sctp/ulpqueue.h
+++ b/include/net/sctp/ulpqueue.h
@@ -77,7 +77,7 @@ void sctp_ulpq_partial_delivery(struct sctp_ulpq *, struct 
sctp_chunk *, gfp_t);
 void sctp_ulpq_abort_pd(struct sctp_ulpq *, gfp_t);
 
 /* Clear the partial data delivery condition on this socket. */
-int sctp_clear_pd(struct sock *sk);
+int sctp_clear_pd(struct sock *sk, struct sctp_association *asoc);
 
 /* Skip over an SSN. */
 void sctp_ulpq_skip(struct sctp_ulpq *ulpq, __u16 sid, __u16 ssn);
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 67a30eb..e773160 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -97,6 +97,8 @@ enum sctp_optname {
 #define SCTP_DELAYED_ACK_TIME SCTP_DELAYED_ACK_TIME
SCTP_CONTEXT,   /* Receive Context */
 #define SCTP_CONTEXT SCTP_CONTEXT
+   SCTP_FRAGMENT_INTERLEAVE,
+#define SCTP_FRAGMENT_INTERLEAVE SCTP_FRAGMENT_INTERLEAVE
 
/* Internal Socket Options. Some of the sctp library functions are 
 * implemented using these socket options.
@@ -530,7 +532,7 @@ struct sctp_paddrparams {
__u32   spp_flags;
 } __attribute__((packed, aligned(4)));
 
-/* 7.1.24. Delayed Ack Timer (SCTP_DELAYED_ACK_TIME)
+/* 7.1.23. Delayed Ack Timer (SCTP_DELAYED_ACK_TIME)
  *
  *   This options will get or set the delayed ack timer.  The time is set
  *   in milliseconds.  If the assoc_id is 0, then this sets or gets the
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 536298c..912073d 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2249,7 +2249,7 @@ static int sctp_setsockopt_peer_addr_params(struct sock 
*sk,
return 0;
 }
 
-/* 7.1.24. Delayed Ack Timer (SCTP_DELAYED_ACK_TIME)
+/* 7.1.23. Delayed Ack Timer (SCTP_DELAYED_ACK_TIME)
  *
  *   This options will get or set the delayed ack timer.  The time is set
  *   in milliseconds.  If the assoc_id is 0, then this sets or gets the
@@ -2786,6 +2786,46 @@ static int sctp_setsockopt_context(struct sock *sk, char 
__user *optval,
return 0;
 }
 
+/*
+ * 7.1.24.  Get or set fragmented interleave (SCTP_FRAGMENT_INTERLEAVE)
+ *
+ * This options will at a minimum specify if the implementation is doing
+ * fragmented interleave.  Fragmented interleave, for a one to many
+ * socket, is when subsequent calls to receive a message may return
+ * parts of messages from different associations.  Some implementations
+ * may allow you to turn this value on or off.  If so, when turned off,
+ * no fragment interleave will occur (which will cause a head of line
+ * blocking amongst multiple associations sharing the same one to many
+ * socket).  When this option is turned on, then each receive call may
+ * come from a different association (thus the user must receive data
+ * with the extended calls (e.g. sctp_recvmsg) to keep track of which
+ * association each receive belongs to.
+ *
+ * This option takes a boolean value.  A non-zero value indicates that
+ * fragmented interleave is on.  A value of zero indicates that
+ * fragmented interleave is off.
+ *
+ * Note that it is important that an implementation that allows this
+ * option to be turned on, have it off by default.  Otherwise an unaware
+ * application using the one to many model may become confused and act
+ * incorrectly.
+ */
+static int sctp_setsockopt_fragment_interleave(struct sock *sk,
+  char __user *optval,
+

Re: routing question under invisible bridge

2007-03-23 Thread Lennert Buytenhek

On Thu, Mar 22, 2007 at 03:52:55PM -0500, Bin He wrote:

 Dear sir,

Hi,


 I found your email address from kernel bridge source codes. I would
 appreciate if you could look into my question a little bit.

The netdev@ mailing list is a better forum to ask such questions,
I've CC'ed this email there.


 I have an invisible bridge (br0) which contains eth0 and eth1. None
 of them have an IP address because I want to it to be transparent to
 the existing network. So there is no entries in kernel routing table.

If you have an IP address assigned to br0, your kernel will likely have
(at least) one entry in its routing table even if you didn't put any
routes in there yourself.


 The problem is how does it handle the routing, i.e., which eth
 interface will a packet be sent to?

(The decision which bridge sub-device to send a packet to isn't
called 'routing', as it doesn't involve an IP routing decision --
that decision has already been made at that point.)


 For example, I can create a packet and bind it to a device by
 SO_BINDTODEVICE socket option. I did some tests and found:
 1) if the socket is bound to eth0 or eth1, the packet cannot be sent out.
 2) if the socket is bound to br0, it seems that the packet is only
 sent out to eth0.

Check out your system's ARP table (run /sbin/arp) and your br0
bridge's MAC address table (run 'brctl showmacs br0' or something
like that.)

When your machine wants to communicate with a remote IP address, it
first sends an ARP packet to figure out what the ethernet address is
that corresponds to that remote IP address.

When your machine then sends an IP packet on the br0 interface to that
ethernet address, the bridge code checks the MAC address table to find
out whether to send it to eth0 or eth1 (if the MAC address is a known
MAC address) or to both (if we have never seen the MAC address before
or if it has timed out.)


 So is there a way to send out a packet on a particular device?

I'm not sure exactly what you are trying to do?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

83 matches

Mail list logo