Re: [RFC] remove NLA_STRING NUL trimming
* Johannes Berg [EMAIL PROTECTED] 2007-03-23 00:12 Looking through the netlink/attr.c code I noticed that NLA_STRING attributes that end with a binary NUL have it removed before passing it to the consumer. It's not really removed, the trailing NUL is just ignored when checking the length of the attribute. This is needed for older netlink families where strings are not always NUL terminated, yet we still need to accept the additional byte needed in case it is present. This validation is strictly necessary, otherwise nla_strcmp() and others will fail. For wireless, we have a few places where we need to be able to accept any (even binary) values, for example for the SSID; the SSID can validly end with \0 and I'd still love to be able to take advantage of NLA_STRING and .len = 32 so I don't need to check the length myself. However, given the code above, an SSID with a terminating \0 would be reduced by one character. I suggest that you introduce NLA_BINARY which enforces a maximum length. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET_SCHED 11/11]: qdisc: avoid dequeue while throttled
Patrick McHardy wrote: [NET_SCHED]: qdisc: avoid dequeue while throttled It just occured to me that this doesn't work properly with qdiscs that have multiple classes since they all don't properly maintain the TCQ_F_THROTTLED flag. They set it on dequeue when no active class is willing to give out packets, but when enqueueing to a non-active class (thereby activating it) it is still set even though we don't know if that class could be dequeued. So this updated patch unsets the TCQ_F_THROTTLED flag whenever we activate a class. Additionally it removes the unsetting of TCQ_F_THROTTLED on successful dequeue since we're now guaranteed that it was not set before. [NET_SCHED]: qdisc: avoid dequeue while throttled Avoid dequeueing while the device is throttled. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 073456c84a46736a3aa1ae4cc9d953a9e97b327c tree 805a29224001180c88a429e65812b97a489c427a parent e2459acd7dee06fb4d5e980f26c23d31db0e5de1 author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 15:37:51 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 15:37:51 +0100 net/sched/sch_cbq.c |5 +++-- net/sched/sch_generic.c |4 net/sched/sch_hfsc.c|5 +++-- net/sched/sch_htb.c |6 -- net/sched/sch_netem.c |4 net/sched/sch_tbf.c |1 - 6 files changed, 14 insertions(+), 11 deletions(-) diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index a294542..151f8e3 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -424,8 +424,10 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch) sch-bstats.packets++; sch-bstats.bytes+=len; cbq_mark_toplevel(q, cl); - if (!cl-next_alive) + if (!cl-next_alive) { cbq_activate_class(cl); + sch-flags = ~TCQ_F_THROTTLED; + } return ret; } @@ -1030,7 +1032,6 @@ cbq_dequeue(struct Qdisc *sch) skb = cbq_dequeue_1(sch); if (skb) { sch-q.qlen--; - sch-flags = ~TCQ_F_THROTTLED; return skb; } diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 52eb343..39c5312 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -93,6 +93,10 @@ static inline int qdisc_restart(struct net_device *dev) struct Qdisc *q = dev-qdisc; struct sk_buff *skb; + smp_rmb(); + if (q-flags TCQ_F_THROTTLED) + return q-q.qlen; + /* Dequeue packet */ if (((skb = dev-gso_skb)) || ((skb = q-dequeue(q { unsigned nolock = (dev-features NETIF_F_LLTX); diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c index 22cec11..c6da436 100644 --- a/net/sched/sch_hfsc.c +++ b/net/sched/sch_hfsc.c @@ -1597,8 +1597,10 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch) return err; } - if (cl-qdisc-q.qlen == 1) + if (cl-qdisc-q.qlen == 1) { set_active(cl, len); + sch-flags = ~TCQ_F_THROTTLED; + } cl-bstats.packets++; cl-bstats.bytes += len; @@ -1672,7 +1674,6 @@ hfsc_dequeue(struct Qdisc *sch) } out: - sch-flags = ~TCQ_F_THROTTLED; sch-q.qlen--; return skb; diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 71db121..1387b7b 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -615,6 +615,8 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch) /* enqueue to helper queue */ if (q-direct_queue.qlen q-direct_qlen) { __skb_queue_tail(q-direct_queue, skb); + if (q-direct_queue.qlen == 1) +sch-flags = ~TCQ_F_THROTTLED; q-direct_pkts++; } else { kfree_skb(skb); @@ -637,6 +639,8 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch) cl-bstats.packets++; cl-bstats.bytes += skb-len; htb_activate(q, cl); + if (cl-un.leaf.q-q.qlen == 1) + sch-flags = ~TCQ_F_THROTTLED; } sch-q.qlen++; @@ -958,7 +962,6 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch) /* try to dequeue direct packets as high prio (!) to minimize cpu work */ skb = __skb_dequeue(q-direct_queue); if (skb != NULL) { - sch-flags = ~TCQ_F_THROTTLED; sch-q.qlen--; return skb; } @@ -991,7 +994,6 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch) skb = htb_dequeue_tree(q, prio, level); if (likely(skb != NULL)) { sch-q.qlen--; -sch-flags = ~TCQ_F_THROTTLED; goto fin; } } diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 5d9d8bc..4c7a8d8 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -273,10 +273,6 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) struct netem_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; - smp_mb(); - if (sch-flags TCQ_F_THROTTLED) - return NULL; - skb = q-qdisc-dequeue(q-qdisc); if (skb) { const struct netem_skb_cb *cb diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index 5386295..ed7e581 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -218,7 +218,6 @@ static struct sk_buff *tbf_dequeue(struct Qdisc* sch) q-tokens = toks; q-ptokens = ptoks; sch-q.qlen--; - sch-flags = ~TCQ_F_THROTTLED; return skb;
Re: [RFC] remove NLA_STRING NUL trimming
On Fri, 2007-03-23 at 15:20 +0100, Thomas Graf wrote: It's not really removed, the trailing NUL is just ignored when checking the length of the attribute. Good point. This is needed for older netlink families where strings are not always NUL terminated, yet we still need to accept the additional byte needed in case it is present. This validation is strictly necessary, otherwise nla_strcmp() and others will fail. Ok. For wireless, we have a few places where we need to be able to accept any (even binary) values, for example for the SSID; the SSID can validly end with \0 and I'd still love to be able to take advantage of NLA_STRING and .len = 32 so I don't need to check the length myself. However, given the code above, an SSID with a terminating \0 would be reduced by one character. I suggest that you introduce NLA_BINARY which enforces a maximum length. Alright, I'll post a patch in a bit. johannes signature.asc Description: This is a digitally signed message part
Re: [NET_SCHED 11/11]: qdisc: avoid dequeue while throttled
Patrick McHardy wrote: [NET_SCHED]: qdisc: avoid dequeue while throttled It just occured to me that this doesn't work properly with qdiscs that have multiple classes since they all don't properly maintain the TCQ_F_THROTTLED flag. They set it on dequeue when no active class is willing to give out packets, but when enqueueing to a non-active class (thereby activating it) it is still set even though we don't know if that class could be dequeued. So this updated patch unsets the TCQ_F_THROTTLED flag whenever we activate a class. Additionally it removes the unsetting of TCQ_F_THROTTLED on successful dequeue since we're now guaranteed that it was not set before. I found another case that doesn't work properly, so let me retract this patch until I've properly thought this through. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netlink: introduce NLA_BINARY type
This patch introduces a new NLA_BINARY attribute policy type with the verification of simply checking the maximum length of the payload. It also fixes a small typo in the example. Signed-off-by: Johannes Berg [EMAIL PROTECTED] Cc: Thomas Graf [EMAIL PROTECTED] Cc: netdev@vger.kernel.org --- include/net/netlink.h |4 +++- net/netlink/attr.c|5 + 2 files changed, 8 insertions(+), 1 deletion(-) --- linux-2.6.orig/include/net/netlink.h2007-03-23 15:45:52.932598534 +0100 +++ linux-2.6/include/net/netlink.h 2007-03-23 15:46:25.962598534 +0100 @@ -171,6 +171,7 @@ enum { NLA_MSECS, NLA_NESTED, NLA_NUL_STRING, + NLA_BINARY, __NLA_TYPE_MAX, }; @@ -188,12 +189,13 @@ enum { *NLA_STRING Maximum length of string *NLA_NUL_STRING Maximum length of string (excluding NUL) *NLA_FLAG Unused + *NLA_BINARY Maximum length of attribute payload *All otherExact length of attribute payload * * Example: * static struct nla_policy my_policy[ATTR_MAX+1] __read_mostly = { * [ATTR_FOO] = { .type = NLA_U16 }, - * [ATTR_BAR] = { .type = NLA_STRING, len = BARSIZ }, + * [ATTR_BAR] = { .type = NLA_STRING, .len = BARSIZ }, * [ATTR_BAZ] = { .len = sizeof(struct mystruct) }, * }; */ --- linux-2.6.orig/net/netlink/attr.c 2007-03-23 15:46:53.112598534 +0100 +++ linux-2.6/net/netlink/attr.c2007-03-23 15:48:12.902598534 +0100 @@ -67,6 +67,11 @@ static int validate_nla(struct nlattr *n } break; + case NLA_BINARY: + if (pt-len attrlen pt-len) + return -ERANGE; + break; + default: if (pt-len) minlen = pt-len; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netlink: introduce NLA_BINARY type
* Johannes Berg [EMAIL PROTECTED] 2007-03-23 16:02 This patch introduces a new NLA_BINARY attribute policy type with the verification of simply checking the maximum length of the payload. It also fixes a small typo in the example. Signed-off-by: Johannes Berg [EMAIL PROTECTED] Cc: Thomas Graf [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Signed-off-by: Thomas Graf [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix use of uninitialized field in mv643xx_eth
On Fri, Mar 23, 2007 at 01:30:02PM +0100, Gabriel Paubert wrote: In this driver, the default ethernet address is first set by by calling eth_port_uc_addr_get() which reads the relevant registers of the corresponding port as initially set by firmware. However that function used the port_num field accessed through the private area of net_dev before it was set. Gabriel, you're right. I introduced the bug and I'm sorry for your trouble. The result was that one board I have ended up with the unicast address set to 00:00:00:00:00:00 (only port 1 is connected on this board). The problem appeared after commit 84dd619e4dc3b0b1c40dafd98c90fd950bce7bc5. This patch fixes the bug by making eth_port_uc_get_addr() more similar to eth_port_uc_set_addr(), i.e., by using the port number as the first parameter instead of a pointer to struct net_device. Signed-off-by: Gabriel Paubert [EMAIL PROTECTED] -- The minimal patch I first tried consisted in just moving mp-port_num to before the call to eth_port_uc_get_addr(). Hmm. That should have fixed it. I reproduced the problem here and this fixed it for me: diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c index 1ee27c3..643ea31 100644 --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -1379,7 +1379,7 @@ #endif spin_lock_init(mp-lock); - port_num = pd-port_number; + port_num = mp-port_num = pd-port_number; /* set default config values */ eth_port_uc_addr_get(dev, dev-dev_addr); @@ -1411,8 +1411,6 @@ #endif duplex = pd-duplex; speed = pd-speed; - mp-port_num = port_num; - /* Hook up MII support for ethtool */ mp-mii.dev = dev; mp-mii.mdio_read = mv643xx_mdio_read; Would you please confirm that this fixes it for you? If so, I'll submit it upstream as coming from you, since you did all the work. OK? The other question is why the driver never gets the info from the device tree on this PPC board, but that's for another list despite the fact I lost some time looking for bugs in the OF interface before stumbling on this use of a field before it was initialized. Probably just because the mac address in the hardware was correct and it didn't seem necessary to overwrite it. Thank you, -Dale - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][SCTP]: Update SCTP Maintainers entry
Dave, I have asked Vlad Yasevich to take over the role of primary maintainer of SCTP and he has accepted it. He has been contributing to SCTP for more than 2 years and has become more active than me in the past year. Thanks Sridhar [SCTP]: Update SCTP Maintainers entry Add Vlad Yasevich as the primary maintainer of SCTP and add a link to the project website. Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED] --- MAINTAINERS |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 6d8d5b9..d4bfb9d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2928,9 +2928,12 @@ L: linux-scsi@vger.kernel.org S: Maintained SCTP PROTOCOL +P: Vlad Yasevich +M: [EMAIL PROTECTED] P: Sridhar Samudrala M: [EMAIL PROTECTED] L: [EMAIL PROTECTED] +W: http://lksctp.sourceforge.net S: Supported SCx200 CPU SUPPORT - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET_SCHED 01/11]: sch_netem: fix off-by-one in send time comparison
On Fri, 23 Mar 2007 14:35:40 +0100 (MET) Patrick McHardy [EMAIL PROTECTED] wrote: [NET_SCHED]: sch_netem: fix off-by-one in send time comparison netem checks PSCHED_TLESS(cb-time_to_send, now) to find out whether it is allowed to send a packet, which is equivalent to cb-time_to_send now. Use !PSCHED_TLESS(now, cb-time_to_send) instead to properly handle cb-time_to_send == now. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] Thanks, I saw that earlier in another spot and fixed it. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add security check before flushing SAD/SPD
On Fri, 2007-03-23 at 01:39 -0400, Eric Paris wrote: In either case though proper auditing needs to be addressed. I see that the first patch from Joy wouldn't audit deletion failures. It appears to me if the check is done per policy then the security hook return code needs to be recorded and passed to xfrm_audit_log instead of the hard coded 1 result used now. Assuming we go with James's double loop what should we be auditing for a security hook denial? Just audit the first policy entry which we tried to remove but couldn't and then leave the rest of the auditing in those functions the way it is now in case there was no denial, calling xfrm_audit_log with a hard coded 1 for the result? Actually, I thought the original intent of the ipsec auditing was to just audit changes made to the SAD/SPD databases, not securiy hook denials, right? Joy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add security check before flushing SAD/SPD
On Fri, 2007-03-23 at 10:33 -0600, Joy Latten wrote: On Fri, 2007-03-23 at 01:39 -0400, Eric Paris wrote: In either case though proper auditing needs to be addressed. I see that the first patch from Joy wouldn't audit deletion failures. It appears to me if the check is done per policy then the security hook return code needs to be recorded and passed to xfrm_audit_log instead of the hard coded 1 result used now. Assuming we go with James's double loop what should we be auditing for a security hook denial? Just audit the first policy entry which we tried to remove but couldn't and then leave the rest of the auditing in those functions the way it is now in case there was no denial, calling xfrm_audit_log with a hard coded 1 for the result? Actually, I thought the original intent of the ipsec auditing was to just audit changes made to the SAD/SPD databases, not securiy hook denials, right? Then what is the point of the 'result' field that we capture and log in xfrm_audit_log if the only things you care to audit are successful changes to the databases? -Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NAPI support for Sibyte MAC
[ This is a re-post, but the patch still applies and works fine against the linux-mips.org tip. We'd really like to get this in. -Mark] This patch completes the NAPI functionality for SB1250 MAC, including making NAPI a kernel option that can be turned on or off and adds the sbmac_poll routine. Signed off by: Mark Mason ([EMAIL PROTECTED]) Signed off by: Dan Krejsa ([EMAIL PROTECTED]) Signed off by: Steve Yang ([EMAIL PROTECTED]) Index: linux-2.6.14-cgl/drivers/net/Kconfig === --- linux-2.6.14-cgl.orig/drivers/net/Kconfig 2006-09-20 14:58:54.0 -0700 +++ linux-2.6.14-cgl/drivers/net/Kconfig2006-09-20 17:04:31.0 -0700 @@ -2031,6 +2031,23 @@ tristate SB1250 Ethernet support depends on SIBYTE_SB1xxx_SOC +config SBMAC_NAPI + bool SBMAC: Use Rx Polling (NAPI) (EXPERIMENTAL) + depends on NET_SB1250_MAC EXPERIMENTAL + help + NAPI is a new driver API designed to reduce CPU and interrupt load + when the driver is receiving lots of packets from the card. It is + still somewhat experimental and thus not yet enabled by default. + + If your estimated Rx load is 10kpps or more, or if the card will be + deployed on potentially unfriendly networks (e.g. in a firewall), + then say Y here. + + See file:Documentation/networking/NAPI_HOWTO.txt for more + information. + + If in doubt, say y. + config R8169_VLAN bool VLAN support depends on R8169 VLAN_8021Q @@ -2826,3 +2843,5 @@ def_bool NETPOLL endmenu + + Index: linux-2.6.14-cgl/drivers/net/sb1250-mac.c === --- linux-2.6.14-cgl.orig/drivers/net/sb1250-mac.c 2006-09-20 14:59:00.0 -0700 +++ linux-2.6.14-cgl/drivers/net/sb1250-mac.c 2006-09-20 20:16:27.0 -0700 @@ -95,19 +95,28 @@ #endif #ifdef CONFIG_SBMAC_COALESCE -static int int_pktcnt = 0; -module_param(int_pktcnt, int, S_IRUGO); -MODULE_PARM_DESC(int_pktcnt, Packet count); - -static int int_timeout = 0; -module_param(int_timeout, int, S_IRUGO); -MODULE_PARM_DESC(int_timeout, Timeout value); +static int int_pktcnt_tx = 255; +module_param(int_pktcnt_tx, int, S_IRUGO); +MODULE_PARM_DESC(int_pktcnt_tx, TX packet count); + +static int int_timeout_tx = 255; +module_param(int_timeout_tx, int, S_IRUGO); +MODULE_PARM_DESC(int_timeout_tx, TX timeout value); + +static int int_pktcnt_rx = 64; +module_param(int_pktcnt_rx, int, S_IRUGO); +MODULE_PARM_DESC(int_pktcnt_rx, RX packet count); + +static int int_timeout_rx = 64; +module_param(int_timeout_rx, int, S_IRUGO); +MODULE_PARM_DESC(int_timeout_rx, RX timeout value); #endif #include asm/sibyte/sb1250.h #if defined(CONFIG_SIBYTE_BCM1x55) || defined(CONFIG_SIBYTE_BCM1x80) #include asm/sibyte/bcm1480_regs.h #include asm/sibyte/bcm1480_int.h +#define R_MAC_DMA_OODPKTLOST_RXR_MAC_DMA_OODPKTLOST #elif defined(CONFIG_SIBYTE_SB1250) || defined(CONFIG_SIBYTE_BCM112X) #include asm/sibyte/sb1250_regs.h #include asm/sibyte/sb1250_int.h @@ -155,8 +164,8 @@ #define NUMCACHEBLKS(x) (((x)+SMP_CACHE_BYTES-1)/SMP_CACHE_BYTES) -#define SBMAC_MAX_TXDESCR 32 -#define SBMAC_MAX_RXDESCR 32 +#define SBMAC_MAX_TXDESCR 256 +#define SBMAC_MAX_RXDESCR 256 #define ETHER_ALIGN2 #define ETHER_ADDR_LEN 6 @@ -185,10 +194,10 @@ * associated with it. */ - struct sbmac_softc *sbdma_eth; /* back pointer to associated MAC */ - int sbdma_channel; /* channel number */ + struct sbmac_softc *sbdma_eth; /* back pointer to associated MAC */ + int sbdma_channel; /* channel number */ int sbdma_txdir; /* direction (1=transmit) */ - int sbdma_maxdescr;/* total # of descriptors in ring */ + int sbdma_maxdescr;/* total # of descriptors in ring */ #ifdef CONFIG_SBMAC_COALESCE int sbdma_int_pktcnt; /* # descriptors rx/tx before interrupt*/ int sbdma_int_timeout; /* # usec rx/tx interrupt */ @@ -197,13 +206,16 @@ volatile void __iomem *sbdma_config0; /* DMA config register 0 */ volatile void __iomem *sbdma_config1; /* DMA config register 1 */ volatile void __iomem *sbdma_dscrbase; /* Descriptor base address */ - volatile void __iomem *sbdma_dscrcnt; /* Descriptor count register */ + volatile void __iomem *sbdma_dscrcnt; /* Descriptor count register */ volatile void __iomem *sbdma_curdscr; /* current descriptor address */ + volatile void __iomem *sbdma_oodpktlost;/* pkt drop (rx only) */ + /* * This stuff is for maintenance of the ring */ + sbdmadscr_t *sbdma_dscrtable_unaligned; sbdmadscr_t *sbdma_dscrtable; /* base of
Re: Recent net-2.6.22 patches break bootup!
On Thu, 22 Mar 2007 21:41:23 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Thomas Graf [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 00:47:04 +0100 * Stephen Hemminger [EMAIL PROTECTED] 2007-03-22 14:27 Something is broken now. If I boot the system (Fedora) it gets to: Bringing up loopback interface: RTNETLINK answers: Invalid argument Dump terminated RTNETLINK answers: Invalid argument tg3 device eth0 does not seem to be present, delaying initialization then it hangs because cups won't come up without loopback Thinko. It always returned the first message handler of a rtnl family. [RTNL]: Properly return rntl message handler Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied, thanks Thomas. Thanks, that fixes it -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET_SCHED 11/11]: qdisc: avoid dequeue while throttled
From: Patrick McHardy [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 15:57:08 +0100 I found another case that doesn't work properly, so let me retract this patch until I've properly thought this through. Ok, I'll apply the rest. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET_SCHED 00/11]: pkt_sched.h cleanup + misc changes
From: Patrick McHardy [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 14:35:38 +0100 (MET) These patches fix an off-by-one in netem, clean up pkt_sched.h by removing most of the now unnecessary PSCHED time macros and turning the two remaining ones into inline functions, consolidate some common filter destruction code and move the TCQ_F_THROTTLED optimization from netem to qdisc_restart. Please apply, thanks. Patches 1-10 applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATH 0/6] New SCTP functionality for 2.6.22
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 09:52:46 -0400 This patch series implements additional SCTP socket options. This was originally submitted too late for 2.6.21, so I am re-submitting for 2.6.22. Please consider applying. All 6 patches applied, thanks Vlad. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netlink: introduce NLA_BINARY type
From: Thomas Graf [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 16:13:24 +0100 * Johannes Berg [EMAIL PROTECTED] 2007-03-23 16:02 This patch introduces a new NLA_BINARY attribute policy type with the verification of simply checking the maximum length of the payload. It also fixes a small typo in the example. Signed-off-by: Johannes Berg [EMAIL PROTECTED] Cc: Thomas Graf [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied to net-2.6.22, thanks everyone. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][SCTP]: Update SCTP Maintainers entry
From: Sridhar Samudrala [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 09:28:30 -0700 I have asked Vlad Yasevich to take over the role of primary maintainer of SCTP and he has accepted it. He has been contributing to SCTP for more than 2 years and has become more active than me in the past year. Applied, thanks for all of your SCTP work Sridhar. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add security check before flushing SAD/SPD
On Fri, 23 Mar 2007, Eric Paris wrote: Maybe I'm way out on a limb here but if I am a regular user and I say rm /tmp/* and I only have permissions to delete some of the files I expect just those couple to be delete, not the whole operation denied. I don't think this analogy holds up, as rm is a per-file deletion operation, and it is the shell which expands the wildcard for you. A 'flush' has a semantic implication that all entries will be removed, and it should be atomic and either succeed or fail at that granularity. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 09:00:08 +0100 I dont consider this new hash as bug fix at all, ie your patch might enter 2.6.22 normal dev cycle. Ok, I checked the patch into net-2.6.22 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add security check before flushing SAD/SPD
From: James Morris [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 14:46:48 -0400 (EDT) A 'flush' has a semantic implication that all entries will be removed, and it should be atomic and either succeed or fail at that granularity. Correct. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VIA Velocity VLAN vexation
Or should I just get a different gigabit card ? This one probably got answered the 2005/11/29. :o) Ah, that's where I asked before. I misplaced the e-mail. I hope you don't mind my asking every year or two. But I don't see any suggestions for an alternative gigabit card anywhere. I had assumed they all mostly worked, but now it appears I need to know details. I'll got to bed in a few minutes but I'll happily resurrect the velocity vlan patches. Haven't they been merged upstream already? Anyway, thanks for the reply! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add security check before flushing SAD/SPD
On Fri, 2007-03-23 at 11:47 -0700, David Miller wrote: From: James Morris [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 14:46:48 -0400 (EDT) A 'flush' has a semantic implication that all entries will be removed, and it should be atomic and either succeed or fail at that granularity. Correct. Fair enough, does it matter that we have no way to report failure back to users who can no longer assume success? -Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] mv643xx_eth: Fix use of uninitialized port_num field
From: Gabriel Paubert [EMAIL PROTECTED] In this driver, the default ethernet address is first set by by calling eth_port_uc_addr_get() which reads the relevant registers of the corresponding port as initially set by firmware. However that function used the port_num field accessed through the private area of net_dev before it was set. The result was that one board I have ended up with the unicast address set to 00:00:00:00:00:00 (only port 1 is connected on this board). The problem appeared after commit 84dd619e4dc3b0b1c40dafd98c90fd950bce7bc5. This patch fixes the bug by setting mp-port_num prior to calling eth_port_uc_get_addr(). Signed-off-by: Gabriel Paubert [EMAIL PROTECTED] Signed-off-by: Dale Farnsworth [EMAIL PROTECTED] --- This fixes a serious bug and should expeditiously pushed upstream. drivers/net/mv643xx_eth.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c index 1ee27c3..643ea31 100644 --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -1379,7 +1379,7 @@ #endif spin_lock_init(mp-lock); - port_num = pd-port_number; + port_num = mp-port_num = pd-port_number; /* set default config values */ eth_port_uc_addr_get(dev, dev-dev_addr); @@ -1411,8 +1411,6 @@ #endif duplex = pd-duplex; speed = pd-speed; - mp-port_num = port_num; - /* Hook up MII support for ethtool */ mp-mii.dev = dev; mp-mii.mdio_read = mv643xx_mdio_read; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add security check before flushing SAD/SPD
On Fri, 2007-03-23 at 12:59 -0400, Eric Paris wrote: On Fri, 2007-03-23 at 10:33 -0600, Joy Latten wrote: On Fri, 2007-03-23 at 01:39 -0400, Eric Paris wrote: In either case though proper auditing needs to be addressed. I see that the first patch from Joy wouldn't audit deletion failures. It appears to me if the check is done per policy then the security hook return code needs to be recorded and passed to xfrm_audit_log instead of the hard coded 1 result used now. Assuming we go with James's double loop what should we be auditing for a security hook denial? Just audit the first policy entry which we tried to remove but couldn't and then leave the rest of the auditing in those functions the way it is now in case there was no denial, calling xfrm_audit_log with a hard coded 1 for the result? Actually, I thought the original intent of the ipsec auditing was to just audit changes made to the SAD/SPD databases, not securiy hook denials, right? Then what is the point of the 'result' field that we capture and log in xfrm_audit_log if the only things you care to audit are successful changes to the databases? Yes, I think we do want to audit the security denial since it is the reason we could not change the policy. In the flush case it seem it will be the only reason. As you suggested, I will audit the first denial since this is the reason the flush will fail. But sometimes, in other cases, the delete or add could fail for other reasons too such as not being able to allocate memory, not finding the entry, etc... which is passed in the result field. Regards, Joy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] mv643xx_eth: make eth_port_uc_addr_{get,set}() calls symmetric
From: Gabriel Paubert [EMAIL PROTECTED] There is no good reason for the asymmetry in the parameters of eth_port_uc_addr_get() and eth_port_uc_addr_set(). Make them symmetric. Remove some gratuitous block comments while we're here. Signed-off-by: Gabriel Paubert [EMAIL PROTECTED] Signed-off-by: Dale Farnsworth [EMAIL PROTECTED] --- This is a clean-up patch that needn't be rushed upstream. -Dale drivers/net/mv643xx_eth.c | 59 +++- drivers/net/mv643xx_eth.h |4 -- 2 files changed, 13 insertions(+), 50 deletions(-) diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c index 643ea31..f58d96e 100644 --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -51,8 +51,8 @@ #include asm/delay.h #include mv643xx_eth.h /* Static function declarations */ -static void eth_port_uc_addr_get(struct net_device *dev, - unsigned char *MacAddr); +static void eth_port_uc_addr_get(unsigned int port_num, unsigned char *p_addr); +static void eth_port_uc_addr_set(unsigned int port_num, unsigned char *p_addr); static void eth_port_set_multicast_list(struct net_device *); static void mv643xx_eth_port_enable_tx(unsigned int port_num, unsigned int queues); @@ -1382,7 +1382,7 @@ #endif port_num = mp-port_num = pd-port_number; /* set default config values */ - eth_port_uc_addr_get(dev, dev-dev_addr); + eth_port_uc_addr_get(port_num, dev-dev_addr); mp-rx_ring_size = MV643XX_ETH_PORT_DEFAULT_RECEIVE_QUEUE_SIZE; mp-tx_ring_size = MV643XX_ETH_PORT_DEFAULT_TRANSMIT_QUEUE_SIZE; @@ -1826,26 +1826,9 @@ static void eth_port_start(struct net_de } /* - * eth_port_uc_addr_set - This function Set the port Unicast address. - * - * DESCRIPTION: - * This function Set the port Ethernet MAC address. - * - * INPUT: - * unsigned inteth_port_numPort number. - * char * p_addr Address to be set - * - * OUTPUT: - * Set MAC address low and high registers. also calls - * eth_port_set_filter_table_entry() to set the unicast - * table with the proper information. - * - * RETURN: - * N/A. - * + * eth_port_uc_addr_set - Write a MAC address into the port's hw registers */ -static void eth_port_uc_addr_set(unsigned int eth_port_num, - unsigned char *p_addr) +static void eth_port_uc_addr_set(unsigned int port_num, unsigned char *p_addr) { unsigned int mac_h; unsigned int mac_l; @@ -1855,40 +1838,24 @@ static void eth_port_uc_addr_set(unsigne mac_h = (p_addr[0] 24) | (p_addr[1] 16) | (p_addr[2] 8) | (p_addr[3] 0); - mv_write(MV643XX_ETH_MAC_ADDR_LOW(eth_port_num), mac_l); - mv_write(MV643XX_ETH_MAC_ADDR_HIGH(eth_port_num), mac_h); + mv_write(MV643XX_ETH_MAC_ADDR_LOW(port_num), mac_l); + mv_write(MV643XX_ETH_MAC_ADDR_HIGH(port_num), mac_h); - /* Accept frames of this address */ - table = MV643XX_ETH_DA_FILTER_UNICAST_TABLE_BASE(eth_port_num); + /* Accept frames with this address */ + table = MV643XX_ETH_DA_FILTER_UNICAST_TABLE_BASE(port_num); eth_port_set_filter_table_entry(table, p_addr[5] 0x0f); } /* - * eth_port_uc_addr_get - This function retrieves the port Unicast address - * (MAC address) from the ethernet hw registers. - * - * DESCRIPTION: - * This function retrieves the port Ethernet MAC address. - * - * INPUT: - * unsigned inteth_port_numPort number. - * char*MacAddrpointer where the MAC address is stored - * - * OUTPUT: - * Copy the MAC address to the location pointed to by MacAddr - * - * RETURN: - * N/A. - * + * eth_port_uc_addr_get - Read the MAC address from the port's hw registers */ -static void eth_port_uc_addr_get(struct net_device *dev, unsigned char *p_addr) +static void eth_port_uc_addr_get(unsigned int port_num, unsigned char *p_addr) { - struct mv643xx_private *mp = netdev_priv(dev); unsigned int mac_h; unsigned int mac_l; - mac_h = mv_read(MV643XX_ETH_MAC_ADDR_HIGH(mp-port_num)); - mac_l = mv_read(MV643XX_ETH_MAC_ADDR_LOW(mp-port_num)); + mac_h = mv_read(MV643XX_ETH_MAC_ADDR_HIGH(port_num)); + mac_l = mv_read(MV643XX_ETH_MAC_ADDR_LOW(port_num)); p_addr[0] = (mac_h 24) 0xff; p_addr[1] = (mac_h 16) 0xff; diff --git a/drivers/net/mv643xx_eth.h b/drivers/net/mv643xx_eth.h index 7d4e90c..82f8c0c 100644 --- a/drivers/net/mv643xx_eth.h +++ b/drivers/net/mv643xx_eth.h @@ -346,10 +346,6 @@ static void eth_port_init(struct mv643xx static void eth_port_reset(unsigned int eth_port_num); static void eth_port_start(struct net_device *dev); -/* Port MAC address routines */ -static void eth_port_uc_addr_set(unsigned int eth_port_num, -
[PATCH] tcp: cubic update for net-2.6.22
The following update received from Injong updates TCP cubic to the latest version. I am running more complete tests and will have results after 4/1. According to Injong: the new version improves on its scalability, fairness and stability. So in all properties, we confirmed it shows better performance. NCSU results (for 2.6.18 and 2.6.20) available: http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_Testing This version is described in a new Internet draft for CUBIC. http://www.ietf.org/internet-drafts/draft-rhee-tcp-cubic-00.txt Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- net/ipv4/tcp_cubic.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 15c5803..296845b 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -1,5 +1,5 @@ /* - * TCP CUBIC: Binary Increase Congestion control for TCP v2.0 + * TCP CUBIC: Binary Increase Congestion control for TCP v2.1 * * This is from the implementation of CUBIC TCP in * Injong Rhee, Lisong Xu. @@ -214,7 +214,9 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd) if (ca-delay_min 0) { /* max increment = Smax * rtt / 0.1 */ min_cnt = (cwnd * HZ * 8)/(10 * max_increment * ca-delay_min); - if (ca-cnt min_cnt) + + /* use concave growth when the target is above the origin */ + if (ca-cnt min_cnt t = ca-bic_K) ca-cnt = min_cnt; } @@ -400,4 +402,4 @@ module_exit(cubictcp_unregister); MODULE_AUTHOR(Sangtae Ha, Stephen Hemminger); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(CUBIC TCP); -MODULE_VERSION(2.0); +MODULE_VERSION(2.1); -- 1.4.4.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] tcp_probe: improvements for net-2.6.22
Change tcp_probe to use ktime (needed to add one export). Add option to only get events when cwnd changes - from Doug Leith Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- kernel/time.c|1 + net/ipv4/tcp_probe.c | 68 - 2 files changed, 40 insertions(+), 29 deletions(-) diff --git a/kernel/time.c b/kernel/time.c index ec5b10c..de66866 100644 --- a/kernel/time.c +++ b/kernel/time.c @@ -452,6 +452,7 @@ struct timespec ns_to_timespec(const s64 nsec) return ts; } +EXPORT_SYMBOL(ns_to_timespec); /** * ns_to_timeval - Convert nanoseconds to timeval diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c index 61f406f..d03ae9b 100644 --- a/net/ipv4/tcp_probe.c +++ b/net/ipv4/tcp_probe.c @@ -26,6 +26,8 @@ #include linux/proc_fs.h #include linux/module.h #include linux/kfifo.h +#include linux/ktime.h +#include linux/time.h #include linux/vmalloc.h #include net/tcp.h @@ -34,43 +36,45 @@ MODULE_AUTHOR(Stephen Hemminger [EMAIL PROTECTED]); MODULE_DESCRIPTION(TCP cwnd snooper); MODULE_LICENSE(GPL); -static int port = 0; +static int port __read_mostly = 0; MODULE_PARM_DESC(port, Port to match (0=all)); module_param(port, int, 0); -static int bufsize = 64*1024; +static int bufsize __read_mostly = 64*1024; MODULE_PARM_DESC(bufsize, Log buffer size (default 64k)); module_param(bufsize, int, 0); +static int full __read_mostly; +MODULE_PARM_DESC(full, Full log (1=every ack packet received, 0=only cwnd changes)); +module_param(full, int, 0); + static const char procname[] = tcpprobe; struct { - struct kfifo *fifo; - spinlock_tlock; + struct kfifo*fifo; + spinlock_t lock; wait_queue_head_t wait; - struct timeval tstart; + ktime_t start; + u32 lastcwnd; } tcpw; +/* + * Print to log with timestamps. + * FIXME: causes an extra copy + */ static void printl(const char *fmt, ...) { va_list args; int len; - struct timeval now; + struct timespec tv; char tbuf[256]; va_start(args, fmt); - do_gettimeofday(now); + /* want monotonic time since start of tcp_probe */ + tv = ktime_to_timespec(ktime_sub(ktime_get(), tcpw.start)); - now.tv_sec -= tcpw.tstart.tv_sec; - now.tv_usec -= tcpw.tstart.tv_usec; - if (now.tv_usec 0) { - --now.tv_sec; - now.tv_usec += 100; - } - - len = sprintf(tbuf, %lu.%06lu , - (unsigned long) now.tv_sec, - (unsigned long) now.tv_usec); + len = sprintf(tbuf, %lu.%09lu , + (unsigned long) tv.tv_sec, (unsigned long) tv.tv_nsec); len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args); va_end(args); @@ -78,38 +82,44 @@ static void printl(const char *fmt, ...) wake_up(tcpw.wait); } -static int jtcp_sendmsg(struct kiocb *iocb, struct sock *sk, - struct msghdr *msg, size_t size) +/* + * Hook inserted to be called before each receive packet. + * Note: arguments must match tcp_rcv_established()! + */ +static int jtcp_rcv_established(struct sock *sk, struct sk_buff *skb, + struct tcphdr *th, unsigned len) { const struct tcp_sock *tp = tcp_sk(sk); const struct inet_sock *inet = inet_sk(sk); - if (port == 0 || ntohs(inet-dport) == port || - ntohs(inet-sport) == port) { + /* Only update if port matches */ + if ((port == 0 || ntohs(inet-dport) == port || ntohs(inet-sport) == port) +(full || tp-snd_cwnd != tcpw.lastcwnd)) { printl(%d.%d.%d.%d:%u %d.%d.%d.%d:%u %d %#x %#x %u %u %u\n, NIPQUAD(inet-saddr), ntohs(inet-sport), NIPQUAD(inet-daddr), ntohs(inet-dport), - size, tp-snd_nxt, tp-snd_una, + skb-len, tp-snd_nxt, tp-snd_una, tp-snd_cwnd, tcp_current_ssthresh(sk), - tp-snd_wnd); + tp-snd_wnd, tp-srtt 3); + tcpw.lastcwnd = tp-snd_cwnd; } jprobe_return(); return 0; } -static struct jprobe tcp_send_probe = { +static struct jprobe tcp_probe = { .kp = { - .symbol_name= tcp_sendmsg, + .symbol_name= tcp_rcv_established, }, - .entry = JPROBE_ENTRY(jtcp_sendmsg), + .entry = JPROBE_ENTRY(jtcp_rcv_established), }; static int tcpprobe_open(struct inode * inode, struct file * file) { kfifo_reset(tcpw.fifo); - do_gettimeofday(tcpw.tstart); + tcpw.start = ktime_get(); return 0; } @@ -162,7 +172,7 @@ static __init int tcpprobe_init(void) if (!proc_net_fops_create(procname, S_IRUSR, tcpprobe_fops)) goto err0; - ret = register_jprobe(tcp_send_probe); + ret =
Re: VIA Velocity VLAN vexation
[EMAIL PROTECTED] [EMAIL PROTECTED] : [...] But I don't see any suggestions for an alternative gigabit card anywhere. I had assumed they all mostly worked, but now it appears I need to know details. Mostly. Assuming you won't play with huge jumbo frames, I'd suggest a plain old pci 8169 (not a PCIe 8168) for VLAN. [...] Haven't they been merged upstream already? No. -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] forcedeth: fix nic poll
The nic poll routine was missing the call to the optimized irq routine. This patch adds the missing call for the optimized path. See http://bugzilla.kernel.org/show_bug.cgi?id=7950 for more information. Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED] --- orig/drivers/net/forcedeth.c2007-03-11 20:38:20.0 -0500 +++ new/drivers/net/forcedeth.c 2007-03-11 20:38:24.0 -0500 @@ -3536,7 +3536,10 @@ pci_push(base); if (!using_multi_irqs(dev)) { - nv_nic_irq(0, dev); + if (np-desc_ver == DESC_VER_3) + nv_nic_irq_optimized(0, dev); + else + nv_nic_irq(0, dev); if (np-msi_flags NV_MSI_X_ENABLED) enable_irq_lockdep(np-msi_x_entry[NV_MSI_X_VECTOR_ALL].vector); else
[PATCH 2/2] forcedeth: fix tx timeout
The tx timeout routine was waking the tx queue conditionally. However, it must call it unconditionally since the dev_watchdog has halted the tx queue before calling the timeout function. Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED] --- orig/drivers/net/forcedeth.c2007-03-11 20:59:06.0 -0500 +++ new/drivers/net/forcedeth.c 2007-03-11 20:58:59.0 -0500 @@ -2050,9 +2050,10 @@ nv_drain_tx(dev); nv_init_tx(dev); setup_hw_rings(dev, NV_SETUP_TX_RING); - netif_wake_queue(dev); } + netif_wake_queue(dev); + /* 4) restart tx engine */ nv_start_tx(dev); spin_unlock_irq(np-lock);
[PATCH 3/5 2.6.21-rc4] l2tp: pppox protocol module load
[PPPOL2TP]: Add the ability to autoload a pppox protocol module. This patch allows a name pppox-proto-nnn to be used in modprobe.conf to autoload a driver for PPPoX protocol nnn. Signed-off-by: James Chapman [EMAIL PROTECTED] Index: linux-2.6.21-rc4/drivers/net/pppox.c === --- linux-2.6.21-rc4.orig/drivers/net/pppox.c +++ linux-2.6.21-rc4/drivers/net/pppox.c @@ -114,6 +114,13 @@ static int pppox_create(struct socket *s goto out; rc = -EPROTONOSUPPORT; +#ifdef CONFIG_KMOD + if (!pppox_protos[protocol]) { + char buffer[32]; + sprintf(buffer, pppox-proto-%d, protocol); + request_module(buffer); + } +#endif if (!pppox_protos[protocol] || !try_module_get(pppox_protos[protocol]-owner)) goto out; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5 2.6.21-rc4] l2tp: pppol2tp kbuild changes
[PPPOL2TP]: Modify kbuild for the new pppol2tp driver. This patch adds a new config option, CONFIG_PPPOL2TP and adds if_pppol2tp.h to the list of exported headers. Signed-off-by: James Chapman [EMAIL PROTECTED] Index: linux-2.6.21-rc4/drivers/net/Kconfig === --- linux-2.6.21-rc4.orig/drivers/net/Kconfig +++ linux-2.6.21-rc4/drivers/net/Kconfig @@ -2812,6 +2812,19 @@ config PPPOATM which can lead to bad results if the ATM peer loses state and changes its encapsulation unilaterally. +config PPPOL2TP + tristate PPP over L2TP (EXPERIMENTAL) + depends on EXPERIMENTAL PPP + help + Support for PPP-over-L2TP socket family. L2TP is a protocol + used by ISPs and enterprises to tunnel PPP traffic over UDP + tunnels. L2TP is replacing PPTP for VPN uses. + + This kernel component handles only L2TP data packets: a + userland daemon handles L2TP the control protocol (tunnel + and session setup). One such daemon is OpenL2TP + (http://openl2tp.sourceforge.net/). + config SLIP tristate SLIP (serial line) support ---help--- Index: linux-2.6.21-rc4/drivers/net/Makefile === --- linux-2.6.21-rc4.orig/drivers/net/Makefile +++ linux-2.6.21-rc4/drivers/net/Makefile @@ -119,6 +119,7 @@ obj-$(CONFIG_PPP_DEFLATE) += ppp_deflate obj-$(CONFIG_PPP_BSDCOMP) += bsd_comp.o obj-$(CONFIG_PPP_MPPE) += ppp_mppe.o obj-$(CONFIG_PPPOE) += pppox.o pppoe.o +obj-$(CONFIG_PPPOL2TP) += pppox.o pppol2tp.o obj-$(CONFIG_SLIP) += slip.o obj-$(CONFIG_SLHC) += slhc.o Index: linux-2.6.21-rc4/include/linux/Kbuild === --- linux-2.6.21-rc4.orig/include/linux/Kbuild +++ linux-2.6.21-rc4/include/linux/Kbuild @@ -220,6 +220,7 @@ unifdef-y += if_fddi.h unifdef-y += if_frad.h unifdef-y += if_ltalk.h unifdef-y += if_link.h +unifdef-y += if_pppol2tp.h unifdef-y += if_pppox.h unifdef-y += if_shaper.h unifdef-y += if_tr.h - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5 2.6.21-rc4] l2tp: add pppol2tp maintainer
[PPPOL2TP]: Update maintainers file for PPP over L2TP. Signed-off-by: James Chapman [EMAIL PROTECTED] Index: linux-2.6.21-rc4/MAINTAINERS === --- linux-2.6.21-rc4.orig/MAINTAINERS +++ linux-2.6.21-rc4/MAINTAINERS @@ -2700,6 +2700,11 @@ P: Michal Ostrowski M: [EMAIL PROTECTED] S: Maintained +PPP OVER L2TP +P: James Chapman +M: [EMAIL PROTECTED] +S: Maintained + PREEMPTIBLE KERNEL P: Robert Love M: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: SAD sometimes has double SAs.
Last Friday I proposed creating larval SAs to act as placeholders to prevent a second acquire resulting in double SAs being created. I tried this and so far I have not seen any double SAs being created. I also plan to run some stress tests over the weekend. Please let me know what improvements I can make to this patch or if there is a better way to do this. A while back I reported that I sometimes saw double and triple SAs being created. The patch to check for protocol when deleting larval SA removed one obstacle in that I no longer see triple SAs. Now, once in a while double SAs. I think I have figured out the second obstacle. The initiator installs his SAs into the kernel before the responder. As soon as they are installed, the blocked packet (which started the ACQUIRE) is sent. By this time the responder has installed his inbound SA(s) and so the newly arrived ipsec packet can be processed. In the case of tcp connections and a ping, a response may be warranted, and thus an outgoing packet results. From what I can tell of the log file below, sometimes, this might happen before the responder has completed installing the outbound SAs. In the log file, the outbound AH has been added, but not the outbound ESP, which is the one the outgoing packet looks for first. Thus resulting in a second acquire. I think this becomes more problematic when using both AH AND ESP, rather than just using ESP with authentication. With the latter, only one SA needed thus reducing the latency in installing the SAs before incoming packet arrives. So far, the only solution I can think of besides mandating all userspace key daemons do SA maintenance is to perhaps add larval SAs for both directions when adding the SPI. Currently, responder does GETSPI for one way and initiator for opposite. When GETSPI is called, larval SA is created containing the SPI, but it is only for one direction. Perhaps we can add a larval SA (no SPI) for opposite direction to act as a placeholder indicating ACQUIRE occurring, since SAs are created for both directions during an ACQUIRE. The initiator may have larval SA from GETSPI and larval SA from the ACQUIRE depending that GETSPI is in opposite direction of ACQUIRE. Calling __find_acq_core() should ensure we don't create duplicate larval SAs. Also, should IKE negotiations return error, larval SAs should expire. They also should be removed when we do the xfrm_state_add() and xfrm_state_update() to add the new SAs. Joy This patch is against linux-2.6.21-rc4-git5 Signed-off-by: Joy Latten[EMAIL PROTECTED] diff -urpN linux-2.6.20.orig/net/xfrm/xfrm_state.c linux-2.6.20/net/xfrm/xfrm_state.c --- linux-2.6.20.orig/net/xfrm/xfrm_state.c 2007-03-20 22:39:15.0 -0500 +++ linux-2.6.20/net/xfrm/xfrm_state.c 2007-03-23 16:38:37.0 -0500 @@ -692,12 +692,15 @@ void xfrm_state_insert(struct xfrm_state } EXPORT_SYMBOL(xfrm_state_insert); +static struct xfrm_state *create_larval_sa(unsigned short family, u8 mode, u32 reqid, u8 proto, xfrm_address_t *daddr, xfrm_address_t *saddr); + /* xfrm_state_lock is held */ static struct xfrm_state *__find_acq_core(unsigned short family, u8 mode, u32 reqid, u8 proto, xfrm_address_t *daddr, xfrm_address_t *saddr, int create) { unsigned int h = xfrm_dst_hash(daddr, saddr, reqid, family); struct hlist_node *entry; - struct xfrm_state *x; + struct xfrm_state *x, *x1; + int track_opposite = 0; hlist_for_each_entry(x, entry, xfrm_state_bydst+h, bydst) { if (x-props.reqid != reqid || @@ -710,11 +713,20 @@ static struct xfrm_state *__find_acq_cor switch (family) { case AF_INET: + if (x-id.daddr.a4 == saddr-a4 + x-props.saddr.a4 == daddr-a4) + track_opposite = 1; if (x-id.daddr.a4!= daddr-a4 || x-props.saddr.a4 != saddr-a4) continue; break; case AF_INET6: + if (ipv6_addr_equal((struct in6_addr *)x-id.daddr.a6, +(struct in6_addr *)saddr) || + ipv6_addr_equal((struct in6_addr *) +x-props.saddr.a6, +(struct in6_addr *)daddr)) + track_opposite = 1; if (!ipv6_addr_equal((struct in6_addr *)x-id.daddr.a6, (struct in6_addr *)daddr) || !ipv6_addr_equal((struct in6_addr *) @@ -731,6 +743,27 @@ static struct xfrm_state *__find_acq_cor if (!create) return NULL; + x = create_larval_sa(family, mode, reqid, proto, daddr, saddr); + + /* create a larval
Re: [PATCH]: SAD sometimes has double SAs.
From: Joy Latten [EMAIL PROTECTED] Date: Fri, 23 Mar 2007 16:58:20 -0600 Last Friday I proposed creating larval SAs to act as placeholders to prevent a second acquire resulting in double SAs being created. I tried this and so far I have not seen any double SAs being created. I also plan to run some stress tests over the weekend. Please let me know what improvements I can make to this patch or if there is a better way to do this. I'll take a look at your patch after I deal with some ipv6 locking bugs, thanks Joy. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[TG3 1/3]: Eliminate the unused TG3_FLAG_SPLIT_MODE flag.
[TG3]: Eliminate the unused TG3_FLAG_SPLIT_MODE flag. This flag to support multiple PCIX split completions was never used because of hardware bugs. This will make room for a new flag. Signed-off-by: Michael Chan [EMAIL PROTECTED] diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index 8c8f9f4..ab87bb1 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -6321,8 +6321,6 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy) RDMAC_MODE_ADDROFLOW_ENAB | RDMAC_MODE_FIFOOFLOW_ENAB | RDMAC_MODE_FIFOURUN_ENAB | RDMAC_MODE_FIFOOREAD_ENAB | RDMAC_MODE_LNGREAD_ENAB); - if (tp-tg3_flags TG3_FLAG_SPLIT_MODE) - rdmac_mode |= RDMAC_MODE_SPLIT_ENABLE; /* If statement applies to 5705 and 5750 PCI devices only */ if ((GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5705 @@ -6495,9 +6493,6 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy) } else if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5704) { val = ~(PCIX_CAPS_SPLIT_MASK | PCIX_CAPS_BURST_MASK); val |= (PCIX_CAPS_MAX_BURST_CPIOB PCIX_CAPS_BURST_SHIFT); - if (tp-tg3_flags TG3_FLAG_SPLIT_MODE) - val |= (tp-split_mode_max_reqs - PCIX_CAPS_SPLIT_SHIFT); } tw32(TG3PCI_X_CAPS, val); } @@ -10863,14 +10858,6 @@ static int __devinit tg3_get_invariants(struct tg3 *tp) grc_misc_cfg = tr32(GRC_MISC_CFG); grc_misc_cfg = GRC_MISC_CFG_BOARD_ID_MASK; - /* Broadcom's driver says that CIOBE multisplit has a bug */ -#if 0 - if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5704 - grc_misc_cfg == GRC_MISC_CFG_BOARD_ID_5704CIOBE) { - tp-tg3_flags |= TG3_FLAG_SPLIT_MODE; - tp-split_mode_max_reqs = SPLIT_MODE_5704_MAX_REQ; - } -#endif if (GET_ASIC_REV(tp-pci_chip_rev_id) == ASIC_REV_5705 (grc_misc_cfg == GRC_MISC_CFG_BOARD_ID_5788 || grc_misc_cfg == GRC_MISC_CFG_BOARD_ID_5788M)) @@ -11968,14 +11955,12 @@ static int __devinit tg3_init_one(struct pci_dev *pdev, i == 5 ? '\n' : ':'); printk(KERN_INFO %s: RXcsums[%d] LinkChgREG[%d] - MIirq[%d] ASF[%d] Split[%d] WireSpeed[%d] - TSOcap[%d] \n, + MIirq[%d] ASF[%d] WireSpeed[%d] TSOcap[%d]\n, dev-name, (tp-tg3_flags TG3_FLAG_RX_CHECKSUMS) != 0, (tp-tg3_flags TG3_FLAG_USE_LINKCHG_REG) != 0, (tp-tg3_flags TG3_FLAG_USE_MI_INTERRUPT) != 0, (tp-tg3_flags TG3_FLAG_ENABLE_ASF) != 0, - (tp-tg3_flags TG3_FLAG_SPLIT_MODE) != 0, (tp-tg3_flags2 TG3_FLG2_NO_ETH_WIRE_SPEED) == 0, (tp-tg3_flags2 TG3_FLG2_TSO_CAPABLE) != 0); printk(KERN_INFO %s: dma_rwctrl[%08x] dma_mask[%d-bit]\n, diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h index 086892d..5df8f76 100644 --- a/drivers/net/tg3.h +++ b/drivers/net/tg3.h @@ -2223,7 +2223,6 @@ struct tg3 { #define TG3_FLAG_40BIT_DMA_BUG 0x0800 #define TG3_FLAG_BROKEN_CHECKSUMS 0x1000 #define TG3_FLAG_GOT_SERDES_FLOWCTL0x2000 -#define TG3_FLAG_SPLIT_MODE0x4000 #define TG3_FLAG_INIT_COMPLETE 0x8000 u32 tg3_flags2; #define TG3_FLG2_RESTART_TIMER 0x0001 @@ -2262,9 +2261,6 @@ struct tg3 { #define TG3_FLG2_NO_FWARE_REPORTED 0x4000 #define TG3_FLG2_PHY_ADJUST_TRIM 0x8000 - u32 split_mode_max_reqs; -#define SPLIT_MODE_5704_MAX_REQ3 - struct timer_list timer; u16 timer_counter; u16 timer_multiplier; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[TG3 2/3]: Exit irq handler during chip reset.
[TG3]: Exit irq handler during chip reset. On most tg3 chips, the memory enable bit in the PCI command register gets cleared during chip reset and must be restored before accessing PCI registers using memory cycles. The chip does not generate interrupt during chip reset, but the irq handler can still be called because of irq sharing or irqpoll. Reading a register in the irq handler can cause a master abort in this scenario and may result in a crash on some architectures. Use the TG3_FLAG_CHIP_RESETTING flag to tell the irq handler to exit without touching any registers. The checking of the flag is in the slow path of the irq handler and will not affect normal performance. The msi handler is not shared and therefore does not require checking the flag. Thanks to Bernhard Walle [EMAIL PROTECTED] for reporting the problem. Signed-off-by: Michael Chan [EMAIL PROTECTED] diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index ab87bb1..9aca100 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -3568,32 +3568,34 @@ static irqreturn_t tg3_interrupt(int irq, void *dev_id) * Reading the PCI State register will confirm whether the * interrupt is ours and will flush the status block. */ - if ((sblk-status SD_STATUS_UPDATED) || - !(tr32(TG3PCI_PCISTATE) PCISTATE_INT_NOT_ACTIVE)) { - /* -* Writing any value to intr-mbox-0 clears PCI INTA# and -* chip-internal interrupt pending events. -* Writing non-zero to intr-mbox-0 additional tells the -* NIC to stop sending us irqs, engaging in-intr-handler -* event coalescing. -*/ - tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, -0x0001); - if (tg3_irq_sync(tp)) + if (unlikely(!(sblk-status SD_STATUS_UPDATED))) { + if ((tp-tg3_flags TG3_FLAG_CHIP_RESETTING) || + (tr32(TG3PCI_PCISTATE) PCISTATE_INT_NOT_ACTIVE)) { + handled = 0; goto out; - sblk-status = ~SD_STATUS_UPDATED; - if (likely(tg3_has_work(tp))) { - prefetch(tp-rx_rcb[tp-rx_rcb_ptr]); - netif_rx_schedule(dev); /* schedule NAPI poll */ - } else { - /* No work, shared interrupt perhaps? re-enable -* interrupts, and flush that PCI write -*/ - tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, - 0x); } - } else {/* shared interrupt */ - handled = 0; + } + + /* +* Writing any value to intr-mbox-0 clears PCI INTA# and +* chip-internal interrupt pending events. +* Writing non-zero to intr-mbox-0 additional tells the +* NIC to stop sending us irqs, engaging in-intr-handler +* event coalescing. +*/ + tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x0001); + if (tg3_irq_sync(tp)) + goto out; + sblk-status = ~SD_STATUS_UPDATED; + if (likely(tg3_has_work(tp))) { + prefetch(tp-rx_rcb[tp-rx_rcb_ptr]); + netif_rx_schedule(dev); /* schedule NAPI poll */ + } else { + /* No work, shared interrupt perhaps? re-enable +* interrupts, and flush that PCI write +*/ + tw32_mailbox_f(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, + 0x); } out: return IRQ_RETVAL(handled); @@ -3611,31 +3613,33 @@ static irqreturn_t tg3_interrupt_tagged(int irq, void *dev_id) * Reading the PCI State register will confirm whether the * interrupt is ours and will flush the status block. */ - if ((sblk-status_tag != tp-last_tag) || - !(tr32(TG3PCI_PCISTATE) PCISTATE_INT_NOT_ACTIVE)) { - /* -* writing any value to intr-mbox-0 clears PCI INTA# and -* chip-internal interrupt pending events. -* writing non-zero to intr-mbox-0 additional tells the -* NIC to stop sending us irqs, engaging in-intr-handler -* event coalescing. -*/ - tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, -0x0001); - if (tg3_irq_sync(tp)) + if (unlikely(sblk-status_tag == tp-last_tag)) { + if ((tp-tg3_flags TG3_FLAG_CHIP_RESETTING) || + (tr32(TG3PCI_PCISTATE) PCISTATE_INT_NOT_ACTIVE)) { + handled = 0; goto out; - if (netif_rx_schedule_prep(dev)) { - prefetch(tp-rx_rcb[tp-rx_rcb_ptr]); - /* Update last_tag to mark
[TG3 3/3]: Update version and reldate.
[TG3]: Update version and reldate. Update version to 3.75. Signed-off-by: Michael Chan [EMAIL PROTECTED] diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index 9aca100..e682f90 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -64,8 +64,8 @@ #define DRV_MODULE_NAMEtg3 #define PFX DRV_MODULE_NAME: -#define DRV_MODULE_VERSION 3.74 -#define DRV_MODULE_RELDATE February 20, 2007 +#define DRV_MODULE_VERSION 3.75 +#define DRV_MODULE_RELDATE March 23, 2007 #define TG3_DEF_MAC_MODE 0 #define TG3_DEF_RX_MODE0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.
On Fri, Mar 23, 2007 at 01:14:43PM +0100, Guennadi Liakhovetski wrote: On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote: On Wed, 21 Mar 2007, Samuel Ortiz wrote: I'm quite sure the leak is in the IrDA code rather than in the ppp or ipv4 one, hence the need for full irda debug... Well, looks like you were wrong, Samuel. Heh, it's good to be wrong sometimes :-) Below is a patch that fixes ONE sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if there are more there. Are you still seeing the skb cache growing with your fix ? Cheers, Samuel. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] [NET] MTU discovery changes
These are a few changes to fix/clean up some of the MTU discovery processing with non-stream sockets, and add a probing mode. See also matching patches to tracepath to take advantage of this. -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] [NET] Do pmtu check in transport layer
Check the pmtu check at the transport layer (for UDP, ICMP and raw), and send a local error if socket is PMTUDISC_DO and packet is too big. This is actually a pure bugfix for ipv6. For ipv4, it allows us to do pmtu checks in the same way as for ipv6. Signed-off-by: John Heffner [EMAIL PROTECTED] --- net/ipv4/ip_output.c |4 +++- net/ipv4/raw.c|8 +--- net/ipv6/ip6_output.c | 11 ++- net/ipv6/raw.c|7 +-- 4 files changed, 19 insertions(+), 11 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index d096332..593acf7 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -822,7 +822,9 @@ int ip_append_data(struct sock *sk, fragheaderlen = sizeof(struct iphdr) + (opt ? opt-optlen : 0); maxfraglen = ((mtu - fragheaderlen) ~7) + fragheaderlen; - if (inet-cork.length + length 0x - fragheaderlen) { + if (inet-cork.length + length 0x - fragheaderlen || + (inet-pmtudisc = IP_PMTUDISC_DO +inet-cork.length + length mtu)) { ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, mtu-exthdrlen); return -EMSGSIZE; } diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 87e9c16..f252f4e 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -271,10 +271,12 @@ static int raw_send_hdrinc(struct sock *sk, void *from, size_t length, struct iphdr *iph; struct sk_buff *skb; int err; + int mtu; - if (length rt-u.dst.dev-mtu) { - ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, - rt-u.dst.dev-mtu); + mtu = inet-pmtudisc == IP_PMTUDISC_DO ? dst_mtu(rt-u.dst) : +rt-u.dst.dev-mtu; + if (length mtu) { + ip_local_error(sk, EMSGSIZE, rt-rt_dst, inet-dport, mtu); return -EMSGSIZE; } if (flagsMSG_PROBE) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 3055169..711dfc3 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1044,11 +1044,12 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, fragheaderlen = sizeof(struct ipv6hdr) + rt-u.dst.nfheader_len + (opt ? opt-opt_nflen : 0); maxfraglen = ((mtu - fragheaderlen) ~7) + fragheaderlen - sizeof(struct frag_hdr); - if (mtu = sizeof(struct ipv6hdr) + IPV6_MAXPLEN) { - if (inet-cork.length + length sizeof(struct ipv6hdr) + IPV6_MAXPLEN - fragheaderlen) { - ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen); - return -EMSGSIZE; - } + if ((mtu = sizeof(struct ipv6hdr) + IPV6_MAXPLEN +inet-cork.length + length sizeof(struct ipv6hdr) + IPV6_MAXPLEN - fragheaderlen) || + (np-pmtudisc = IPV6_PMTUDISC_DO +inet-cork.length + length mtu)) { + ipv6_local_error(sk, EMSGSIZE, fl, mtu-exthdrlen); + return -EMSGSIZE; } /* diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 306d5d8..75db277 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -556,9 +556,12 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length, struct sk_buff *skb; unsigned int hh_len; int err; + int mtu; - if (length rt-u.dst.dev-mtu) { - ipv6_local_error(sk, EMSGSIZE, fl, rt-u.dst.dev-mtu); + mtu = np-pmtudisc == IPV6_PMTUDISC_DO ? dst_mtu(rt-u.dst) : +rt-u.dst.dev-mtu; + if (length mtu) { + ipv6_local_error(sk, EMSGSIZE, fl, mtu); return -EMSGSIZE; } if (flagsMSG_PROBE) -- 1.5.0.2.gc260-dirty - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] [NET] Move DF check to ip_forward
Do fragmentation check in ip_forward, similar to ipv6 forwarding. Also add a debug printk in the DF check in ip_fragment since we should now never reach it. Signed-off-by: John Heffner [EMAIL PROTECTED] --- net/ipv4/ip_forward.c |8 net/ipv4/ip_output.c |2 ++ 2 files changed, 10 insertions(+), 0 deletions(-) diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c index 369e721..0efb1f5 100644 --- a/net/ipv4/ip_forward.c +++ b/net/ipv4/ip_forward.c @@ -85,6 +85,14 @@ int ip_forward(struct sk_buff *skb) if (opt-is_strictroute rt-rt_dst != rt-rt_gateway) goto sr_failed; + if (unlikely(skb-len dst_mtu(rt-u.dst) +(skb-nh.iph-frag_off htons(IP_DF))) !skb-local_df) { + IP_INC_STATS(IPSTATS_MIB_FRAGFAILS); + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, + htonl(dst_mtu(rt-u.dst))); + goto drop; + } + /* We are about to mangle packet. Copy it! */ if (skb_cow(skb, LL_RESERVED_SPACE(rt-u.dst.dev)+rt-u.dst.header_len)) goto drop; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 593acf7..90bdd53 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -433,6 +433,8 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff*)) iph = skb-nh.iph; if (unlikely((iph-frag_off htons(IP_DF)) !skb-local_df)) { + if (net_ratelimit()) + printk(KERN_DEBUG ip_fragment: requested fragment of packet with DF set\n); IP_INC_STATS(IPSTATS_MIB_FRAGFAILS); icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(dst_mtu(rt-u.dst))); -- 1.5.0.2.gc260-dirty - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] [NET] Add IP(V6)_PMTUDISC_RPOBE
Add IP(V6)_PMTUDISC_PROBE value for IP(V6)_MTU_DISCOVER. This option forces us not to fragment, but does not make use of the kernel path MTU discovery. That is, it allows for user-mode MTU probing (or, packetization-layer path MTU discovery). This is particularly useful for diagnostic utilities, like traceroute/tracepath. Signed-off-by: John Heffner [EMAIL PROTECTED] --- include/linux/in.h |1 + include/linux/in6.h |1 + include/linux/skbuff.h |3 ++- include/net/ip.h |2 +- net/core/skbuff.c|2 ++ net/ipv4/ip_output.c | 14 ++ net/ipv4/ip_sockglue.c |2 +- net/ipv4/raw.c |3 +++ net/ipv6/ip6_output.c| 12 net/ipv6/ipv6_sockglue.c |2 +- net/ipv6/raw.c |3 +++ 11 files changed, 33 insertions(+), 12 deletions(-) diff --git a/include/linux/in.h b/include/linux/in.h index 1912e7c..2dc1f8a 100644 --- a/include/linux/in.h +++ b/include/linux/in.h @@ -83,6 +83,7 @@ struct in_addr { #define IP_PMTUDISC_DONT 0 /* Never send DF frames */ #define IP_PMTUDISC_WANT 1 /* Use per route hints */ #define IP_PMTUDISC_DO 2 /* Always DF*/ +#define IP_PMTUDISC_PROBE 3 /* Ignore dst pmtu */ #define IP_MULTICAST_IF32 #define IP_MULTICAST_TTL 33 diff --git a/include/linux/in6.h b/include/linux/in6.h index 4e8350a..d559fac 100644 --- a/include/linux/in6.h +++ b/include/linux/in6.h @@ -179,6 +179,7 @@ struct in6_flowlabel_req #define IPV6_PMTUDISC_DONT 0 #define IPV6_PMTUDISC_WANT 1 #define IPV6_PMTUDISC_DO 2 +#define IPV6_PMTUDISC_PROBE3 /* Flowlabel */ #define IPV6_FLOWLABEL_MGR 32 diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4ff3940..64038b4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -284,7 +284,8 @@ struct sk_buff { nfctinfo:3; __u8pkt_type:3, fclone:2, - ipvs_property:1; + ipvs_property:1, + ign_dst_mtu; __be16 protocol; void(*destructor)(struct sk_buff *skb); diff --git a/include/net/ip.h b/include/net/ip.h index e79c3e3..f5874a3 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -201,7 +201,7 @@ int ip_decrease_ttl(struct iphdr *iph) static inline int ip_dont_fragment(struct sock *sk, struct dst_entry *dst) { - return (inet_sk(sk)-pmtudisc == IP_PMTUDISC_DO || + return (inet_sk(sk)-pmtudisc = IP_PMTUDISC_DO || (inet_sk(sk)-pmtudisc == IP_PMTUDISC_WANT !(dst_metric(dst, RTAX_LOCK)(1RTAX_MTU; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 702fa8f..5c8515c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -474,6 +474,7 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE) C(ipvs_property); #endif + C(ign_dst_mtu); C(protocol); n-destructor = NULL; C(mark); @@ -549,6 +550,7 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old) #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE) new-ipvs_property = old-ipvs_property; #endif + new-ign_dst_mtu= old-ign_dst_mtu; #ifdef CONFIG_BRIDGE_NETFILTER new-nf_bridge = old-nf_bridge; nf_bridge_get(old-nf_bridge); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 90bdd53..a7e8944 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -201,7 +201,8 @@ static inline int ip_finish_output(struct sk_buff *skb) return dst_output(skb); } #endif - if (skb-len dst_mtu(skb-dst) !skb_is_gso(skb)) + if (skb-len dst_mtu(skb-dst) + !skb-ign_dst_mtu !skb_is_gso(skb)) return ip_fragment(skb, ip_finish_output2); else return ip_finish_output2(skb); @@ -801,7 +802,9 @@ int ip_append_data(struct sock *sk, inet-cork.addr = ipc-addr; } dst_hold(rt-u.dst); - inet-cork.fragsize = mtu = dst_mtu(rt-u.dst.path); + inet-cork.fragsize = mtu = inet-pmtudisc == IP_PMTUDISC_PROBE ? + rt-u.dst.dev-mtu : + dst_mtu(rt-u.dst.path); inet-cork.rt = rt; inet-cork.length = 0; sk-sk_sndmsg_page = NULL; @@ -1220,13 +1223,16 @@ int ip_push_pending_frames(struct sock *sk) * to fragment the frame generated here. No matter, what transforms * how transforms change size of the packet, it will come out. */ - if
[PATCH 0/2] [iputils] MTU discovery changes
These add some changes that make tracepath a little more useful for diagnosing MTU issues. The length flag helps distinguish between MTU black holes and other types of black holes by allowing you to vary the probe packet lengths. Using PMTUDISC_PROBE gives you the same results on each run without having to flush the route cache, so you can see where MTU changes in the path actually occur. The PMTUDISC_PROBE patch goes in should be conditional on whether the corresponding kernel patch (just sent) goes in. -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] [iputils] Use PMTUDISC_PROBE mode if it exists.
Signed-off-by: John Heffner [EMAIL PROTECTED] --- tracepath.c | 10 -- tracepath6.c | 10 -- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/tracepath.c b/tracepath.c index 1f901ba..a562d88 100644 --- a/tracepath.c +++ b/tracepath.c @@ -24,6 +24,10 @@ #include sys/uio.h #include arpa/inet.h +#ifndef IP_PMTUDISC_PROBE +#define IP_PMTUDISC_PROBE 3 +#endif + struct hhistory { int hops; @@ -322,8 +326,10 @@ main(int argc, char **argv) } memcpy(target.sin_addr, he-h_addr, 4); - on = IP_PMTUDISC_DO; - if (setsockopt(fd, SOL_IP, IP_MTU_DISCOVER, on, sizeof(on))) { + on = IP_PMTUDISC_PROBE; + if (setsockopt(fd, SOL_IP, IP_MTU_DISCOVER, on, sizeof(on)) + (on = IP_PMTUDISC_DO, +setsockopt(fd, SOL_IP, IP_MTU_DISCOVER, on, sizeof(on { perror(IP_MTU_DISCOVER); exit(1); } diff --git a/tracepath6.c b/tracepath6.c index d65230d..6f13a51 100644 --- a/tracepath6.c +++ b/tracepath6.c @@ -30,6 +30,10 @@ #define SOL_IPV6 IPPROTO_IPV6 #endif +#ifndef IPV6_PMTUDISC_PROBE +#define IPV6_PMTUDISC_PROBE3 +#endif + int overhead = 48; int mtu = 128000; int hops_to = -1; @@ -369,8 +373,10 @@ int main(int argc, char **argv) mapped = 1; } - on = IPV6_PMTUDISC_DO; - if (setsockopt(fd, SOL_IPV6, IPV6_MTU_DISCOVER, on, sizeof(on))) { + on = IPV6_PMTUDISC_PROBE; + if (setsockopt(fd, SOL_IPV6, IPV6_MTU_DISCOVER, on, sizeof(on)) + (on = IPV6_PMTUDISC_DO, +setsockopt(fd, SOL_IPV6, IPV6_MTU_DISCOVER, on, sizeof(on { perror(IPV6_MTU_DISCOVER); exit(1); } -- 1.5.0.2.gc260-dirty - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ehea: removing unused functionality
This patch includes: - removal of unused fields in structs - ethtool statistics cleanup - removes unsed functionality from send path Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- This patch applies on top of the netdev upstream branch for 2.6.22 drivers/net/ehea/ehea.h | 25 ++--- drivers/net/ehea/ehea_ethtool.c | 111 ++-- drivers/net/ehea/ehea_main.c| 55 +++ drivers/net/ehea/ehea_qmr.h |2 4 files changed, 69 insertions(+), 124 deletions(-) diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h index f889933..1405d0b 100644 --- a/drivers/net/ehea/ehea.h +++ b/drivers/net/ehea/ehea.h @@ -39,7 +39,7 @@ #include asm/abs_addr.h #include asm/io.h #define DRV_NAME ehea -#define DRV_VERSIONEHEA_0054 +#define DRV_VERSIONEHEA_0055 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \ | NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) @@ -79,7 +79,6 @@ #define EHEA_RQ2_PKT_SIZE 1522 #define EHEA_L_PKT_SIZE 256/* low latency */ /* Send completion signaling */ -#define EHEA_SIG_IV_LONG 1 /* Protection Domain Identifier */ #define EHEA_PD_ID0xaabcdeff @@ -106,11 +105,7 @@ #define EHEA_BCMC_VLANID_SINGLE0x00 #define EHEA_CACHE_LINE 128 /* Memory Regions */ -#define EHEA_MR_MAX_TX_PAGES 20 -#define EHEA_MR_TX_DATA_PN 3 #define EHEA_MR_ACC_CTRL 0x0080 -#define EHEA_RWQES_PER_MR_RQ2 10 -#define EHEA_RWQES_PER_MR_RQ3 10 #define EHEA_WATCH_DOG_TIMEOUT 10*HZ @@ -318,17 +313,12 @@ struct ehea_mr { /* * Port state information */ -struct port_state { - int poll_max_processed; +struct port_stats { int poll_receive_errors; - int ehea_poll; int queue_stopped; - int min_swqe_avail; - u64 sqc_stop_sum; - int pkt_send; - int pkt_xmit; - int send_tasklet; - int nwqe; + int err_tcp_cksum; + int err_ip_cksum; + int err_frame_crc; }; #define EHEA_IRQ_NAME_SIZE 20 @@ -347,6 +337,7 @@ struct ehea_q_skb_arr { * Port resources */ struct ehea_port_res { + struct port_stats p_stats; struct ehea_mr send_mr; /* send memory region */ struct ehea_mr recv_mr; /* receive memory region */ spinlock_t xmit_lock; @@ -358,7 +349,6 @@ struct ehea_port_res { struct ehea_cq *recv_cq; struct ehea_eq *eq; struct net_device *d_netdev; - spinlock_t send_lock; struct ehea_q_skb_arr rq1_skba; struct ehea_q_skb_arr rq2_skba; struct ehea_q_skb_arr rq3_skba; @@ -368,11 +358,8 @@ struct ehea_port_res { int swqe_refill_th; atomic_t swqe_avail; int swqe_ll_count; - int swqe_count; u32 swqe_id_counter; u64 tx_packets; - spinlock_t recv_lock; - struct port_state p_state; u64 rx_packets; u32 poll_counter; }; diff --git a/drivers/net/ehea/ehea_ethtool.c b/drivers/net/ehea/ehea_ethtool.c index 9f57c2e..170aff3 100644 --- a/drivers/net/ehea/ehea_ethtool.c +++ b/drivers/net/ehea/ehea_ethtool.c @@ -166,33 +166,23 @@ static u32 ehea_get_rx_csum(struct net_d } static char ehea_ethtool_stats_keys[][ETH_GSTRING_LEN] = { - {poll_max_processed}, - {queue_stopped}, - {min_swqe_avail}, - {poll_receive_err}, - {pkt_send}, - {pkt_xmit}, - {send_tasklet}, - {ehea_poll}, - {nwqe}, - {swqe_available_0}, {sig_comp_iv}, {swqe_refill_th}, {port resets}, - {rxo}, - {rx64}, - {rx65}, - {rx128}, - {rx256}, - {rx512}, - {rx1024}, - {txo}, - {tx64}, - {tx65}, - {tx128}, - {tx256}, - {tx512}, - {tx1024}, + {Receive errors}, + {TCP cksum errors}, + {IP cksum errors}, + {Frame cksum errors}, + {num SQ stopped}, + {SQ stopped}, + {PR0 free_swqes}, + {PR1 free_swqes}, + {PR2 free_swqes}, + {PR3 free_swqes}, + {PR4 free_swqes}, + {PR5 free_swqes}, + {PR6 free_swqes}, + {PR7 free_swqes}, }; static void ehea_get_strings(struct net_device *dev, u32 stringset, u8 *data) @@ -211,63 +201,44 @@ static int ehea_get_stats_count(struct n static void ehea_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *stats, u64 *data) { - u64 hret; - int i; + int i, k, tmp; struct ehea_port *port = netdev_priv(dev); - struct ehea_adapter *adapter = port-adapter; - struct ehea_port_res *pr = port-port_res[0]; - struct port_state *p_state = pr-p_state; - struct hcp_ehea_port_cb6 *cb6; for (i = 0; i ehea_get_stats_count(dev); i++) data[i] = 0; - i = 0; - data[i++] = p_state-poll_max_processed; - data[i++] = p_state-queue_stopped; -
Re: [git patches] net driver fixes
Jeff, might be worth getting the sk_buff leak fix in ppp from http://www.spinics.net/lists/netdev/msg27706.html in 2.6.21 too? Don't know how important it is for stable. It was present in 2.6.18 too. Thanks Guennadi --- Guennadi Liakhovetski - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[5/5] 2.6.21-rc4: known regressions (v2)
This email lists some known regressions in Linus' tree compared to 2.6.20. If you find your name in the Cc header, you are either submitter of one of the bugs, maintainer of an affectected subsystem or driver, a patch of you caused a breakage or I'm considering you in any other way possibly involved with one or more of these issues. Due to the huge amount of recipients, please trim the Cc when answering. Subject: Oops when changing DVB-T adapter References : http://lkml.org/lkml/2007/3/9/212 Submitter : CIJOML [EMAIL PROTECTED] Status : unknown Subject: USB: iPod doesn't work References : http://lkml.org/lkml/2007/3/21/320 Submitter : Tino Keitel [EMAIL PROTECTED] Handled-By : Oliver Neukum [EMAIL PROTECTED] Status : problem is being debuggged Subject: snd_intel8x0: divide error: References : http://lkml.org/lkml/2007/3/5/252 Submitter : Michal Piotrowski [EMAIL PROTECTED] Handled-By : Takashi Iwai [EMAIL PROTECTED] Status : problem is being debugged Subject: forcedeth: skb_over_panic References : http://bugzilla.kernel.org/show_bug.cgi?id=8058 Submitter : Albert Hopkins [EMAIL PROTECTED] Handled-By : Ayaz Abdulla [EMAIL PROTECTED] Patch : http://bugzilla.kernel.org/show_bug.cgi?id=8058 Status : patch available - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
From: Eric Dumazet [EMAIL PROTECTED] Date: Thu, 22 Mar 2007 23:03:04 +0100 David Miller a écrit : From: Nikolaos D. Bougalis [EMAIL PROTECTED] Date: Thu, 22 Mar 2007 12:44:09 -0700 People _have_ had problems. _I_ have had problems. And when someone with a few thousand drones under his control hoses your servers because he can do math and he leaves you with 2-item long chains, _you_ will have problems. No need to further argue this point, the people that matter (ie. me :-) understand it, don't worry.. Yes, I recall having one big server hit two years ago by an attack on tcp hash function. David sent me the patch to use jhash. It's performing well :) Welcome to the club :) Ok, how about we put something like the following into 2.6.21? I'm not looking for the hash perfectionist analysis, please bug the heck off if that's what your reply is going to be about, don't bother hitting the reply button I will ignore you. I want to hear instead if this makes attackability markedly _HARDER_ than what we have now and I am sure beyond a shadow of a doubt that it does. The secret is inialized when the first ehash-using socket is created, that's not perfect (bug off!) but it's better than doing the initialization in inet_init() or {tcp,dccp}_init() as we'll have a chance of at least some entropy when that first such socket is created. We definitely can't do it for the first AF_INET socket creation, because icmp creates a bunch of SOCK_RAW inet sockets at init time which would defeat the whole purpose of deferring this. :) Thanks. diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index c28e424..668056b 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -19,6 +19,9 @@ #include linux/in6.h #include linux/ipv6.h #include linux/types.h +#include linux/jhash.h + +#include net/inet_sock.h #include net/ipv6.h @@ -28,12 +31,11 @@ struct inet_hashinfo; static inline unsigned int inet6_ehashfn(const struct in6_addr *laddr, const u16 lport, const struct in6_addr *faddr, const __be16 fport) { - unsigned int hashent = (lport ^ (__force u16)fport); + u32 ports = (lport ^ (__force u16)fport); - hashent ^= (__force u32)(laddr-s6_addr32[3] ^ faddr-s6_addr32[3]); - hashent ^= hashent 16; - hashent ^= hashent 8; - return hashent; + return jhash_3words((__force u32)laddr-s6_addr32[3], + (__force u32)faddr-s6_addr32[3], + ports, inet_ehash_secret); } static inline int inet6_sk_ehashfn(const struct sock *sk) diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index ce6da97..62daf21 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -19,6 +19,7 @@ #include linux/string.h #include linux/types.h +#include linux/jhash.h #include net/flow.h #include net/sock.h @@ -167,13 +168,15 @@ static inline void inet_sk_copy_descendant(struct sock *sk_to, extern int inet_sk_rebuild_header(struct sock *sk); +extern u32 inet_ehash_secret; +extern void build_ehash_secret(void); + static inline unsigned int inet_ehashfn(const __be32 laddr, const __u16 lport, const __be32 faddr, const __be16 fport) { - unsigned int h = ((__force __u32)laddr ^ lport) ^ ((__force __u32)faddr ^ (__force __u32)fport); - h ^= h 16; - h ^= h 8; - return h; + return jhash_2words((__force __u32) laddr ^ (__force __u32) faddr, + ((__u32) lport) 16 | (__force __u32)fport, + inet_ehash_secret); } static inline int inet_sk_ehashfn(const struct sock *sk) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index cf358c8..308318a 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -87,6 +87,7 @@ #include linux/init.h #include linux/poll.h #include linux/netfilter_ipv4.h +#include linux/random.h #include asm/uaccess.h #include asm/system.h @@ -217,6 +218,16 @@ out: return err; } +u32 inet_ehash_secret; +EXPORT_SYMBOL(inet_ehash_secret); + +void build_ehash_secret(void) +{ + while (!inet_ehash_secret) + get_random_bytes(inet_ehash_secret, 4); +} +EXPORT_SYMBOL(build_ehash_secret); + /* * Create an inet socket. */ @@ -233,6 +244,11 @@ static int inet_create(struct socket *sock, int protocol) int try_loading_module = 0; int err; + if (sock-type != SOCK_RAW + sock-type != SOCK_DGRAM + !inet_ehash_secret) + build_ehash_secret(); + sock-state = SS_UNCONNECTED; /* Look for the requested type/protocol pair. */ diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 5cac14a..0de723f 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -98,6 +98,11 @@ static int inet6_create(struct socket *sock, int protocol) int try_loading_module
Re: L2TP support?
- Original Message - From: James Chapman [EMAIL PROTECTED] To: Ingo Oeser [EMAIL PROTECTED] Cc: netdev@vger.kernel.org Sent: Thursday, March 22, 2007 9:13 PM Subject: Re: L2TP support? Yes there is. There's a pppd plugin which comes with the openl2tp project, http://sf.net/projects/openl2tp. OpenL2TP supports both LAC and LNS operation. A patch is also available to allow this driver to be used with another L2TP implementation, l2tpd. Well, I am using a modified rp-l2tp with the pppol2tp kernel module myself so now it accounts for three implementations I guess :-) -Jorge == Jorge Boncompte - Ingenieria y Gestion de RED DTI2 - Desarrollo de la Tecnologia de las Comunicaciones -- C/ Abogado Enriquez Barrios, 5 14004 CORDOBA (SPAIN) Tlf: +34 957 761395 / FAX: +34 957 450380 == - Sin pistachos no hay Rock Roll... - Without wicker a basket cannot be made. == - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
On Thu, Mar 22, 2007 at 01:53:03PM -0700, Nikolaos D. Bougalis ([EMAIL PROTECTED]) wrote: Grrr, I think I pointed several times already, that properly distributed values do not change distribution after folding. And it can be seen in all tests (and in that you pointed too). Yes, I agree that the folding will not be a problem _IF_ the values are properly distributed -- although in that case, the folding is unnecessary. But that the Jenkins distribution didn't change (according to posts you made) after folding says that the output of Jenkins is pretty good to begin with ;) In _some_ cases, but not in all. We can use jhash_2words(laddr, faddr, portpair^inet_ehash_rnd) though. Please explain to me how jhash_2words solves the issue that you claim jhash_3words has, when they both use the same underlying bit-mixer? $c value is not properly distributed and significanly breaks overall distribution. Attacker, which controls $c (and it does it by controlling ports), can significantly increase selected hash chains. Even if we assume that $c is not properly distributed, using a secret cookie and mixing operations from different algebraic groups changes the calculus dramatically. It's no longer straight-forward for the attacker to generate collisions (as it is with the current function) because the '$c' supplied by the attacker is used in conjunction with the secret cookie before __jhash_mix thoroughly mixes the inputs to generate a hash. With XOR hash attacker can predict end result easily, with jenkins it can not (easily), but jenkins distribution itself (even for usual data) results in too long chains - there are two problems: 1. easily predicted result 2. broken distribution Xor hash has problems with first one, Jenkins (in some cases) with second. I've tested the Jenkins hash extensively. I see no evidence of this improper distribution that you describe. In fact, about the only person that I've seen advocate this in the archives of netdev is you, and a lot of other very smart people disagree with you, so I consider myself to be in good company. Hmm, I ran tests to select proper hash for netchannel implementation (actualy the same as sockets) and showed Jenkin's hash problems - it is enough to have only problem to state that there is a problem, doesn't it? Again, from what I've seen from your other posts, I don't believe you've identified any inherent problems with the Jenkins hash. But that aside for a moment, surely you will agree that the ability of an attacker with a few dozen machines under his control to trivially mount an algorithmic complexity attack causing serious performance drops is also a problem with the current code and one that must be addressed. Please refer to above two problems - Jenkins hash does not have problem with easy end result detection, instead if has distribution problem. Which means that attacker should not guess hash chains, it should provide special crafted input and distribution will be shifted to the higher levels. I will try to decipher phrase 'whatever it is, it's not there'... It meant that I saw nothing particularly interesting running the example you suggested and looking at the output. This thread for example: http://marc.info/?t=11705761351r=1w=2 I went through most of this thread. I don't see an analysis of the Jenkins. Am I missing something? There is no full analysis, I just posted results I found when selected hash for different projects with similar to sockets background. One your test shows thare are no problems, try that one I propose, which can be even created in userspace - you do not want even to get into account what I try to say to you. I'm not trying to be obnoxious on purpose here, but I don't see the test that you are referring to. Could you be more specific? http://marc.info/?l=linux-netdevm=117199140430104q=p5 -n -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
David Miller a écrit : From: Eric Dumazet [EMAIL PROTECTED] Welcome to the club :) Ok, how about we put something like the following into 2.6.21? 2.6.21 really ? Just to be clear : I had an attack two years ago, I applied your patch, rebooted the machine, and since then the attackers had to find another way to hurt the machine. Eventually, when I update the kernel of this machine, I forget to appply jhash patch, and attackers dont know they can try again :) I dont consider this new hash as bug fix at all, ie your patch might enter 2.6.22 normal dev cycle. Maybe a *fix*, independant of the hash function (so that no math expert can insult us), would be to have a *limit*, say... 1000 (something insane) on the length of a hash chain ? In my case, I saw lengths of about 3000 two years ago under attack, but machine was still usable... maybe in half power mode. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
On Thu, Mar 22, 2007 at 01:58:34PM -0700, David Miller ([EMAIL PROTECTED]) wrote: From: Nikolaos D. Bougalis [EMAIL PROTECTED] Date: Thu, 22 Mar 2007 12:44:09 -0700 People _have_ had problems. _I_ have had problems. And when someone with a few thousand drones under his control hoses your servers because he can do math and he leaves you with 2-item long chains, _you_ will have problems. No need to further argue this point, the people that matter (ie. me :-) understand it, don't worry.. Call me a loooser which mail will be deleted on arrival, but... jhash_2words(const, const, ((const 16) | $sport) ^ $random) where $sport is 1-65535 in a loop, and $random is pseudo-random number obtained on start. Which is exactly the case of web server and attacker connects to 80 port from the same IP address and different source ports. Result with jenkins: 1 23880 2 12108 3 4040 4 1019 5 200 6 30 7 8 8 1 Xor: 1 65536 Please, do not apply patch as is, I will devote this day to find where jenkins has problems and try to fix distribution. If I will fail, then it is up to you to decide that above results are bad or good. Thank you. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
Evgeniy Polyakov a ecrit : Call me a loooser which mail will be deleted on arrival, but... jhash_2words(const, const, ((const 16) | $sport) ^ $random) where $sport is 1-65535 in a loop, and $random is pseudo-random number obtained on start. Which is exactly the case of web server and attacker connects to 80 port from the same IP address and different source ports. Result with jenkins: 1 23880 2 12108 3 4040 4 1019 5 200 6 30 7 8 8 1 Xor: 1 65536 So what ? You still think hash function must be bijective ? Come on ! You have a machine somewhere that allows 65536 concurrent connections coming from the same IP address ? The last problem you have is the nature of tcp hash function. Dont argue again with your pseudo science. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
On Fri, Mar 23, 2007 at 09:17:19AM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: You have a machine somewhere that allows 65536 concurrent connections coming from the same IP address ? Attached png file of botnet scenario: 1000 addresses from the same network (class B for example), each one creates 1024 connections to the same static port. Eric, I agree, that XOR hash is not perfect, and it should be changed, but not blindly. I perfectly know that hash function is not bijective, but it must have good distribution. Function like this int hash(u32 saddr, u16 sport, u32 daddr, u16 dport, u32 rand) { return rand ^ saddr ^ daddr)16)^(dport ^ sport)) 8); } has even worse _distribution_, although you can not predict its end result due to random value, and attacker will not try to do it. -- Evgeniy Polyakov jhash_botnet.png Description: PNG image
Re: RFC: Established connections hash function
On Fri, Mar 23, 2007 at 11:33:32AM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: Eric, I agree, that XOR hash is not perfect, and it should be changed, but not blindly. Attached case of how broken can be xor in botnet scenario. -- Evgeniy Polyakov jhash_good.png Description: PNG image
Re: [PATCH 4/5] netem: avoid excessive requeues
David Miller wrote: From: Patrick McHardy [EMAIL PROTECTED] Date: Thu, 22 Mar 2007 21:40:43 +0100 Perhaps we should put this in qdisc_restart, other qdiscs have the same problem. Agreed, patches welcome :) I've tried this, but for some reason it makes TBF stay about 5% under the configured rate. Probably because of late timers, the strange thing is that the 5% happen constantly even with very low rates. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
XOR hash beauty solved [Was: RFC: Established connections hash function]
Please, do not apply patch as is, I will devote this day to find where jenkins has problems and try to fix distribution. If I will fail, then it is up to you to decide that above results are bad or good. I need to admit that I was partially wrong in my analysis of the Jenkins hash distribution - it does _not_ have problems and any kind of artifacts. Waves found in tests are results of folding into hash_size boundary, distribution inside F(32) field is unifirm. XOR hash does not have such problem, because it uses (u32 ^ u16) as one round, which results in the uniform (it is not correct to call that distribution uniform as is, but only getting into account that u16 values used in tests were uniformly distributed) distribution inside F(16), which does not suffer from hash_size boundary folding. Since XOR hash has 3 rounds, only one of them (xor of the final u32 values) will suffer from folding, but tests where is it can be determined for sure use constant addresses, so problem hides again. So, briefly saying, jhash_2/3words have safe distribution, but have higher-number of elements waves as a result of folding which is unavoidable for general-purpose hash. Now my conscience is calm :) -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.
On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote: On Wed, 21 Mar 2007, Samuel Ortiz wrote: I'm quite sure the leak is in the IrDA code rather than in the ppp or ipv4 one, hence the need for full irda debug... Well, looks like you were wrong, Samuel. Below is a patch that fixes ONE sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if there are more there. Thanks Guennadi - Guennadi Liakhovetski, Ph.D. DSA Daten- und Systemtechnik GmbH Pascalstr. 28 D-52076 Aachen Germany Don't leak an sk_buff on interface destruction. Signed-off-by: G. Liakhovetski [EMAIL PROTECTED] --- a/drivers/net/ppp_generic.c 2007-03-23 13:04:04.0 +0100 +++ b/drivers/net/ppp_generic.c 2007-03-23 13:05:29.0 +0100 @@ -2544,6 +2544,9 @@ ppp-active_filter = NULL; #endif /* CONFIG_PPP_FILTER */ + if (ppp-xmit_pending) + kfree_skb(ppp-xmit_pending); + kfree(ppp); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Established connections hash function
Let me start off by saying that I hope I didn't come across as condenscending in my previous posts. If I did, then it wasn't intended. Now, on to more important things :) jhash_2words(const, const, ((const 16) | $sport) ^ $random) where $sport is 1-65535 in a loop, and $random is pseudo-random number obtained on start. If you are correct that jhash_3words doesn't properly distribute the bits in 'c' (which I don't believe you are, but let's assume it for a second) then this function will also be broken: jhash_2words calls jhash_3words; jhash_3words adds (a linear operation) initval and c before calling __jhash_mix. So, if there is a problem with passing values under the direct control of the attacker into 'c' both jhash_2words and jhash_3words are affected; in other words, this variant would also be flawed. Which is exactly the case of web server and attacker connects to 80 port from the same IP address and different source ports. Result with jenkins: 1 23880 2 12108 3 4040 4 1019 5 200 6 30 7 8 8 1 Xor: 1 65536 I believe that the XOR results, if generated by your test above, are somewhat meaningless because you're feeding what is ideal input into the XOR hash. Which means that you'll get a perfect distribution. With your input, one might as well suggest that using the remote port will give a perfect distribution, and it will, but only for that specific input. Just for kicks, I went to one of our servers, and did netstat -n | grep ESTABLISHED and ended up with 31072 distinct ip:port/ip:port 4-tuples which I then hashed into a 65536 bucket table. There are the results; feel free to draw your own conclusions: [ I think this should come out looking good; sorry if whitespace is screwy ] +---+---+---+---+---+ | | xor | j2w 1 | j2w 2 | j3w 1 | +---+---+---+---+ | 0 | 40868 | 40930 | 40767 | 40750 | | 1 | 19208 | 19119 | 19382 | 19413 | | 2 | 4636 | 4618 | 4576 | 4554 | | 3 | 716 | 769 | 715 | 734 | | 4 |99 |91 |87 |76 | | 5 | 7 | 8 | 9 | 9 | | 6 | 1 | 1 | 0 | 0 | | 7 | 1 | 0 | 0 | 0 | | 8 | 0 | 0 | 0 | 0 | +---+---+---+---+---+ xor: the vanilla linux function j2w 1 is my variant: jhash_2words(laddr + rport, raddr + lport, seed) j2w 2 is your variant: jhash_2words(laddr, raddr, (rport 16) ^ lport) ^ seed) j3w: jhash_3words(laddr, raddr, (rport 12) + lport, seed) The seed used for all the Jenkins hashes came from the low-order 32-bits returned by RDTSC, executed when the program started. It remained constant throughout the run. 8 runs where made, to ensure that the seed wasn't causing weirdness, all runs giving almost identical results. The Jenkins hashes did not use the extra 2 right-shifts to fold high-order bits into the low-order bits, that is employed by the XOR hash. -n - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix use of uninitialized field in mv643xx_eth
In this driver, the default ethernet address is first set by by calling eth_port_uc_addr_get() which reads the relevant registers of the corresponding port as initially set by firmware. However that function used the port_num field accessed through the private area of net_dev before it was set. The result was that one board I have ended up with the unicast address set to 00:00:00:00:00:00 (only port 1 is connected on this board). The problem appeared after commit 84dd619e4dc3b0b1c40dafd98c90fd950bce7bc5. This patch fixes the bug by making eth_port_uc_get_addr() more similar to eth_port_uc_set_addr(), i.e., by using the port number as the first parameter instead of a pointer to struct net_device. Signed-off-by: Gabriel Paubert [EMAIL PROTECTED] -- The minimal patch I first tried consisted in just moving mp-port_num to before the call to eth_port_uc_get_addr(). The other question is why the driver never gets the info from the device tree on this PPC board, but that's for another list despite the fact I lost some time looking for bugs in the OF interface before stumbling on this use of a field before it was initialized. diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c index 1ee27c3..ca459e0 100644 --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -51,7 +51,7 @@ #include mv643xx_eth.h /* Static function declarations */ -static void eth_port_uc_addr_get(struct net_device *dev, +static void eth_port_uc_addr_get(unsigned int port_num, unsigned char *MacAddr); static void eth_port_set_multicast_list(struct net_device *); static void mv643xx_eth_port_enable_tx(unsigned int port_num, @@ -1382,7 +1382,7 @@ static int mv643xx_eth_probe(struct platform_device *pdev) port_num = pd-port_number; /* set default config values */ - eth_port_uc_addr_get(dev, dev-dev_addr); + eth_port_uc_addr_get(port_num, dev-dev_addr); mp-rx_ring_size = MV643XX_ETH_PORT_DEFAULT_RECEIVE_QUEUE_SIZE; mp-tx_ring_size = MV643XX_ETH_PORT_DEFAULT_TRANSMIT_QUEUE_SIZE; @@ -1883,14 +1883,13 @@ static void eth_port_uc_addr_set(unsigned int eth_port_num, * N/A. * */ -static void eth_port_uc_addr_get(struct net_device *dev, unsigned char *p_addr) +static void eth_port_uc_addr_get(unsigned int port_num, unsigned char *p_addr) { - struct mv643xx_private *mp = netdev_priv(dev); unsigned int mac_h; unsigned int mac_l; - mac_h = mv_read(MV643XX_ETH_MAC_ADDR_HIGH(mp-port_num)); - mac_l = mv_read(MV643XX_ETH_MAC_ADDR_LOW(mp-port_num)); + mac_h = mv_read(MV643XX_ETH_MAC_ADDR_HIGH(port_num)); + mac_l = mv_read(MV643XX_ETH_MAC_ADDR_LOW(port_num)); p_addr[0] = (mac_h 24) 0xff; p_addr[1] = (mac_h 16) 0xff; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XOR hash beauty solved [Was: RFC: Established connections hash function]
So, briefly saying, jhash_2/3words have safe distribution, but have higher-number of elements waves as a result of folding which is unavoidable for general-purpose hash. Thanks for the analysis. -n - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1][PKT_CLS] Avoid multiple tree locks
On Thu, 2007-22-03 at 12:36 +0100, Patrick McHardy wrote: jamal wrote: On Wed, 2007-21-03 at 15:04 +0100, Patrick McHardy wrote: We can remove qdisc_tree_lock since with this patch all changes and all tree walking happen under the RTNL. We still need to keep dev-queue_lock for the data path. ok, that would work. Should have been obvious to me. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET_SCHED 00/11]: pkt_sched.h cleanup + misc changes
These patches fix an off-by-one in netem, clean up pkt_sched.h by removing most of the now unnecessary PSCHED time macros and turning the two remaining ones into inline functions, consolidate some common filter destruction code and move the TCQ_F_THROTTLED optimization from netem to qdisc_restart. Please apply, thanks. include/net/pkt_sched.h | 24 +++--- include/net/red.h | 10 +++--- include/net/sch_generic.h | 10 +- net/sched/act_police.c| 17 -- net/sched/sch_api.c | 20 ++-- net/sched/sch_atm.c | 17 +- net/sched/sch_cbq.c | 76 ++ net/sched/sch_dsmark.c|8 net/sched/sch_generic.c |4 ++ net/sched/sch_hfsc.c | 23 +++-- net/sched/sch_htb.c | 24 -- net/sched/sch_ingress.c |7 net/sched/sch_netem.c | 24 +- net/sched/sch_prio.c |7 net/sched/sch_tbf.c |9 ++--- 15 files changed, 110 insertions(+), 170 deletions(-) Patrick McHardy (11): [NET_SCHED]: sch_netem: fix off-by-one in send time comparison [NET_SCHED]: kill PSCHED_AUDIT_TDIFF [NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2 [NET_SCHED]: kill PSCHED_TLESS [NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT [NET_SCHED]: kill PSCHED_TDIFF [NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function [NET_SCHED]: turn PSCHED_GET_TIME into inline function [NET_SCHED]: Unline tcf_destroy [NET_SCHED]: qdisc: remove unnecessary memory barriers [NET_SCHED]: qdisc: avoid dequeue while throttled - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET_SCHED 01/11]: sch_netem: fix off-by-one in send time comparison
[NET_SCHED]: sch_netem: fix off-by-one in send time comparison netem checks PSCHED_TLESS(cb-time_to_send, now) to find out whether it is allowed to send a packet, which is equivalent to cb-time_to_send now. Use !PSCHED_TLESS(now, cb-time_to_send) instead to properly handle cb-time_to_send == now. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 9f8c07452088f432c79ac3a8d87d6adebcce57df tree 42b214f74b8b2d5bd3065e9f63d8048beb4f3bdc parent 3231f075945001667eafaf325abab8c992b3d1e4 author Patrick McHardy [EMAIL PROTECTED] Thu, 22 Mar 2007 23:57:32 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:26 +0100 net/sched/sch_netem.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 3e1b633..bc42843 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -286,7 +286,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) /* if more time remaining? */ PSCHED_GET_TIME(now); - if (PSCHED_TLESS(cb-time_to_send, now)) { + if (!PSCHED_TLESS(now, cb-time_to_send)) { pr_debug(netem_dequeue: return skb=%p\n, skb); sch-q.qlen--; return skb; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET_SCHED 02/11]: kill PSCHED_AUDIT_TDIFF
[NET_SCHED]: kill PSCHED_AUDIT_TDIFF Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 4a4a3d59dca71f202ab063b909c84c96c8ea09a7 tree 4958bfec571a3330bd023ebe50f7b071f6dc7dd7 parent 9f8c07452088f432c79ac3a8d87d6adebcce57df author Patrick McHardy [EMAIL PROTECTED] Thu, 22 Mar 2007 23:58:12 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:27 +0100 include/net/pkt_sched.h |1 - net/sched/sch_cbq.c |2 -- 2 files changed, 0 insertions(+), 3 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 6555e57..276d1ad 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -59,7 +59,6 @@ typedef long psched_tdiff_t; #define PSCHED_TADD(tv, delta) ((tv) += (delta)) #define PSCHED_SET_PASTPERFECT(t) ((t) = 0) #define PSCHED_IS_PASTPERFECT(t) ((t) == 0) -#definePSCHED_AUDIT_TDIFF(t) struct qdisc_watchdog { struct hrtimer timer; diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index dcd9c31..57ac6c5 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -820,8 +820,6 @@ cbq_update(struct cbq_sched_data *q) idle -= L2T(q-link, len); idle += L2T(cl, len); - PSCHED_AUDIT_TDIFF(idle); - PSCHED_TADD2(q-now, idle, cl-undertime); } else { /* Underlimit */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET_SCHED 05/11]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT
[NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT Use direct assignment and comparison instead. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit ec252ac5640ea38d3630cdb97333c398a75391b9 tree bce7b2c63ffb0694942484418f1adf08ed78292d parent 4f8fc418f88c0b7ee6e726b05f27c42d8e20593c author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:01:32 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:29 +0100 include/net/pkt_sched.h |3 +-- include/net/red.h |4 ++-- net/sched/sch_cbq.c | 17 - net/sched/sch_netem.c |2 +- 4 files changed, 12 insertions(+), 14 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 49325ff..c40147a 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -54,8 +54,7 @@ typedef long psched_tdiff_t; #define PSCHED_TDIFF(tv1, tv2) (long)((tv1) - (tv2)) #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \ min_t(long long, (tv1) - (tv2), bound) -#define PSCHED_SET_PASTPERFECT(t) ((t) = 0) -#define PSCHED_IS_PASTPERFECT(t) ((t) == 0) +#define PSCHED_PASTPERFECT 0 struct qdisc_watchdog { struct hrtimer timer; diff --git a/include/net/red.h b/include/net/red.h index a4eb379..d9e1149 100644 --- a/include/net/red.h +++ b/include/net/red.h @@ -151,7 +151,7 @@ static inline void red_set_parms(struct red_parms *p, static inline int red_is_idling(struct red_parms *p) { - return !PSCHED_IS_PASTPERFECT(p-qidlestart); + return p-qidlestart != PSCHED_PASTPERFECT; } static inline void red_start_of_idle_period(struct red_parms *p) @@ -161,7 +161,7 @@ static inline void red_start_of_idle_period(struct red_parms *p) static inline void red_end_of_idle_period(struct red_parms *p) { - PSCHED_SET_PASTPERFECT(p-qidlestart); + p-qidlestart = PSCHED_PASTPERFECT; } static inline void red_restart(struct red_parms *p) diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index 9e6cdab..2bb271b 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -738,7 +738,7 @@ cbq_update_toplevel(struct cbq_sched_data *q, struct cbq_class *cl, if (cl q-toplevel = borrowed-level) { if (cl-q-q.qlen 1) { do { - if (PSCHED_IS_PASTPERFECT(borrowed-undertime)) { + if (borrowed-undertime == PSCHED_PASTPERFECT) { q-toplevel = borrowed-level; return; } @@ -824,7 +824,7 @@ cbq_update(struct cbq_sched_data *q) } else { /* Underlimit */ - PSCHED_SET_PASTPERFECT(cl-undertime); + cl-undertime = PSCHED_PASTPERFECT; if (avgidle cl-maxidle) cl-avgidle = cl-maxidle; else @@ -845,7 +845,7 @@ cbq_under_limit(struct cbq_class *cl) if (cl-tparent == NULL) return cl; - if (PSCHED_IS_PASTPERFECT(cl-undertime) || q-now = cl-undertime) { + if (cl-undertime == PSCHED_PASTPERFECT || q-now = cl-undertime) { cl-delayed = 0; return cl; } @@ -868,8 +868,7 @@ cbq_under_limit(struct cbq_class *cl) } if (cl-level q-toplevel) return NULL; - } while (!PSCHED_IS_PASTPERFECT(cl-undertime) -q-now cl-undertime); + } while (cl-undertime != PSCHED_PASTPERFECT q-now cl-undertime); cl-delayed = 0; return cl; @@ -1054,11 +1053,11 @@ cbq_dequeue(struct Qdisc *sch) */ if (q-toplevel == TC_CBQ_MAXLEVEL - PSCHED_IS_PASTPERFECT(q-link.undertime)) + q-link.undertime == PSCHED_PASTPERFECT) break; q-toplevel = TC_CBQ_MAXLEVEL; - PSCHED_SET_PASTPERFECT(q-link.undertime); + q-link.undertime = PSCHED_PASTPERFECT; } /* No packets in scheduler or nobody wants to give them to us :-( @@ -1289,7 +1288,7 @@ cbq_reset(struct Qdisc* sch) qdisc_reset(cl-q); cl-next_alive = NULL; - PSCHED_SET_PASTPERFECT(cl-undertime); + cl-undertime = PSCHED_PASTPERFECT; cl-avgidle = cl-maxidle; cl-deficit = cl-quantum; cl-cpriority = cl-priority; @@ -1650,7 +1649,7 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg, cl-xstats.avgidle = cl-avgidle; cl-xstats.undertime = 0; - if (!PSCHED_IS_PASTPERFECT(cl-undertime)) + if (cl-undertime != PSCHED_PASTPERFECT) cl-xstats.undertime = PSCHED_TDIFF(cl-undertime,
[NET_SCHED 07/11]: turn PSCHED_TDIFF_SAFE into inline function
[NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function Also rename to psched_tdiff_bounded. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit c86b236046f7de4094ceb2b2cb069c32969ee36c tree 27c99a0d619bcabf384838adeae3c0469472b86b parent d72d57707edf96c31e62da0841faf59c011dcd92 author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:01:59 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:30 +0100 include/net/pkt_sched.h |8 ++-- include/net/red.h |2 +- net/sched/act_police.c |8 net/sched/sch_htb.c |4 ++-- net/sched/sch_tbf.c |2 +- 5 files changed, 14 insertions(+), 10 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 1639737..e6b1da0 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -51,10 +51,14 @@ typedef longpsched_tdiff_t; #define PSCHED_GET_TIME(stamp) \ ((stamp) = PSCHED_NS2US(ktime_to_ns(ktime_get( -#define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \ - min_t(long long, (tv1) - (tv2), bound) #define PSCHED_PASTPERFECT 0 +static inline psched_tdiff_t +psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, psched_time_t bound) +{ + return min(tv1 - tv2, bound); +} + struct qdisc_watchdog { struct hrtimer timer; struct Qdisc*qdisc; diff --git a/include/net/red.h b/include/net/red.h index d9e1149..0bc1691 100644 --- a/include/net/red.h +++ b/include/net/red.h @@ -178,7 +178,7 @@ static inline unsigned long red_calc_qavg_from_idle_time(struct red_parms *p) int shift; PSCHED_GET_TIME(now); - us_idle = PSCHED_TDIFF_SAFE(now, p-qidlestart, p-Scell_max); + us_idle = psched_tdiff_bounded(now, p-qidlestart, p-Scell_max); /* * The problem: ideally, average length queue recalcultion should diff --git a/net/sched/act_police.c b/net/sched/act_police.c index 0a5679e..65d60a3 100644 --- a/net/sched/act_police.c +++ b/net/sched/act_police.c @@ -298,8 +298,8 @@ static int tcf_act_police(struct sk_buff *skb, struct tc_action *a, PSCHED_GET_TIME(now); - toks = PSCHED_TDIFF_SAFE(now, police-tcfp_t_c, -police-tcfp_burst); + toks = psched_tdiff_bounded(now, police-tcfp_t_c, + police-tcfp_burst); if (police-tcfp_P_tab) { ptoks = toks + police-tcfp_ptoks; if (ptoks (long)L2T_P(police, police-tcfp_mtu)) @@ -544,8 +544,8 @@ int tcf_police(struct sk_buff *skb, struct tcf_police *police) } PSCHED_GET_TIME(now); - toks = PSCHED_TDIFF_SAFE(now, police-tcfp_t_c, -police-tcfp_burst); + toks = psched_tdiff_bounded(now, police-tcfp_t_c, + police-tcfp_burst); if (police-tcfp_P_tab) { ptoks = toks + police-tcfp_ptoks; if (ptoks (long)L2T_P(police, police-tcfp_mtu)) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index d265ac4..f629ce2 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -729,7 +729,7 @@ static void htb_charge_class(struct htb_sched *q, struct htb_class *cl, cl-T = toks while (cl) { - diff = PSCHED_TDIFF_SAFE(q-now, cl-t_c, (u32) cl-mbuffer); + diff = psched_tdiff_bounded(q-now, cl-t_c, cl-mbuffer); if (cl-level = level) { if (cl-level == level) cl-xstats.lends++; @@ -789,7 +789,7 @@ static psched_time_t htb_do_events(struct htb_sched *q, int level) return cl-pq_key; htb_safe_rb_erase(p, q-wait_pq + level); - diff = PSCHED_TDIFF_SAFE(q-now, cl-t_c, (u32) cl-mbuffer); + diff = psched_tdiff_bounded(q-now, cl-t_c, cl-mbuffer); htb_change_class_mode(q, cl, diff); if (cl-cmode != HTB_CAN_SEND) htb_add_to_wait_tree(q, cl, diff); diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index 626ce96..da9f40e 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -201,7 +201,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc* sch) PSCHED_GET_TIME(now); - toks = PSCHED_TDIFF_SAFE(now, q-t_c, q-buffer); + toks = psched_tdiff_bounded(now, q-t_c, q-buffer); if (q-P_tab) { ptoks = toks + q-ptokens; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET_SCHED 06/11]: kill PSCHED_TDIFF
[NET_SCHED]: kill PSCHED_TDIFF Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit d72d57707edf96c31e62da0841faf59c011dcd92 tree 8b6192c94e025fb8b6e1be3b02526d4792bd4fa1 parent ec252ac5640ea38d3630cdb97333c398a75391b9 author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:01:47 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:29 +0100 include/net/pkt_sched.h |1 - net/sched/sch_cbq.c | 14 +++--- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index c40147a..1639737 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -51,7 +51,6 @@ typedef long psched_tdiff_t; #define PSCHED_GET_TIME(stamp) \ ((stamp) = PSCHED_NS2US(ktime_to_ns(ktime_get( -#define PSCHED_TDIFF(tv1, tv2) (long)((tv1) - (tv2)) #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \ min_t(long long, (tv1) - (tv2), bound) #define PSCHED_PASTPERFECT 0 diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index 2bb271b..f9e8403 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -386,7 +386,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct cbq_class *cl) psched_tdiff_t incr; PSCHED_GET_TIME(now); - incr = PSCHED_TDIFF(now, q-now_rt); + incr = now - q-now_rt; now = q-now + incr; do { @@ -474,7 +474,7 @@ cbq_requeue(struct sk_buff *skb, struct Qdisc *sch) static void cbq_ovl_classic(struct cbq_class *cl) { struct cbq_sched_data *q = qdisc_priv(cl-qdisc); - psched_tdiff_t delay = PSCHED_TDIFF(cl-undertime, q-now); + psched_tdiff_t delay = cl-undertime - q-now; if (!cl-delayed) { delay += cl-offtime; @@ -509,7 +509,7 @@ static void cbq_ovl_classic(struct cbq_class *cl) psched_tdiff_t base_delay = q-wd_expires; for (b = cl-borrow; b; b = b-borrow) { - delay = PSCHED_TDIFF(b-undertime, q-now); + delay = b-undertime - q-now; if (delay base_delay) { if (delay = 0) delay = 1; @@ -547,7 +547,7 @@ static void cbq_ovl_rclassic(struct cbq_class *cl) static void cbq_ovl_delay(struct cbq_class *cl) { struct cbq_sched_data *q = qdisc_priv(cl-qdisc); - psched_tdiff_t delay = PSCHED_TDIFF(cl-undertime, q-now); + psched_tdiff_t delay = cl-undertime - q-now; if (!cl-delayed) { psched_time_t sched = q-now; @@ -776,7 +776,7 @@ cbq_update(struct cbq_sched_data *q) idle = (now - last) - last_pktlen/rate */ - idle = PSCHED_TDIFF(q-now, cl-last); + idle = q-now - cl-last; if ((unsigned long)idle 128*1024*1024) { avgidle = cl-maxidle; } else { @@ -1004,7 +1004,7 @@ cbq_dequeue(struct Qdisc *sch) psched_tdiff_t incr; PSCHED_GET_TIME(now); - incr = PSCHED_TDIFF(now, q-now_rt); + incr = now - q-now_rt; if (q-tx_class) { psched_tdiff_t incr2; @@ -1650,7 +1650,7 @@ cbq_dump_class_stats(struct Qdisc *sch, unsigned long arg, cl-xstats.undertime = 0; if (cl-undertime != PSCHED_PASTPERFECT) - cl-xstats.undertime = PSCHED_TDIFF(cl-undertime, q-now); + cl-xstats.undertime = cl-undertime - q-now; if (gnet_stats_copy_basic(d, cl-bstats) 0 || #ifdef CONFIG_NET_ESTIMATOR - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET_SCHED 09/11]: Unline tcf_destroy
[NET_SCHED]: Unline tcf_destroy Uninline tcf_destroy and add a helper function to destroy an entire filter chain. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 8da4bcec7e54c8344c8fd77c72a61f24ce12cfc3 tree 7f36f4af8e9413637fb9b65501f281fd8a915da3 parent 231788aa3b9eef85b72ecac2e33441bd842ce3f4 author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:31 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:31 +0100 include/net/sch_generic.h | 10 ++ net/sched/sch_api.c | 18 ++ net/sched/sch_atm.c | 17 ++--- net/sched/sch_cbq.c | 14 ++ net/sched/sch_dsmark.c|8 +--- net/sched/sch_hfsc.c | 13 + net/sched/sch_htb.c | 14 ++ net/sched/sch_ingress.c |7 +-- net/sched/sch_prio.c |7 +-- 9 files changed, 30 insertions(+), 78 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index a3f4ddd..1b8e351 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -177,14 +177,8 @@ extern void qdisc_tree_decrease_qlen(struct Qdisc *qdisc, unsigned int n); extern struct Qdisc *qdisc_alloc(struct net_device *dev, struct Qdisc_ops *ops); extern struct Qdisc *qdisc_create_dflt(struct net_device *dev, struct Qdisc_ops *ops, u32 parentid); - -static inline void -tcf_destroy(struct tcf_proto *tp) -{ - tp-ops-destroy(tp); - module_put(tp-ops-owner); - kfree(tp); -} +extern void tcf_destroy(struct tcf_proto *tp); +extern void tcf_destroy_chain(struct tcf_proto *fl); static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff_head *list) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 5873250..5b5bce0 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1220,6 +1220,24 @@ reclassify: return -1; } +void tcf_destroy(struct tcf_proto *tp) +{ + tp-ops-destroy(tp); + module_put(tp-ops-owner); + kfree(tp); +} + +void tcf_destroy_chain(struct tcf_proto *fl) +{ + struct tcf_proto *tp; + + while ((tp = fl) != NULL) { + fl = tp-next; + tcf_destroy(tp); + } +} +EXPORT_SYMBOL(tcf_destroy_chain); + #ifdef CONFIG_PROC_FS static int psched_show(struct seq_file *seq, void *v) { diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c index 0cc3c9b..be7d299 100644 --- a/net/sched/sch_atm.c +++ b/net/sched/sch_atm.c @@ -158,19 +158,6 @@ static unsigned long atm_tc_bind_filter(struct Qdisc *sch, return atm_tc_get(sch,classid); } - -static void destroy_filters(struct atm_flow_data *flow) -{ - struct tcf_proto *filter; - - while ((filter = flow-filter_list)) { - DPRINTK(destroy_filters: destroying filter %p\n,filter); - flow-filter_list = filter-next; - tcf_destroy(filter); - } -} - - /* * atm_tc_put handles all destructions, including the ones that are explicitly * requested (atm_tc_destroy, etc.). The assumption here is that we never drop @@ -195,7 +182,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl) *prev = flow-next; DPRINTK(atm_tc_put: qdisc %p\n,flow-q); qdisc_destroy(flow-q); - destroy_filters(flow); + tcf_destroy_chain(flow-filter_list); if (flow-sock) { DPRINTK(atm_tc_put: f_count %d\n, file_count(flow-sock-file)); @@ -611,7 +598,7 @@ static void atm_tc_destroy(struct Qdisc *sch) DPRINTK(atm_tc_destroy(sch %p,[qdisc %p])\n,sch,p); /* races ? */ while ((flow = p-flows)) { - destroy_filters(flow); + tcf_destroy_chain(flow-filter_list); if (flow-ref 1) printk(KERN_ERR atm_destroy: %p-ref = %d\n,flow, flow-ref); diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index 414a97c..a294542 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -1717,23 +1717,13 @@ static unsigned long cbq_get(struct Qdisc *sch, u32 classid) return 0; } -static void cbq_destroy_filters(struct cbq_class *cl) -{ - struct tcf_proto *tp; - - while ((tp = cl-filter_list) != NULL) { - cl-filter_list = tp-next; - tcf_destroy(tp); - } -} - static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl) { struct cbq_sched_data *q = qdisc_priv(sch); BUG_TRAP(!cl-filters); - cbq_destroy_filters(cl); + tcf_destroy_chain(cl-filter_list); qdisc_destroy(cl-q); qdisc_put_rtab(cl-R_tab); #ifdef CONFIG_NET_ESTIMATOR @@ -1760,7 +1750,7 @@ cbq_destroy(struct Qdisc* sch) */ for (h = 0; h 16; h++) for (cl = q-classes[h]; cl; cl = cl-next) -
[NET_SCHED 03/11]: kill PSCHED_TADD/PSCHED_TADD2
[NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2 Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 145a1a6010c6b852ffab28c110d8911a6161aa8b tree 84b7bf284ea3b870a9b5fd9dae3adaad9979dc26 parent 4a4a3d59dca71f202ab063b909c84c96c8ea09a7 author Patrick McHardy [EMAIL PROTECTED] Thu, 22 Mar 2007 23:58:42 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:27 +0100 include/net/pkt_sched.h |2 -- net/sched/sch_cbq.c | 12 ++-- net/sched/sch_netem.c |2 +- 3 files changed, 7 insertions(+), 9 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 276d1ad..32cdf01 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -55,8 +55,6 @@ typedef long psched_tdiff_t; #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \ min_t(long long, (tv1) - (tv2), bound) #define PSCHED_TLESS(tv1, tv2) ((tv1) (tv2)) -#define PSCHED_TADD2(tv, delta, tv_res) ((tv_res) = (tv) + (delta)) -#define PSCHED_TADD(tv, delta) ((tv) += (delta)) #define PSCHED_SET_PASTPERFECT(t) ((t) = 0) #define PSCHED_IS_PASTPERFECT(t) ((t) == 0) diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index 57ac6c5..290b26b 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -387,7 +387,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct cbq_class *cl) PSCHED_GET_TIME(now); incr = PSCHED_TDIFF(now, q-now_rt); - PSCHED_TADD2(q-now, incr, now); + now = q-now + incr; do { if (PSCHED_TLESS(cl-undertime, now)) { @@ -492,7 +492,7 @@ static void cbq_ovl_classic(struct cbq_class *cl) cl-avgidle = cl-minidle; if (delay = 0) delay = 1; - PSCHED_TADD2(q-now, delay, cl-undertime); + cl-undertime = q-now + delay; cl-xstats.overactions++; cl-delayed = 1; @@ -558,7 +558,7 @@ static void cbq_ovl_delay(struct cbq_class *cl) delay -= (-cl-avgidle) - ((-cl-avgidle) cl-ewma_log); if (cl-avgidle cl-minidle) cl-avgidle = cl-minidle; - PSCHED_TADD2(q-now, delay, cl-undertime); + cl-undertime = q-now + delay; if (delay 0) { sched += delay + cl-penalty; @@ -820,7 +820,7 @@ cbq_update(struct cbq_sched_data *q) idle -= L2T(q-link, len); idle += L2T(cl, len); - PSCHED_TADD2(q-now, idle, cl-undertime); + cl-undertime = q-now + idle; } else { /* Underlimit */ @@ -1018,12 +1018,12 @@ cbq_dequeue(struct Qdisc *sch) cbq_time = max(real_time, work); */ incr2 = L2T(q-link, q-tx_len); - PSCHED_TADD(q-now, incr2); + q-now += incr2; cbq_update(q); if ((incr -= incr2) 0) incr = 0; } - PSCHED_TADD(q-now, incr); + q-now += incr; q-now_rt = now; for (;;) { diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index bc42843..6044ae7 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -218,7 +218,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch) q-delay_cor, q-delay_dist); PSCHED_GET_TIME(now); - PSCHED_TADD2(now, delay, cb-time_to_send); + cb-time_to_send = now + delay; ++q-counter; ret = q-qdisc-enqueue(skb, q-qdisc); } else { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET_SCHED 08/11]: turn PSCHED_GET_TIME into inline function
[NET_SCHED]: turn PSCHED_GET_TIME into inline function Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 231788aa3b9eef85b72ecac2e33441bd842ce3f4 tree f302e509ec32a86bc9a6c3712d188fc91455a213 parent c86b236046f7de4094ceb2b2cb069c32969ee36c author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:02:12 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:30 +0100 include/net/pkt_sched.h |8 +--- include/net/red.h |4 ++-- net/sched/act_police.c |9 - net/sched/sch_cbq.c | 10 +- net/sched/sch_hfsc.c| 10 -- net/sched/sch_htb.c |6 +++--- net/sched/sch_netem.c |8 +++- net/sched/sch_tbf.c |7 +++ 8 files changed, 29 insertions(+), 33 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index e6b1da0..b2cc9a8 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -48,11 +48,13 @@ typedef longpsched_tdiff_t; #define PSCHED_NS2US(x)((x) 10) #define PSCHED_TICKS_PER_SEC PSCHED_NS2US(NSEC_PER_SEC) -#define PSCHED_GET_TIME(stamp) \ - ((stamp) = PSCHED_NS2US(ktime_to_ns(ktime_get( - #define PSCHED_PASTPERFECT 0 +static inline psched_time_t psched_get_time(void) +{ + return PSCHED_NS2US(ktime_to_ns(ktime_get())); +} + static inline psched_tdiff_t psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, psched_time_t bound) { diff --git a/include/net/red.h b/include/net/red.h index 0bc1691..3cf31d4 100644 --- a/include/net/red.h +++ b/include/net/red.h @@ -156,7 +156,7 @@ static inline int red_is_idling(struct red_parms *p) static inline void red_start_of_idle_period(struct red_parms *p) { - PSCHED_GET_TIME(p-qidlestart); + p-qidlestart = psched_get_time(); } static inline void red_end_of_idle_period(struct red_parms *p) @@ -177,7 +177,7 @@ static inline unsigned long red_calc_qavg_from_idle_time(struct red_parms *p) long us_idle; int shift; - PSCHED_GET_TIME(now); + now = psched_get_time(); us_idle = psched_tdiff_bounded(now, p-qidlestart, p-Scell_max); /* diff --git a/net/sched/act_police.c b/net/sched/act_police.c index 65d60a3..616f465 100644 --- a/net/sched/act_police.c +++ b/net/sched/act_police.c @@ -241,7 +241,7 @@ override: if (ret != ACT_P_CREATED) return ret; - PSCHED_GET_TIME(police-tcfp_t_c); + police-tcfp_t_c = psched_get_time(); police-tcf_index = parm-index ? parm-index : tcf_hash_new_index(police_idx_gen, police_hash_info); h = tcf_hash(police-tcf_index, POL_TAB_MASK); @@ -296,8 +296,7 @@ static int tcf_act_police(struct sk_buff *skb, struct tc_action *a, return police-tcfp_result; } - PSCHED_GET_TIME(now); - + now = psched_get_time(); toks = psched_tdiff_bounded(now, police-tcfp_t_c, police-tcfp_burst); if (police-tcfp_P_tab) { @@ -495,7 +494,7 @@ struct tcf_police *tcf_police_locate(struct rtattr *rta, struct rtattr *est) } if (police-tcfp_P_tab) police-tcfp_ptoks = L2T_P(police, police-tcfp_mtu); - PSCHED_GET_TIME(police-tcfp_t_c); + police-tcfp_t_c = psched_get_time(); police-tcf_index = parm-index ? parm-index : tcf_police_new_index(); police-tcf_action = parm-action; @@ -543,7 +542,7 @@ int tcf_police(struct sk_buff *skb, struct tcf_police *police) return police-tcfp_result; } - PSCHED_GET_TIME(now); + now = psched_get_time(); toks = psched_tdiff_bounded(now, police-tcfp_t_c, police-tcfp_burst); if (police-tcfp_P_tab) { diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index f9e8403..414a97c 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -385,7 +385,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct cbq_class *cl) psched_time_t now; psched_tdiff_t incr; - PSCHED_GET_TIME(now); + now = psched_get_time(); incr = now - q-now_rt; now = q-now + incr; @@ -654,7 +654,7 @@ static enum hrtimer_restart cbq_undelay(struct hrtimer *timer) psched_tdiff_t delay = 0; unsigned pmask; - PSCHED_GET_TIME(now); + now = psched_get_time(); pmask = q-pmask; q-pmask = 0; @@ -1003,7 +1003,7 @@ cbq_dequeue(struct Qdisc *sch) psched_time_t now; psched_tdiff_t incr; - PSCHED_GET_TIME(now); + now = psched_get_time(); incr = now - q-now_rt; if (q-tx_class) { @@ -1277,7 +1277,7 @@ cbq_reset(struct Qdisc* sch) qdisc_watchdog_cancel(q-watchdog);
[NET_SCHED 04/11]: kill PSCHED_TLESS
[NET_SCHED]: kill PSCHED_TLESS Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 4f8fc418f88c0b7ee6e726b05f27c42d8e20593c tree c70508f2e0174aef42aaf99bf0cef4184d7ed07e parent 145a1a6010c6b852ffab28c110d8911a6161aa8b author Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 00:00:55 +0100 committer Patrick McHardy [EMAIL PROTECTED] Fri, 23 Mar 2007 10:31:28 +0100 include/net/pkt_sched.h |1 - net/sched/sch_cbq.c |7 +++ net/sched/sch_netem.c |6 +++--- 3 files changed, 6 insertions(+), 8 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 32cdf01..49325ff 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -54,7 +54,6 @@ typedef long psched_tdiff_t; #define PSCHED_TDIFF(tv1, tv2) (long)((tv1) - (tv2)) #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \ min_t(long long, (tv1) - (tv2), bound) -#define PSCHED_TLESS(tv1, tv2) ((tv1) (tv2)) #define PSCHED_SET_PASTPERFECT(t) ((t) = 0) #define PSCHED_IS_PASTPERFECT(t) ((t) == 0) diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index 290b26b..9e6cdab 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -390,7 +390,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct cbq_class *cl) now = q-now + incr; do { - if (PSCHED_TLESS(cl-undertime, now)) { + if (cl-undertime now) { q-toplevel = cl-level; return; } @@ -845,8 +845,7 @@ cbq_under_limit(struct cbq_class *cl) if (cl-tparent == NULL) return cl; - if (PSCHED_IS_PASTPERFECT(cl-undertime) || - !PSCHED_TLESS(q-now, cl-undertime)) { + if (PSCHED_IS_PASTPERFECT(cl-undertime) || q-now = cl-undertime) { cl-delayed = 0; return cl; } @@ -870,7 +869,7 @@ cbq_under_limit(struct cbq_class *cl) if (cl-level q-toplevel) return NULL; } while (!PSCHED_IS_PASTPERFECT(cl-undertime) -PSCHED_TLESS(q-now, cl-undertime)); +q-now cl-undertime); cl-delayed = 0; return cl; diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index 6044ae7..5d571aa 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -286,7 +286,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch) /* if more time remaining? */ PSCHED_GET_TIME(now); - if (!PSCHED_TLESS(now, cb-time_to_send)) { + if (cb-time_to_send = now) { pr_debug(netem_dequeue: return skb=%p\n, skb); sch-q.qlen--; return skb; @@ -494,7 +494,7 @@ static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) if (likely(skb_queue_len(list) q-limit)) { /* Optimize for add at tail */ - if (likely(skb_queue_empty(list) || !PSCHED_TLESS(tnext, q-oldest))) { + if (likely(skb_queue_empty(list) || tnext = q-oldest)) { q-oldest = tnext; return qdisc_enqueue_tail(nskb, sch); } @@ -503,7 +503,7 @@ static int tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch) const struct netem_skb_cb *cb = (const struct netem_skb_cb *)skb-cb; - if (!PSCHED_TLESS(tnext, cb-time_to_send)) + if (tnext = cb-time_to_send) break; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] remove NLA_STRING NUL trimming
Looking through the netlink/attr.c code I noticed that NLA_STRING attributes that end with a binary NUL have it removed before passing it to the consumer. For wireless, we have a few places where we need to be able to accept any (even binary) values, for example for the SSID; the SSID can validly end with \0 and I'd still love to be able to take advantage of NLA_STRING and .len = 32 so I don't need to check the length myself. However, given the code above, an SSID with a terminating \0 would be reduced by one character. This patch removes the trimming. Signed-off-by: Johannes Berg [EMAIL PROTECTED] --- This shouldn't break things if all users that rely on terminating NULs have migrated to NLA_NUL_STRING already. I don't see many users of NLA_STRING still, but if we can't make that change because some users still rely on it trimming the NUL I could also make a patch that introduces NLA_BIN_STRING with the changed semantics. --- wireless-dev.orig/net/netlink/attr.c2007-03-23 00:06:41.293435409 +0100 +++ wireless-dev/net/netlink/attr.c 2007-03-23 00:07:13.753435409 +0100 @@ -56,15 +56,8 @@ static int validate_nla(struct nlattr *n if (attrlen 1) return -ERANGE; - if (pt-len) { - char *buf = nla_data(nla); - - if (buf[attrlen - 1] == '\0') - attrlen--; - - if (attrlen pt-len) - return -ERANGE; - } + if (pt-len attrlen pt-len) + return -ERANGE; break; default: - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATH 0/6] New SCTP functionality for 2.6.22
This patch series implements additional SCTP socket options. This was originally submitted too late for 2.6.21, so I am re-submitting for 2.6.22. Please consider applying. Thanks -vlad - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] [SCTP] Implement SCTP_PARTIAL_DELIVERY_POINT option.
This option induces partial delivery to run as soon as the specified amount of data has been accumulated on the association. However, we give preference to fully reassembled messages over PD messages. In any case, window and buffer is freed up. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] --- include/net/sctp/structs.h |1 + include/net/sctp/user.h|2 + net/sctp/socket.c | 57 +++ net/sctp/ulpqueue.c| 64 +--- 4 files changed, 120 insertions(+), 4 deletions(-) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 6883c7d..f4bb396 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -307,6 +307,7 @@ struct sctp_sock { __u8 v4mapped; __u8 frag_interleave; __u32 adaptation_ind; + __u32 pd_point; atomic_t pd_mode; /* Receive to here while partial delivery is in effect. */ diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h index e773160..9a83527 100644 --- a/include/net/sctp/user.h +++ b/include/net/sctp/user.h @@ -99,6 +99,8 @@ enum sctp_optname { #define SCTP_CONTEXT SCTP_CONTEXT SCTP_FRAGMENT_INTERLEAVE, #define SCTP_FRAGMENT_INTERLEAVE SCTP_FRAGMENT_INTERLEAVE + SCTP_PARTIAL_DELIVERY_POINT,/* Set/Get partial delivery point */ +#define SCTP_PARTIAL_DELIVERY_POINT SCTP_PARTIAL_DELIVERY_POINT /* Internal Socket Options. Some of the sctp library functions are * implemented using these socket options. diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 912073d..2d0c2ee 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -2826,6 +2826,32 @@ static int sctp_setsockopt_fragment_interleave(struct sock *sk, return 0; } +/* + * 7.1.25. Set or Get the sctp partial delivery point + * (SCTP_PARTIAL_DELIVERY_POINT) + * This option will set or get the SCTP partial delivery point. This + * point is the size of a message where the partial delivery API will be + * invoked to help free up rwnd space for the peer. Setting this to a + * lower value will cause partial delivery's to happen more often. The + * calls argument is an integer that sets or gets the partial delivery + * point. + */ +static int sctp_setsockopt_partial_delivery_point(struct sock *sk, + char __user *optval, + int optlen) +{ + u32 val; + + if (optlen != sizeof(u32)) + return -EINVAL; + if (get_user(val, (int __user *)optval)) + return -EFAULT; + + sctp_sk(sk)-pd_point = val; + + return 0; /* is this the right error code? */ +} + /* API 6.2 setsockopt(), getsockopt() * * Applications use setsockopt() and getsockopt() to set or retrieve @@ -2905,6 +2931,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname, case SCTP_DELAYED_ACK_TIME: retval = sctp_setsockopt_delayed_ack_time(sk, optval, optlen); break; + case SCTP_PARTIAL_DELIVERY_POINT: + retval = sctp_setsockopt_partial_delivery_point(sk, optval, optlen); + break; case SCTP_INITMSG: retval = sctp_setsockopt_initmsg(sk, optval, optlen); @@ -4596,6 +4625,30 @@ static int sctp_getsockopt_fragment_interleave(struct sock *sk, int len, return 0; } +/* + * 7.1.25. Set or Get the sctp partial delivery point + * (chapter and verse is quoted at sctp_setsockopt_partial_delivery_point()) + */ +static int sctp_getsockopt_partial_delivery_point(struct sock *sk, int len, + char __user *optval, + int __user *optlen) +{ +u32 val; + + if (len sizeof(u32)) + return -EINVAL; + + len = sizeof(u32); + + val = sctp_sk(sk)-pd_point; + if (put_user(len, optlen)) + return -EFAULT; + if (copy_to_user(optval, val, len)) + return -EFAULT; + + return -ENOTSUPP; +} + SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, int __user *optlen) { @@ -4712,6 +4765,10 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname, retval = sctp_getsockopt_fragment_interleave(sk, len, optval, optlen); break; + case SCTP_PARTIAL_DELIVERY_POINT: + retval = sctp_getsockopt_partial_delivery_point(sk, len, optval, + optlen); + break; default: retval = -ENOPROTOOPT; break; diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c index 896e834..6f64b15 100644 --- a/net/sctp/ulpqueue.c
[PATCH 5/6] [SCTP] Implement sac_info field in SCTP_ASSOC_CHANGE notification.
As stated in the sctp socket api draft: sac_info: variable If the sac_state is SCTP_COMM_LOST and an ABORT chunk was received for this association, sac_info[] contains the complete ABORT chunk as defined in the SCTP specification RFC2960 [RFC2960] section 3.3.7. We now save received ABORT chunks into the sac_info field and pass that to the user. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] --- include/net/sctp/ulpevent.h |1 + include/net/sctp/user.h |1 + net/sctp/sm_sideeffect.c| 11 +++-- net/sctp/sm_statefuns.c | 14 ++-- net/sctp/ulpevent.c | 49 -- 5 files changed, 59 insertions(+), 17 deletions(-) diff --git a/include/net/sctp/ulpevent.h b/include/net/sctp/ulpevent.h index 2923e3d..de88ed5 100644 --- a/include/net/sctp/ulpevent.h +++ b/include/net/sctp/ulpevent.h @@ -89,6 +89,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_assoc_change( __u16 error, __u16 outbound, __u16 inbound, + struct sctp_chunk *chunk, gfp_t gfp); struct sctp_ulpevent *sctp_ulpevent_make_peer_addr_change( diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h index 80b7afe..1b3153c 100644 --- a/include/net/sctp/user.h +++ b/include/net/sctp/user.h @@ -217,6 +217,7 @@ struct sctp_assoc_change { __u16 sac_outbound_streams; __u16 sac_inbound_streams; sctp_assoc_t sac_assoc_id; + __u8 sac_info[0]; }; /* diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c index 1355674..0a1a197 100644 --- a/net/sctp/sm_sideeffect.c +++ b/net/sctp/sm_sideeffect.c @@ -464,7 +464,7 @@ static void sctp_cmd_init_failed(sctp_cmd_seq_t *commands, struct sctp_ulpevent *event; event = sctp_ulpevent_make_assoc_change(asoc,0, SCTP_CANT_STR_ASSOC, - (__u16)error, 0, 0, + (__u16)error, 0, 0, NULL, GFP_ATOMIC); if (event) @@ -492,8 +492,13 @@ static void sctp_cmd_assoc_failed(sctp_cmd_seq_t *commands, /* Cancel any partial delivery in progress. */ sctp_ulpq_abort_pd(asoc-ulpq, GFP_ATOMIC); - event = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_COMM_LOST, - (__u16)error, 0, 0, + if (event_type == SCTP_EVENT_T_CHUNK subtype.chunk == SCTP_CID_ABORT) + event = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_COMM_LOST, + (__u16)error, 0, 0, chunk, + GFP_ATOMIC); + else + event = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_COMM_LOST, + (__u16)error, 0, 0, NULL, GFP_ATOMIC); if (event) sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP, diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c index c85b517..cceaf90 100644 --- a/net/sctp/sm_statefuns.c +++ b/net/sctp/sm_statefuns.c @@ -186,7 +186,7 @@ sctp_disposition_t sctp_sf_do_4_C(const struct sctp_endpoint *ep, * notification is passed to the upper layer. */ ev = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_SHUTDOWN_COMP, -0, 0, 0, GFP_ATOMIC); +0, 0, 0, NULL, GFP_ATOMIC); if (ev) sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP, SCTP_ULPEVENT(ev)); @@ -661,7 +661,7 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(const struct sctp_endpoint *ep, ev = sctp_ulpevent_make_assoc_change(new_asoc, 0, SCTP_COMM_UP, 0, new_asoc-c.sinit_num_ostreams, new_asoc-c.sinit_max_instreams, -GFP_ATOMIC); +NULL, GFP_ATOMIC); if (!ev) goto nomem_ev; @@ -790,7 +790,7 @@ sctp_disposition_t sctp_sf_do_5_1E_ca(const struct sctp_endpoint *ep, ev = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_COMM_UP, 0, asoc-c.sinit_num_ostreams, asoc-c.sinit_max_instreams, -GFP_ATOMIC); +NULL, GFP_ATOMIC); if (!ev) goto nomem; @@ -1625,7 +1625,7 @@ static sctp_disposition_t sctp_sf_do_dupcook_a(const struct sctp_endpoint *ep, ev = sctp_ulpevent_make_assoc_change(asoc, 0, SCTP_RESTART, 0, new_asoc-c.sinit_num_ostreams, new_asoc-c.sinit_max_instreams, -GFP_ATOMIC);
[PATCH 4/6] [SCTP] Honor flags when setting peer address parameters
Parameters only take effect when a corresponding flag bit is set and a value is specified. This means we need to check the flags in addition to checking for non-zero value. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] --- include/net/sctp/user.h | 15 +++-- net/sctp/socket.c | 54 ++ 2 files changed, 52 insertions(+), 17 deletions(-) diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h index 4ed7521..80b7afe 100644 --- a/include/net/sctp/user.h +++ b/include/net/sctp/user.h @@ -513,16 +513,17 @@ struct sctp_setadaptation { * address's parameters: */ enum sctp_spp_flags { - SPP_HB_ENABLE = 1, /*Enable heartbeats*/ - SPP_HB_DISABLE = 2, /*Disable heartbeats*/ + SPP_HB_ENABLE = 10, /*Enable heartbeats*/ + SPP_HB_DISABLE = 11, /*Disable heartbeats*/ SPP_HB = SPP_HB_ENABLE | SPP_HB_DISABLE, - SPP_HB_DEMAND = 4, /*Send heartbeat immediately*/ - SPP_PMTUD_ENABLE = 8, /*Enable PMTU discovery*/ - SPP_PMTUD_DISABLE = 16, /*Disable PMTU discovery*/ + SPP_HB_DEMAND = 12, /*Send heartbeat immediately*/ + SPP_PMTUD_ENABLE = 13,/*Enable PMTU discovery*/ + SPP_PMTUD_DISABLE = 14, /*Disable PMTU discovery*/ SPP_PMTUD = SPP_PMTUD_ENABLE | SPP_PMTUD_DISABLE, - SPP_SACKDELAY_ENABLE = 32, /*Enable SACK*/ - SPP_SACKDELAY_DISABLE = 64, /*Disable SACK*/ + SPP_SACKDELAY_ENABLE = 15,/*Enable SACK*/ + SPP_SACKDELAY_DISABLE = 16, /*Disable SACK*/ SPP_SACKDELAY = SPP_SACKDELAY_ENABLE | SPP_SACKDELAY_DISABLE, + SPP_HB_TIME_IS_ZERO = 17, /* Set HB delay to 0 */ }; struct sctp_paddrparams { diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 2d0c2ee..8939536 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -2033,6 +2033,10 @@ static int sctp_setsockopt_autoclose(struct sock *sk, char __user *optval, * SPP_HB_DEMAND - Request a user initiated heartbeat * to be made immediately. * + * SPP_HB_TIME_IS_ZERO - Specify's that the time for + * heartbeat delayis to be set to the value of 0 + * milliseconds. + * * SPP_PMTUD_ENABLE - This field will enable PMTU * discovery upon the specified address. Note that * if the address feild is empty then all addresses @@ -2075,13 +2079,30 @@ static int sctp_apply_peer_addr_params(struct sctp_paddrparams *params, return error; } - if (params-spp_hbinterval) { - if (trans) { - trans-hbinterval = msecs_to_jiffies(params-spp_hbinterval); - } else if (asoc) { - asoc-hbinterval = msecs_to_jiffies(params-spp_hbinterval); - } else { - sp-hbinterval = params-spp_hbinterval; + /* Note that unless the spp_flag is set to SPP_HB_ENABLE the value of +* this field is ignored. Note also that a value of zero indicates +* the current setting should be left unchanged. +*/ + if (params-spp_flags SPP_HB_ENABLE) { + + /* Re-zero the interval if the SPP_HB_TIME_IS_ZERO is +* set. This lets us use 0 value when this flag +* is set. +*/ + if (params-spp_flags SPP_HB_TIME_IS_ZERO) + params-spp_hbinterval = 0; + + if (params-spp_hbinterval || + (params-spp_flags SPP_HB_TIME_IS_ZERO)) { + if (trans) { + trans-hbinterval = + msecs_to_jiffies(params-spp_hbinterval); + } else if (asoc) { + asoc-hbinterval = + msecs_to_jiffies(params-spp_hbinterval); + } else { + sp-hbinterval = params-spp_hbinterval; + } } } @@ -2098,7 +2119,12 @@ static int sctp_apply_peer_addr_params(struct sctp_paddrparams *params, } } - if (params-spp_pathmtu) { + /* When Path MTU discovery is disabled the value specified here will +* be the fixed path mtu (i.e. the value of the spp_flags field must +* include the flag SPP_PMTUD_DISABLE for this field to have any +* effect). +*/ + if ((params-spp_flags SPP_PMTUD_DISABLE) params-spp_pathmtu) { if (trans) { trans-pathmtu = params-spp_pathmtu; sctp_assoc_sync_pmtu(asoc); @@ -2129,7 +2155,11 @@ static int sctp_apply_peer_addr_params(struct sctp_paddrparams *params, } } -
[PATCH 3/6] [SCTP]: Implement SCTP_ADDR_CONFIRMED state for ADDR_CHNAGE event
Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] --- include/net/sctp/user.h |1 + net/sctp/associola.c| 10 +- 2 files changed, 10 insertions(+), 1 deletions(-) diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h index 9a83527..4ed7521 100644 --- a/include/net/sctp/user.h +++ b/include/net/sctp/user.h @@ -265,6 +265,7 @@ enum sctp_spc_state { SCTP_ADDR_REMOVED, SCTP_ADDR_ADDED, SCTP_ADDR_MADE_PRIM, + SCTP_ADDR_CONFIRMED, }; diff --git a/net/sctp/associola.c b/net/sctp/associola.c index fa82b73..294be94 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -714,8 +714,16 @@ void sctp_assoc_control_transport(struct sctp_association *asoc, /* Record the transition on the transport. */ switch (command) { case SCTP_TRANSPORT_UP: + /* If we are moving from UNCONFIRMED state due +* to heartbeat success, report the SCTP_ADDR_CONFIRMED +* state to the user, otherwise report SCTP_ADDR_AVAILABLE. +*/ + if (SCTP_UNCONFIRMED == transport-state + SCTP_HEARTBEAT_SUCCESS == error) + spc_state = SCTP_ADDR_CONFIRMED; + else + spc_state = SCTP_ADDR_AVAILABLE; transport-state = SCTP_ACTIVE; - spc_state = SCTP_ADDR_AVAILABLE; break; case SCTP_TRANSPORT_DOWN: -- 1.5.0.3.438.gc49b2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] [SCTP] Implement SCTP_MAX_BURST socket option.
Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] --- include/net/sctp/constants.h |2 +- include/net/sctp/structs.h |1 + include/net/sctp/user.h |2 + net/sctp/associola.c |2 +- net/sctp/protocol.c |2 +- net/sctp/socket.c| 61 ++ 6 files changed, 67 insertions(+), 3 deletions(-) diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h index 5ddb855..bb37724 100644 --- a/include/net/sctp/constants.h +++ b/include/net/sctp/constants.h @@ -283,7 +283,7 @@ enum { SCTP_MAX_GABS = 16 }; #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */ /* Maximum number of new data packets that can be sent in a burst. */ -#define SCTP_MAX_BURST 4 +#define SCTP_DEFAULT_MAX_BURST 4 #define SCTP_CLOCK_GRANULARITY 1 /* 1 jiffy */ diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index f4bb396..8135815 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -276,6 +276,7 @@ struct sctp_sock { __u32 default_context; __u32 default_timetolive; __u32 default_rcv_context; + int max_burst; /* Heartbeat interval: The endpoint sends out a Heartbeat chunk to * the destination address every heartbeat interval. This value diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h index 1b3153c..6d2b577 100644 --- a/include/net/sctp/user.h +++ b/include/net/sctp/user.h @@ -101,6 +101,8 @@ enum sctp_optname { #define SCTP_FRAGMENT_INTERLEAVE SCTP_FRAGMENT_INTERLEAVE SCTP_PARTIAL_DELIVERY_POINT,/* Set/Get partial delivery point */ #define SCTP_PARTIAL_DELIVERY_POINT SCTP_PARTIAL_DELIVERY_POINT + SCTP_MAX_BURST, /* Set/Get max burst */ +#define SCTP_MAX_BURST SCTP_MAX_BURST /* Internal Socket Options. Some of the sctp library functions are * implemented using these socket options. diff --git a/net/sctp/associola.c b/net/sctp/associola.c index 294be94..2f61d58 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -143,7 +143,7 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a /* Initialize the maximum mumber of new data packets that can be sent * in a burst. */ - asoc-max_burst = sctp_max_burst; + asoc-max_burst = sp-max_burst; /* initialize association timers */ asoc-timeouts[SCTP_EVENT_TIMEOUT_NONE] = 0; diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index 7c28c9b..c361deb 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1042,7 +1042,7 @@ SCTP_STATIC __init int sctp_init(void) sctp_cookie_preserve_enable = 1; /* Max.Burst- 4 */ - sctp_max_burst = SCTP_MAX_BURST; + sctp_max_burst = SCTP_DEFAULT_MAX_BURST; /* Association.Max.Retrans - 10 attempts * Path.Max.Retrans - 5 attempts (per destination address) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 8939536..e45cff4 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -2886,6 +2886,36 @@ static int sctp_setsockopt_partial_delivery_point(struct sock *sk, return 0; /* is this the right error code? */ } +/* + * 7.1.28. Set or Get the maximum burst (SCTP_MAX_BURST) + * + * This option will allow a user to change the maximum burst of packets + * that can be emitted by this association. Note that the default value + * is 4, and some implementations may restrict this setting so that it + * can only be lowered. + * + * NOTE: This text doesn't seem right. Do this on a socket basis with + * future associations inheriting the socket value. + */ +static int sctp_setsockopt_maxburst(struct sock *sk, + char __user *optval, + int optlen) +{ + int val; + + if (optlen != sizeof(int)) + return -EINVAL; + if (get_user(val, (int __user *)optval)) + return -EFAULT; + + if (val 0) + return -EINVAL; + + sctp_sk(sk)-max_burst = val; + + return 0; +} + /* API 6.2 setsockopt(), getsockopt() * * Applications use setsockopt() and getsockopt() to set or retrieve @@ -3006,6 +3036,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname, case SCTP_FRAGMENT_INTERLEAVE: retval = sctp_setsockopt_fragment_interleave(sk, optval, optlen); break; + case SCTP_MAX_BURST: + retval = sctp_setsockopt_maxburst(sk, optval, optlen); + break; default: retval = -ENOPROTOOPT; break; @@ -3165,6 +3198,7 @@ SCTP_STATIC int sctp_init_sock(struct sock *sk) sp-default_timetolive = 0; sp-default_rcv_context = 0; + sp-max_burst = sctp_max_burst; /* Initialize default
[PATCH 1/6] [SCTP] Implement SCTP_FRAGMENT_INTERLEAVE socket option
This option was introduced in draft-ietf-tsvwg-sctpsocket-13. It prevents head-of-line blocking in the case of one-to-many endpoint. Applications enabling this option really must enable SCTP_SNDRCV event so that they would know where the data belongs. Based on an earlier patch by Ivan Skytte Jørgensen. Additionally, this functionality now permits multiple associations on the same endpoint to enter Partial Delivery. Applications should be extra careful, when using this functionality, to track EOR indicators. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] --- include/net/sctp/structs.h |3 +- include/net/sctp/ulpqueue.h |2 +- include/net/sctp/user.h |4 +- net/sctp/socket.c | 84 +--- net/sctp/ulpqueue.c | 88 -- 5 files changed, 150 insertions(+), 31 deletions(-) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 31a8e88..6883c7d 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -304,10 +304,11 @@ struct sctp_sock { __u32 autoclose; __u8 nodelay; __u8 disable_fragments; - __u8 pd_mode; __u8 v4mapped; + __u8 frag_interleave; __u32 adaptation_ind; + atomic_t pd_mode; /* Receive to here while partial delivery is in effect. */ struct sk_buff_head pd_lobby; }; diff --git a/include/net/sctp/ulpqueue.h b/include/net/sctp/ulpqueue.h index a43c878..3421b19 100644 --- a/include/net/sctp/ulpqueue.h +++ b/include/net/sctp/ulpqueue.h @@ -77,7 +77,7 @@ void sctp_ulpq_partial_delivery(struct sctp_ulpq *, struct sctp_chunk *, gfp_t); void sctp_ulpq_abort_pd(struct sctp_ulpq *, gfp_t); /* Clear the partial data delivery condition on this socket. */ -int sctp_clear_pd(struct sock *sk); +int sctp_clear_pd(struct sock *sk, struct sctp_association *asoc); /* Skip over an SSN. */ void sctp_ulpq_skip(struct sctp_ulpq *ulpq, __u16 sid, __u16 ssn); diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h index 67a30eb..e773160 100644 --- a/include/net/sctp/user.h +++ b/include/net/sctp/user.h @@ -97,6 +97,8 @@ enum sctp_optname { #define SCTP_DELAYED_ACK_TIME SCTP_DELAYED_ACK_TIME SCTP_CONTEXT, /* Receive Context */ #define SCTP_CONTEXT SCTP_CONTEXT + SCTP_FRAGMENT_INTERLEAVE, +#define SCTP_FRAGMENT_INTERLEAVE SCTP_FRAGMENT_INTERLEAVE /* Internal Socket Options. Some of the sctp library functions are * implemented using these socket options. @@ -530,7 +532,7 @@ struct sctp_paddrparams { __u32 spp_flags; } __attribute__((packed, aligned(4))); -/* 7.1.24. Delayed Ack Timer (SCTP_DELAYED_ACK_TIME) +/* 7.1.23. Delayed Ack Timer (SCTP_DELAYED_ACK_TIME) * * This options will get or set the delayed ack timer. The time is set * in milliseconds. If the assoc_id is 0, then this sets or gets the diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 536298c..912073d 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -2249,7 +2249,7 @@ static int sctp_setsockopt_peer_addr_params(struct sock *sk, return 0; } -/* 7.1.24. Delayed Ack Timer (SCTP_DELAYED_ACK_TIME) +/* 7.1.23. Delayed Ack Timer (SCTP_DELAYED_ACK_TIME) * * This options will get or set the delayed ack timer. The time is set * in milliseconds. If the assoc_id is 0, then this sets or gets the @@ -2786,6 +2786,46 @@ static int sctp_setsockopt_context(struct sock *sk, char __user *optval, return 0; } +/* + * 7.1.24. Get or set fragmented interleave (SCTP_FRAGMENT_INTERLEAVE) + * + * This options will at a minimum specify if the implementation is doing + * fragmented interleave. Fragmented interleave, for a one to many + * socket, is when subsequent calls to receive a message may return + * parts of messages from different associations. Some implementations + * may allow you to turn this value on or off. If so, when turned off, + * no fragment interleave will occur (which will cause a head of line + * blocking amongst multiple associations sharing the same one to many + * socket). When this option is turned on, then each receive call may + * come from a different association (thus the user must receive data + * with the extended calls (e.g. sctp_recvmsg) to keep track of which + * association each receive belongs to. + * + * This option takes a boolean value. A non-zero value indicates that + * fragmented interleave is on. A value of zero indicates that + * fragmented interleave is off. + * + * Note that it is important that an implementation that allows this + * option to be turned on, have it off by default. Otherwise an unaware + * application using the one to many model may become confused and act + * incorrectly. + */ +static int sctp_setsockopt_fragment_interleave(struct sock *sk, + char __user *optval, +
Re: routing question under invisible bridge
On Thu, Mar 22, 2007 at 03:52:55PM -0500, Bin He wrote: Dear sir, Hi, I found your email address from kernel bridge source codes. I would appreciate if you could look into my question a little bit. The netdev@ mailing list is a better forum to ask such questions, I've CC'ed this email there. I have an invisible bridge (br0) which contains eth0 and eth1. None of them have an IP address because I want to it to be transparent to the existing network. So there is no entries in kernel routing table. If you have an IP address assigned to br0, your kernel will likely have (at least) one entry in its routing table even if you didn't put any routes in there yourself. The problem is how does it handle the routing, i.e., which eth interface will a packet be sent to? (The decision which bridge sub-device to send a packet to isn't called 'routing', as it doesn't involve an IP routing decision -- that decision has already been made at that point.) For example, I can create a packet and bind it to a device by SO_BINDTODEVICE socket option. I did some tests and found: 1) if the socket is bound to eth0 or eth1, the packet cannot be sent out. 2) if the socket is bound to br0, it seems that the packet is only sent out to eth0. Check out your system's ARP table (run /sbin/arp) and your br0 bridge's MAC address table (run 'brctl showmacs br0' or something like that.) When your machine wants to communicate with a remote IP address, it first sends an ARP packet to figure out what the ethernet address is that corresponds to that remote IP address. When your machine then sends an IP packet on the br0 interface to that ethernet address, the bridge code checks the MAC address table to find out whether to send it to eth0 or eth1 (if the MAC address is a known MAC address) or to both (if we have never seen the MAC address before or if it has timed out.) So is there a way to send out a packet on a particular device? I'm not sure exactly what you are trying to do? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html