[RFC] net/ipv{4,6} UDP-Lite: code sharing between udp.c and udplite.c
I would greatly value comments on a suggestion made earlier by Yoshifuji regarding sharing code between udp.c and udplite.c which I am pursuing. I reduced diffs between ipv4/udp{,lite}.c to the minimum possible and performed a line-by-line comparison between udp.c and udplite.c. Result: out of 45 functions which re-appear from udp.c in udplite.c, * 26 can be derived without human thinking at all (sed/perl) * 10 require trivial interaction (sockopt/header names) * 8 require genuine modifications (in control flow and algorithm) * 1 function is missing in udplite.c (no equivalent of udp_check()) Problem: The UDP code (and in particular the replicated functions) operate on the following globally visible symbols: EXPORT_SYMBOL(udp_hash);/* would be udplite_hash */ EXPORT_SYMBOL(udp_hash_lock); /* would be udplite_hash_lock */ EXPORT_SYMBOL(udp_port_rover); /* would be udplite_port_rover */ This would lead to clashes if udp.c/udplite.c use the same names. Suggestion: #include code from udp.c in a much-reduced udplite.c, after re-defining symbols, so that the top of udplite.c looks like e.g. #include linux/udplite.h #define udp_hashudplite_hash #define udp_port_rover udplite_port_rover #include udp.c /* include the source code */ Inputs: The benefits are a much deflated patch, code reuse, increased clarity (only the diffs are visible). This comes at the cost of introducing a few #ifdefs in udp.c (otherwise no changes). However, I am not sure whether such an approach would find acceptance and therefore I am asking for input. As currently the porting to ipv6/udplite.c is under way, I would like to take any suggestions on board which can reduce dependencies and inflated code. Many thanks in advance, --Gerrit NB: Details of the code analysis can be found on http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/udplite-comparison.html and the diff-minimized variant of ipv4/udplite.c is in the latest tarball, http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz (any future patches will have the linelengths cut to 80 chars). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET]: Prevent transmission after dev_deactivate
Hi Dave: I found a bug in my GSO patches with the shutdown handling in dev_deactivate. It provided enough impetus for me to finally clean up this function :) This patch is against Linus's tree. [NET]: Prevent transmission after dev_deactivate The dev_deactivate function has bit-rotted since the introduction of lockless drivers. In particular, the spin_unlock_wait call at the end has no effect on the xmit routine of lockless drivers. With a little bit of work, we can make it much more useful by providing the guarantee that when it returns, no more calls to the xmit routine of the underlying driver will be made. The idea is simple. There are two entry points in to the xmit routine. The first comes from dev_queue_xmit. That one is easily stopped by using synchronize_rcu. This works because we set the qdisc to noop_qdisc before the synchronize_rcu call. That in turn causes all subsequent packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call also ensures all outstanding calls leave their critical section. The other entry point is from qdisc_run. Since we now have a bit that indicates whether it's running, all we have to do is to wait until the bit is off. I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is useless because netif_wake_queue can cause it to be set again. It is also harmless because we've disarmed qdisc_run. I've also removed the spin_unlock_wait on xmit_lock because its only purpose of making sure that all outstanding xmit_lock holders have exited is also given by dev_watchdog_down. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- 1201ce3ea54baa35bcecf9925bf9d788e084d895 diff --git a/net/core/dev.c b/net/core/dev.c index ab39fe1..29e3888 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1295,7 +1295,7 @@ int dev_queue_xmit(struct sk_buff *skb) /* Disable soft irqs for various locks below. Also * stops preemption for RCU. */ - local_bh_disable(); + rcu_read_lock_bh(); /* Updates of qdisc are serialized by queue_lock. * The struct Qdisc which is pointed to by qdisc is now a @@ -1369,13 +1369,13 @@ #endif } rc = -ENETDOWN; - local_bh_enable(); + rcu_read_unlock_bh(); out_kfree_skb: kfree_skb(skb); return rc; out: - local_bh_enable(); + rcu_read_unlock_bh(); return rc; } diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index d7aca8e..7aad012 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -181,9 +181,13 @@ requeue: void __qdisc_run(struct net_device *dev) { + if (unlikely(dev-qdisc == noop_qdisc)) + goto out; + while (qdisc_restart(dev) 0 !netif_queue_stopped(dev)) /* NOTHING */; +out: clear_bit(__LINK_STATE_QDISC_RUNNING, dev-state); } @@ -583,10 +587,12 @@ void dev_deactivate(struct net_device *d dev_watchdog_down(dev); - while (test_bit(__LINK_STATE_SCHED, dev-state)) - yield(); + /* Wait for outstanding dev_queue_xmit calls. */ + synchronize_rcu(); - spin_unlock_wait(dev-_xmit_lock); + /* Wait for outstanding qdisc_run calls. */ + while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state)) + yield(); } void dev_init_scheduler(struct net_device *dev) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[0/5] GSO: Generic Segmentation Offload
Hi: This is a repost of the GSO patches. The main change is the fix to a bug in the way dev-gso_skb is freed. This series requires the dev_deactivate patch that I just posted. Here is the original description: This series adds Generic Segmentation Offload (GSO) support to the Linux networking stack. Many people have observed that a lot of the savings in TSO come from traversing the networking stack once rather than many times for each super-packet. These savings can be obtained without hardware support. In fact, the concept can be applied to other protocols such as TCPv6, UDP, or even DCCP. The key to minimising the cost in implementing this is to postpone the segmentation as late as possible. In the ideal world, the segmentation would occur inside each NIC driver where they would rip the super-packet apart and either produce SG lists which are directly fed to the hardware, or linearise each segment into pre-allocated memory to be fed to the NIC. This would elminate segmented skb's altogether. Unfortunately this requires modifying each and every NIC driver so it would take quite some time. A much easier solution is to perform the segmentation just before the entry into the driver's xmit routine. This series of patches does this. I've attached some numbers to demonstrate the savings brought on by doing this. The best scenario is obviously the case where the underlying NIC supports SG. This means that we simply have to manipulate the SG entries and place them into individual skb's before passing them to the driver. The attached file lo-res shows this. The test was performed through the loopback device which is a fairly good approxmiation of an SG-capable NIC. GSO like TSO is only effective if the MTU is significantly less than the maximum value of 64K. So only the case where the MTU was set to 1500 is of interest. There we can see that the throughput improved by 17.5% (3061.05Mb/s = 3598.17Mb/s). The actual saving in transmission cost is in fact a lot more than that as the majority of the time here is spent on the RX side which still has to deal with 1500-byte packets. The worst-case scenario is where the NIC does not support SG and the user uses write(2) which means that we have to copy the data twice. The files gso-off/gso-on provide data for this case (the test was carried out on e100). As you can see, the cost of the extra copy is mostly offset by the reduction in the cost of going through the networking stack. For now GSO is off by default but can be enabled through ethtool. It is conceivable that with enough optimisation GSO could be a win in most cases and we could enable it by default. However, even without enabling GSO explicitly it can still function on bridged and forwarded packets. As it is, passing TSO packets through a bridge only works if all constiuents support TSO. With GSO, it provides a fallback so that we may enable TSO for a bridge even if some of its constituents do not support TSO. This provides massive savings for Xen as it uses a bridge-based architecture and TSO/GSO produces a much larger effective MTU for internal traffic between domains. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt $ sudo ./ethtool -K lo gso on $ sudo ifconfig lo mtu 1500 $ netperf -t TCP_STREAM TCP STREAM TEST to localhost Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.003598.17 $ sudo ./ethtool -K lo gso off $ netperf -t TCP_STREAM TCP STREAM TEST to localhost Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.003061.05 $ sudo ifconfig lo mtu 6 $ netperf -t TCP_STREAM TCP STREAM TEST to localhost Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.008245.05 $ sudo ./ethtool -K lo gso on $ netperf -t TCP_STREAM TCP STREAM TEST to localhost Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.008563.36 $ sudo ifconfig lo mtu 16436 $ netperf -t TCP_STREAM TCP STREAM TEST to localhost Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.007359.95 $ sudo ./ethtool -K lo gso off $ netperf -t TCP_STREAM TCP STREAM TEST to localhost Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.007535.04 $ CPU: PIII,
[1/5] [NET]: Merge TSO/UFO fields in sk_buff
Hi: [NET]: Merge TSO/UFO fields in sk_buff Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not going to scale if we add any more segmentation methods (e.g., DCCP). So let's merge them. They were used to tell the protocol of a packet. This function has been subsumed by the new gso_type field. This is essentially a set of netdev feature bits (shifted by 16 bits) that are required to process a specific skb. As such it's easy to tell whether a given device can process a GSO skb: you just have to and the gso_type field and the netdev's features field. I've made gso_type a conjunction. The idea is that you have a base type (e.g., SKB_GSO_TCPV4) that can be modified further to support new features. For example, if we add a hardware TSO type that supports ECN, they would declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO packets would be SKB_GSO_TCPV4. This means that only the CWR packets need to be emulated in software. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c --- a/drivers/net/8139cp.c +++ b/drivers/net/8139cp.c @@ -792,7 +792,7 @@ static int cp_start_xmit (struct sk_buff entry = cp-tx_head; eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0; if (dev-features NETIF_F_TSO) - mss = skb_shinfo(skb)-tso_size; + mss = skb_shinfo(skb)-gso_size; if (skb_shinfo(skb)-nr_frags == 0) { struct cp_desc *txd = cp-tx_ring[entry]; diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -1640,7 +1640,7 @@ bnx2_tx_int(struct bnx2 *bp) skb = tx_buf-skb; #ifdef BCM_TSO /* partial BD completions possible with TSO packets */ - if (skb_shinfo(skb)-tso_size) { + if (skb_shinfo(skb)-gso_size) { u16 last_idx, last_ring_idx; last_idx = sw_cons + @@ -4428,7 +4428,7 @@ bnx2_start_xmit(struct sk_buff *skb, str (TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) 16)); } #ifdef BCM_TSO - if ((mss = skb_shinfo(skb)-tso_size) + if ((mss = skb_shinfo(skb)-gso_size) (skb-len (bp-dev-mtu + ETH_HLEN))) { u32 tcp_opt_len, ip_tcp_len; diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c --- a/drivers/net/chelsio/sge.c +++ b/drivers/net/chelsio/sge.c @@ -1418,7 +1418,7 @@ int t1_start_xmit(struct sk_buff *skb, s struct cpl_tx_pkt *cpl; #ifdef NETIF_F_TSO - if (skb_shinfo(skb)-tso_size) { + if (skb_shinfo(skb)-gso_size) { int eth_type; struct cpl_tx_pkt_lso *hdr; @@ -1433,7 +1433,7 @@ int t1_start_xmit(struct sk_buff *skb, s hdr-ip_hdr_words = skb-nh.iph-ihl; hdr-tcp_hdr_words = skb-h.th-doff; hdr-eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type, - skb_shinfo(skb)-tso_size)); + skb_shinfo(skb)-gso_size)); hdr-len = htonl(skb-len - sizeof(*hdr)); cpl = (struct cpl_tx_pkt *)hdr; sge-stats.tx_lso_pkts++; diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -2394,7 +2394,7 @@ e1000_tso(struct e1000_adapter *adapter, uint8_t ipcss, ipcso, tucss, tucso, hdr_len; int err; - if (skb_shinfo(skb)-tso_size) { + if (skb_shinfo(skb)-gso_size) { if (skb_header_cloned(skb)) { err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); if (err) @@ -2402,7 +2402,7 @@ e1000_tso(struct e1000_adapter *adapter, } hdr_len = ((skb-h.raw - skb-data) + (skb-h.th-doff 2)); - mss = skb_shinfo(skb)-tso_size; + mss = skb_shinfo(skb)-gso_size; if (skb-protocol == htons(ETH_P_IP)) { skb-nh.iph-tot_len = 0; skb-nh.iph-check = 0; @@ -2519,7 +2519,7 @@ e1000_tx_map(struct e1000_adapter *adapt * tso gets written back prematurely before the data is fully * DMA'd to the controller */ if (!skb-data_len tx_ring-last_tx_tso - !skb_shinfo(skb)-tso_size) { + !skb_shinfo(skb)-gso_size) { tx_ring-last_tx_tso = 0; size -= 4; } @@ -2757,7 +2757,7 @@ e1000_xmit_frame(struct sk_buff *skb, st
[2/5] [NET]: Add generic segmentation offload
Hi: [NET]: Add generic segmentation offload This patch adds the infrastructure for generic segmentation offload. The idea is to tap into the potential savings of TSO without hardware support by postponing the allocation of segmented skb's until just before the entry point into the NIC driver. The same structure can be used to support software IPv6 TSO, as well as UFO and segmentation offload for other relevant protocols, e.g., DCCP. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -406,6 +406,9 @@ struct net_device struct list_headqdisc_list; unsigned long tx_queue_len; /* Max frames per queue allowed */ + /* Partially transmitted GSO packet. */ + struct sk_buff *gso_skb; + /* ingress path synchronizer */ spinlock_t ingress_lock; struct Qdisc*qdisc_ingress; @@ -540,6 +543,7 @@ struct packet_type { struct net_device *, struct packet_type *, struct net_device *); + struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg); void*af_packet_priv; struct list_headlist; }; @@ -690,7 +694,8 @@ extern int dev_change_name(struct net_d extern int dev_set_mtu(struct net_device *, int); extern int dev_set_mac_address(struct net_device *, struct sockaddr *); -extern voiddev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev); +extern int dev_hard_start_xmit(struct sk_buff *skb, + struct net_device *dev); extern voiddev_init(void); @@ -964,6 +969,7 @@ extern int netdev_max_backlog; extern int weight_p; extern int netdev_set_master(struct net_device *dev, struct net_device *master); extern int skb_checksum_help(struct sk_buff *skb, int inward); +extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg); #ifdef CONFIG_BUG extern void netdev_rx_csum_fault(struct net_device *dev); #else diff --git a/net/core/dev.c b/net/core/dev.c --- a/net/core/dev.c +++ b/net/core/dev.c @@ -116,6 +116,7 @@ #include asm/current.h #include linux/audit.h #include linux/dmaengine.h +#include linux/err.h /* * The list of packet types we will receive (as opposed to discard) @@ -1048,7 +1049,7 @@ static inline void net_timestamp(struct * taps currently in use. */ -void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) +static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) { struct packet_type *ptype; @@ -1186,6 +1187,40 @@ out: return ret; } +/** + * skb_gso_segment - Perform segmentation on skb. + * @skb: buffer to segment + * @sg: whether scatter-gather is supported on the target. + * + * This function segments the given skb and returns a list of segments. + */ +struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg) +{ + struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT); + struct packet_type *ptype; + int type = skb-protocol; + + BUG_ON(skb_shinfo(skb)-frag_list); + BUG_ON(skb-ip_summed != CHECKSUM_HW); + + skb-mac.raw = skb-data; + skb-mac_len = skb-nh.raw - skb-data; + __skb_pull(skb, skb-mac_len); + + rcu_read_lock(); + list_for_each_entry_rcu(ptype, ptype_base[ntohs(type) 15], list) { + if (ptype-type == type !ptype-dev ptype-gso_segment) { + segs = ptype-gso_segment(skb, sg); + break; + } + } + rcu_read_unlock(); + + return segs; +} + +EXPORT_SYMBOL(skb_gso_segment); + /* Take action when hardware reception checksum errors are detected. */ #ifdef CONFIG_BUG void netdev_rx_csum_fault(struct net_device *dev) @@ -1222,6 +1257,86 @@ static inline int illegal_highdma(struct #define illegal_highdma(dev, skb) (0) #endif +struct dev_gso_cb { + void (*destructor)(struct sk_buff *skb); +}; + +#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)-cb) + +static void dev_gso_skb_destructor(struct sk_buff *skb) +{ + struct dev_gso_cb *cb; + + do { + struct sk_buff *nskb = skb-next; + + skb-next = nskb-next; + nskb-next = NULL; + kfree_skb(nskb); + } while (skb-next); + + cb = DEV_GSO_CB(skb); + if (cb-destructor) + cb-destructor(skb); +} + +/** +
[4/5] [NET]: Added GSO toggle
Hi: [NET]: Added GSO toggle This patch adds a generic segmentation offload toggle that can be turned on/off for each net device. For now it only supports in TCPv4. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -408,6 +408,8 @@ struct ethtool_ops { #define ETHTOOL_GPERMADDR 0x0020 /* Get permanent hardware address */ #define ETHTOOL_GUFO 0x0021 /* Get UFO enable (ethtool_value) */ #define ETHTOOL_SUFO 0x0022 /* Set UFO enable (ethtool_value) */ +#define ETHTOOL_GGSO 0x0023 /* Get GSO enable (ethtool_value) */ +#define ETHTOOL_SGSO 0x0024 /* Set GSO enable (ethtool_value) */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -309,6 +309,7 @@ struct net_device #define NETIF_F_HW_VLAN_RX 256 /* Receive VLAN hw acceleration */ #define NETIF_F_HW_VLAN_FILTER 512 /* Receive filtering on VLAN */ #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN packets */ +#define NETIF_F_GSO2048/* Enable software GSO. */ #define NETIF_F_LLTX 4096/* LockLess TX */ /* Segmentation offload features */ diff --git a/include/net/sock.h b/include/net/sock.h --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1031,9 +1031,13 @@ static inline void sk_setup_caps(struct { __sk_dst_set(sk, dst); sk-sk_route_caps = dst-dev-features; + if (sk-sk_route_caps NETIF_F_GSO) + sk-sk_route_caps |= NETIF_F_TSO; if (sk-sk_route_caps NETIF_F_TSO) { if (sock_flag(sk, SOCK_NO_LARGESEND) || dst-header_len) sk-sk_route_caps = ~NETIF_F_TSO; + else + sk-sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM; } } diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -376,15 +376,20 @@ void br_features_recompute(struct net_br features = br-feature_mask ~NETIF_F_ALL_CSUM; list_for_each_entry(p, br-port_list, list) { - if (checksum NETIF_F_NO_CSUM - !(p-dev-features NETIF_F_NO_CSUM)) + unsigned long feature = p-dev-features; + + if (checksum NETIF_F_NO_CSUM !(feature NETIF_F_NO_CSUM)) checksum ^= NETIF_F_NO_CSUM | NETIF_F_HW_CSUM; - if (checksum NETIF_F_HW_CSUM - !(p-dev-features NETIF_F_HW_CSUM)) + if (checksum NETIF_F_HW_CSUM !(feature NETIF_F_HW_CSUM)) checksum ^= NETIF_F_HW_CSUM | NETIF_F_IP_CSUM; - if (!(p-dev-features NETIF_F_IP_CSUM)) + if (!(feature NETIF_F_IP_CSUM)) checksum = 0; - features = p-dev-features; + + if (feature NETIF_F_GSO) + feature |= NETIF_F_TSO; + feature |= NETIF_F_GSO; + + features = feature; } br-dev-features = features | checksum | NETIF_F_LLTX; diff --git a/net/core/ethtool.c b/net/core/ethtool.c --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -614,6 +614,29 @@ static int ethtool_set_ufo(struct net_de return dev-ethtool_ops-set_ufo(dev, edata.data); } +static int ethtool_get_gso(struct net_device *dev, char __user *useraddr) +{ + struct ethtool_value edata = { ETHTOOL_GGSO }; + + edata.data = dev-features NETIF_F_GSO; + if (copy_to_user(useraddr, edata, sizeof(edata))) +return -EFAULT; + return 0; +} + +static int ethtool_set_gso(struct net_device *dev, char __user *useraddr) +{ + struct ethtool_value edata; + + if (copy_from_user(edata, useraddr, sizeof(edata))) + return -EFAULT; + if (edata.data) + dev-features |= NETIF_F_GSO; + else + dev-features = ~NETIF_F_GSO; + return 0; +} + static int ethtool_self_test(struct net_device *dev, char __user *useraddr) { struct ethtool_test test; @@ -905,6 +928,12 @@ int dev_ethtool(struct ifreq *ifr) case ETHTOOL_SUFO: rc = ethtool_set_ufo(dev, useraddr); break; + case ETHTOOL_GGSO: + rc = ethtool_get_gso(dev, useraddr); + break; + case ETHTOOL_SGSO: + rc = ethtool_set_gso(dev, useraddr); + break; default: rc = -EOPNOTSUPP; }
[5/5] [IPSEC]: Handle GSO packets
Hi: [IPSEC]: Handle GSO packets This patch segments GSO packets received by the IPsec stack. This can happen when a NIC driver injects GSO packets into the stack which are then forwarded to another host. The primary application of this is going to be Xen where its backend driver may inject GSO packets into dom0. Of course this also can be used by other virtualisation schemes such as VMWare or UML since the tap device could be modified to inject GSO packets received through splice. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c --- a/net/ipv4/xfrm4_output.c +++ b/net/ipv4/xfrm4_output.c @@ -9,6 +9,8 @@ */ #include linux/compiler.h +#include linux/if_ether.h +#include linux/kernel.h #include linux/skbuff.h #include linux/spinlock.h #include linux/netfilter_ipv4.h @@ -97,16 +99,10 @@ error_nolock: goto out_exit; } -static int xfrm4_output_finish(struct sk_buff *skb) +static int xfrm4_output_finish2(struct sk_buff *skb) { int err; -#ifdef CONFIG_NETFILTER - if (!skb-dst-xfrm) { - IPCB(skb)-flags |= IPSKB_REROUTED; - return dst_output(skb); - } -#endif while (likely((err = xfrm4_output_one(skb)) == 0)) { nf_reset(skb); @@ -119,7 +115,7 @@ static int xfrm4_output_finish(struct sk return dst_output(skb); err = nf_hook(PF_INET, NF_IP_POST_ROUTING, skb, NULL, - skb-dst-dev, xfrm4_output_finish); + skb-dst-dev, xfrm4_output_finish2); if (unlikely(err != 1)) break; } @@ -127,6 +123,48 @@ static int xfrm4_output_finish(struct sk return err; } +static int xfrm4_output_finish(struct sk_buff *skb) +{ + struct sk_buff *segs; + +#ifdef CONFIG_NETFILTER + if (!skb-dst-xfrm) { + IPCB(skb)-flags |= IPSKB_REROUTED; + return dst_output(skb); + } +#endif + + if (!skb_shinfo(skb)-gso_size) + return xfrm4_output_finish2(skb); + + skb-protocol = htons(ETH_P_IP); + segs = skb_gso_segment(skb, 0); + kfree_skb(skb); + if (unlikely(IS_ERR(segs))) + return PTR_ERR(segs); + + do { + struct sk_buff *nskb = segs-next; + int err; + + segs-next = NULL; + err = xfrm4_output_finish2(segs); + + if (unlikely(err)) { + while ((segs = nskb)) { + nskb = segs-next; + segs-next = NULL; + kfree_skb(segs); + } + return err; + } + + segs = nskb; + } while (segs); + + return 0; +} + int xfrm4_output(struct sk_buff *skb) { return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb-dst-dev, diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c --- a/net/ipv6/xfrm6_output.c +++ b/net/ipv6/xfrm6_output.c @@ -94,7 +94,7 @@ error_nolock: goto out_exit; } -static int xfrm6_output_finish(struct sk_buff *skb) +static int xfrm6_output_finish2(struct sk_buff *skb) { int err; @@ -110,7 +110,7 @@ static int xfrm6_output_finish(struct sk return dst_output(skb); err = nf_hook(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, - skb-dst-dev, xfrm6_output_finish); + skb-dst-dev, xfrm6_output_finish2); if (unlikely(err != 1)) break; } @@ -118,6 +118,41 @@ static int xfrm6_output_finish(struct sk return err; } +static int xfrm6_output_finish(struct sk_buff *skb) +{ + struct sk_buff *segs; + + if (!skb_shinfo(skb)-gso_size) + return xfrm6_output_finish2(skb); + + skb-protocol = htons(ETH_P_IP); + segs = skb_gso_segment(skb, 0); + kfree_skb(skb); + if (unlikely(IS_ERR(segs))) + return PTR_ERR(segs); + + do { + struct sk_buff *nskb = segs-next; + int err; + + segs-next = NULL; + err = xfrm6_output_finish2(segs); + + if (unlikely(err)) { + while ((segs = nskb)) { + nskb = segs-next; + segs-next = NULL; + kfree_skb(segs); + } + return err; + } + + segs = nskb; + } while (segs); + + return 0; +} + int xfrm6_output(struct sk_buff *skb) { return NF_HOOK(PF_INET6,
Re: [0/5] GSO: Generic Segmentation Offload
Hi: If anyone is interested here is the incremental patch against the previous series. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/net/core/dev.c b/net/core/dev.c index 9c68ab8..d293e0f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1415,7 +1415,7 @@ gso: /* Disable soft irqs for various locks below. Also * stops preemption for RCU. */ - local_bh_disable(); + rcu_read_lock_bh(); /* Updates of qdisc are serialized by queue_lock. * The struct Qdisc which is pointed to by qdisc is now a @@ -1486,13 +1486,13 @@ #endif } rc = -ENETDOWN; - local_bh_enable(); + rcu_read_unlock_bh(); out_kfree_skb: kfree_skb(skb); return rc; out: - local_bh_enable(); + rcu_read_unlock_bh(); return rc; } diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 472cb5a..4cdd6ca 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -185,9 +185,13 @@ requeue: void __qdisc_run(struct net_device *dev) { + if (unlikely(dev-qdisc == noop_qdisc)) + goto out; + while (qdisc_restart(dev) 0 !netif_queue_stopped(dev)) /* NOTHING */; +out: clear_bit(__LINK_STATE_QDISC_RUNNING, dev-state); } @@ -581,20 +585,24 @@ void dev_deactivate(struct net_device *d spin_lock_bh(dev-queue_lock); qdisc = dev-qdisc; dev-qdisc = noop_qdisc; - skb = dev-gso_skb; - dev-gso_skb = NULL; qdisc_reset(qdisc); spin_unlock_bh(dev-queue_lock); - kfree_skb(skb); dev_watchdog_down(dev); - while (test_bit(__LINK_STATE_SCHED, dev-state)) + /* Wait for outstanding dev_queue_xmit calls. */ + synchronize_rcu(); + + /* Wait for outstanding qdisc_run calls. */ + while (test_bit(__LINK_STATE_QDISC_RUNNING, dev-state)) yield(); - spin_unlock_wait(dev-_xmit_lock); + if (dev-gso_skb) { + kfree_skb(dev-gso_skb); + dev-gso_skb = NULL; + } } void dev_init_scheduler(struct net_device *dev) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[3/5] [NET]: Add software TSOv4
Hi: [NET]: Add software TSOv4 This patch adds the GSO implementation for IPv4 TCP. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1299,6 +1299,7 @@ extern void skb_split(struct sk_b struct sk_buff *skb1, const u32 len); extern void skb_release_data(struct sk_buff *skb); +extern struct sk_buff *skb_segment(struct sk_buff *skb, int sg); static inline void *skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) diff --git a/include/net/protocol.h b/include/net/protocol.h --- a/include/net/protocol.h +++ b/include/net/protocol.h @@ -37,6 +37,7 @@ struct net_protocol { int (*handler)(struct sk_buff *skb); void(*err_handler)(struct sk_buff *skb, u32 info); + struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg); int no_policy; }; diff --git a/include/net/tcp.h b/include/net/tcp.h --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1087,6 +1087,8 @@ extern struct request_sock_ops tcp_reque extern int tcp_v4_destroy_sock(struct sock *sk); +extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg); + #ifdef CONFIG_PROC_FS extern int tcp4_proc_init(void); extern void tcp4_proc_exit(void); diff --git a/net/core/skbuff.c b/net/core/skbuff.c --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1826,6 +1826,132 @@ unsigned char *skb_pull_rcsum(struct sk_ EXPORT_SYMBOL_GPL(skb_pull_rcsum); +/** + * skb_segment - Perform protocol segmentation on skb. + * @skb: buffer to segment + * @sg: whether scatter-gather can be used for generated segments + * + * This function performs segmentation on the given skb. It returns + * the segment at the given position. It returns NULL if there are + * no more segments to generate, or when an error is encountered. + */ +struct sk_buff *skb_segment(struct sk_buff *skb, int sg) +{ + struct sk_buff *segs = NULL; + struct sk_buff *tail = NULL; + unsigned int mss = skb_shinfo(skb)-gso_size; + unsigned int doffset = skb-data - skb-mac.raw; + unsigned int offset = doffset; + unsigned int headroom; + unsigned int len; + int nfrags = skb_shinfo(skb)-nr_frags; + int err = -ENOMEM; + int i = 0; + int pos; + + __skb_push(skb, doffset); + headroom = skb_headroom(skb); + pos = skb_headlen(skb); + + do { + struct sk_buff *nskb; + skb_frag_t *frag; + int hsize, nsize; + int k; + int size; + + len = skb-len - offset; + if (len mss) + len = mss; + + hsize = skb_headlen(skb) - offset; + if (hsize 0) + hsize = 0; + nsize = hsize + doffset; + if (nsize len + doffset || !sg) + nsize = len + doffset; + + nskb = alloc_skb(nsize + headroom, GFP_ATOMIC); + if (unlikely(!nskb)) + goto err; + + if (segs) + tail-next = nskb; + else + segs = nskb; + tail = nskb; + + nskb-dev = skb-dev; + nskb-priority = skb-priority; + nskb-protocol = skb-protocol; + nskb-dst = dst_clone(skb-dst); + memcpy(nskb-cb, skb-cb, sizeof(skb-cb)); + nskb-pkt_type = skb-pkt_type; + nskb-mac_len = skb-mac_len; + + skb_reserve(nskb, headroom); + nskb-mac.raw = nskb-data; + nskb-nh.raw = nskb-data + skb-mac_len; + nskb-h.raw = nskb-nh.raw + (skb-h.raw - skb-nh.raw); + memcpy(skb_put(nskb, doffset), skb-data, doffset); + + if (!sg) { + nskb-csum = skb_copy_and_csum_bits(skb, offset, + skb_put(nskb, len), + len, 0); + continue; + } + + frag = skb_shinfo(nskb)-frags; + k = 0; + + nskb-ip_summed = CHECKSUM_HW; + nskb-csum = skb-csum; + memcpy(skb_put(nskb, hsize), skb-data + offset, hsize); + + while (pos offset + len) { + BUG_ON(i = nfrags); + + *frag = skb_shinfo(skb)-frags[i]; + get_page(frag-page); +
Re: [3/5] [NET]: Add software TSOv4
On Thu, Jun 22, 2006 at 06:14:00PM +1000, herbert wrote: [NET]: Add software TSOv4 Doh, forgot to remove an unused declaration. Here is an updated version. [NET]: Add software TSOv4 This patch adds the GSO implementation for IPv4 TCP. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -406,6 +406,9 @@ struct net_device struct list_headqdisc_list; unsigned long tx_queue_len; /* Max frames per queue allowed */ + /* Partially transmitted GSO packet. */ + struct sk_buff *gso_skb; + /* ingress path synchronizer */ spinlock_t ingress_lock; struct Qdisc*qdisc_ingress; @@ -540,6 +543,7 @@ struct packet_type { struct net_device *, struct packet_type *, struct net_device *); + struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg); void*af_packet_priv; struct list_headlist; }; @@ -690,7 +694,8 @@ extern int dev_change_name(struct net_d extern int dev_set_mtu(struct net_device *, int); extern int dev_set_mac_address(struct net_device *, struct sockaddr *); -extern voiddev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev); +extern int dev_hard_start_xmit(struct sk_buff *skb, + struct net_device *dev); extern voiddev_init(void); @@ -964,6 +969,7 @@ extern int netdev_max_backlog; extern int weight_p; extern int netdev_set_master(struct net_device *dev, struct net_device *master); extern int skb_checksum_help(struct sk_buff *skb, int inward); +extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg); #ifdef CONFIG_BUG extern void netdev_rx_csum_fault(struct net_device *dev); #else diff --git a/net/core/dev.c b/net/core/dev.c --- a/net/core/dev.c +++ b/net/core/dev.c @@ -116,6 +116,7 @@ #include asm/current.h #include linux/audit.h #include linux/dmaengine.h +#include linux/err.h /* * The list of packet types we will receive (as opposed to discard) @@ -1048,7 +1049,7 @@ static inline void net_timestamp(struct * taps currently in use. */ -void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) +static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) { struct packet_type *ptype; @@ -1186,6 +1187,40 @@ out: return ret; } +/** + * skb_gso_segment - Perform segmentation on skb. + * @skb: buffer to segment + * @sg: whether scatter-gather is supported on the target. + * + * This function segments the given skb and returns a list of segments. + */ +struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg) +{ + struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT); + struct packet_type *ptype; + int type = skb-protocol; + + BUG_ON(skb_shinfo(skb)-frag_list); + BUG_ON(skb-ip_summed != CHECKSUM_HW); + + skb-mac.raw = skb-data; + skb-mac_len = skb-nh.raw - skb-data; + __skb_pull(skb, skb-mac_len); + + rcu_read_lock(); + list_for_each_entry_rcu(ptype, ptype_base[ntohs(type) 15], list) { + if (ptype-type == type !ptype-dev ptype-gso_segment) { + segs = ptype-gso_segment(skb, sg); + break; + } + } + rcu_read_unlock(); + + return segs; +} + +EXPORT_SYMBOL(skb_gso_segment); + /* Take action when hardware reception checksum errors are detected. */ #ifdef CONFIG_BUG void netdev_rx_csum_fault(struct net_device *dev) @@ -1222,6 +1257,86 @@ static inline int illegal_highdma(struct #define illegal_highdma(dev, skb) (0) #endif +struct dev_gso_cb { + void (*destructor)(struct sk_buff *skb); +}; + +#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)-cb) + +static void dev_gso_skb_destructor(struct sk_buff *skb) +{ + struct dev_gso_cb *cb; + + do { + struct sk_buff *nskb = skb-next; + + skb-next = nskb-next; + nskb-next = NULL; + kfree_skb(nskb); + } while (skb-next); + + cb = DEV_GSO_CB(skb); + if (cb-destructor) + cb-destructor(skb); +} + +/** + * dev_gso_segment - Perform emulated hardware segmentation on skb. + * @skb: buffer to segment + * + * This function segments the given skb and stores the list of segments + * in
Re: [patch] ipv6 source address selection in addrconf.c (2.6.17)
From: YOSHIFUJI Hideaki [EMAIL PROTECTED] Date: Thu, 22 Jun 2006 01:12:57 +0900 (JST) I think it is trivial enough to push this to -stable as well. Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] Ok, done. Thanks a lot! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPV6] ADDRCONF: Fix default source address selection without CONFIG_IPV6_PRIVACY
From: YOSHIFUJI Hideaki [EMAIL PROTECTED] Date: Thu, 22 Jun 2006 00:23:41 +0900 (JST) We need to update hiscore.rule even if we don't enable CONFIG_IPV6_PRIVACY, because we have more less significant rule; longest match. Applied, thank you. I think it is suitable for -stable as well. Agreed, I have pushed it to -stable. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm patch] drivers/net/ni5010.c: fix compile error
Hi, On Wed, Jun 21, 2006 at 05:10:57PM +0200, Adrian Bunk wrote: On Wed, Jun 21, 2006 at 03:48:57AM -0700, Andrew Morton wrote: ... Changes since 2.6.17-rc6-mm2: ... +ni5010-netcard-cleanup.patch netdev cleanup ... This patch fixes the following compile error with CONFIG_NI5010=y: Doh, thanks! (that should teach me to do non-module runs, too) Andreas Mohr - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ? (was Re: Possible leaks in network drivers)
Herbert Xu wrote: This patch uses pskb_expand_head to expand the existing skb and linearize Seems sane to me. it if needed. Actually, someone should sift through every instance of skb_pad on a non-linear skb as they do not fit the reasons why this was originally created. Non-linear skbs smaller than ETH_ZLEN seem unlikely. Overall, the skb_pad() changes were made over a short span of time, often to older and under-used drivers, so I would not be surprised to find rough edges or the occasional bug. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ?
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 22 Jun 2006 12:30:29 +1000 On Thu, Jun 22, 2006 at 10:55:44AM +1000, Herbert Xu wrote: I think skb_padto simply shouldn't allocate a new skb. It only needs to extend the data area. OK, here is a patch to make it do that. [NET]: Avoid allocating skb in skb_pad Want me to let this cook in 2.6.18 for a while before sending it off to -stable? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ? (was Re: Possible leaks in network drivers)
On Thu, Jun 22, 2006 at 04:22:22AM -0400, Jeff Garzik wrote: it if needed. Actually, someone should sift through every instance of skb_pad on a non-linear skb as they do not fit the reasons why this was originally created. Non-linear skbs smaller than ETH_ZLEN seem unlikely. When I was grepping it seems that a few drivers were using it with a length other than ETH_ZLEN. I've just done another grep and here are the potential suspects: cassini.c starfire.c yellowfin.c Also, the skb_pad in drivers/s390/net/claw.c didn't check for errors at all. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ?
On Thu, Jun 22, 2006 at 01:26:09AM -0700, David Miller wrote: Want me to let this cook in 2.6.18 for a while before sending it off to -stable? You know I'm never one to push anything quickly so absolutely yes :) -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ?
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 22 Jun 2006 18:30:37 +1000 On Thu, Jun 22, 2006 at 01:26:09AM -0700, David Miller wrote: Want me to let this cook in 2.6.18 for a while before sending it off to -stable? You know I'm never one to push anything quickly so absolutely yes :) Ok, applied to net-2.6.18 for now :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm patch] drivers/net/ni5010.c: fix compile error
On Thu, Jun 22, 2006 at 10:13:16AM +0200, Andreas Mohr wrote: Hi, On Wed, Jun 21, 2006 at 05:10:57PM +0200, Adrian Bunk wrote: On Wed, Jun 21, 2006 at 03:48:57AM -0700, Andrew Morton wrote: ... Changes since 2.6.17-rc6-mm2: ... +ni5010-netcard-cleanup.patch netdev cleanup ... This patch fixes the following compile error with CONFIG_NI5010=y: Doh, thanks! (that should teach me to do non-module runs, too) And change the driver to no longer use Space.c? ;-) Andreas Mohr cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: No interfaces under /proc/sys/net/ipv4/conf/
Hasso Tepper wrote: After upgrade to 2.6.16.20 from 2.6.11 I discovered that no dynamic interfaces (vlans, tunnels) appear under /proc/sys/net/ipv4/conf/. /proc/sys/net/ipv6/conf/ is OK. OK, realised out that it's feature. Entries in /proc/sys/net/*/conf/ are not created if interface doesn't have at least one ipv4/ipv6 address. I can think of workarounds for most of problems (although it breaks a hell lot of software here), but how I suppose to configure ipv6 settings for interfaces which have to obtain global ipv6 address via autoconf so that it will work even if cable is not plugged in? I did via /etc/sysctl.conf, but now ... machine boots with no link = no link-local address = no /proc/sys/net/ipv6/conf/interfce = configuration fails. regards, -- Hasso Tepper - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Prevent transmission after dev_deactivate
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 22 Jun 2006 18:08:56 +1000 I found a bug in my GSO patches with the shutdown handling in dev_deactivate. It provided enough impetus for me to finally clean up this function :) This patch is against Linus's tree. [NET]: Prevent transmission after dev_deactivate Looks good, applied, thanks a lot! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ? (was Re: Possible leaks in network drivers)
Herbert Xu wrote: On Thu, Jun 22, 2006 at 04:22:22AM -0400, Jeff Garzik wrote: it if needed. Actually, someone should sift through every instance of skb_pad on a non-linear skb as they do not fit the reasons why this was originally created. Non-linear skbs smaller than ETH_ZLEN seem unlikely. When I was grepping it seems that a few drivers were using it with a length other than ETH_ZLEN. I've just done another grep and here are the potential suspects: cassini.c starfire.c yellowfin.c That doesn't really invalidate the point :) These drivers are still only padding very small packets. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/7] NetLabel: core network changes
From: [EMAIL PROTECTED] Date: Wed, 21 Jun 2006 15:42:37 -0400 Index: linux-2.6.17.i686-quilt/include/linux/netlink.h === --- linux-2.6.17.i686-quilt.orig/include/linux/netlink.h +++ linux-2.6.17.i686-quilt/include/linux/netlink.h @@ -21,6 +21,7 @@ #define NETLINK_DNRTMSG 14 /* DECnet routing messages */ #define NETLINK_KOBJECT_UEVENT 15 /* Kernel messages to userspace */ #define NETLINK_GENERIC 16 +#define NETLINK_NETLABEL 17 /* Network packet labeling */ #define MAX_LINKS 32 Please use generic netlink. Jamal posted a very nice document recently on how to use it properly. You can read that thread here: http://marc.theaimsgroup.com/?l=linux-netdevm=115072450928755w=2 Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ? (was Re: Possible leaks in network drivers)
On Thu, Jun 22, 2006 at 04:57:39AM -0400, Jeff Garzik wrote: Non-linear skbs smaller than ETH_ZLEN seem unlikely. When I was grepping it seems that a few drivers were using it with a length other than ETH_ZLEN. I've just done another grep and here are the potential suspects: cassini.c starfire.c yellowfin.c That doesn't really invalidate the point :) These drivers are still only padding very small packets. Hmm, at least cassini pads it to 255 for gigabit... Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/7] NetLabel: core network changes
From: [EMAIL PROTECTED] Date: Wed, 21 Jun 2006 15:42:37 -0400 Index: linux-2.6.17.i686-quilt/net/ipv4/Makefile === --- linux-2.6.17.i686-quilt.orig/net/ipv4/Makefile +++ linux-2.6.17.i686-quilt/net/ipv4/Makefile @@ -42,6 +42,9 @@ obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybl obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o +ifeq ($(CONFIG_NETLABEL_CIPSOV4),y) +obj-y += cipso_ipv4.o +endif Why not obj-$CONFIG_NETLABEL_CIPSOV4 += cipso_ipv4.o? The whole idea behind the obj-$CONFIG_OPTION technique is to avoid conditionals all over the makefile. Index: linux-2.6.17.i686-quilt/net/ipv4/af_inet.c === --- linux-2.6.17.i686-quilt.orig/net/ipv4/af_inet.c +++ linux-2.6.17.i686-quilt/net/ipv4/af_inet.c @@ -114,6 +114,7 @@ #ifdef CONFIG_IP_MROUTE #include linux/mroute.h #endif +#include net/netlabel.h DEFINE_SNMP_STAT(struct linux_mib, net_statistics) __read_mostly; @@ -616,6 +617,8 @@ int inet_accept(struct socket *sock, str sock_graft(sk2, newsock); + netlbl_socket_inet_accept(sock, newsock); + newsock-state = SS_CONNECTED; err = 0; release_sock(sk2); Neither the netlabel.h header not the implementation of the netlbl_socket_inet_accept() function exist at this point in your patch set. At each patch point, the tree must build and function properly. This means you have to split up and order your changes correctly, gradually building up the infrastructure and then finally plugging it in and making use of it. Nobody can test your work in an incremental fashion, and thus it's not possible to determine if a bug or behavior gets introduced at patch 2, 3 or 4, for example. + if (cipso_v4_validate(optptr)) { + pp_ptr = optptr; + goto error; + } + break; Same thing here, cipso_v4_validate() doesn't exist in the tree at this point in the patch set, so the tree doesn't build after applying this patch. Please split up your submission properly. I really can't sanely review the rest of this until you dice up your changes properly. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ? (was Re: Possible leaks in network drivers)
On Thu, Jun 22, 2006 at 07:02:27PM +1000, herbert wrote: cassini.c starfire.c yellowfin.c That doesn't really invalidate the point :) These drivers are still only padding very small packets. Hmm, at least cassini pads it to 255 for gigabit... The one in starfire looks especially dodgy. It supports SG and also requires the whole length to be a multiple of 4 if the firmware is broken. The question is do they really intend this or do they want each fragment to terminate on a 4-byte boundary. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/7] NetLabel: CIPSOv4 engine
From: [EMAIL PROTECTED] Date: Wed, 21 Jun 2006 15:42:38 -0400 Add support for the Commercial IP Security Option (CIPSO) to the IPv4 network stack. CIPSO has become a de-facto standard for trusted/labeled networking amongst existing Trusted Operating Systems such as Trusted Solaris, HP-UX CMW, etc. This implementation is designed to be used with the NetLabel subsystem to provide explicit packet labeling to LSM developers. The thing that concerns me most about CIPSO is that even once users migrate to a more SELINUX native approach from this CIPSO stuff, the CIPSO code, it's bloat, and it's maintainence burdon will remain. It's easy to put stuff it, it's impossible to take stuff out even once it's largely unused by even it's original target audience. And that's what I see happening here. This is why, to be perfectly honest with you, I'd much rather something like this stay out-of-tree and people are strongly encouraged to use the more native stuff under Linux. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: No interfaces under /proc/sys/net/ipv4/conf/
Hasso Tepper: I can think of workarounds for most of problems (although it breaks a hell lot of software here), but how I suppose to configure ipv6 settings for interfaces which have to obtain global ipv6 address via autoconf so that it will work even if cable is not plugged in? I did via /etc/sysctl.conf, but now ... machine boots with no link = no link-local address = no /proc/sys/net/ipv6/conf/interfce = configuration fails. Just realized (via practical experience) that same question applies to interfaces configured via dhcp. regards, -- Hasso Tepper - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [0/5] GSO: Generic Segmentation Offload
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 22 Jun 2006 18:12:11 +1000 This is a repost of the GSO patches. The main change is the fix to a bug in the way dev-gso_skb is freed. This series requires the dev_deactivate patch that I just posted. Applied, thanks a lot Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 6730] New: pptp connection hang on heavy network load.
On Thu, 22 Jun 2006 03:06:00 -0700 [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=6730 Summary: pptp connection hang on heavy network load. Kernel Version: 2.6.17 Status: NEW Severity: blocking Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: 2.6.17 Distribution: debian Hardware Environment: NVidia nForce2 GF6600 forcedeth driver. Software Environment: latest debian sid dist-upgrade on 22 june 2006 Problem Description: when I make a many files (about 500 files near 1 Gb of total size, *.deb from last and previous upgrade for example ) upload to a server via ssh/scp, my vpn pptp connection hangs till i restart ppp connection. This problem persist not only a latest kernel. I don't remember when it begin to hangs. It very-very annoying and makes me difficult to make my daily job. If I do not make such heavy network vpn traffic, I can work for a weeks without reboots and any problems Steps to reproduce: Start to copy big amount of data over pptp connection from a client linux machine. Ex. via ssh/scp. We thought we'd fixed this :( - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suspending 802.11 drivers
On 6/21/06, Stefan Rompf [EMAIL PROTECTED] wrote: Am Mittwoch 21 Juni 2006 17:08 schrieb Luis R. Rodriguez: Since d80211 is already being patched for sysfs how about we use sysfs (and kobjects) to maintain the state at suspend() and resume(). This would allow userspace tools like supplicant running in the background to pick up from sysfs where it left off and for our drivers to save where we left off. Forgive me that I'm so insistant on this question, but this is important: What state that goes beyond the data settable with wireless ioctls/iwconfig (that is kept anyway) needs to be saved by the stack? Last association info is worthless, the assocation can be restored using the ESSID/BSSID/channel set with iwconfig or by wpa_supplicant. Important is that userspace is notified about the connection loss. Is there _any_ other information not recreatable from iwconfig settings that needs to be kept? Stefan Stefan, this is an excellent and valid question. Let me elaborate -- Its exactly those settings you mention I'm suggesting get saved onto sysfs by the driver and later get picked up by userspace. There are, however, other settings which could get saved when suspending too though, settings for example, which otherwise would get set by current private wireless ioctls. There are too many here to describe really, each device has their own set of of private attributes. More on this in another e-mail I'm about to send to netdev. Luis - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipv6 source address selection in addrconf.c (2.6.17)
YOSHIFUJI Hideaki / 吉藤英明 wrote: In article [EMAIL PROTECTED] (at Thu, 22 Jun 2006 00:57:56 +0200), Lukasz Stelmach [EMAIL PROTECTED] says: Lukasz Stelmach wrote: Lukasz Stelmach wrote: [] fd24:6f44:46bd:face:EUI64 fd24:6f44:46bd:face:RANDOM and 2002:531f:d667:face:EUI64 2002:531f:d667:face::RANDOM there seem to be no way to prefere 2002:: over fc00:: in rule 7 and it will be selected as long as it is before 2002:: on the list. I can see here that an implicit assumption has been made that an interface either is multihomed or private. The seventh rule should not IMHO break the whole process of selection but rather mark as selectable all private (random) addresses. And it should rather be done before rule 6. Hmm? We do not have such intention. In above case, when you connect to 2001:200:0:8002:203:47ff:fea5:3085, either 2002:531f:d667:face:EUI64 or 2002:531f:d667:face::RANDOM should be selected (depending on if use_tempaddr = 2), by the longest matching rule (Rule 8). I've chewd the code line by line and it tastes like it should work the way you say... OK I see the problem. I've used ifconfig which doesn't show deprecated flag and valid/prefered times which, combined with privacy, *seem* to cause some problems . I don't know yet if it is a problem of proper intervals in radvd.conf or is there still a bug in kernel. I'll let you know when I learn it. OK. That's enough for now. Let me get back to the real work ;-) Best regards. -- Było mi bardzo miło.Czwarta pospolita klęska, [...] Łukasz Już nie katolicka lecz złodziejska. (c)PP signature.asc Description: OpenPGP digital signature
[PATCH 0/32] TIPC updates
Here's a resend of the latest TIPC updates. I apologize for not properly submitting the patches for review the first time around. This patch set includes several minor bugfixes. Most of them ported over from an older TIPC 1.5.x branch maintained on sourceforge (that branch is being phased). Patches can be pulled from: git://tipc.cslab.ericsson.net/pub/git/tipc.git Summary: include/net/tipc/tipc_bearer.h | 12 ++ net/tipc/bcast.c | 79 --- net/tipc/bcast.h |2 net/tipc/bearer.c | 70 +++-- net/tipc/cluster.c | 22 ++-- net/tipc/config.c | 85 +++- net/tipc/core.c|7 + net/tipc/core.h| 21 +++- net/tipc/discover.c| 13 +- net/tipc/eth_media.c | 29 +++-- net/tipc/link.c| 217 +++- net/tipc/name_distr.c | 30 -- net/tipc/name_table.c | 203 - net/tipc/node.c| 78 -- net/tipc/node.h|2 net/tipc/node_subscr.c | 15 +-- net/tipc/port.c| 41 net/tipc/ref.c | 31 +- net/tipc/socket.c | 100 +++--- net/tipc/subscr.c | 18 ++- net/tipc/zone.c| 19 ++-- 21 files changed, 661 insertions(+), 433 deletions(-) Allan Stephens: [TIPC] Prevent name table corruption if no room for new publication [TIPC] Use correct upper bound when validating network zone number. [TIPC] Corrected potential misuse of tipc_media_addr structure. [TIPC] Allow ports to receive multicast messages through native API. [TIPC] Links now validate destination node specified by incoming messages. [TIPC] Multicast link failure now resets all links to nacking node. [TIPC] Allow compilation when CONFIG_TIPC_DEBUG is not set. [TIPC] Fixed privilege checking typo in dest_name_check(). [TIPC] Fix misleading comment in buf_discard() routine. [TIPC] Added support for MODULE_VERSION capability. [TIPC] Validate entire interface name when locating bearer to enable. [TIPC] Non-operation-affecting corrections to comments function definitions. [TIPC] Fixed connect() to detect a dest address that is missing or too short. [TIPC] Implied connect now saves dest name for retrieval as ancillary data. [TIPC] Can now return destination name of form {0,x,y} via ancillary data. [TIPC] Connected send now checks socket state when retrying congested send. [TIPC] Stream socket send indicates partial success if data partially sent. [TIPC] Improved performance of error checking during socket creation. [TIPC] recvmsg() now returns TIPC ancillary data using correct level (SOL_TIPC) [TIPC] Simplify code for returning partial success of stream send request. [TIPC] Optimized argument validation done by connect(). [TIPC] Withdrawing all names from nameless port now returns success, not error [TIPC] Added missing warning for out-of-memory condition [TIPC] Fixed memory leak in tipc_link_send() when destination is unreachable [TIPC] Disallow config operations that aren't supported in certain modes. [TIPC] First phase of assert() cleanup [TIPC] Enhanced cleaned up system messages; fixed 2 obscure memory leaks. [TIPC] Fixed link switchover bugs [TIPC] Get rid of dynamically allocated arrays in broadcast code. [TIPC] Fix incorrect correction to discovery timer frequency computation. Eric Sesterhenn: [TIPC] Fix for NULL pointer dereference Jon Maloy: [TIPC] Improved tolerance to promiscuous mode interface /Per - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ?
Ar Iau, 2006-06-22 am 01:34 -0700, ysgrifennodd David Miller: From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 22 Jun 2006 18:30:37 +1000 On Thu, Jun 22, 2006 at 01:26:09AM -0700, David Miller wrote: Want me to let this cook in 2.6.18 for a while before sending it off to -stable? You know I'm never one to push anything quickly so absolutely yes :) Ok, applied to net-2.6.18 for now :) The 8390 change (corrected version) also makes 8390.c faster so should be applied anyway, and the orinoco one fixes some code that isn't even needed and someone forgot to remove long ago. Otherwise the skb_padto behaviour change with the newer skb style makes a lot more sense I agree. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ?
Alan Cox [EMAIL PROTECTED] wrote: The 8390 change (corrected version) also makes 8390.c faster so should be applied anyway, and the orinoco one fixes some code that isn't even needed and someone forgot to remove long ago. Otherwise the skb_padto Yeah I agree totally. However, I haven't actually seen the fixed 8390 version being posted yet or at least not to netdev :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ?
On Thu, 2006-06-22 at 12:34 +0100, Alan Cox wrote: Ar Iau, 2006-06-22 am 01:34 -0700, ysgrifennodd David Miller: From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 22 Jun 2006 18:30:37 +1000 On Thu, Jun 22, 2006 at 01:26:09AM -0700, David Miller wrote: Want me to let this cook in 2.6.18 for a while before sending it off to -stable? You know I'm never one to push anything quickly so absolutely yes :) Ok, applied to net-2.6.18 for now :) The 8390 change (corrected version) also makes 8390.c faster so should be applied anyway, 8390 is such a race monster that a few cycles matter a lot! :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ?
On Thu, Jun 22, 2006 at 01:33:36PM +0200, Arjan van de Ven wrote: On Thu, 2006-06-22 at 12:34 +0100, Alan Cox wrote: The 8390 change (corrected version) also makes 8390.c faster so should be applied anyway, 8390 is such a race monster that a few cycles matter a lot! :-) It sure is. Back in the old days I could saturate a 10 Mbit ethernet segment using a Western Digital 8003 (the 8 bit ISA card) in a 386DX40 (running Linux 1.0, 1.2, and 1.3). Erik -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] sysfs + configfs on 802.11 wireless drivers
Rebel fleet of wireless developers, Here's some changes which I think our current wireless stacks could use to assist cleaning up WEs, private ioctls, and provide userspace a cleaner framework to interact with our wireless drivers. Kernel level: (1) Use the new *configfs* for all user-specific attributes (2) Use *sysfs* read-only kobjects for device-specific attributes like values which can be saved for suspend() and collected for resume(). IEEE 802.11 capabilities, features (for example radiotap), and what is currently settable/gettable from private ioctl realm along with its restrictions can also be exported via sysfs. (3) On resume() talk to userspace via netlink to read our sysfs and configfs us Userspace applications can then: (1) Interact with configfs for configuring wireless devices, including what used to be set by private ioctls (2) Retrieve attributes saved from sysfs and set them onto configfs after resume(). Sysfs will also tell us this card's capabilities, features and private data along with their respective restrictions we can work with so -- userspace can modify the available options which can be gettable/settable. (3) Respond to netlink communication from driver after resume() to set data through configfs --- I know we recently moved WE to netlink but I figured celebrating the happy marriage of Mr. sysfs and Mrs. configfs on 2.6.16 by giving them offspring would be nice and more appropriate. Here's an example run-through of how this would work: (1) A wireless device comes up and spits out device-specific default settings on sysfs (2) If a user wants to change essid, channel, power-save-mode (this is not suspend()), rate, and so forth userspace writes the settings into configfs, these would in turn get updated on sysfs by the driver. (3) Should the device go into suspend() the driver can then update its necessary attributes on sysfs required to recover from suspend() which may not have been updated yet (whatever they may be) (4) At resume() we could just have our driver read our sysfs attributes and try to set all of them back exactly how they were before but to reduce bloat on our drivers and since our state is already exported we could just have userspace do it for us so... we use netlink to communicate to userspace to go ahead and ask it to resume() us. Advantages of this would be userspace would always consistantly handle the assoc/desassoc and WPA in a consistent manner and as mentioned above, driver bloatness killing. (5) At resume() userspace reads sysfs and sets us back up through configfs Comments are appreciated, if this is something that seems desirable I can start cranking up some code. Luis - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ?
Ar Iau, 2006-06-22 am 13:33 +0200, ysgrifennodd Arjan van de Ven: The 8390 change (corrected version) also makes 8390.c faster so should be applied anyway, 8390 is such a race monster that a few cycles matter a lot! :-) There are generic 8390 clones for 100Mbit. I'm not suggesting its a good idea but people did it. Alan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Memory corruption in 8390.c ?
Ar Iau, 2006-06-22 am 21:29 +1000, ysgrifennodd Herbert Xu: Alan Cox [EMAIL PROTECTED] wrote: The 8390 change (corrected version) also makes 8390.c faster so should be applied anyway, and the orinoco one fixes some code that isn't even needed and someone forgot to remove long ago. Otherwise the skb_padto Yeah I agree totally. However, I haven't actually seen the fixed 8390 version being posted yet or at least not to netdev :) Ah the resounding clang of a subtle hint ;) Signed-off-by: Alan Cox [EMAIL PROTECTED] - Return 8390.c to the old way of handling short packets (which is also faster) - Remove the skb_padto from orinoco. This got left in when the padding bad write patch was added and is actually not needed. This is fixing a merge error way back when. - Wavelan can also use the stack based buffer trick if you want diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.17/drivers/net/8390.c linux-2.6.17/drivers/net/8390.c --- linux.vanilla-2.6.17/drivers/net/8390.c 2006-06-19 17:17:32.0 +0100 +++ linux-2.6.17/drivers/net/8390.c 2006-06-21 21:23:12.0 +0100 @@ -275,12 +275,14 @@ struct ei_device *ei_local = (struct ei_device *) netdev_priv(dev); int send_length = skb-len, output_page; unsigned long flags; + char buf[ETH_ZLEN]; + char *data = skb-data; if (skb-len ETH_ZLEN) { - skb = skb_padto(skb, ETH_ZLEN); - if (skb == NULL) - return 0; + memset(buf, 0, ETH_ZLEN); /* more efficient than doing just the needed bits */ + memcpy(buf, data, skb-len); send_length = ETH_ZLEN; + data = buf; } /* Mask interrupts from the ethercard. @@ -347,7 +349,7 @@ * trigger the send later, upon receiving a Tx done interrupt. */ - ei_block_output(dev, send_length, skb-data, output_page); + ei_block_output(dev, send_length, data, output_page); if (! ei_local-txing) { diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.17/drivers/net/wireless/orinoco.c linux-2.6.17/drivers/net/wireless/orinoco.c --- linux.vanilla-2.6.17/drivers/net/wireless/orinoco.c 2006-06-19 17:29:48.0 +0100 +++ linux-2.6.17/drivers/net/wireless/orinoco.c 2006-06-21 18:19:02.0 +0100 @@ -491,11 +491,8 @@ } /* Length of the packet body */ - /* FIXME: what if the skb is smaller than this? */ + /* A shorter data_len will be padded by hermes_bap_pwrite_pad */ len = max_t(int, ALIGN(skb-len, 2), ETH_ZLEN); - skb = skb_padto(skb, len); - if (skb == NULL) - goto fail; len -= ETH_HLEN; eh = (struct ethhdr *)skb-data; diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.17/drivers/net/wireless/wavelan.c linux-2.6.17/drivers/net/wireless/wavelan.c --- linux.vanilla-2.6.17/drivers/net/wireless/wavelan.c 2006-06-19 17:29:48.0 +0100 +++ linux-2.6.17/drivers/net/wireless/wavelan.c 2006-06-21 18:32:47.0 +0100 @@ -2903,6 +2903,7 @@ { net_local *lp = (net_local *) dev-priv; unsigned long flags; + char data[ETH_ZLEN]; #ifdef DEBUG_TX_TRACE printk(KERN_DEBUG %s: -wavelan_packet_xmit(0x%X)\n, dev-name, @@ -2937,15 +2938,16 @@ * able to detect collisions, therefore in theory we don't really * need to pad. Jean II */ if (skb-len ETH_ZLEN) { - skb = skb_padto(skb, ETH_ZLEN); - if (skb == NULL) - return 0; + memset(data, 0, ETH_ZLEN); + memcpy(data, skb-data, skb-len); + /* Write packet on the card */ + if(wv_packet_write(dev, data, ETH_ZLEN)) + return 1; /* We failed */ } - - /* Write packet on the card */ - if(wv_packet_write(dev, skb-data, skb-len)) + else if(wv_packet_write(dev, skb-data, skb-len)) return 1; /* We failed */ + dev_kfree_skb(skb); #ifdef DEBUG_TX_TRACE - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/7] NetLabel: CIPSOv4 engine
On Thursday 22 June 2006 5:12 am, David Miller wrote: From: [EMAIL PROTECTED] Date: Wed, 21 Jun 2006 15:42:38 -0400 Add support for the Commercial IP Security Option (CIPSO) to the IPv4 network stack. CIPSO has become a de-facto standard for trusted/labeled networking amongst existing Trusted Operating Systems such as Trusted Solaris, HP-UX CMW, etc. This implementation is designed to be used with the NetLabel subsystem to provide explicit packet labeling to LSM developers. The thing that concerns me most about CIPSO is that even once users migrate to a more SELINUX native approach from this CIPSO stuff, the CIPSO code, it's bloat, and it's maintainence burdon will remain. It's easy to put stuff it, it's impossible to take stuff out even once it's largely unused by even it's original target audience. And that's what I see happening here. This is why, to be perfectly honest with you, I'd much rather something like this stay out-of-tree and people are strongly encouraged to use the more native stuff under Linux. Well, not exactly the response I was hoping for, but let me plead my case one more time :) Traditional MLS CIPSO is a niche protocol, I won't try to argue that point, and I also won't try to argue that the NetLabel patch is late to the party, the IPsec/XFRM labeling approach has already been accepted as the SELinux packet labeling mechanism. However, the XFRM labeling mechanism in not currently supported by any OS other than Linux/SELinux. I have spoken with users that need CIPSO to interoperate with their other trusted systems, the XFRM approach is simply not a viable solution for them. I strongly believe that failure to support an interoperable packet labeling mechanism on Linux will seriously restrict Linux's deployment in trusted networks. It's all about compatibility and enabling Linux to be used in places it can't be used now. True, other OS vendors might support the SELinux/IPsec packet labeling approach, but I see very little in the way of motivation for them to do the work. If it makes you feel any better I do intend to support the Selopt approach (or at least something very similar) for CIPSO as envisioned by James Morris for the SELinux networking hooks of long ago. This will allow CIPSO to carry the full SELinux context making it a more SELINUX native approach than traditional MLS CIPSO. I just wanted to keep this initial patch set as small as possible (you can see how well that worked out) ... :) -- paul moore linux security @ hp - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 32/32] [TIPC] Fix incorrect correction to discovery timer frequency computation.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/discover.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/tipc/discover.c b/net/tipc/discover.c index ee9b448..2b84412 100644 --- a/net/tipc/discover.c +++ b/net/tipc/discover.c @@ -2,7 +2,7 @@ * net/tipc/discover.c * * Copyright (c) 2003-2006, Ericsson AB - * Copyright (c) 2005, Wind River Systems + * Copyright (c) 2005-2006, Wind River Systems * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -267,8 +267,8 @@ static void disc_timeout(struct link_req /* leave timer interval as is if already at a normal rate */ } else { req-timer_intv *= 2; - if (req-timer_intv TIPC_LINK_REQ_SLOW) - req-timer_intv = TIPC_LINK_REQ_SLOW; + if (req-timer_intv TIPC_LINK_REQ_FAST) + req-timer_intv = TIPC_LINK_REQ_FAST; if ((req-timer_intv == TIPC_LINK_REQ_FAST) (req-bearer-nodes.count)) req-timer_intv = TIPC_LINK_REQ_SLOW; -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 24/32] [TIPC] Withdrawing all names from nameless port now returns success, not error
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/port.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/net/tipc/port.c b/net/tipc/port.c index 360920b..899e08e 100644 --- a/net/tipc/port.c +++ b/net/tipc/port.c @@ -1171,8 +1171,6 @@ int tipc_withdraw(u32 ref, unsigned int p_ptr = tipc_port_lock(ref); if (!p_ptr) return -EINVAL; - if (!p_ptr-publ.published) - goto exit; if (!seq) { list_for_each_entry_safe(publ, tpubl, p_ptr-publications, pport_list) { @@ -1199,7 +1197,6 @@ int tipc_withdraw(u32 ref, unsigned int } if (list_empty(p_ptr-publications)) p_ptr-publ.published = 0; -exit: tipc_port_unlock(p_ptr); return res; } -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/32] [TIPC] Links now validate destination node specified by incoming messages.
From: Allan Stephens [EMAIL PROTECTED] This fix prevents link flopping and name table inconsistency problems arising when a node is assigned a different Z.C.N value than it used previously. (Changing the Z.C.N value causes other nodes to have two link endpoints sending to the same MAC address using two different destination Z.C.N values, requiring the receiving node to filter out the unwanted messages.) Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/link.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index 784b24b..955b87d 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1720,6 +1720,11 @@ #endif link_recv_non_seq(buf); continue; } + + if (unlikely(!msg_short(msg) +(msg_destnode(msg) != tipc_own_addr))) + goto cont; + n_ptr = tipc_node_find(msg_prevnode(msg)); if (unlikely(!n_ptr)) goto cont; -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/32] [TIPC] Multicast link failure now resets all links to nacking node.
From: Allan Stephens [EMAIL PROTECTED] This fix prevents node from crashing. Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/bcast.c | 32 +++--- net/tipc/link.c | 124 +- 2 files changed, 128 insertions(+), 28 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 2c4ecbe..00691b7 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -49,13 +49,19 @@ #include bearer.h #include name_table.h #include bcast.h - #define MAX_PKT_DEFAULT_MCAST 1500 /* bcast link max packet size (fixed) */ #define BCLINK_WIN_DEFAULT 20 /* bcast link window size (default) */ #define BCLINK_LOG_BUF_SIZE 0 +/* + * Loss rate for incoming broadcast frames; used to test retransmission code. + * Set to N to cause every N'th frame to be discarded; 0 = don't discard any. + */ + +#define TIPC_BCAST_LOSS_RATE 0 + /** * struct bcbearer_pair - a pair of bearers used by broadcast link * @primary: pointer to primary bearer @@ -165,21 +171,18 @@ static int bclink_ack_allowed(u32 n) * @after: sequence number of last packet to *not* retransmit * @to: sequence number of last packet to retransmit * - * Called with 'node' locked, bc_lock unlocked + * Called with bc_lock locked */ static void bclink_retransmit_pkt(u32 after, u32 to) { struct sk_buff *buf; - spin_lock_bh(bc_lock); buf = bcl-first_out; while (buf less_eq(buf_seqno(buf), after)) { buf = buf-next; } - if (buf != NULL) - tipc_link_retransmit(bcl, buf, mod(to - after)); - spin_unlock_bh(bc_lock); + tipc_link_retransmit(bcl, buf, mod(to - after)); } /** @@ -399,7 +402,10 @@ int tipc_bclink_send_msg(struct sk_buff */ void tipc_bclink_recv_pkt(struct sk_buff *buf) -{ +{ +#if (TIPC_BCAST_LOSS_RATE) + static int rx_count = 0; +#endif struct tipc_msg *msg = buf_msg(buf); struct node* node = tipc_node_find(msg_prevnode(msg)); u32 next_in; @@ -420,9 +426,13 @@ void tipc_bclink_recv_pkt(struct sk_buff tipc_node_lock(node); tipc_bclink_acknowledge(node, msg_bcast_ack(msg)); tipc_node_unlock(node); + spin_lock_bh(bc_lock); bcl-stats.recv_nacks++; + bcl-owner-next = node; /* remember requestor */ bclink_retransmit_pkt(msg_bcgap_after(msg), msg_bcgap_to(msg)); + bcl-owner-next = NULL; + spin_unlock_bh(bc_lock); } else { tipc_bclink_peek_nack(msg_destnode(msg), msg_bcast_tag(msg), @@ -433,6 +443,14 @@ void tipc_bclink_recv_pkt(struct sk_buff return; } +#if (TIPC_BCAST_LOSS_RATE) + if (++rx_count == TIPC_BCAST_LOSS_RATE) { + rx_count = 0; + buf_discard(buf); + return; + } +#endif + tipc_node_lock(node); receive: deferred = node-bclink.deferred_head; diff --git a/net/tipc/link.c b/net/tipc/link.c index 955b87d..ba7d3f1 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1604,40 +1604,121 @@ void tipc_link_push_queue(struct link *l tipc_bearer_schedule(l_ptr-b_ptr, l_ptr); } +static void link_reset_all(unsigned long addr) +{ + struct node *n_ptr; + char addr_string[16]; + u32 i; + + read_lock_bh(tipc_net_lock); + n_ptr = tipc_node_find((u32)addr); + if (!n_ptr) { + read_unlock_bh(tipc_net_lock); + return; /* node no longer exists */ + } + + tipc_node_lock(n_ptr); + + warn(Resetting all links to %s\n, +addr_string_fill(addr_string, n_ptr-addr)); + + for (i = 0; i MAX_BEARERS; i++) { + if (n_ptr-links[i]) { + link_print(n_ptr-links[i], TIPC_OUTPUT, + Resetting link\n); + tipc_link_reset(n_ptr-links[i]); + } + } + + tipc_node_unlock(n_ptr); + read_unlock_bh(tipc_net_lock); +} + +static void link_retransmit_failure(struct link *l_ptr, struct sk_buff *buf) +{ + struct tipc_msg *msg = buf_msg(buf); + + warn(Retransmission failure on link %s\n, l_ptr-name); + tipc_msg_print(TIPC_OUTPUT, msg, RETR-FAIL); + + if (l_ptr-addr) { + + /* Handle failure on standard link */ + + link_print(l_ptr, TIPC_OUTPUT, Resetting link\n); + tipc_link_reset(l_ptr); + + } else { + + /* Handle failure on broadcast link */ + + struct node *n_ptr; + char addr_string[16]; + +
[PATCH 20/32] [TIPC] Improved performance of error checking during socket creation.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c |9 +++-- 1 files changed, 3 insertions(+), 6 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 8cefacb..a1f2210 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -169,12 +169,6 @@ static int tipc_create(struct socket *so struct sock *sk; u32 ref; - if ((sock-type != SOCK_STREAM) - (sock-type != SOCK_SEQPACKET) - (sock-type != SOCK_DGRAM) - (sock-type != SOCK_RDM)) - return -EPROTOTYPE; - if (unlikely(protocol != 0)) return -EPROTONOSUPPORT; @@ -199,6 +193,9 @@ static int tipc_create(struct socket *so sock-ops = msg_ops; sock-state = SS_READY; break; + default: + tipc_deleteport(ref); + return -EPROTOTYPE; } sk = sk_alloc(AF_TIPC, GFP_KERNEL, tipc_proto, 1); -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 30/32] [TIPC] Fixed link switchover bugs
From: Allan Stephens [EMAIL PROTECTED] Incorporates several related fixes: - switchover now occurs when switching from an active link to a standby link - failure of a standby link no longer initiates switchover - links now display correct # of received packtes following reactivation Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/link.c | 30 -- net/tipc/node.c |7 +-- net/tipc/node.h |2 ++ 3 files changed, 31 insertions(+), 8 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index d7668b8..d646580 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -690,6 +690,7 @@ void tipc_link_reset(struct link *l_ptr) struct sk_buff *buf; u32 prev_state = l_ptr-state; u32 checkpoint = l_ptr-next_in_no; + int was_active_link = tipc_link_is_active(l_ptr); msg_set_session(l_ptr-pmsg, msg_session(l_ptr-pmsg) + 1); @@ -711,7 +712,7 @@ #if 0 tipc_printf(TIPC_CONS, \nReset link %s\n, l_ptr-name); dbg_link_dump(); #endif - if (tipc_node_has_active_links(l_ptr-owner) + if (was_active_link tipc_node_has_active_links(l_ptr-owner) l_ptr-owner-permit_changeover) { l_ptr-reset_checkpoint = checkpoint; l_ptr-exp_msg_count = START_CHANGEOVER; @@ -754,7 +755,7 @@ #endif static void link_activate(struct link *l_ptr) { - l_ptr-next_in_no = 1; + l_ptr-next_in_no = l_ptr-stats.recv_info = 1; tipc_node_link_up(l_ptr-owner, l_ptr); tipc_bearer_add_dest(l_ptr-b_ptr, l_ptr-addr); link_send_event(tipc_cfg_link_event, l_ptr, 1); @@ -2303,12 +2304,18 @@ void tipc_link_tunnel(struct link *l_ptr u32 length = msg_size(msg); tunnel = l_ptr-owner-active_links[selector 1]; - if (!tipc_link_is_up(tunnel)) + if (!tipc_link_is_up(tunnel)) { + warn(Link changeover error, +tunnel link no longer available\n); return; + } msg_set_size(tunnel_hdr, length + INT_H_SIZE); buf = buf_acquire(length + INT_H_SIZE); - if (!buf) + if (!buf) { + warn(Link changeover error, +unable to send tunnel msg\n); return; + } memcpy(buf-data, (unchar *)tunnel_hdr, INT_H_SIZE); memcpy(buf-data + INT_H_SIZE, (unchar *)msg, length); dbg(%c-%c:, l_ptr-b_ptr-net_plane, tunnel-b_ptr-net_plane); @@ -2328,19 +2335,23 @@ void tipc_link_changeover(struct link *l u32 msgcount = l_ptr-out_queue_size; struct sk_buff *crs = l_ptr-first_out; struct link *tunnel = l_ptr-owner-active_links[0]; - int split_bundles = tipc_node_has_redundant_links(l_ptr-owner); struct tipc_msg tunnel_hdr; + int split_bundles; if (!tunnel) return; - if (!l_ptr-owner-permit_changeover) + if (!l_ptr-owner-permit_changeover) { + warn(Link changeover error, +peer did not permit changeover\n); return; + } msg_init(tunnel_hdr, CHANGEOVER_PROTOCOL, ORIGINAL_MSG, TIPC_OK, INT_H_SIZE, l_ptr-addr); msg_set_bearer_id(tunnel_hdr, l_ptr-peer_bearer_id); msg_set_msgcnt(tunnel_hdr, msgcount); + dbg(Link changeover requires %u tunnel messages\n, msgcount); if (!l_ptr-first_out) { struct sk_buff *buf; @@ -2360,6 +2371,9 @@ void tipc_link_changeover(struct link *l return; } + split_bundles = (l_ptr-owner-active_links[0] != +l_ptr-owner-active_links[1]); + while (crs) { struct tipc_msg *msg = buf_msg(crs); @@ -2497,11 +2511,13 @@ static int link_recv_changeover_msg(stru dest_link-name); tipc_link_reset(dest_link); dest_link-exp_msg_count = msg_count; + dbg(Expecting %u tunnelled messages\n, msg_count); if (!msg_count) goto exit; } else if (dest_link-exp_msg_count == START_CHANGEOVER) { msg_dbg(tunnel_msg, BLK/FIRST/REC); dest_link-exp_msg_count = msg_count; + dbg(Expecting %u tunnelled messages\n, msg_count); if (!msg_count) goto exit; } @@ -2509,6 +2525,8 @@ static int link_recv_changeover_msg(stru /* Receive original message */ if (dest_link-exp_msg_count == 0) { + warn(Link switchover error, +got too many tunnelled messages\n); msg_dbg(tunnel_msg, OVERDUE/DROP/REC); dbg_print_link(dest_link, LINK:); goto exit; diff --git a/net/tipc/node.c b/net/tipc/node.c index 5f09754..ce9678e 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -125,6 +125,8 @@ void
[PATCH 8/32] [TIPC] Allow compilation when CONFIG_TIPC_DEBUG is not set.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/core.h | 19 ++- 1 files changed, 14 insertions(+), 5 deletions(-) diff --git a/net/tipc/core.h b/net/tipc/core.h index 1f2e8b2..d1edb7a 100644 --- a/net/tipc/core.h +++ b/net/tipc/core.h @@ -2,7 +2,7 @@ * net/tipc/core.h: Include file for TIPC global declarations * * Copyright (c) 2005-2006, Ericsson AB - * Copyright (c) 2005, Wind River Systems + * Copyright (c) 2005-2006, Wind River Systems * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -111,10 +111,6 @@ #endif #else -#ifndef DBG_OUTPUT -#define DBG_OUTPUT NULL -#endif - /* * TIPC debug support not included: * - system messages are printed to system console @@ -129,6 +125,19 @@ #define dbg(fmt, arg...) do {} while (0) #define msg_dbg(msg,txt) do {} while (0) #define dump(fmt,arg...) do {} while (0) + +/* + * TIPC_OUTPUT is defined to be the system console, while DBG_OUTPUT is + * the null print buffer. Thes ensures that any system or debug messages + * that are generated without using the above macros are handled correctly. + */ + +#undef TIPC_OUTPUT +#define TIPC_OUTPUT TIPC_CONS + +#undef DBG_OUTPUT +#define DBG_OUTPUT NULL + #endif -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 22/32] [TIPC] Simplify code for returning partial success of stream send request.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index abecf2d..6d4d2b0 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -969,7 +969,7 @@ static int recv_stream(struct kiocb *ioc restart: if (unlikely((skb_queue_len(sock-sk-sk_receive_queue) == 0) (flags MSG_DONTWAIT))) { - res = (sz_copied == 0) ? -EWOULDBLOCK : 0; + res = -EWOULDBLOCK; goto exit; } @@ -1060,7 +1060,7 @@ restart: exit: up(tsock-sem); - return res ? res : sz_copied; + return sz_copied ? sz_copied : res; } /** -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/32] [TIPC] Use correct upper bound when validating network zone number.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/core.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/tipc/core.c b/net/tipc/core.c index 3d0a8ee..31c7dd5 100644 --- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -198,7 +198,7 @@ static int __init tipc_init(void) tipc_max_publications = 1; tipc_max_subscriptions = 2000; tipc_max_ports = delimit(CONFIG_TIPC_PORTS, 127, 65536); - tipc_max_zones = delimit(CONFIG_TIPC_ZONES, 1, 511); + tipc_max_zones = delimit(CONFIG_TIPC_ZONES, 1, 255); tipc_max_clusters = delimit(CONFIG_TIPC_CLUSTERS, 1, 1); tipc_max_nodes = delimit(CONFIG_TIPC_NODES, 8, 2047); tipc_max_slaves = delimit(CONFIG_TIPC_SLAVE_NODES, 0, 2047); -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/32] [TIPC] Implied connect now saves dest name for retrieval as ancillary data.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 758b2d2..98550b9 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -470,6 +470,10 @@ static int send_msg(struct kiocb *iocb, if ((tsock-p-published) || ((sock-type == SOCK_STREAM) (total_len != 0))) return -EOPNOTSUPP; + if (dest-addrtype == TIPC_ADDR_NAME) { + tsock-p-conn_type = dest-addr.name.name.type; + tsock-p-conn_instance = dest-addr.name.name.instance; + } } if (down_interruptible(tsock-sem)) @@ -1269,10 +1273,6 @@ static int connect(struct socket *sock, msg = buf_msg(buf); res = auto_connect(sock, tsock, msg); if (!res) { - if (dst-addrtype == TIPC_ADDR_NAME) { - tsock-p-conn_type = dst-addr.name.name.type; - tsock-p-conn_instance = dst-addr.name.name.instance; - } if (!msg_data_sz(msg)) advance_queue(tsock); } -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/32] [TIPC] Non-operation-affecting corrections to comments function definitions.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index eaf4d69..0923213 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -437,7 +437,7 @@ static int dest_name_check(struct sockad * @iocb: (unused) * @sock: socket structure * @m: message to send - * @total_len: (unused) + * @total_len: length of message * * Message must have an destination specified explicitly. * Used for SOCK_RDM and SOCK_DGRAM messages, @@ -538,7 +538,7 @@ exit: * @iocb: (unused) * @sock: socket structure * @m: message to send - * @total_len: (unused) + * @total_len: length of message * * Used for SOCK_SEQPACKET messages and SOCK_STREAM data. * @@ -1386,7 +1386,7 @@ exit: /** * shutdown - shutdown socket connection * @sock: socket structure - * @how: direction to close (always treated as read + write) + * @how: direction to close (unused; always treated as read + write) * * Terminates connection (if necessary), then purges socket's receive queue. * @@ -1469,7 +1469,8 @@ restart: * Returns 0 on success, errno otherwise */ -static int setsockopt(struct socket *sock, int lvl, int opt, char *ov, int ol) +static int setsockopt(struct socket *sock, + int lvl, int opt, char __user *ov, int ol) { struct tipc_sock *tsock = tipc_sk(sock-sk); u32 value; @@ -1525,7 +1526,8 @@ static int setsockopt(struct socket *soc * Returns 0 on success, errno otherwise */ -static int getsockopt(struct socket *sock, int lvl, int opt, char *ov, int *ol) +static int getsockopt(struct socket *sock, + int lvl, int opt, char __user *ov, int *ol) { struct tipc_sock *tsock = tipc_sk(sock-sk); int len; -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/32] [TIPC] Optimized argument validation done by connect().
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c | 17 + 1 files changed, 13 insertions(+), 4 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 6d4d2b0..32d7784 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -455,7 +455,8 @@ static int send_msg(struct kiocb *iocb, if (unlikely(!dest)) return -EDESTADDRREQ; - if (unlikely(dest-family != AF_TIPC)) + if (unlikely((m-msg_namelen sizeof(*dest)) || +(dest-family != AF_TIPC))) return -EINVAL; needs_conn = (sock-state != SS_READY); @@ -1245,7 +1246,8 @@ static int connect(struct socket *sock, if (sock-state == SS_READY) return -EOPNOTSUPP; - /* MOVE THE REST OF THIS ERROR CHECKING TO send_msg()? */ + /* Issue Posix-compliant error code if socket is in the wrong state */ + if (sock-state == SS_LISTENING) return -EOPNOTSUPP; if (sock-state == SS_CONNECTING) @@ -1253,13 +1255,20 @@ static int connect(struct socket *sock, if (sock-state != SS_UNCONNECTED) return -EISCONN; - if ((destlen sizeof(*dst)) || (dst-family != AF_TIPC) || - ((dst-addrtype != TIPC_ADDR_NAME) (dst-addrtype != TIPC_ADDR_ID))) + /* +* Reject connection attempt using multicast address +* +* Note: send_msg() validates the rest of the address fields, +* so there's no need to do it here +*/ + + if (dst-addrtype == TIPC_ADDR_MCAST) return -EINVAL; /* Send a 'SYN-' to destination */ m.msg_name = dest; + m.msg_namelen = destlen; if ((res = send_msg(NULL, sock, m, 0)) 0) { sock-state = SS_DISCONNECTING; return res; -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/32] [TIPC] Connected send now checks socket state when retrying congested send.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 361dc34..9c834fc 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -565,15 +565,15 @@ static int send_packet(struct kiocb *ioc return -ERESTARTSYS; } -if (unlikely(sock-state != SS_CONNECTED)) { -if (sock-state == SS_DISCONNECTING) -res = -EPIPE; -else -res = -ENOTCONN; -goto exit; -} - do { + if (unlikely(sock-state != SS_CONNECTED)) { + if (sock-state == SS_DISCONNECTING) + res = -EPIPE; + else + res = -ENOTCONN; + goto exit; + } + res = tipc_send(tsock-p-ref, m-msg_iovlen, m-msg_iov); if (likely(res != -ELINKCONG)) { exit: -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/32] [TIPC] Corrected potential misuse of tipc_media_addr structure.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- include/net/tipc/tipc_bearer.h | 12 ++-- net/tipc/eth_media.c |4 +++- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/net/tipc/tipc_bearer.h b/include/net/tipc/tipc_bearer.h index 098607c..e07136d 100644 --- a/include/net/tipc/tipc_bearer.h +++ b/include/net/tipc/tipc_bearer.h @@ -49,10 +49,18 @@ #include linux/spinlock.h #define TIPC_MEDIA_TYPE_ETH1 +/* + * Destination address structure used by TIPC bearers when sending messages + * + * IMPORTANT: The fields of this structure MUST be stored using the specified + * byte order indicated below, as the structure is exchanged between nodes + * as part of a link setup process. + */ + struct tipc_media_addr { - __u32 type; + __u32 type;/* bearer type (network byte order) */ union { - __u8 eth_addr[6]; /* Ethernet bearer */ + __u8 eth_addr[6]; /* 48 bit Ethernet addr (byte array) */ #if 0 /* Prototypes for other possible bearer types */ diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index b646619..3ecb100 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -254,7 +254,9 @@ int tipc_eth_media_start(void) if (eth_started) return -EINVAL; - memset(bcast_addr, 0xff, sizeof(bcast_addr)); + bcast_addr.type = htonl(TIPC_MEDIA_TYPE_ETH); + memset(bcast_addr.dev_addr, 0xff, ETH_ALEN); + memset(eth_bearers, 0, sizeof(eth_bearers)); res = tipc_register_media(TIPC_MEDIA_TYPE_ETH, eth, -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/32] [TIPC] Can now return destination name of form {0,x,y} via ancillary data.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 98550b9..361dc34 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -731,6 +731,7 @@ static int anc_data_recv(struct msghdr * u32 anc_data[3]; u32 err; u32 dest_type; + int has_name; int res; if (likely(m-msg_controllen == 0)) @@ -755,24 +756,27 @@ static int anc_data_recv(struct msghdr * dest_type = msg ? msg_type(msg) : TIPC_DIRECT_MSG; switch (dest_type) { case TIPC_NAMED_MSG: + has_name = 1; anc_data[0] = msg_nametype(msg); anc_data[1] = msg_namelower(msg); anc_data[2] = msg_namelower(msg); break; case TIPC_MCAST_MSG: + has_name = 1; anc_data[0] = msg_nametype(msg); anc_data[1] = msg_namelower(msg); anc_data[2] = msg_nameupper(msg); break; case TIPC_CONN_MSG: + has_name = (tport-conn_type != 0); anc_data[0] = tport-conn_type; anc_data[1] = tport-conn_instance; anc_data[2] = tport-conn_instance; break; default: - anc_data[0] = 0; + has_name = 0; } - if (anc_data[0] + if (has_name (res = put_cmsg(m, SOL_SOCKET, TIPC_DESTNAME, 12, anc_data))) return res; -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 29/32] [TIPC] Enhanced cleaned up system messages; fixed 2 obscure memory leaks.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/bcast.c |2 + net/tipc/bcast.h |2 + net/tipc/bearer.c | 70 +++-- net/tipc/cluster.c| 22 +-- net/tipc/config.c |2 + net/tipc/discover.c |7 + net/tipc/link.c | 39 +++ net/tipc/name_distr.c | 10 --- net/tipc/name_table.c |6 ++-- net/tipc/node.c | 68 +--- net/tipc/port.c | 10 --- net/tipc/subscr.c | 18 ++--- net/tipc/zone.c | 19 - 13 files changed, 149 insertions(+), 126 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 44645f5..1633ef2 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -785,7 +785,7 @@ int tipc_bclink_init(void) bclink = kmalloc(sizeof(*bclink), GFP_ATOMIC); if (!bcbearer || !bclink) { nomem: - warn(Memory squeeze; Failed to create multicast link\n); + warn(Multicast link creation failed, no memory\n); kfree(bcbearer); bcbearer = NULL; kfree(bclink); diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h index 0e3be2a..b243d9d 100644 --- a/net/tipc/bcast.h +++ b/net/tipc/bcast.h @@ -180,7 +180,7 @@ static inline void tipc_port_list_add(st if (!item-next) { item-next = kmalloc(sizeof(*item), GFP_ATOMIC); if (!item-next) { - warn(Memory squeeze: multicast destination port list is incomplete\n); + warn(Incomplete multicast delivery, no memory\n); return; } item-next-next = NULL; diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index e213a8e..4fa24b5 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -112,39 +112,42 @@ int tipc_register_media(u32 media_type, goto exit; if (!media_name_valid(name)) { - warn(Media registration error: illegal name %s\n, name); + warn(Media %s rejected, illegal name\n, name); goto exit; } if (!bcast_addr) { - warn(Media registration error: no broadcast address supplied\n); + warn(Media %s rejected, no broadcast address\n, name); goto exit; } if ((bearer_priority TIPC_MIN_LINK_PRI) (bearer_priority TIPC_MAX_LINK_PRI)) { - warn(Media registration error: priority %u\n, bearer_priority); + warn(Media %s rejected, illegal priority (%u)\n, name, +bearer_priority); goto exit; } if ((link_tolerance TIPC_MIN_LINK_TOL) || (link_tolerance TIPC_MAX_LINK_TOL)) { - warn(Media registration error: tolerance %u\n, link_tolerance); + warn(Media %s rejected, illegal tolerance (%u)\n, name, +link_tolerance); goto exit; } media_id = media_count++; if (media_id = MAX_MEDIA) { - warn(Attempt to register more than %u media\n, MAX_MEDIA); + warn(Media %s rejected, media limit reached (%u)\n, name, +MAX_MEDIA); media_count--; goto exit; } for (i = 0; i media_id; i++) { if (media_list[i].type_id == media_type) { - warn(Attempt to register second media with type %u\n, + warn(Media %s rejected, duplicate type (%u)\n, name, media_type); media_count--; goto exit; } if (!strcmp(name, media_list[i].name)) { - warn(Attempt to re-register media name %s\n, name); + warn(Media %s rejected, duplicate name\n, name); media_count--; goto exit; } @@ -283,6 +286,9 @@ static struct bearer *bearer_find(const struct bearer *b_ptr; u32 i; + if (tipc_mode != TIPC_NET_MODE) + return NULL; + for (i = 0, b_ptr = tipc_bearers; i MAX_BEARERS; i++, b_ptr++) { if (b_ptr-active (!strcmp(b_ptr-publ.name, name))) return b_ptr; @@ -475,26 +481,33 @@ int tipc_enable_bearer(const char *name, u32 i; int res = -EINVAL; - if (tipc_mode != TIPC_NET_MODE) + if (tipc_mode != TIPC_NET_MODE) { + warn(Bearer %s rejected, not supported in standalone mode\n, +name); return -ENOPROTOOPT; - - if (!bearer_name_validate(name, b_name) || -
[PATCH 12/32] [TIPC] Added support for MODULE_VERSION capability.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/core.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/net/tipc/core.c b/net/tipc/core.c index 31c7dd5..5003acb 100644 --- a/net/tipc/core.c +++ b/net/tipc/core.c @@ -2,7 +2,7 @@ * net/tipc/core.c: TIPC module code * * Copyright (c) 2003-2006, Ericsson AB - * Copyright (c) 2005, Wind River Systems + * Copyright (c) 2005-2006, Wind River Systems * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -57,7 +57,7 @@ void tipc_socket_stop(void); int tipc_netlink_start(void); void tipc_netlink_stop(void); -#define MOD_NAME tipc_start: +#define TIPC_MOD_VER 1.6.1 #ifndef CONFIG_TIPC_ZONES #define CONFIG_TIPC_ZONES 3 @@ -224,6 +224,7 @@ module_exit(tipc_exit); MODULE_DESCRIPTION(TIPC: Transparent Inter Process Communication); MODULE_LICENSE(Dual BSD/GPL); +MODULE_VERSION(TIPC_MOD_VER); /* Native TIPC API for kernel-space applications (see tipc.h) */ -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 25/32] [TIPC] Added missing warning for out-of-memory condition
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/port.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/tipc/port.c b/net/tipc/port.c index 899e08e..99846a1 100644 --- a/net/tipc/port.c +++ b/net/tipc/port.c @@ -1061,6 +1061,7 @@ int tipc_createport(u32 user_ref, up_ptr = (struct user_port *)kmalloc(sizeof(*up_ptr), GFP_ATOMIC); if (up_ptr == NULL) { + warn(Port creation failed, no memory\n); return -ENOMEM; } ref = tipc_createport_raw(NULL, port_dispatcher, port_wakeup, importance); -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/32] [TIPC] recvmsg() now returns TIPC ancillary data using correct level (SOL_TIPC)
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index a1f2210..abecf2d 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -744,10 +744,10 @@ static int anc_data_recv(struct msghdr * if (unlikely(err)) { anc_data[0] = err; anc_data[1] = msg_data_sz(msg); - if ((res = put_cmsg(m, SOL_SOCKET, TIPC_ERRINFO, 8, anc_data))) + if ((res = put_cmsg(m, SOL_TIPC, TIPC_ERRINFO, 8, anc_data))) return res; if (anc_data[1] - (res = put_cmsg(m, SOL_SOCKET, TIPC_RETDATA, anc_data[1], + (res = put_cmsg(m, SOL_TIPC, TIPC_RETDATA, anc_data[1], msg_data(msg return res; } @@ -778,7 +778,7 @@ static int anc_data_recv(struct msghdr * has_name = 0; } if (has_name - (res = put_cmsg(m, SOL_SOCKET, TIPC_DESTNAME, 12, anc_data))) + (res = put_cmsg(m, SOL_TIPC, TIPC_DESTNAME, 12, anc_data))) return res; return 0; -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/32] [TIPC] Validate entire interface name when locating bearer to enable.
From: Allan Stephens [EMAIL PROTECTED] This fix prevents a bearer from being enabled using the wrong interface. For example, specifying eth:eth14 might enable eth:eth1 by mistake. Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/eth_media.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index 3ecb100..682da4a 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -2,7 +2,7 @@ * net/tipc/eth_media.c: Ethernet bearer support for TIPC * * Copyright (c) 2001-2006, Ericsson AB - * Copyright (c) 2005, Wind River Systems + * Copyright (c) 2005-2006, Wind River Systems * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -127,8 +127,7 @@ static int enable_bearer(struct tipc_bea /* Find device with specified name */ - while (dev dev-name - (memcmp(dev-name, driver_name, strlen(dev-name { + while (dev dev-name strncmp(dev-name, driver_name, IFNAMSIZ)) { dev = dev-next; } if (!dev) -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/32] [TIPC] Improved tolerance to promiscuous mode interface
From: Jon Maloy [EMAIL PROTECTED] Signed-off-by: Jon Maloy [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/eth_media.c | 20 +++- 1 files changed, 11 insertions(+), 9 deletions(-) diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index 7a25278..b646619 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -98,17 +98,19 @@ static int recv_msg(struct sk_buff *buf, u32 size; if (likely(eb_ptr-bearer)) { - size = msg_size((struct tipc_msg *)buf-data); - skb_trim(buf, size); - if (likely(buf-len == size)) { - buf-next = NULL; - tipc_recv_msg(buf, eb_ptr-bearer); - } else { - kfree_skb(buf); + if (likely(!dev-promiscuity) || + !memcmp(buf-mac.raw,dev-dev_addr,ETH_ALEN) || + !memcmp(buf-mac.raw,dev-broadcast,ETH_ALEN)) { + size = msg_size((struct tipc_msg *)buf-data); + skb_trim(buf, size); + if (likely(buf-len == size)) { + buf-next = NULL; + tipc_recv_msg(buf, eb_ptr-bearer); + return TIPC_OK; + } } - } else { - kfree_skb(buf); } + kfree_skb(buf); return TIPC_OK; } -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/32] [TIPC] Prevent name table corruption if no room for new publication
From: Allan Stephens [EMAIL PROTECTED] Now exits cleanly if attempt to allocate larger array of subsequences fails, without losing track of pointer to existing array. Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/name_table.c | 18 +- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c index d129422..0511436 100644 --- a/net/tipc/name_table.c +++ b/net/tipc/name_table.c @@ -284,18 +284,18 @@ static struct publication *tipc_nameseq_ /* Ensure there is space for new sub-sequence */ if (nseq-first_free == nseq-alloc) { - struct sub_seq *sseqs = nseq-sseqs; - nseq-sseqs = tipc_subseq_alloc(nseq-alloc * 2); - if (nseq-sseqs != NULL) { - memcpy(nseq-sseqs, sseqs, - nseq-alloc * sizeof (struct sub_seq)); - kfree(sseqs); - dbg(Allocated %u sseqs\n, nseq-alloc); - nseq-alloc *= 2; - } else { + struct sub_seq *sseqs = tipc_subseq_alloc(nseq-alloc * 2); + + if (!sseqs) { warn(Memory squeeze; failed to create sub-sequence\n); return NULL; } + dbg(Allocated %u more sseqs\n, nseq-alloc); + memcpy(sseqs, nseq-sseqs, + nseq-alloc * sizeof(struct sub_seq)); + kfree(nseq-sseqs); + nseq-sseqs = sseqs; + nseq-alloc *= 2; } dbg(Have %u sseqs for type %u\n, nseq-alloc, type); -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/32] [TIPC] Fixed privilege checking typo in dest_name_check().
From: Allan Stephens [EMAIL PROTECTED] This patch originated by Stephane Ouellette [EMAIL PROTECTED]. Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 648a734..eaf4d69 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -426,7 +426,7 @@ static int dest_name_check(struct sockad if (copy_from_user(hdr, m-msg_iov[0].iov_base, sizeof(hdr))) return -EFAULT; - if ((ntohs(hdr.tcm_type) 0xC000) (!capable(CAP_NET_ADMIN))) + if ((ntohs(hdr.tcm_type) 0xC000) (!capable(CAP_NET_ADMIN))) return -EACCES; return 0; -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/32] [TIPC] Allow ports to receive multicast messages through native API.
From: Allan Stephens [EMAIL PROTECTED] This fix prevents a kernel panic if an application mistakenly sends a multicast message to TIPC's topology service or configuration service. Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/port.c | 26 -- 1 files changed, 16 insertions(+), 10 deletions(-) diff --git a/net/tipc/port.c b/net/tipc/port.c index 67e96cb..360920b 100644 --- a/net/tipc/port.c +++ b/net/tipc/port.c @@ -810,18 +810,20 @@ static void port_dispatcher_sigh(void *d void *usr_handle; int connected; int published; + u32 message_type; struct sk_buff *next = buf-next; struct tipc_msg *msg = buf_msg(buf); u32 dref = msg_destport(msg); + message_type = msg_type(msg); + if (message_type TIPC_DIRECT_MSG) + goto reject;/* Unsupported message type */ + p_ptr = tipc_port_lock(dref); - if (!p_ptr) { - /* Port deleted while msg in queue */ - tipc_reject_msg(buf, TIPC_ERR_NO_PORT); - buf = next; - continue; - } + if (!p_ptr) + goto reject;/* Port deleted while msg in queue */ + orig.ref = msg_origport(msg); orig.node = msg_orignode(msg); up_ptr = p_ptr-user_port; @@ -832,7 +834,7 @@ static void port_dispatcher_sigh(void *d if (unlikely(msg_errcode(msg))) goto err; - switch (msg_type(msg)) { + switch (message_type) { case TIPC_CONN_MSG:{ tipc_conn_msg_event cb = up_ptr-conn_msg_cb; @@ -874,6 +876,7 @@ static void port_dispatcher_sigh(void *d orig); break; } + case TIPC_MCAST_MSG: case TIPC_NAMED_MSG:{ tipc_named_msg_event cb = up_ptr-named_msg_cb; @@ -886,7 +889,8 @@ static void port_dispatcher_sigh(void *d goto reject; dseq.type = msg_nametype(msg); dseq.lower = msg_nameinst(msg); - dseq.upper = dseq.lower; + dseq.upper = (message_type == TIPC_NAMED_MSG) + ? dseq.lower : msg_nameupper(msg); skb_pull(buf, msg_hdr_sz(msg)); cb(usr_handle, dref, buf, msg_data(msg), msg_data_sz(msg), msg_importance(msg), @@ -899,7 +903,7 @@ static void port_dispatcher_sigh(void *d buf = next; continue; err: - switch (msg_type(msg)) { + switch (message_type) { case TIPC_CONN_MSG:{ tipc_conn_shutdown_event cb = @@ -931,6 +935,7 @@ err: msg_data_sz(msg), msg_errcode(msg), orig); break; } + case TIPC_MCAST_MSG: case TIPC_NAMED_MSG:{ tipc_named_msg_err_event cb = up_ptr-named_err_cb; @@ -940,7 +945,8 @@ err: break; dseq.type = msg_nametype(msg); dseq.lower = msg_nameinst(msg); - dseq.upper = dseq.lower; + dseq.upper = (message_type == TIPC_NAMED_MSG) + ? dseq.lower : msg_nameupper(msg); skb_pull(buf, msg_hdr_sz(msg)); cb(usr_handle, dref, buf, msg_data(msg), msg_data_sz(msg), msg_errcode(msg), dseq); -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/32] [TIPC] Fixed connect() to detect a dest address that is missing or too short.
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/socket.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 0923213..758b2d2 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1244,7 +1244,7 @@ static int connect(struct socket *sock, if (sock-state != SS_UNCONNECTED) return -EISCONN; - if ((dst-family != AF_TIPC) || + if ((destlen sizeof(*dst)) || (dst-family != AF_TIPC) || ((dst-addrtype != TIPC_ADDR_NAME) (dst-addrtype != TIPC_ADDR_ID))) return -EINVAL; -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 27/32] [TIPC] Disallow config operations that aren't supported in certain modes.
From: Allan Stephens [EMAIL PROTECTED] This change provides user-friendly feedback when TIPC is unable to perform certain configuration operations that don't work properly in certain modes. (In particular, any reconfiguration request that would temporarily take TIPC from network mode to standalone mode, or from standalone mode to not running mode, is disallowed.) Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/config.c | 83 - 1 files changed, 38 insertions(+), 45 deletions(-) diff --git a/net/tipc/config.c b/net/tipc/config.c index 48b5de2..41c8447 100644 --- a/net/tipc/config.c +++ b/net/tipc/config.c @@ -291,13 +291,22 @@ static struct sk_buff *cfg_set_own_addr( if (!tipc_addr_node_valid(addr)) return tipc_cfg_reply_error_string(TIPC_CFG_INVALID_VALUE (node address)); - if (tipc_own_addr) + if (tipc_mode == TIPC_NET_MODE) return tipc_cfg_reply_error_string(TIPC_CFG_NOT_SUPPORTED (cannot change node address once assigned)); + tipc_own_addr = addr; + + /* +* Must release all spinlocks before calling start_net() because +* Linux version of TIPC calls eth_media_start() which calls +* register_netdevice_notifier() which may block! +* +* Temporarily releasing the lock should be harmless for non-Linux TIPC, +* but Linux version of eth_media_start() should really be reworked +* so that it can be called with spinlocks held. +*/ spin_unlock_bh(config_lock); - tipc_core_stop_net(); - tipc_own_addr = addr; tipc_core_start_net(); spin_lock_bh(config_lock); return tipc_cfg_reply_none(); @@ -350,50 +359,21 @@ static struct sk_buff *cfg_set_max_subsc static struct sk_buff *cfg_set_max_ports(void) { - int orig_mode; u32 value; if (!TLV_CHECK(req_tlv_area, req_tlv_space, TIPC_TLV_UNSIGNED)) return tipc_cfg_reply_error_string(TIPC_CFG_TLV_ERROR); value = *(u32 *)TLV_DATA(req_tlv_area); value = ntohl(value); + if (value == tipc_max_ports) + return tipc_cfg_reply_none(); if (value != delimit(value, 127, 65535)) return tipc_cfg_reply_error_string(TIPC_CFG_INVALID_VALUE (max ports must be 127-65535)); - - if (value == tipc_max_ports) - return tipc_cfg_reply_none(); - - if (atomic_read(tipc_user_count) 2) + if (tipc_mode != TIPC_NOT_RUNNING) return tipc_cfg_reply_error_string(TIPC_CFG_NOT_SUPPORTED - (cannot change max ports while TIPC users exist)); - - spin_unlock_bh(config_lock); - orig_mode = tipc_get_mode(); - if (orig_mode == TIPC_NET_MODE) - tipc_core_stop_net(); - tipc_core_stop(); +(cannot change max ports while TIPC is active)); tipc_max_ports = value; - tipc_core_start(); - if (orig_mode == TIPC_NET_MODE) - tipc_core_start_net(); - spin_lock_bh(config_lock); - return tipc_cfg_reply_none(); -} - -static struct sk_buff *set_net_max(int value, int *parameter) -{ - int orig_mode; - - if (value != *parameter) { - orig_mode = tipc_get_mode(); - if (orig_mode == TIPC_NET_MODE) - tipc_core_stop_net(); - *parameter = value; - if (orig_mode == TIPC_NET_MODE) - tipc_core_start_net(); - } - return tipc_cfg_reply_none(); } @@ -405,10 +385,16 @@ static struct sk_buff *cfg_set_max_zones return tipc_cfg_reply_error_string(TIPC_CFG_TLV_ERROR); value = *(u32 *)TLV_DATA(req_tlv_area); value = ntohl(value); + if (value == tipc_max_zones) + return tipc_cfg_reply_none(); if (value != delimit(value, 1, 255)) return tipc_cfg_reply_error_string(TIPC_CFG_INVALID_VALUE (max zones must be 1-255)); - return set_net_max(value, tipc_max_zones); + if (tipc_mode == TIPC_NET_MODE) + return tipc_cfg_reply_error_string(TIPC_CFG_NOT_SUPPORTED +(cannot change max zones once TIPC has joined a network)); + tipc_max_zones = value; + return tipc_cfg_reply_none(); } static struct sk_buff *cfg_set_max_clusters(void) @@ -419,8 +405,8 @@ static struct sk_buff *cfg_set_max_clust return tipc_cfg_reply_error_string(TIPC_CFG_TLV_ERROR); value = *(u32 *)TLV_DATA(req_tlv_area); value = ntohl(value); - if (value != 1) - return
[PATCH 9/32] [TIPC] Fix for NULL pointer dereference
From: Eric Sesterhenn [EMAIL PROTECTED] This fixes a bug spotted by the coverity checker, bug id #366. If (mod(seqno - prev) != 1) we set buf to NULL, dereference it in the for case, and set it to whatever value happes to be at adress 0+next, if it happens to be non-zero, we even stay in the loop. It seems that the author intended to break there. Signed-off-by: Eric Sesterhenn [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/bcast.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 00691b7..44645f5 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -349,8 +349,10 @@ static void tipc_bclink_peek_nack(u32 de for (; buf; buf = buf-next) { u32 seqno = buf_seqno(buf); - if (mod(seqno - prev) != 1) + if (mod(seqno - prev) != 1) { buf = NULL; + break; + } if (seqno == gap_after) break; prev = seqno; -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 26/32] [TIPC] Fixed memory leak in tipc_link_send() when destination is unreachable
From: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Allan Stephens [EMAIL PROTECTED] Signed-off-by: Per Liden [EMAIL PROTECTED] --- net/tipc/link.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/net/tipc/link.c b/net/tipc/link.c index ba7d3f1..ff40c91 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1135,9 +1135,13 @@ int tipc_link_send(struct sk_buff *buf, if (n_ptr) { tipc_node_lock(n_ptr); l_ptr = n_ptr-active_links[selector 1]; - dbg(tipc_link_send: found link %x for dest %x\n, l_ptr, dest); if (l_ptr) { + dbg(tipc_link_send: found link %x for dest %x\n, l_ptr, dest); res = tipc_link_send_buf(l_ptr, buf); + } else { + dbg(Attempt to send msg to unreachable node:\n); + msg_dbg(buf_msg(buf),); + buf_discard(buf); } tipc_node_unlock(n_ptr); } else { -- 1.4.0 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [0/5] GSO: Generic Segmentation Offload
Hello. Yes, I genrally like this idea. In article [EMAIL PROTECTED] (at Thu, 22 Jun 2006 18:12:11 +1000), Herbert Xu [EMAIL PROTECTED] says: GSO like TSO is only effective if the MTU is significantly less than the maximum value of 64K. So only the case where the MTU was set to 1500 is of interest. There we can see that the throughput improved by 17.5% (3061.05Mb/s = 3598.17Mb/s). The actual saving in transmission cost is in fact a lot more than that as the majority of the time here is spent on the RX side which still has to deal with 1500-byte packets. Can you measure some with other sizes, e.g. 4kByte, 8kByte, 9000Byte? --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
IPSec + large packets being corrupted
I've been using the 2.6 kernel ipsec system for some time and have always had to work around issues with large packets not traversing the VPN by setting the LAN interface MTU size to something like 1400. Because I always thought this was a hack and not a proper fix, I've spent a few days trying to work out exactly why large packets aren't traversing the VPN and have found something which may well be the cause. I really don't know the kernel networking code that well so I was hoping that someone can either verify that what I've found is really an issue, or whether I'm doing something wrong. This has been seen in the field with P4/e100+e1000 systems running 2.6.12 and in testing on Geode/dp8381x systems running 2.6.17, all using IPv4. VPN is Racoon based, using x509 certs and ESP/AH (3DES/SHA1). This is my understanding of how large packets get corrupted: Large packet (eg. 1600 byte ping) received by VPN server A. Packet encrypted and fragmented then sent from Server A to Server B. Packet received by network subsytem on B and frag_list created ah_input() strips the AH header -- frag sizes are not changed! esp_input() decrypts data ip_fragment() uses existing frag_list sizes from before the AH header being stripped, and sends too much data (16 bytes extra). This breaks the checksum and packets get dropped by destination host. By setting the MTU on the local interface, this breaks one of the checks for using the pre-existing frag list in ip_fragment() (MTU is now smaller than the largest frag size), so the packet fragments are re-generated from scratch and the large packet gets through. If I disable the valid frag_list check in ip_fragment(), again large packets traverse the VPN with no problems at all since the fragments are re-generated from scratch. If my analysis of the above is correct, then my feeling is that either ah_input() should re-calculate the fragment sizes, or some flag should be set to tell ip_fragment() to use the slow method and recreate the fragments. Does this sound like a real problem, or have I missed something obvious? Regards, -- Chris Audley mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/5] [NET]: Add software TSOv4
In article [EMAIL PROTECTED] (at Thu, 22 Jun 2006 18:14:00 +1000), Herbert Xu [EMAIL PROTECTED] says: [NET]: Add software TSOv4 This patch adds the GSO implementation for IPv4 TCP. Signed-off-by: Herbert Xu [EMAIL PROTECTED] I'd appreciate if you code up IPv6 TCP as well. :-) Regards, --yoshfuji - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism
On Thu, 2006-06-22 at 08:53 -0500, Steve Wise wrote: On Thu, 2006-06-22 at 01:57 -0700, David Miller wrote: From: Steve Wise [EMAIL PROTECTED] Date: Wed, 21 Jun 2006 13:45:19 -0500 This patch implements a mechanism that allows interested clients to register for notification of certain network events. We have a generic network event notification facility called netlink, please use it and extend it for your needs if necessary. I'll investigate this. Thanks, The in-kernel Infiniband subsystem needs to know when certain events happen. For example, if the mac address of a neighbour changes. Any rdma devices that are using said neighbour need to be notified of the change. You are asking that I extend the netlink facility (if necessary) to provide this functionality. Are you suggesting, then, that the Infiniband subsystem should create an in-kernel NETLINK socket and obtain these events (and the pertinent information) via the socket? I'm still learning about netlink, but my understanding to date is that its a way to pass events/commands between the kernel and user applications. It perhaps seems overkill to use this mechanism for kernel-kernel event notifications. That's why I started with notifier blocks and added a netevent_notifier mechanism. Any help is greatly appreciated. Sorry if I'm being dense... Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Binding a packet socket to interface down
Hi, When packet socket (PF_SOCKET) is attempted in syscall bind () to the network interface, which is down (no IF_UPP flag), packet_do_bind () sets error to the socket, but bind () does not fail. When datagram, stream or raw socket fail to bind to some local ip-port/ip, bind () fails. Is this behavior of bind () for packet socket done deliberately, or better to correct it so that bind will fail and return errno, e.g. ENODEV? Thanks. -- Sincerely, -- Robert Iakobashvili, coroberti at gmail dot com Navigare necesse est, vivere non est necesse. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/21] e1000: disable CRC stripping workaround
Ben Greear wrote: Kok, Auke wrote: CRC stripping is breaking SMBUS-connected BMC's. We disable this feature to make it work. This fixes related bugs regarding SOL. Shouldn't you also have to subtract 4 bytes when setting the skb len in the receive logic? Perhaps when setting the rx-bytes counter as well? the hardware corrects for the size properly when we disable CRC stripping. The end result is the same. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/21] e1000: disable CRC stripping workaround
On 6/21/06, Ben Greear [EMAIL PROTECTED] wrote: Kok, Auke wrote: CRC stripping is breaking SMBUS-connected BMC's. We disable this feature to make it work. This fixes related bugs regarding SOL. Shouldn't you also have to subtract 4 bytes when setting the skb len in the receive logic? Perhaps when setting the rx-bytes counter as well? we thought about this, but most drivers don't strip the CRC, and we couldn't find any tests including bridging that cared if the CRC was there in the indicated packet. If you can find me a failing case I'll fix it. It was much simpler to leave it out, especially when we add back in the multiple descriptor receive code in the future (think about the case when subtracting the CRC makes the last descriptor disappear) Once again, let me know if you have info I don't :-) Thanks for the review, Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] Hardware button support for Wireless cards: radiobtn
On Sat, 17 Jun 2006 17:05:55 +0200, Ivo van Doorn wrote: With this approach more buttons can be registered, it includes the optional field to report an update of the key status to the driver that registered it, and it supports for non-polling keys. I think this is not specific to networking anymore, so it should go to lkml. Please be sure to Cc: input devices maintainer, Dmitry Torokhov. Regarding rfkill button, I talked about that with Vojtech Pavlik (Cc:ed) and he suggests this solution: - driver is responsible for turning on/off radio when the input device is not opened; - when something opens the input device, it receives input events and gets responsible to turn on/off the radio (by ioctl or putting the network interfaces up/down). This is of course not possible for all hardware, but it gives the most flexibility while keeping the possibility to switch of the radio without userspace support. Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Binding a packet socket to interface down
On Thu, Jun 22, 2006 at 06:32:29PM +0300, Robert Iakobashvili ([EMAIL PROTECTED]) wrote: Hi, When packet socket (PF_SOCKET) is attempted in syscall bind () to the network interface, which is down (no IF_UPP flag), packet_do_bind () sets error to the socket, but bind () does not fail. When datagram, stream or raw socket fail to bind to some local ip-port/ip, bind () fails. Is this behavior of bind () for packet socket done deliberately, or better to correct it so that bind will fail and return errno, e.g. ENODEV? Thanks. man page says that packet socket does not handle any errors. And actually packet socket binds to the device, but you can not read data. When device will be turned on, packet socket should start to function (packet_notifier() - NETDEV_UP).. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/21] e1000: disable CRC stripping workaround
Jesse Brandeburg wrote: On 6/21/06, Ben Greear [EMAIL PROTECTED] wrote: Kok, Auke wrote: CRC stripping is breaking SMBUS-connected BMC's. We disable this feature to make it work. This fixes related bugs regarding SOL. Shouldn't you also have to subtract 4 bytes when setting the skb len in the receive logic? Perhaps when setting the rx-bytes counter as well? we thought about this, but most drivers don't strip the CRC, and we couldn't find any tests including bridging that cared if the CRC was there in the indicated packet. If you can find me a failing case I'll fix it. It was much simpler to leave it out, especially when we add back in the multiple descriptor receive code in the future (think about the case when subtracting the CRC makes the last descriptor disappear) Once again, let me know if you have info I don't :-) It should only be a problem if skb-len includes the extra 4 bytes for the crc. Then, if I transmit that skb to another interface, I am afraid that the crc will be seen as data in the packet. In the 2.6.13 days, the e1000 did not strip the CRC, but it subtracted 4 before it did the skb_put. So, the crc was correctly stripped/ignored. The e100 functioned similarly I believe. If you skb_put the extra 4 bytes, I believe this will break my (proprietary) app because on transmit it will append the extra 4 crc bytes, but that isn't your problem..and I can work around it. If the receiving NIC can handle pkts 4 bytes bigger than normal, it will probably still receive the packet w/out problem, but in truth, the frame will not be exactly correct. When you did your bridging tests, did you sniff the packets on the far side of the bridge to see if they were the right size? Thanks, Ben Thanks for the review, Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/21] e1000: disable CRC stripping workaround
On Thu, Jun 22, 2006 at 08:39:10AM -0700, Jesse Brandeburg wrote: CRC stripping is breaking SMBUS-connected BMC's. We disable this feature to make it work. This fixes related bugs regarding SOL. Shouldn't you also have to subtract 4 bytes when setting the skb len in the receive logic? Perhaps when setting the rx-bytes counter as well? we thought about this, but most drivers don't strip the CRC, Really? and we couldn't find any tests including bridging that cared if the CRC was there in the indicated packet. Bridging definitely cares -- some years ago there was a case where 8139too NICs would pass packets up the stack with 4 bytes of FCS, and that causes frames received on 8139too interfaces not to be forwarded to other interfaces because on TX, the frame would be too long. Maybe e1000 is okay with sending oversized frames, but other NIC drivers might not be. (Did you test without bridge-netfilter enabled? bridge-nf might trim incoming IP packets even in the bridging case.) cheers, Lennert - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/21] e1000: disable CRC stripping workaround
On 6/22/06, Ben Greear [EMAIL PROTECTED] wrote: Jesse Brandeburg wrote: On 6/21/06, Ben Greear [EMAIL PROTECTED] wrote: Kok, Auke wrote: CRC stripping is breaking SMBUS-connected BMC's. We disable this feature to make it work. This fixes related bugs regarding SOL. Shouldn't you also have to subtract 4 bytes when setting the skb len in the receive logic? Perhaps when setting the rx-bytes counter as well? we thought about this, but most drivers don't strip the CRC, and we couldn't find any tests including bridging that cared if the CRC was there in the indicated packet. If you can find me a failing case I'll fix it. It was much simpler to leave it out, especially when we add back in the multiple descriptor receive code in the future (think about the case when subtracting the CRC makes the last descriptor disappear) Once again, let me know if you have info I don't :-) It should only be a problem if skb-len includes the extra 4 bytes for the crc. Then, if I transmit that skb to another interface, I am afraid that the crc will be seen as data in the packet. In the 2.6.13 days, the e1000 did not strip the CRC, but it subtracted 4 before it did the skb_put. So, the crc was correctly stripped/ignored. The e100 functioned similarly I believe. currently the e100 driver in 2.6.X strips the CRC in hardware. If you skb_put the extra 4 bytes, I believe this will break my (proprietary) app because on transmit it will append the extra 4 crc bytes, but that isn't your problem..and I can work around it. If the receiving NIC can handle pkts 4 bytes bigger than normal, it will probably still receive the packet w/out problem, but in truth, the frame will not be exactly correct. When you did your bridging tests, did you sniff the packets on the far side of the bridge to see if they were the right size? hm, probably not, we touch tested bridging (probably with TCP), and have completed several internal testing passes, to make sure it worked but I don't think we went so far as to sniff the traffic at the other end of the bridge. I'll look into it. Jesse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFT] bcm43xx: AccessPoint mode
On Mon, 19 Jun 2006 11:07:34 +0200, Michael Buesch wrote: Well, it does not work 100%, but at least it's very promising. We are able to create a bssid and correctly send beacon frames out. Great work! I was even able to ping. (Tried only open system authentication for now, it seems it works quite well.) Please give it a testrun. Final note about hostapd: hostapd snapshot 0.5-2006-06-10 seems to work in the sense that it is able to bring up the device. hostapd snapshot 0.5-2006-06-11 seems to fail. 0.5-2006-06-19 works with the patch. Important notes from Alexander Tsvyashchenko's initial mail follow: [...] Although my previous patch to hostapd to make it interoperable with bcm43xx dscape has been merged already in their CVS version, due to the subsequent changes in dscape stack current hostapd is again incompartible :-( So, to test this patch, the patch to hostapd should be applied. Or, if you don't want to patch hostapd (untested, but should work): iwpriv wlan0 param 1046 1 ip link set wmgmt0 name wmaster0ap hostapd /path/to/hostapd.conf iwpriv wlan0 param 1046 0 I used hostapd snapshot 0.5-2006-06-10, patch for it is attached. The patch is very hacky and requires tricky way to bring everything up, but as dscape stack is changed quite constantly, I just do not want to waste time fixing it in proper way only to find a week later that dscape handling of master interface was changed completely once more and everything is broken again ;-) Hopefully we will convert the whole hostapd-stack communication to netlink in some near future ;-) 2) Insert modules (80211, rate_control and bcm43xx-d80211) modprobe bcm43xx-d80211 is enough, other modules will load automatically. 4) ifconfig wlan0 up (this should be done by hostapd actually, but its operation with current dscape stack seems to be broken) hostapd tries to open (put to 'up' state) wmgmt0 earlier than wlan0, which is not possible. It should open wlan0 first; even more, opening of wmgmt0 is not necessary as it will be opened automatically when wlan0 is opened. 6) iwconfig wlan0 essid your-SSID-name (this also should not be required, but current combination of hostapd + dscape doesn't seem to generate config_interface callback when setting beacon, so this is required just to force call of config_interface). The stack currently has very limited support for cards with beacon templates. ieee80211_beacon_get function is not designed in a way it is used in bcm43xx. Although this seems to be easy to fix now, we will run into other problems later (with TIM elements mainly). I need to look how PS mode works in bcm chipsets to find a correct solution for this. Do you have any ideas? Thanks, Jiri -- Jiri Benc SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/7] NetLabel: core network changes
On Thursday 22 June 2006 05:00, David Miller wrote: #define NETLINK_GENERIC 16 +#define NETLINK_NETLABEL 17 /* Network packet labeling */ #define MAX_LINKS 32 Please use generic netlink. Since this is a security interface, shouldn't it be its own protocol so that SE Linux can control commands being sent? Paul's patches do include a netlink table in security/selinux/nlmsgtab.c. But I do not see any hooks to control generic netlink messages. (There seems to be several protocols that SE Linux is not controlling.) I could see that someone in secadm role should be able to issue these commands, but someone at sysadm or auditadm would not. If moving this over to generic is a must, then I think SE Linux will have to clip into generic to control its packet flow. -Steve - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/7] NetLabel: CIPSOv4 engine
Paul Moore wrote: On Thursday 22 June 2006 5:12 am, David Miller wrote: From: [EMAIL PROTECTED] Date: Wed, 21 Jun 2006 15:42:38 -0400 The thing that concerns me most about CIPSO is that even once users migrate to a more SELINUX native approach from this CIPSO stuff, the CIPSO code, it's bloat, and it's maintainence burdon will remain. It's easy to put stuff it, it's impossible to take stuff out even once it's largely unused by even it's original target audience. And that's what I see happening here. This is why, to be perfectly honest with you, I'd much rather something like this stay out-of-tree and people are strongly encouraged to use the more native stuff under Linux. Well, not exactly the response I was hoping for, but let me plead my case one more time :) Traditional MLS CIPSO is a niche protocol, I won't try to argue that point, and I also won't try to argue that the NetLabel patch is late to the party, the IPsec/XFRM labeling approach has already been accepted as the SELinux packet labeling mechanism. However, the XFRM labeling mechanism in not currently supported by any OS other than Linux/SELinux. I have spoken with users that need CIPSO to interoperate with their other trusted systems, the XFRM approach is simply not a viable solution for them. I strongly believe that failure to support an interoperable packet labeling mechanism on Linux will seriously restrict Linux's deployment in trusted networks. The PitBull product uses the CIPSO/RIPSO labeling protocol in order to do interop packet labeling with other trusted systems and for passing labels between our own systems. Because it is the standard, it is the protocol that government agencies use to do packet labeling across networks. Not having CIPSO in the mainline would mean that government agencies would either a) only use SELinux from a distro that supports the CIPSO patch (by maintaining it in their kernel themselves), if such a distro exists, b) have to patch the kernels themselves (unlikely), or c) not use SELinux at all. Also, the port of PitBull to Linux that I'm working on is currently using the netlabel patch to handle the CIPSO/RIPSO labeling. Since the actual protocol for reading and writing out the IPSec option is independent from the security enforcment module it makes a lot of sense to have a generic handler in the kernel that LSM modules can use. So, in short, it makes my life a lot easier to have all that work already done :) -- Ryan Pratt Chief Solaris Engineer Innovative Security Systems, Inc. (dba Argus Systems Group) 1809 Woodfield Dr. Savoy IL 61874 (217) 355-6308 www.argus-systems.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] sysfs + configfs on 802.11 wireless drivers
On Thursday 22 June 2006 05:12, Luis R. Rodriguez wrote: (3) On resume() talk to userspace via netlink to read our sysfs and configfs us I think that's fairly overkill. I really do not like the idea of requiring any more userspace involvement in the suspend process than needed. At least for ADM8211, all the data needed to restore the card properly is already stored somewhere for userspace (iwconfig) to query, so I don't understand how this is suppose to reduce bloat. All the data that is needed should already be stored in the driver to support configuring the interface before taking the interface up. -Michael Wu pgpH1PqEloadB.pgp Description: PGP signature
[1/1] Kevent subsystem.
Hello. Kevent subsystem incorporates several AIO/kqueue design notes and ideas. Kevent can be used both for edge and level notifications. It supports socket notifications, network AIO (aio_send(), aio_recv() and aio_sendfile()), inode notifications (create/remove), generic poll()/select() notifications and timer notifications. It was tested against FreeBSD kqueue and Linux epoll and showed noticeble performance win. Network asynchronous IO operations were tested against Linux synchronous socket code and showed noticeble performance win. Patch against linux-2.6.17-git tree attached (gzipped). I would like to hear some comments about the overall design, implementation and plans about it's usefullness for generic kernel. Design notes, patches, userspace application and perfomance tests can be found at project's homepages. 1. Kevent subsystem. http://tservice.net.ru/~s0mbre/old/?section=projectsitem=kevent 2. Network AIO. http://tservice.net.ru/~s0mbre/old/?section=projectsitem=naio 3. LWN article about kevent. http://lwn.net/Articles/172844/ Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] Thank you. -- Evgeniy Polyakov kevent-2.6.17-git.diff.gz Description: application/gunzip
Re: [PATCH] Export accept queue len of a TCP listening socket via rx_queue
On Thu, 2006-06-22 at 10:50 +1000, Herbert Xu wrote: Sridhar Samudrala [EMAIL PROTECTED] wrote: What about using the same fields (rqueue/wqueue) as you did for /proc? I meant extending tcp_info structure to add new fields. I think the user space also uses this structure. What about putting it into inet_idiag_msg.idiag_[rw]queue instead? OK. I was under the mistaken assumption that [rw]queue fields are exported via tcp_info. This makes it pretty simple to support netlink users also. Here is the updated patch. Thanks Sridhar While debugging a TCP server hang issue, we noticed that currently there is no way for a user to get the acceptq backlog value for a TCP listen socket. All the standard networking utilities that display socket info like netstat, ss and /proc/net/tcp have 2 fields called rx_queue and tx_queue. These fields do not mean much for listening sockets. This patch uses one of these unused fields(rx_queue) to export the accept queue len for listening sockets. Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c index c148c10..b56399c 100644 --- a/net/ipv4/tcp_diag.c +++ b/net/ipv4/tcp_diag.c @@ -26,7 +26,10 @@ static void tcp_diag_get_info(struct soc const struct tcp_sock *tp = tcp_sk(sk); struct tcp_info *info = _info; - r-idiag_rqueue = tp-rcv_nxt - tp-copied_seq; + if (sk-sk_state == TCP_LISTEN) + r-idiag_rqueue = sk-sk_ack_backlog; + else + r-idiag_rqueue = tp-rcv_nxt - tp-copied_seq; r-idiag_wqueue = tp-write_seq - tp-snd_una; if (info != NULL) tcp_get_info(sk, info); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 25ecc6e..4c6ef47 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1726,7 +1726,8 @@ static void get_tcp4_sock(struct sock *s sprintf(tmpbuf, %4d: %08X:%04X %08X:%04X %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %u %u %u %u %d, i, src, srcp, dest, destp, sp-sk_state, - tp-write_seq - tp-snd_una, tp-rcv_nxt - tp-copied_seq, + tp-write_seq - tp-snd_una, + (sp-sk_state == TCP_LISTEN) ? sp-sk_ack_backlog : (tp-rcv_nxt - tp-copied_seq), timer_active, jiffies_to_clock_t(timer_expires - jiffies), icsk-icsk_retransmits, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index a50eb30..b36d5b2 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1469,7 +1469,8 @@ static void get_tcp6_sock(struct seq_fil dest-s6_addr32[0], dest-s6_addr32[1], dest-s6_addr32[2], dest-s6_addr32[3], destp, sp-sk_state, - tp-write_seq-tp-snd_una, tp-rcv_nxt-tp-copied_seq, + tp-write_seq-tp-snd_una, + (sp-sk_state == TCP_LISTEN) ? sp-sk_ack_backlog : (tp-rcv_nxt - tp-copied_seq), timer_active, jiffies_to_clock_t(timer_expires - jiffies), icsk-icsk_retransmits, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] NET: Accurate packet scheduling for ATM/ADSL
On Tue, 2006-20-06 at 17:09 +0200, Patrick McHardy wrote: jamal wrote: Depend on bandwidth is not the right term. All of TBF, HTB and HFSC provide bandwidth per time, but with TBF and HTB the relation between the amount of bandwidth is linear to the amount of time, with HFSC it is only on a linear on larger scale since it uses service curves, which are represented as two linear pieces. So you have bandwidth b1 for time t1, bandwidth b2 after that until eternity. By scaling the clock rate you alter after how much time b2 kicks in, which affects the guaranteed delays. The end result should be that both bandwidth and delay scale up or down proportionally, but I'm not sure that this is what HFSC would do in all cases (on small scale). But it should be easy to answer with a bit more time for visualizing it. Ok, this makes things a little trickier though, no? The thing I'm not sure about is whether this wouldn't be handled better by userspace, If you do it in user space you will need a daemon of some form; this is my preference but it seems a lot of people hate daemons - the standard claim is it is counter-usability. Such people are forgiving if you built the daemon into the kernel as a thread. Perhaps the netcarrier that Stefan Rompf has added could be extended to handle this) Note, if you wanna do it right as well you will factor in other things like some wireless technologies which changes their throughput capability over a period of time ( A lot of these guys try to have their own hardware level schedulers to compensate for this). if the link layer speed changes you might not want proportional scaling but prefer to still give a fixed amount of that bandwidth to some class, for example VoIP traffic. Do we have netlink notifications for link speed changes? Not there at the moment - but we do emit event for other link layer stuff like carrier on/off - so adding this should be trivial and a reasonable thing to have; with a caveat: it will be link-layer specific; so whoever ends up adding will have to be careful to make sure it is not hard-coded to be specific to ethernet-like netdevices. It could probably be reported together with link state as a TLV like ETHER_SPEED_CHANGED which carries probably old speed and new speed and maybe even reason why it changed. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/32] [TIPC] Multicast link failure now resets all links to nacking node.
On Thu, 22 Jun 2006, Per Liden wrote: +static void link_reset_all(unsigned long addr) +{ + struct node *n_ptr; + char addr_string[16]; + u32 i; + + read_lock_bh(tipc_net_lock); + n_ptr = tipc_node_find((u32)addr); + if (!n_ptr) { + read_unlock_bh(tipc_net_lock); + return; /* node no longer exists */ + } + + tipc_node_lock(n_ptr); You already have bh's disabled here, and tipc_node_lock() also disables them. Not sure if it's really worth worrying about but if so, you could perhaps implement tipc_node_lock_bh() and tipc_node_lock(). - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/1] Kevent subsystem.
On Thu, 22 Jun 2006, Evgeniy Polyakov wrote: Patch against linux-2.6.17-git tree attached (gzipped). I would like to hear some comments about the overall design, implementation and plans about it's usefullness for generic kernel. Please send patches as in-line ascii text, along with documentation. If they're too big, split them up logically into smaller pieces. - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] NET: Accurate packet scheduling for ATM/ADSL (userspace)
On Tue, 2006-20-06 at 18:51 +0200, Patrick McHardy wrote: jamal wrote: The issue is really is whether Linux should be interested in the throughput it is told about or the goodput (also known as effective throughput) the service provider offers. Two different issues by definition. In the case of PPPoE non-work-conserving qdiscs are already used to manage a link that is non-local with knowledge of the its bandwidth, I think that is a different issue though- you are managing a point-to-point link then you will be working under the assumption of throughput not goodput. If you had knowledge of the goodput you should use that for a working assumption; i think in practise that approach is valuable. My arguement is against trying to make complex changes to compensate the scheduler for such changes. Therefore i am not feeling sorry for the poor guy who has to go and tell their PPP device bandwith is only 1Mbps when their ISP is claiming it is 2Mbps i.e The ADSL case i have seen thus far is you trying manage something because a BRAS 3-4 hops down the path uses ATM. To use my earlier example the arguement is no different than saying 3-4 hops downlink there is a wireless link which is 20% lossy. Armed with knowledge like that you can tell something to the scheduler to resolve thing. The daemon in user space for example could be sending bandwidth measuring probes and telling the kernel of the new goodput. contrary to a local link that would be best managed in work-conserving mode. And I think for better accuracy it is necessary to manage effective throughput, especially if you're interested in guaranteed delays. Indeed - but fixing the scheduler to achieve such management is not the first choice (would be fine if it is generic and non-intrusive) Yes, Linux cant tell if your service provider is lying to you. I wouldn't call it lying as long as they don't say 1.5mbps IP layer throughput. It is a scam for sure. By definition of what throughput is - you are telling the truth; just not the whole truth. Most users think in terms of goodput and not throughput. i.e you are not telling the whole truth by not saying it is 1.5Mbps ATM throughput. Tpyically not an issue until somebody finds that by leaving out ATM you meant throughput and not goodput. I think that point can be used to argue in favour of that Linux should be able to manage effective throughput :) I think you have convinced me this is valuable I even suggest probes above to discover goodput;-). I hope i have convinced you how rude it would be to make extensive changes to compensate for goodput;- I am saying that #2 is the choice to go with hence my assertion earlier, it should be fine to tell the scheduler all it has is 1Mbps and nobody gets hurt. #1 if i could do it with minimal intrusion and still get to use it when i have 802.11g. Not sure i made sense. HFSC is actually capable of handling this quite well. If you use it in work-conserving mode (and the card doesn't do (much) internal queueing) it will get clocked by successful transmissions. Using link-sharing classes you can define proportions for use of available bandwidth, possibly with upper limits. No hacks required :) HFSC sounds very interesting - I should go and study it a little more. My understanding is though that it is a bit of a CPU pig, true? Anyway, this again goes more in the direction of handling link speed changes. The more we discuss this, the more i think they are the same thing ;- ip dev add compensate_header 100 bytes Something like that, but its a bit more complicated. For ATM we need some mapping: [0-48] - 53 [49-96] - 106 ... for Ethernet we need: [0-60] - 64 [60-n] - n + 4 an upper bound check against MTU would be reasonable. We could do something like this (feel free to imagine nicer names): The name should reflect that the table exists to compensate for goodput. ATM: table = { .step = 53, .map = { [0..48] = 53, [49..96] = 106, ... } }; Requiring a table of size 32 for typical MTUs. Ethernet: table = { .step = 60, .map = { [0..60] = 60, [...] = 0, }, .fixed_overhead = 4, }; static inline unsigned int skb_wire_len(struct sk_buff *skb, struct net_device *dev) { unsigned int idx, len; if (dev-lengthtable == NULL) return skb-len; idx = skb-len / dev-lengthtable-step; len = dev-lengthtable-map[idx]; return dev-lengthtable-fixed_overhead + len ? len : skb-len; } Unforunately I can't think of a way to handle the ATM case without a division .. or iteration. I am not thinking straight right now but it does not sound like a big change to me i.e within reason. Note, it may be valuable to think of this as related to the speed changing daemon as i stated earlier. Only in this
Re: [DOC]: generic netlink
On Tue, 2006-20-06 at 23:34 +0200, Thomas Graf wrote: Ask Mr. Mashimaro has become my replacement for 8ball. Renaming it would lead to a serious loss of coolness ;-) ;- Blame Dave for that ;- I think if you put it in some website, I will just add a url to point to it. Shailabh has sent me an extension to the example, but i think it is still not encompassing. b) Describe some details on how user space - kernel works probably using libnl?? I'll take care of that. Whats the plan? To add to this doc or separate doc? The status is that the code is there including userspace tools to query the controller. I have a patch for the controller for iproute2 that i would like to submit as well - but that is separate from this i think. Documentation is written as part of the API reference (coming up with -pre6), no architectural notes yet though. I think it's best to keep it separated and refer to it both ways. So you mean just refer to the one in the kernel headers? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Prevent multiple qdisc runs
On Wed, 2006-21-06 at 09:52 +1000, Herbert Xu wrote: Well my gut feeling is that multiple qdisc_run's on the same dev can't be good for perfomance. The reason is that SMP is only good when the CPUs work on different tasks. If you get two or more CPUs to work on qdisc_run at the same time they can still only supply one skb to the device at any time. What's worse is that they will now have to fight over the two spin locks involved which means that their cache lines will bounce back and forth. 1) If the CPUs collide all the time it is without a doubt it is a bad thing (you can tell from tx_collission stats). 2) If on the other hand, the iota that a CPU enters that path in the softirq it gets the txlock then there is benefit to not serialize at the level you have done with that patch - you are enlarging the granularity of the serialization so much so that the CPU wont even get the opportunity to try and grab tx lock because it finds qdisc is already running. Your gut feeling is for #1 and my worry is for #2 ;- I actually think your change is obviously valuable for scenarios where the bus is slower and therefore transmits take longer - my feeling is it may not be beneficial for fast buses like PCI-E or high speed PCI/X where the possibility of getting access tx collision is lower. The other reason I mentioned earlier as justification to leave the granularity at the level where it was is for good qos clocking. i.e to allow incoming packets to be used to clock the tx path - otherwise you will be dependent on HZ for your egress rate accuracy. I am not sure if this later point made sense - I could elaborate. The experiment needed to prove things is not hard: one needs to get a 2 or 4way machine and create a funneling effect to one NIC. For forwarding, the best setup will be to have 3 NICs. packets coming in on 2 NICs are forwarded to a third. The incoming-packet NICS are tied to different CPUs. In a 4way, the outgoing as well is tied to its own CPU. You then pummel the two incoming CPUs with pktgen or otherwise at something like 1Mpps (which is higher than the wire rate the third nic can handle). cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism
On Thu, 2006-22-06 at 10:27 -0500, Steve Wise wrote: The in-kernel Infiniband subsystem needs to know when certain events happen. For example, if the mac address of a neighbour changes. Any rdma devices that are using said neighbour need to be notified of the change. You are asking that I extend the netlink facility (if necessary) to provide this functionality. No - what these 2 gents are saying was these events and infrastructure already exist. If there are some events that dont and you need to extend what already exists. Your patch was a serious reinvention of the wheel (and in the case of the neighbor code looking very wrong). As an example, search for NETDEV_CHANGEADDR,NETDEV_CHANGEMTU etc. Actually you are probably making this too complicated. Listen to events in user space and tell infiniband from user space. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/1] Kevent subsystem.
Evgeniy, On 6/22/06, Evgeniy Polyakov [EMAIL PROTECTED] wrote: Kevent subsystem incorporates several AIO/kqueue design notes and ideas. Kevent can be used both for edge and level notifications. It supports socket notifications, network AIO (aio_send(), aio_recv() and aio_sendfile()), inode notifications (create/remove), generic poll()/select() notifications and timer notifications. Great job! Smooth integration with userland asynch POSIX frameworks (e.g. ACE POSIX_Proactor) may require syscalls (or their emulation) with POSIX interface: * POSIX_API * aio_read * aio_write * aio_suspend * aio_error * aio_return * aio_cancel where aio_suspend is very important. -- Sincerely, -- Robert Iakobashvili, coroberti at gmail dot com Navigare necesse est, vivere non est necesse. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/7] NetLabel: core network changes
On Thu, 22 Jun 2006, Steve Grubb wrote: On Thursday 22 June 2006 05:00, David Miller wrote: #define NETLINK_GENERIC 16 +#define NETLINK_NETLABEL 17 /* Network packet labeling */ #define MAX_LINKS 32 Please use generic netlink. Since this is a security interface, shouldn't it be its own protocol so that SE Linux can control commands being sent? Paul's patches do include a netlink table in security/selinux/nlmsgtab.c. But I do not see any hooks to control generic netlink messages. (There seems to be several protocols that SE Linux is not controlling.) I could see that someone in secadm role should be able to issue these commands, but someone at sysadm or auditadm would not. If moving this over to generic is a must, then I think SE Linux will have to clip into generic to control its packet flow. SELinux will mediate them as 'generic' netlink. Fine-grained SELinux support for generic netlink is todo. -- James Morris [EMAIL PROTECTED]
Re: [PATCH 0/2][RFC] Network Event Notifier Mechanism
On Thu, 2006-06-22 at 15:43 -0400, jamal wrote: On Thu, 2006-22-06 at 10:27 -0500, Steve Wise wrote: The in-kernel Infiniband subsystem needs to know when certain events happen. For example, if the mac address of a neighbour changes. Any rdma devices that are using said neighbour need to be notified of the change. You are asking that I extend the netlink facility (if necessary) to provide this functionality. No - what these 2 gents are saying was these events and infrastructure already exist. If there are some events that dont and you need to extend what already exists. Your patch was a serious reinvention of the wheel (and in the case of the neighbor code looking very wrong). ok. As an example, search for NETDEV_CHANGEADDR,NETDEV_CHANGEMTU etc. Actually you are probably making this too complicated. NETDEV_CHANGEADDR uses a notifier block, and the network subsystem calls call_netdevice_notifiers() when it sets an addr. And any kernel module can register for these events. That's the model I used to create the netevent_notifier mechanism in the patch I posted. I could add the new events to this netdevice notifier, but these aren't really net device events. Their network events. Listen to events in user space and tell infiniband from user space. I can indeed extend the rtnetlink stuff to add the events in question (neighbour mac addr change, route redirect, etc). In fact, there is similar functionality under the CONFIG_ARPD option to support a user space arp daemon. Its not quite the same, and it doesn't cover redirect and routing events, just neighbour events. But in the case of the RDMA subsystem, the consumer of these events is in the kernel. Why is it better to propagate events all the way up to user space, then send the event back down into the Infiniband kernel subsystem? That seems very inefficient. Steve. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] WAN: register_hdlc_device() doesn't need dev_alloc_name()
David Boggs noticed that register_hdlc_device() no longer needs to call dev_alloc_name() as it's called by register_netdev(). register_hdlc_device() is currently equivalent to register_netdev(). hdlc_setup() is now EXPORTed as per David's request. Signed-off-by: Krzysztof Halasa [EMAIL PROTECTED] --- a/include/linux/hdlc.h +++ b/include/linux/hdlc.h @@ -188,7 +188,7 @@ int hdlc_x25_ioctl(struct net_device *de int hdlc_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd); /* Must be used by hardware driver on module startup/exit */ -int register_hdlc_device(struct net_device *dev); +#define register_hdlc_device(dev) register_netdev(dev) void unregister_hdlc_device(struct net_device *dev); struct net_device *alloc_hdlcdev(void *priv); --- a/drivers/net/wan/hdlc_generic.c +++ b/drivers/net/wan/hdlc_generic.c @@ -259,7 +259,7 @@ int hdlc_ioctl(struct net_device *dev, s } } -static void hdlc_setup(struct net_device *dev) +void hdlc_setup(struct net_device *dev) { hdlc_device *hdlc = dev_to_hdlc(dev); @@ -288,26 +288,6 @@ struct net_device *alloc_hdlcdev(void *p return dev; } -int register_hdlc_device(struct net_device *dev) -{ - int result = dev_alloc_name(dev, hdlc%d); - if (result 0) - return result; - - result = register_netdev(dev); - if (result != 0) - return -EIO; - -#if 0 - if (netif_carrier_ok(dev)) - netif_carrier_off(dev); /* no carrier until DCD goes up */ -#endif - - return 0; -} - - - void unregister_hdlc_device(struct net_device *dev) { rtnl_lock(); @@ -326,8 +306,8 @@ EXPORT_SYMBOL(hdlc_open); EXPORT_SYMBOL(hdlc_close); EXPORT_SYMBOL(hdlc_set_carrier); EXPORT_SYMBOL(hdlc_ioctl); +EXPORT_SYMBOL(hdlc_setup); EXPORT_SYMBOL(alloc_hdlcdev); -EXPORT_SYMBOL(register_hdlc_device); EXPORT_SYMBOL(unregister_hdlc_device); static struct packet_type hdlc_packet_type = { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] WAN: ioremap() failure checks in drivers
Eric Sesterhenn found that pci200syn initialization lacks return statement in ioremap() error path (coverity bug id #195). It looks like more WAN drivers have problems with ioremap(). Signed-off-by: Krzysztof Halasa [EMAIL PROTECTED] --- a/drivers/net/wan/c101.c +++ b/drivers/net/wan/c101.c @@ -326,21 +326,21 @@ static int __init c101_run(unsigned long if (request_irq(irq, sca_intr, 0, devname, card)) { printk(KERN_ERR c101: could not allocate IRQ\n); c101_destroy_card(card); - return(-EBUSY); + return -EBUSY; } card-irq = irq; if (!request_mem_region(winbase, C101_MAPPED_RAM_SIZE, devname)) { printk(KERN_ERR c101: could not request RAM window\n); c101_destroy_card(card); - return(-EBUSY); + return -EBUSY; } card-phy_winbase = winbase; card-win0base = ioremap(winbase, C101_MAPPED_RAM_SIZE); if (!card-win0base) { printk(KERN_ERR c101: could not map I/O address\n); c101_destroy_card(card); - return -EBUSY; + return -EFAULT; } card-tx_ring_buffers = TX_RING_BUFFERS; --- a/drivers/net/wan/n2.c +++ b/drivers/net/wan/n2.c @@ -387,6 +387,11 @@ static int __init n2_run(unsigned long i } card-phy_winbase = winbase; card-winbase = ioremap(winbase, USE_WINDOWSIZE); + if (!card-winbase) { + printk(KERN_ERR n2: ioremap() failed\n); + n2_destroy_card(card); + return -EFAULT; + } outb(0, io + N2_PCR); outb(winbase 12, io + N2_BAR); --- a/drivers/net/wan/pci200syn.c +++ b/drivers/net/wan/pci200syn.c @@ -358,6 +358,7 @@ #endif card-rambase == NULL) { printk(KERN_ERR pci200syn: ioremap() failed\n); pci200_pci_remove_one(pdev); + return -EFAULT; } /* Reset PLX */ --- a/drivers/net/wan/wanxl.c +++ b/drivers/net/wan/wanxl.c @@ -634,7 +634,13 @@ #endif /* set up PLX mapping */ plx_phy = pci_resource_start(pdev, 0); + card-plx = ioremap_nocache(plx_phy, 0x70); + if (!card-plx) { + printk(KERN_ERR wanxl: ioremap() failed\n); + wanxl_pci_remove_one(pdev); + return -EFAULT; + } #if RESET_WHILE_LOADING wanxl_reset(card); @@ -700,6 +706,12 @@ #endif } mem = ioremap_nocache(mem_phy, PDM_OFFSET + sizeof(firmware)); + if (!mem) { + printk(KERN_ERR wanxl: ioremap() failed\n); + wanxl_pci_remove_one(pdev); + return -EFAULT; + } + for (i = 0; i sizeof(firmware); i += 4) writel(htonl(*(u32*)(firmware + i)), mem + PDM_OFFSET + i); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/32] [TIPC] Multicast link failure now resets all links to nacking node.
Nice observation, James. As a relative newcomer to the official Linux kernel development world, I'm impressed that non-TIPC folks are looking at TIPC changes closely enough to see things like this! Per, I'll leave it to you to decide if you want to address James's concern. But be aware that the link_reset_all() routine is only called to handle emergency situations when TIPC's multicast link has run into serious problems and is trying to recover. Most systems will never follow this path, so the cost of the unnecessary local_bh_disable()/local_bh_enable() pairing shouldn't have any real impact on the overall performance of TIPC. Regards, Al Stephens -Original Message- From: James Morris [mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 2:51 PM To: Per Liden Cc: David Miller; netdev@vger.kernel.org; Stephens, Allan Subject: Re: [PATCH 7/32] [TIPC] Multicast link failure now resets all links to nacking node. On Thu, 22 Jun 2006, Per Liden wrote: +static void link_reset_all(unsigned long addr) { + struct node *n_ptr; + char addr_string[16]; + u32 i; + + read_lock_bh(tipc_net_lock); + n_ptr = tipc_node_find((u32)addr); + if (!n_ptr) { + read_unlock_bh(tipc_net_lock); + return; /* node no longer exists */ + } + + tipc_node_lock(n_ptr); You already have bh's disabled here, and tipc_node_lock() also disables them. Not sure if it's really worth worrying about but if so, you could perhaps implement tipc_node_lock_bh() and tipc_node_lock(). - James -- James Morris [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html