Re: [PATCH 1/2] [IPV4] UDP: Always checksum even if without socket filter
On Thu, Nov 29, 2007 at 03:55:38PM +0800, Wang Chen wrote: I tested nfsv3 nfsv4. It seems that nfs calls recvmsg() like following:nfsd()-svc_recv()-svc_udp_recvfrom()-udp_recvmsg(). So, I think putting the udpInDatagrams increment in udp_recvmsg() is enough. FYI: http://www.mail-archive.com/netdev@vger.kernel.org/msg13817.html Excellent. They now do a recvmsg first with no buffer to get meta-information, which just happens to increment the counters. Could you please resubmit the patch then? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Thursday 22 November 2007, Andi Kleen wrote: #define EXPORT_SYMBOL(sym) \ - __EXPORT_SYMBOL(sym, ) + __EXPORT_SYMBOL(sym, ,,, NULL) #define EXPORT_SYMBOL_GPL(sym) \ - __EXPORT_SYMBOL(sym, _gpl) + __EXPORT_SYMBOL(sym, _gpl,,, NULL) #define EXPORT_SYMBOL_GPL_FUTURE(sym) \ - __EXPORT_SYMBOL(sym, _gpl_future) + __EXPORT_SYMBOL(sym, _gpl_future,,, NULL) +/* Export symbol into namespace ns + * No _GPL variants because namespaces imply GPL only + */ +#define EXPORT_SYMBOL_NS(ns, sym) \ + __EXPORT_SYMBOL(sym, _gpl,__##ns, NS_SEPARATOR #ns, #ns) I think it would be good if you could specify a default namespace per module, that could reduce the amount of necessary changes significantly. For example, you can do #define EXPORT_SYMBOL_GLOBAL(sym) __EXPORT_SYMBOL(sym, _gpl,,, NULL) #ifdef MODULE_NAMESPACE #define EXPORT_SYMBOL_GPL(sym) EXPORT_SYMBOL_GLOBAL(sym) #else #define EXPORT_SYMBOL_GPL(sym) EXPORT_SYMBOL_NS(sym, MODULE_NAMESPACE) #endif If we go that way, it may be useful to extend the namespace mechanism to non-GPL symbols as well, like #define EXPORT_SYMBOL(sym) __EXPORT_SYMBOL(sym, ,__## MODULE_NAMESPACE, NS_SEPARATOR #MODULE_NAMESPACE, #MODULE_NAMESPACE) Unfortunately, doing this automatic namespace selection requires to set the namespace before #include linux/module.h. One way to work around this could be to use Makefile magic so you can list a Makefile as obj-$(CONFIG_COMBINED) += combined.o combined-$(CONFIG_SUBOPTION) += combined_main.o combined_other.o obj-$(CONFIG_SINGLE) += single.o obj-$(CONFIG_OTHER) += other.o obj-$(CONFIG_API) += api.o NAMESPACE = subsys # default, used for other.o NAMESPACE_single.o = single # used only for single.o NAMESPACE_combined.o = combined # all parts of combined.o NAMESPACE_combined_other.o = special #except this one NAMESPACE_api.o =# api.o is put into the global ns The Makefile logic here would basically just follow the rules we have for CFLAGS etc, and then pass -DMODULE_NAMESPACE=$(NAMESPACE_$(obj)) to gcc. Arnd - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
I think it would be good if you could specify a default namespace per module, that could reduce the amount of necessary changes significantly. But also give less documentation. It's also not that difficult to mark the exports once. I've forward ported such patches over a few kernels and didn't run into significant me obj-$(CONFIG_COMBINED) += combined.o combined-$(CONFIG_SUBOPTION) += combined_main.o combined_other.o obj-$(CONFIG_SINGLE) += single.o obj-$(CONFIG_OTHER) += other.o obj-$(CONFIG_API) += api.o NAMESPACE = subsys # default, used for other.o NAMESPACE_single.o = single # used only for single.o NAMESPACE_combined.o = combined # all parts of combined.o NAMESPACE_combined_other.o = special #except this one NAMESPACE_api.o =# api.o is put into the global ns I would prefer to keep that inside the source files, again for documentation purposes. One goal of namespace was to make something that was previously kind of implicit explicit and the default name spaces would work against that again I think. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET 00/02]: Remove NET_ACT_NAT dependency on NETFILTER
These patches remove the dependency of NET_ACT_NAT on NETFILTER by moving the netfilter checksum helpers to include/net/checksum and net/core/utils.c. I didn't find more appropriate locations, but I'd happily change it if someone suggests something better. include/linux/netfilter.h | 22 -- include/net/checksum.h | 25 + net/core/utils.c | 16 net/ipv4/netfilter/ipt_ECN.c |6 +++--- net/ipv4/netfilter/ipt_TOS.c |2 +- net/ipv4/netfilter/ipt_TTL.c |4 ++-- net/ipv4/netfilter/nf_nat_core.c |4 ++-- net/ipv4/netfilter/nf_nat_helper.c | 20 ++-- net/ipv4/netfilter/nf_nat_proto_icmp.c |4 ++-- net/ipv4/netfilter/nf_nat_proto_tcp.c |4 ++-- net/ipv4/netfilter/nf_nat_proto_udp.c |6 +++--- net/netfilter/core.c | 16 net/netfilter/xt_TCPMSS.c | 17 + net/sched/Kconfig |1 - net/sched/act_nat.c| 12 ++-- 15 files changed, 81 insertions(+), 78 deletions(-) Patrick McHardy (2): [NET]: Move netfilter checksum helpers to net/core/utils.c [NETFILTER]: Convert old checksum helper names - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET 01/02]: Move netfilter checksum helpers to net/core/utils.c
[NET]: Move netfilter checksum helpers to net/core/utils.c This allows to get rid of the CONFIG_NETFILTER dependency of NET_ACT_NAT. This patch redefines the old names to keep the noise low, the next patch converts all users. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 01879e49c8b53ea8cfb28f275db18ee7cbe54304 tree 3a62d0d28d01b8d3a32b1927ccf854c4af7dc1dc parent c748e53090d0511fdecb9a91cd3619bd2d7a39f6 author Patrick McHardy [EMAIL PROTECTED] Thu, 29 Nov 2007 10:43:11 +0100 committer Patrick McHardy [EMAIL PROTECTED] Thu, 29 Nov 2007 10:43:11 +0100 include/linux/netfilter.h | 25 - include/net/checksum.h| 25 + net/core/utils.c | 16 net/netfilter/core.c | 16 4 files changed, 45 insertions(+), 37 deletions(-) diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index 25fc122..e2bf6d2 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -298,27 +298,10 @@ extern void nf_invalidate_cache(int pf); Returns true or false. */ extern int skb_make_writable(struct sk_buff *skb, unsigned int writable_len); -static inline void nf_csum_replace4(__sum16 *sum, __be32 from, __be32 to) -{ - __be32 diff[] = { ~from, to }; - - *sum = csum_fold(csum_partial((char *)diff, sizeof(diff), ~csum_unfold(*sum))); -} - -static inline void nf_csum_replace2(__sum16 *sum, __be16 from, __be16 to) -{ - nf_csum_replace4(sum, (__force __be32)from, (__force __be32)to); -} - -extern void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb, - __be32 from, __be32 to, int pseudohdr); - -static inline void nf_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb, - __be16 from, __be16 to, int pseudohdr) -{ - nf_proto_csum_replace4(sum, skb, (__force __be32)from, - (__force __be32)to, pseudohdr); -} +#define nf_csum_replace4 csum_replace4 +#define nf_csum_replace2 csum_replace2 +#define nf_proto_csum_replace4 inet_proto_csum_replace4 +#define nf_proto_csum_replace2 inet_proto_csum_replace2 struct nf_afinfo { unsigned short family; diff --git a/include/net/checksum.h b/include/net/checksum.h index 1242461..07602b7 100644 --- a/include/net/checksum.h +++ b/include/net/checksum.h @@ -93,4 +93,29 @@ static inline __wsum csum_unfold(__sum16 n) } #define CSUM_MANGLED_0 ((__force __sum16)0x) + +static inline void csum_replace4(__sum16 *sum, __be32 from, __be32 to) +{ + __be32 diff[] = { ~from, to }; + + *sum = csum_fold(csum_partial((char *)diff, sizeof(diff), ~csum_unfold(*sum))); +} + +static inline void csum_replace2(__sum16 *sum, __be16 from, __be16 to) +{ + csum_replace4(sum, (__force __be32)from, (__force __be32)to); +} + +struct sk_buff; +extern void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb, +__be32 from, __be32 to, int pseudohdr); + +static inline void inet_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb, + __be16 from, __be16 to, + int pseudohdr) +{ + inet_proto_csum_replace4(sum, skb, (__force __be32)from, +(__force __be32)to, pseudohdr); +} + #endif diff --git a/net/core/utils.c b/net/core/utils.c index 0bf17da..34459c4 100644 --- a/net/core/utils.c +++ b/net/core/utils.c @@ -293,3 +293,19 @@ out: } EXPORT_SYMBOL(in6_pton); + +void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb, + __be32 from, __be32 to, int pseudohdr) +{ + __be32 diff[] = { ~from, to }; + if (skb-ip_summed != CHECKSUM_PARTIAL) { + *sum = csum_fold(csum_partial(diff, sizeof(diff), + ~csum_unfold(*sum))); + if (skb-ip_summed == CHECKSUM_COMPLETE pseudohdr) + skb-csum = ~csum_partial(diff, sizeof(diff), + ~skb-csum); + } else if (pseudohdr) + *sum = ~csum_fold(csum_partial(diff, sizeof(diff), + csum_unfold(*sum))); +} +EXPORT_SYMBOL(inet_proto_csum_replace4); diff --git a/net/netfilter/core.c b/net/netfilter/core.c index bed9ba0..631d269 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -217,22 +217,6 @@ int skb_make_writable(struct sk_buff *skb, unsigned int writable_len) } EXPORT_SYMBOL(skb_make_writable); -void nf_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb, - __be32 from, __be32 to, int pseudohdr) -{ - __be32 diff[] = { ~from, to }; - if (skb-ip_summed != CHECKSUM_PARTIAL) { - *sum = csum_fold(csum_partial(diff, sizeof(diff), - ~csum_unfold(*sum))); - if (skb-ip_summed == CHECKSUM_COMPLETE
[NETFILTER 02/02]: Convert old checksum helper names
[NETFILTER]: Convert old checksum helper names Kill the defines again, convert to the new checksum helper names and remove the dependency of NET_ACT_NAT on NETFILTER. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit da8ffc485b6e8e429c51f7f1c03a0ae824eca848 tree 2198c531b224c82e74b66b3029dc8fec1c555105 parent 01879e49c8b53ea8cfb28f275db18ee7cbe54304 author Patrick McHardy [EMAIL PROTECTED] Thu, 29 Nov 2007 10:48:52 +0100 committer Patrick McHardy [EMAIL PROTECTED] Thu, 29 Nov 2007 10:48:52 +0100 include/linux/netfilter.h |5 - net/ipv4/netfilter/ipt_ECN.c |6 +++--- net/ipv4/netfilter/ipt_TOS.c |2 +- net/ipv4/netfilter/ipt_TTL.c |4 ++-- net/ipv4/netfilter/nf_nat_core.c |4 ++-- net/ipv4/netfilter/nf_nat_helper.c | 20 ++-- net/ipv4/netfilter/nf_nat_proto_icmp.c |4 ++-- net/ipv4/netfilter/nf_nat_proto_tcp.c |4 ++-- net/ipv4/netfilter/nf_nat_proto_udp.c |6 +++--- net/netfilter/xt_TCPMSS.c | 17 + net/sched/Kconfig |1 - net/sched/act_nat.c| 12 ++-- 12 files changed, 40 insertions(+), 45 deletions(-) diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index e2bf6d2..f42e436 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -298,11 +298,6 @@ extern void nf_invalidate_cache(int pf); Returns true or false. */ extern int skb_make_writable(struct sk_buff *skb, unsigned int writable_len); -#define nf_csum_replace4 csum_replace4 -#define nf_csum_replace2 csum_replace2 -#define nf_proto_csum_replace4 inet_proto_csum_replace4 -#define nf_proto_csum_replace2 inet_proto_csum_replace2 - struct nf_afinfo { unsigned short family; __sum16 (*checksum)(struct sk_buff *skb, unsigned int hook, diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c index add1100..e8d5f68 100644 --- a/net/ipv4/netfilter/ipt_ECN.c +++ b/net/ipv4/netfilter/ipt_ECN.c @@ -38,7 +38,7 @@ set_ect_ip(struct sk_buff *skb, const struct ipt_ECN_info *einfo) oldtos = iph-tos; iph-tos = ~IPT_ECN_IP_MASK; iph-tos |= (einfo-ip_ect IPT_ECN_IP_MASK); - nf_csum_replace2(iph-check, htons(oldtos), htons(iph-tos)); + csum_replace2(iph-check, htons(oldtos), htons(iph-tos)); } return true; } @@ -71,8 +71,8 @@ set_ect_tcp(struct sk_buff *skb, const struct ipt_ECN_info *einfo) if (einfo-operation IPT_ECN_OP_SET_CWR) tcph-cwr = einfo-proto.tcp.cwr; - nf_proto_csum_replace2(tcph-check, skb, - oldval, ((__be16 *)tcph)[6], 0); + inet_proto_csum_replace2(tcph-check, skb, +oldval, ((__be16 *)tcph)[6], 0); return true; } diff --git a/net/ipv4/netfilter/ipt_TOS.c b/net/ipv4/netfilter/ipt_TOS.c index d4573ba..7b4a6ca 100644 --- a/net/ipv4/netfilter/ipt_TOS.c +++ b/net/ipv4/netfilter/ipt_TOS.c @@ -38,7 +38,7 @@ target(struct sk_buff *skb, iph = ip_hdr(skb); oldtos = iph-tos; iph-tos = (iph-tos IPTOS_PREC_MASK) | tosinfo-tos; - nf_csum_replace2(iph-check, htons(oldtos), htons(iph-tos)); + csum_replace2(iph-check, htons(oldtos), htons(iph-tos)); } return XT_CONTINUE; } diff --git a/net/ipv4/netfilter/ipt_TTL.c b/net/ipv4/netfilter/ipt_TTL.c index c620a05..00ddfbe 100644 --- a/net/ipv4/netfilter/ipt_TTL.c +++ b/net/ipv4/netfilter/ipt_TTL.c @@ -54,8 +54,8 @@ ipt_ttl_target(struct sk_buff *skb, } if (new_ttl != iph-ttl) { - nf_csum_replace2(iph-check, htons(iph-ttl 8), - htons(new_ttl 8)); + csum_replace2(iph-check, htons(iph-ttl 8), + htons(new_ttl 8)); iph-ttl = new_ttl; } diff --git a/net/ipv4/netfilter/nf_nat_core.c b/net/ipv4/netfilter/nf_nat_core.c index d237511..746c2ef 100644 --- a/net/ipv4/netfilter/nf_nat_core.c +++ b/net/ipv4/netfilter/nf_nat_core.c @@ -372,10 +372,10 @@ manip_pkt(u_int16_t proto, iph = (void *)skb-data + iphdroff; if (maniptype == IP_NAT_MANIP_SRC) { - nf_csum_replace4(iph-check, iph-saddr, target-src.u3.ip); + csum_replace4(iph-check, iph-saddr, target-src.u3.ip); iph-saddr = target-src.u3.ip; } else { - nf_csum_replace4(iph-check, iph-daddr, target-dst.u3.ip); + csum_replace4(iph-check, iph-daddr, target-dst.u3.ip); iph-daddr = target-dst.u3.ip; } return 1; diff --git a/net/ipv4/netfilter/nf_nat_helper.c b/net/ipv4/netfilter/nf_nat_helper.c index d00b8b2..53f79a3 100644 --- a/net/ipv4/netfilter/nf_nat_helper.c +++ b/net/ipv4/netfilter/nf_nat_helper.c @@
Re: [PATCH 1/2] [IPV4] UDP: Always checksum even if without socket filter
On Thu, Nov 29, 2007 at 06:08:30PM +0800, Wang Chen wrote: Add a new UdpInEarlyDatagrams counter to count datagrams received early, but which might be dropped later. Could you please split this into two patches? Have one do the UdpInDatagrams change and the other to introduce the EarlyDatagrams counter. I'm a bit hesitant to introduce new counters in the MIB because it'd be difficult if not impossible to ever remove them. Do you really need the early counter? One more thing, please put the is_udplite clean-up in its own patch too so it's absolutely clear what we're changing in the patches that aren't clean-ups. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Wang Chen [EMAIL PROTECTED] Who's the author? Andi or you? Please make this obvious with a From header when you resubmit. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Thursday 29 November 2007, Andi Kleen wrote: I think it would be good if you could specify a default namespace per module, that could reduce the amount of necessary changes significantly. But also give less documentation. It's also not that difficult to mark the exports once. I've forward ported such patches over a few kernels and didn't run into significant me Part of your sentence seems to be missing, but I guess I understand your point. How many files did you annotate this way? I can see it as being useful to have the namespace explicit in each symbol, but doing it once per module sounds like the 80% solution for 20% of the work, and the two don't even conflict. In the current kernel, I count 12644 exported symbols in 1646 files, in 540 directories. One problem I can see with annotating every symbol is that it conflicts with other patches that add more exported functions to a file without adding the namespace, or that simply break because of context changes. Arnd - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Eliminate unused argument from sk_stream_alloc_pskb
On Mon, Nov 26, 2007 at 08:17:27PM +0300, Pavel Emelyanov wrote: The 3rd argument is always zero (according to grep :) Eliminate it and merge the function with sk_stream_alloc_skb. This saves 44 more bytes, and together with the previous patch we have: add/remove: 1/0 grow/shrink: 0/8 up/down: 183/-751 (-568) function old new delta sk_stream_alloc_skb- 183+183 ip_rt_init 529 525 -4 arp_ignore 112 107 -5 __inet_lookup_listener 284 274 -10 tcp_sendmsg 25832481-102 tcp_sendpage14491300-149 tso_fragment 417 258-159 tcp_fragment1149 988-161 __tcp_push_pending_frames 19981837-161 Also applied to net-2.6.25. Thanks. Question: is this 2.6.24 material (good space saving) or should I rework this against 2.6.25 (it applies with fuzzes, but seems to compile)? I guess I've answered this question :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.5)
On Mon, Nov 26, 2007 at 05:16:16PM +, Templin, Fred L wrote: From: Fred L. Templin [EMAIL PROTECTED] This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the iproute2 utility. The diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This version includes the diff for ./include/linux/if.h which was missing in the v2.4 submission and is needed to make the patch compile. The patch has been installed, compiled and tested in a clean 2.6.24-rc2 kernel build area. Signed-off-by: Fred L. Templin [EMAIL PROTECTED] Sorry, the patch doesn't apply to net-2.6.25. $ git apply --check --whitespace=error-all ~/p Space in indent is followed by a tab. /home/gondolin/herbert/p:101: %s: Disabled Multicast RS\n, Space in indent is followed by a tab. /home/gondolin/herbert/p:216: } Space in indent is followed by a tab. /home/gondolin/herbert/p:252: printk(KERN_DEBUG sit: nexthop == NULL\n); Space in indent is followed by a tab. /home/gondolin/herbert/p:254: } fatal: corrupt patch at line 269 $ There seems to be a line missing at the end. Please fix the white space errors and resend. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] XFRM: SPD auditing fix to include the netmask/prefix-length
On Mon, Nov 26, 2007 at 07:55:12PM +, Paul Moore wrote: Currently the netmask/prefix-length of an IPsec SPD entry is not included in any of the SPD related audit messages. This can cause a problem when the audit log is examined as the netmask/prefix-length is vital in determining what network traffic is affected by a particular SPD entry. This patch fixes this problem by adding two additional fields, src_prefixlen and dst_prefixlen, to the SPD audit messages to indicate the source and destination netmasks. These new fields are only included in the audit message when the netmask/prefix-length is less than the address length, i.e. the SPD entry applies to a network address and not a host address. Any reason why we don't just always include them? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [IPV4] UDP: Always checksum even if without socket filter
Herbert Xu said the following on 2007-11-29 18:21: On Thu, Nov 29, 2007 at 06:08:30PM +0800, Wang Chen wrote: Add a new UdpInEarlyDatagrams counter to count datagrams received early, but which might be dropped later. Could you please split this into two patches? Have one do the UdpInDatagrams change and the other to introduce the EarlyDatagrams counter. I'm a bit hesitant to introduce new counters in the MIB because it'd be difficult if not impossible to ever remove them. Do you really need the early counter? I cooked the patch based on Andi's and left the new counter. Frankly, I don't like the EarlyDatagrams too. So, I will remove it and resubmit. One more thing, please put the is_udplite clean-up in its own patch too so it's absolutely clear what we're changing in the patches that aren't clean-ups. OK. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Wang Chen [EMAIL PROTECTED] Who's the author? Andi or you? Please make this obvious with a From header when you resubmit. Since I will remove the new counter idea of Andi, there will be only one author. :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [IPV4] UDP: Always checksum even if without socket filter
Herbert Xu said the following on 2007-11-29 17:21: On Thu, Nov 29, 2007 at 03:55:38PM +0800, Wang Chen wrote: Excellent. They now do a recvmsg first with no buffer to get meta-information, which just happens to increment the counters. Could you please resubmit the patch then? [SNMP]: Defer InDataGrams increment until recvmsg() does checksum Split UDP receive count into UdpInDatagrams and UdpInEarlyDatagrams UdpInDatagrams can be confusing because it counts packets that might be dropped later. Move UdpInDatagrams into recvmsg() as allowed by the RFC. Add a new UdpInEarlyDatagrams counter to count datagrams received early, but which might be dropped later. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Wang Chen [EMAIL PROTECTED] --- Documentation/networking/udplite.txt |2 +- include/linux/snmp.h |1 + net/ipv4/proc.c |1 + net/ipv4/udp.c | 12 net/ipv6/proc.c |1 + net/ipv6/udp.c | 13 - 6 files changed, 20 insertions(+), 10 deletions(-) diff -Nurp linux-2.6.24.rc3.org/Documentation/networking/udplite.txt linux-2.6.24.rc3/Documentation/networking/udplite.txt --- linux-2.6.24.rc3.org/Documentation/networking/udplite.txt 2007-11-19 12:37:40.0 +0800 +++ linux-2.6.24.rc3/Documentation/networking/udplite.txt 2007-11-28 18:35:29.0 +0800 @@ -236,7 +236,7 @@ This displays UDP-Lite statistics variables, whose meaning is as follows. - InDatagrams: Total number of received datagrams. + InDatagrams: The total number of UDP datagrams delivered to UDP users. NoPorts: Number of packets received to an unknown port. These cases are counted separately (not as InErrors). diff -Nurp linux-2.6.24.rc3.org/include/linux/snmp.h linux-2.6.24.rc3/include/linux/snmp.h --- linux-2.6.24.rc3.org/include/linux/snmp.h 2007-11-19 12:38:13.0 +0800 +++ linux-2.6.24.rc3/include/linux/snmp.h 2007-11-28 18:06:15.0 +0800 @@ -138,6 +138,7 @@ enum UDP_MIB_OUTDATAGRAMS, /* OutDatagrams */ UDP_MIB_RCVBUFERRORS, /* RcvbufErrors */ UDP_MIB_SNDBUFERRORS, /* SndbufErrors */ + UDP_MIB_INEARLYDATAGRAMS, /* Early Datagrams Received */ __UDP_MIB_MAX }; diff -Nurp linux-2.6.24.rc3.org/net/ipv4/proc.c linux-2.6.24.rc3/net/ipv4/proc.c --- linux-2.6.24.rc3.org/net/ipv4/proc.c2007-11-19 12:38:14.0 +0800 +++ linux-2.6.24.rc3/net/ipv4/proc.c2007-11-28 18:06:15.0 +0800 @@ -149,6 +149,7 @@ static const struct snmp_mib snmp4_tcp_l static const struct snmp_mib snmp4_udp_list[] = { SNMP_MIB_ITEM(InDatagrams, UDP_MIB_INDATAGRAMS), + SNMP_MIB_ITEM(InEarlyDatagrams, UDP_MIB_INEARLYDATAGRAMS), SNMP_MIB_ITEM(NoPorts, UDP_MIB_NOPORTS), SNMP_MIB_ITEM(InErrors, UDP_MIB_INERRORS), SNMP_MIB_ITEM(OutDatagrams, UDP_MIB_OUTDATAGRAMS), diff -Nurp linux-2.6.24.rc3.org/net/ipv4/udp.c linux-2.6.24.rc3/net/ipv4/udp.c --- linux-2.6.24.rc3.org/net/ipv4/udp.c 2007-11-19 12:38:14.0 +0800 +++ linux-2.6.24.rc3/net/ipv4/udp.c 2007-11-29 17:24:25.0 +0800 @@ -873,6 +873,8 @@ try_again: if (err) goto out_free; + UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite); + sock_recv_timestamp(msg, sk, skb); /* Copy the address. */ @@ -940,6 +942,7 @@ int udp_queue_rcv_skb(struct sock * sk, { struct udp_sock *up = udp_sk(sk); int rc; + int is_udplite = IS_UDPLITE(sk); /* * Charge it to the socket, dropping if the queue is full. @@ -967,7 +970,8 @@ int udp_queue_rcv_skb(struct sock * sk, ret = (*up-encap_rcv)(sk, skb); if (ret = 0) { - UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up-pcflag); + UDP_INC_STATS_BH(UDP_MIB_INEARLYDATAGRAMS, +is_udplite); return -ret; } } @@ -1019,15 +1023,15 @@ int udp_queue_rcv_skb(struct sock * sk, if ((rc = sock_queue_rcv_skb(sk,skb)) 0) { /* Note that an ENOMEM error is charged twice */ if (rc == -ENOMEM) - UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, up-pcflag); + UDP_INC_STATS_BH(UDP_MIB_RCVBUFERRORS, is_udplite); goto drop; } - UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, up-pcflag); + UDP_INC_STATS_BH(UDP_MIB_INEARLYDATAGRAMS, is_udplite); return 0; drop: - UDP_INC_STATS_BH(UDP_MIB_INERRORS, up-pcflag); + UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite); kfree_skb(skb); return -1; } diff -Nurp linux-2.6.24.rc3.org/net/ipv6/proc.c
Re: [PATCH 1/2] [IPV4] UDP: Always checksum even if without socket filter
On Thu, Nov 29, 2007 at 06:33:01PM +0800, Wang Chen wrote: I cooked the patch based on Andi's and left the new counter. Frankly, I don't like the EarlyDatagrams too. So, I will remove it and resubmit. Sounds good. Thanks for all your efforts on this problem! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.5)
In article [EMAIL PROTECTED] (at Thu, 29 Nov 2007 21:29:40 +1100), Herbert Xu [EMAIL PROTECTED] says: On Mon, Nov 26, 2007 at 05:16:16PM +, Templin, Fred L wrote: From: Fred L. Templin [EMAIL PROTECTED] This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the iproute2 utility. The diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This version includes the diff for ./include/linux/if.h which was missing in the v2.4 submission and is needed to make the patch compile. The patch has been installed, compiled and tested in a clean 2.6.24-rc2 kernel build area. Signed-off-by: Fred L. Templin [EMAIL PROTECTED] Sorry, the patch doesn't apply to net-2.6.25. $ git apply --check --whitespace=error-all ~/p Space in indent is followed by a tab. /home/gondolin/herbert/p:101: %s: Disabled Multicast RS\n, Space in indent is followed by a tab. /home/gondolin/herbert/p:216: } Space in indent is followed by a tab. /home/gondolin/herbert/p:252: printk(KERN_DEBUG sit: nexthop == NULL\n); Space in indent is followed by a tab. /home/gondolin/herbert/p:254: } fatal: corrupt patch at line 269 $ There seems to be a line missing at the end. Please fix the white space errors and resend. I've fixed up those errors. -- YOSHIFUJI Hideaki @ USAGI Project [EMAIL PROTECTED] GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA Subject: [PATCH] IPv6: RFC4214 Support (v2.5) Date: Mon, 26 Nov 2007 09:16:16 -0800 From: Fred L. Templin [EMAIL PROTECTED] This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the iproute2 utility. The diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This version includes the diff for ./include/linux/if.h which was missing in the v2.4 submission and is needed to make the patch compile. The patch has been installed, compiled and tested in a clean 2.6.24-rc2 kernel build area. Signed-off-by: Fred L. Templin [EMAIL PROTECTED] Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] --- diff --git a/include/linux/if.h b/include/linux/if.h index 186070d..5c9d1fa 100644 --- a/include/linux/if.h +++ b/include/linux/if.h @@ -63,6 +63,7 @@ #define IFF_MASTER_ALB 0x10/* bonding master, balance-alb. */ #define IFF_BONDING0x20/* bonding master or slave */ #define IFF_SLAVE_NEEDARP 0x40 /* need ARPs for validation */ +#define IFF_ISATAP 0x80/* ISATAP interface (RFC4214) */ #define IF_GET_IFACE 0x0001 /* for querying only */ #define IF_GET_PROTO 0x0002 diff --git a/include/linux/if_tunnel.h b/include/linux/if_tunnel.h index 660b501..228eb4e 100644 --- a/include/linux/if_tunnel.h +++ b/include/linux/if_tunnel.h @@ -17,6 +17,9 @@ #define GRE_FLAGS __constant_htons(0x00F8) #define GRE_VERSION__constant_htons(0x0007) +/* i_flags values for SIT mode */ +#defineSIT_ISATAP 0x0001 + struct ip_tunnel_parm { charname[IFNAMSIZ]; diff --git a/include/linux/in.h b/include/linux/in.h index 3975cbf..a8f00ca 100644 --- a/include/linux/in.h +++ b/include/linux/in.h @@ -253,6 +253,14 @@ struct sockaddr_in { #define ZERONET(x) (((x) htonl(0xff00)) == htonl(0x)) #define LOCAL_MCAST(x) (((x) htonl(0xFF00)) == htonl(0xE000)) +/* Special-Use IPv4 Addresses (RFC3330) */ +#define PRIVATE_10(x) (((x) htonl(0xff00)) == htonl(0x0A00)) +#define LINKLOCAL_169(x) (((x) htonl(0x)) == htonl(0xA9FE)) +#define PRIVATE_172(x) (((x) htonl(0xfff0)) == htonl(0xAC10)) +#define TEST_192(x)(((x) htonl(0xff00)) == htonl(0xC200)) +#define ANYCAST_6TO4(x)(((x) htonl(0xff00)) == htonl(0xC0586300)) +#define PRIVATE_192(x) (((x) htonl(0x)) == htonl(0xC0A8)) +#define TEST_198(x)(((x) htonl(0xfffe)) == htonl(0xC612)) #endif #endif /* _LINUX_IN_H */ diff --git a/include/net/addrconf.h b/include/net/addrconf.h index bccc2fe..c56827d 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -17,6 +17,7 @@ #define IPV6_MAX_ADDRESSES 16 +#include linux/in.h #include linux/in6.h struct prefix_info { @@ -249,6 +250,24 @@ static inline int ipv6_addr_is_ll_all_routers(const struct in6_addr *addr) addr-s6_addr32[3] == htonl(0x0002)); } +static inline int ipv6_isatap_eui64(u8 *eui, __be32 addr) +{ + eui[0] = (ZERONET(addr) || PRIVATE_10(addr) || LOOPBACK(addr) || + LINKLOCAL_169(addr) || PRIVATE_172(addr) || TEST_192(addr) || + ANYCAST_6TO4(addr) || PRIVATE_192(addr) || TEST_198(addr) || + MULTICAST(addr) || BADCLASS(addr)) ? 0x00 :
Re: [PATCH] sungem: fix napi regression with reset work
On Mon, Nov 26, 2007 at 09:02:08PM +0100, Johannes Berg wrote: sungem's gem_reset_task() will unconditionally try to disable NAPI even when it's called while the interface is not operating and hence the NAPI struct isn't enabled. Make napi_disable() depend on gp-running. Also removes a superfluous test of gp-running in the same function. Signed-off-by: Johannes Berg [EMAIL PROTECTED] Patch applied to net-2.6. Thanks Johannes! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [IPV4] UDP: Always checksum even if without socket filter
Thank you for doing this work, there is a small comment below. | --- linux-2.6.24.rc3.org/Documentation/networking/udplite.txt 2007-11-19 12:37:40.0 +0800 | +++ linux-2.6.24.rc3/Documentation/networking/udplite.txt 2007-11-28 18:35:29.0 +0800 | @@ -236,7 +236,7 @@ | |This displays UDP-Lite statistics variables, whose meaning is as follows. | | - InDatagrams: Total number of received datagrams. | + InDatagrams: The total number of UDP datagrams delivered to UDP users. You are in the UDP-Lite documentation -- it should read UDP-Lite, not UDP. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/01] ipv6: RFC4214 Support (v2.5)
On Thu, Nov 29, 2007 at 07:54:59PM +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote: I've fixed up those errors. OK, patch applied to net-2.6.25. Thanks everyone! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [NET]: Fix TX bug VLAN in VLAN
On Tue, Nov 27, 2007 at 04:02:19PM +0900, Joonwoo Park wrote: [NET]: Fix TX bug VLAN in VLAN Fix misbehavior of vlan_dev_hard_start_xmit() for recursive encapsulations. Signed-off-by: Joonwoo Park [EMAIL PROTECTED] Applied to net-2.6. Thanks! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [IPV4] UDP: Always checksum even if without socket filter
On Thu, Nov 29, 2007 at 10:56:48AM +, Gerrit Renker wrote: | - InDatagrams: Total number of received datagrams. | + InDatagrams: The total number of UDP datagrams delivered to UDP users. You are in the UDP-Lite documentation -- it should read UDP-Lite, not UDP. We could just drop the mention of UDP completely: The total number of datagrams delivered to applications. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][UNIX] EOF on non-blocking SOCK_SEQPACKET
On Tue, Nov 27, 2007 at 09:33:23AM +, Florian Zumbiehl wrote: I am not absolutely sure whether this actually is a bug (as in: I've got no clue what the standards say or what other implementations do), but at least I was pretty surprised when I noticed that a recv() on a non-blocking unix domain socket of type SOCK_SEQPACKET (which is connection oriented, after all) where the remote end has closed the connection returned -1 (EAGAIN) rather than 0 to indicate end of file. I agree with your expectation. In fact, that's what POSIX says too. Since the risk of this breaking an existing application seems to be minimal, I've applied your patch to net-2.6. However, I had to reformat it so that it fits with the Linux coding style. Please take this into account for future patches. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index e835da8..060bba4 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1637,8 +1637,15 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock, mutex_lock(u-readlock); skb = skb_recv_datagram(sk, flags, noblock, err); - if (!skb) + if (!skb) { + unix_state_lock(sk); + /* Signal EOF on disconnected non-blocking SEQPACKET socket. */ + if (sk-sk_type == SOCK_SEQPACKET err == -EAGAIN + (sk-sk_shutdown RCV_SHUTDOWN)) + err = 0; + unix_state_unlock(sk); goto out_unlock; + } wake_up_interruptible_sync(u-peer_wait); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix inet_diag.ko register vs rcv race
On Tue, Nov 27, 2007 at 04:09:43PM +0300, Pavel Emelyanov wrote: The following race is possible when one cpu unregisters the handler while other one is trying to receive a message and call this one: Good catch! But I think we need a bit more to close this fully. Dumps can resume asynchronously which means that they won't be holding inet_diag_mutex. We can fix that pretty easily by giving that as our cb_mutex. So could you add that to your patch and resubmit? Arnaldo, synchronize_rcu() doesn't work on its own. Whoever accesses the object that it's supposed to protect has to use the correct RCU primitives for this to work. Synchronisation is like tango, it always takes two to make it work :) Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH (resubmit)] Fix inet_diag.ko register vs rcv race
The following race is possible when one cpu unregisters the handler while other one is trying to receive a message and call this one: CPU1: CPU2: inet_diag_rcv() inet_diag_unregister() mutex_lock(inet_diag_mutex); netlink_rcv_skb(skb, inet_diag_rcv_msg); if (inet_diag_table[nlh-nlmsg_type] == NULL) /* false handler is still registered */ ... netlink_dump_start(idiagnl, skb, nlh, inet_diag_dump, NULL); cb = kzalloc(sizeof(*cb), GFP_KERNEL); /* sleep here freeing memory * or preempt * or sleep later on nlk-cb_mutex */ spin_lock(inet_diag_register_lock); inet_diag_table[type] = NULL; ... spin_unlock(inet_diag_register_lock); synchronize_rcu(); /* CPU1 is sleeping - RCU quiescent * state is passed */ return; /* inet_diag_dump is finally called: */ inet_diag_dump() handler = inet_diag_table[cb-nlh-nlmsg_type]; BUG_ON(handler == NULL); /* OOPS! While we slept the unregister has set * handler to NULL :( */ Grep showed, that the register/unregister functions are called from init/fini module callbacks for tcp_/dccp_diag, so it's OK to use the inet_diag_mutex to synchronize manipulations with the inet_diag_table and the access to it. Besides, as Herbert pointed out, asynchronous dumps should hold this mutex as well, and thus, we provide the mutex as cb_mutex one. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index b017073..6b3fffb 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -853,8 +853,6 @@ static void inet_diag_rcv(struct sk_buff *skb) mutex_unlock(inet_diag_mutex); } -static DEFINE_SPINLOCK(inet_diag_register_lock); - int inet_diag_register(const struct inet_diag_handler *h) { const __u16 type = h-idiag_type; @@ -863,13 +861,13 @@ int inet_diag_register(const struct inet_diag_handler *h) if (type = INET_DIAG_GETSOCK_MAX) goto out; - spin_lock(inet_diag_register_lock); + mutex_lock(inet_diag_mutex); err = -EEXIST; if (inet_diag_table[type] == NULL) { inet_diag_table[type] = h; err = 0; } - spin_unlock(inet_diag_register_lock); + mutex_unlock(inet_diag_mutex); out: return err; } @@ -882,11 +880,9 @@ void inet_diag_unregister(const struct inet_diag_handler *h) if (type = INET_DIAG_GETSOCK_MAX) return; - spin_lock(inet_diag_register_lock); + mutex_lock(inet_diag_mutex); inet_diag_table[type] = NULL; - spin_unlock(inet_diag_register_lock); - - synchronize_rcu(); + mutex_unlock(inet_diag_mutex); } EXPORT_SYMBOL_GPL(inet_diag_unregister); @@ -901,7 +897,7 @@ static int __init inet_diag_init(void) goto out; idiagnl = netlink_kernel_create(init_net, NETLINK_INET_DIAG, 0, - inet_diag_rcv, NULL, THIS_MODULE); + inet_diag_rcv, inet_diag_mutex, THIS_MODULE); if (idiagnl == NULL) goto out_free_table; err = 0; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[3/4] dst: Network state machine.
Network state machine. Includes network async processing state machine and related tasks. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/drivers/block/dst/kst.c b/drivers/block/dst/kst.c new file mode 100644 index 000..ba5e5ef --- /dev/null +++ b/drivers/block/dst/kst.c @@ -0,0 +1,1475 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/kernel.h +#include linux/module.h +#include linux/list.h +#include linux/slab.h +#include linux/socket.h +#include linux/kthread.h +#include linux/net.h +#include linux/in.h +#include linux/poll.h +#include linux/bio.h +#include linux/dst.h + +#include net/sock.h + +struct kst_poll_helper +{ + poll_table pt; + struct kst_state*st; +}; + +static LIST_HEAD(kst_worker_list); +static DEFINE_MUTEX(kst_worker_mutex); + +/* + * This function creates bound socket for local export node. + */ +static int kst_sock_create(struct kst_state *st, struct saddr *addr, + int type, int proto, int backlog) +{ + int err; + + err = sock_create(addr-sa_family, type, proto, st-socket); + if (err) + goto err_out_exit; + + err = st-socket-ops-bind(st-socket, (struct sockaddr *)addr, + addr-sa_data_len); + + err = st-socket-ops-listen(st-socket, backlog); + if (err) + goto err_out_release; + + st-socket-sk-sk_allocation = GFP_NOIO; + + return 0; + +err_out_release: + sock_release(st-socket); +err_out_exit: + return err; +} + +static void kst_sock_release(struct kst_state *st) +{ + if (st-socket) { + sock_release(st-socket); + st-socket = NULL; + } +} + +void kst_wake(struct kst_state *st) +{ + if (st) { + struct kst_worker *w = st-node-w; + unsigned long flags; + + spin_lock_irqsave(w-ready_lock, flags); + if (list_empty(st-ready_entry)) + list_add_tail(st-ready_entry, w-ready_list); + spin_unlock_irqrestore(w-ready_lock, flags); + + wake_up(w-wait); + } +} +EXPORT_SYMBOL_GPL(kst_wake); + +/* + * Polling machinery. + */ +static int kst_state_wake_callback(wait_queue_t *wait, unsigned mode, + int sync, void *key) +{ + struct kst_state *st = container_of(wait, struct kst_state, wait); + kst_wake(st); + return 1; +} + +static void kst_queue_func(struct file *file, wait_queue_head_t *whead, +poll_table *pt) +{ + struct kst_state *st = container_of(pt, struct kst_poll_helper, pt)-st; + + st-whead = whead; + init_waitqueue_func_entry(st-wait, kst_state_wake_callback); + add_wait_queue(whead, st-wait); +} + +static void kst_poll_exit(struct kst_state *st) +{ + if (st-whead) { + remove_wait_queue(st-whead, st-wait); + st-whead = NULL; + } +} + +/* + * This function removes request from state tree and ordering list. + */ +void kst_del_req(struct dst_request *req) +{ + list_del_init(req-request_list_entry); +} +EXPORT_SYMBOL_GPL(kst_del_req); + +static struct dst_request *kst_req_first(struct kst_state *st) +{ + struct dst_request *req = NULL; + + if (!list_empty(st-request_list)) + req = list_entry(st-request_list.next, struct dst_request, + request_list_entry); + return req; +} + +/* + * This function dequeues first request from the queue and tree. + */ +static struct dst_request *kst_dequeue_req(struct kst_state *st) +{ + struct dst_request *req; + + mutex_lock(st-request_lock); + req = kst_req_first(st); + if (req) + kst_del_req(req); + mutex_unlock(st-request_lock); + return req; +} + +/* + * This function enqueues request into tree, indexed by start of the request, + * and also puts request into ordered queue. + */ +int kst_enqueue_req(struct kst_state *st, struct dst_request *req) +{ + if (unlikely(req-flags DST_REQ_CHECK_QUEUE)) { + struct dst_request *r; + + list_for_each_entry(r, st-request_list, request_list_entry) { + if (bio_rw(r-bio) != bio_rw(req-bio)) + continue; + + if (r-start = req-start + req-size) + continue; + +
[4/4] dst: Algorithms used in distributed storage.
Algorithms used in distributed storage. Mirror and linear mapping code. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/drivers/block/dst/alg_linear.c b/drivers/block/dst/alg_linear.c new file mode 100644 index 000..cb77b57 --- /dev/null +++ b/drivers/block/dst/alg_linear.c @@ -0,0 +1,104 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/init.h +#include linux/dst.h + +static struct dst_alg *alg_linear; + +/* + * This callback is invoked when node is removed from storage. + */ +static void dst_linear_del_node(struct dst_node *n) +{ +} + +/* + * This callback is invoked when node is added to storage. + */ +static int dst_linear_add_node(struct dst_node *n) +{ + struct dst_storage *st = n-st; + + dprintk(%s: disk_size: %llu, node_size: %llu.\n, + __func__, st-disk_size, n-size); + + mutex_lock(st-tree_lock); + n-start = st-disk_size; + st-disk_size += n-size; + mutex_unlock(st-tree_lock); + + return 0; +} + +static int dst_linear_remap(struct dst_request *req) +{ + int err; + + if (req-node-bdev) { + generic_make_request(req-bio); + return 0; + } + + err = kst_check_permissions(req-state, req-bio); + if (err) + return err; + + return req-state-ops-push(req); +} + +/* + * Failover callback - it is invoked each time error happens during + * request processing. + */ +static int dst_linear_error(struct kst_state *st, int err) +{ + if (err) + set_bit(DST_NODE_FROZEN, st-node-flags); + else + clear_bit(DST_NODE_FROZEN, st-node-flags); + return 0; +} + +static struct dst_alg_ops alg_linear_ops = { + .remap = dst_linear_remap, + .add_node = dst_linear_add_node, + .del_node = dst_linear_del_node, + .error = dst_linear_error, + .owner = THIS_MODULE, +}; + +static int __devinit alg_linear_init(void) +{ + alg_linear = dst_alloc_alg(alg_linear, alg_linear_ops); + if (!alg_linear) + return -ENOMEM; + + return 0; +} + +static void __devexit alg_linear_exit(void) +{ + dst_remove_alg(alg_linear); +} + +module_init(alg_linear_init); +module_exit(alg_linear_exit); + +MODULE_LICENSE(GPL); +MODULE_AUTHOR(Evgeniy Polyakov [EMAIL PROTECTED]); +MODULE_DESCRIPTION(Linear distributed algorithm.); diff --git a/drivers/block/dst/alg_mirror.c b/drivers/block/dst/alg_mirror.c new file mode 100644 index 000..55cf59c --- /dev/null +++ b/drivers/block/dst/alg_mirror.c @@ -0,0 +1,1122 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/init.h +#include linux/poll.h +#include linux/dst.h + +struct dst_mirror_node_data +{ + u64 age; +}; + +struct dst_mirror_priv +{ + unsigned intchunk_num; + + u64 last_start; + + spinlock_t backlog_lock; + struct list_headbacklog_list; + + struct dst_mirror_node_data old_data, new_data; + + unsigned long *chunk; +}; + +static struct dst_alg *alg_mirror; +static struct bio_set *dst_mirror_bio_set; + +static int dst_mirror_resync(struct dst_node *n, int ndp); + +static void dst_mirror_mark_sync(struct dst_node *n) +{ + if (test_bit(DST_NODE_NOTSYNC, n-flags)) { + struct dst_mirror_priv *priv = n-priv; + + clear_bit(DST_NODE_NOTSYNC, n-flags); + dprintk(%s: node: %p, %llu:%llu synchronization + has been completed.\n, + __func__, n, n-start, n-size); + priv-old_data.age = 0; + } +} + +static void dst_mirror_mark_notsync(struct
[1/4] dst: Distributed storage documentation.
Distributed storage documentation. Algorithms used in the system, userspace interfaces (sysfs dirs and files), design and implementation details are described here. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/Documentation/dst/algorithms.txt b/Documentation/dst/algorithms.txt new file mode 100644 index 000..1437a6a --- /dev/null +++ b/Documentation/dst/algorithms.txt @@ -0,0 +1,115 @@ +Each storage by itself is just a set of contiguous logical blocks, with +allowed number of operations. Nodes, each of which has own start and size, +are placed into storage by appropriate algorithm, which remaps +logical sector number into real node's sector. One can create +own algorithms, since DST has pluggable interface for that. +Currently mirrored and linear algorithms are supported. + +Let's briefly describe how they work. + +Linear algorithm. +Simple approach of concatenating storages into single device with +increased size is used in this algorithm. Essentially new device +has size equal to sum of sizes of underlying nodes and nodes are +placed one after another. + + /- Node 1 ---\ /-- Node 3 \ +start end start end + |==||==| + |start end | + | \--- Node 2 -/ | + | | +start end + \-- DST storage --/ + + /\ + || + || + + IO operations + + Figure 1. + 3 nodes combined into single storage using linear algorithm. + +Mirror algorithm. +In this algorithms nodes are placed under each other, so when +operation comes to the first one, it can be mirrored to all +underlying nodes. In case of reading, actual data is obtained from +the nearest node - algoritm keeps track of previous operation +and knows where it was stopped, so that subsequent seek to the +start of the new request will take the shortest time. +Writing is always mirrored to all underlying nodes. + + IO operations + || + || + \/ + +| DST storage ---| +| prev position | +|---| Node 1 | +| prev pos | +| Node 2 -|--| +|prev pos| +|---| Node 3 | + + Figure 2. + 3 nodes combined into single storage using mirror algorithm. + +Each algorithm must implement number of callbacks, +which must be registered during initialization time. + +struct dst_alg_ops +{ + int (*add_node)(struct dst_node *n); + void(*del_node)(struct dst_node *n); + int (*remap)(struct dst_request *req); + int (*error)(struct kst_state *state, int err); + struct module *owner; +}; + [EMAIL PROTECTED] +This callback is invoked when new node is being added into the storage, +but before node is actually added into the storage, so that it could +be accessed from it. When it is called, all appropriate initialization +of the underlying device is already completed (system has been connected +to remote node or got a reference to the local block device). At this +stage algorithm can add node into private map. +It must return zero on success or negative value otherwise. + [EMAIL PROTECTED] +This callback is invoked when node is being deleted from the storage, +i.e. when its reference counter hits zero. It is called before +any cleaning is performed. +It must return zero on success or negative value otherwise. + [EMAIL PROTECTED] +This callback is invoked each time new bio hits the storage. +Request structure contains BIO itself, pointer to the node, which originally +stores the whole region under given IO request, and various parameters +used by storage core to process this block request. +It must return zero on success or negative value otherwise. It is upto +this method to call all cleaning if remapping failed, for example it must +call kst_bio_endio() for given callback in case of error, which in turn +will call bio_endio(). Note, that dst_request structure provided in this +callback is allocated on stack, so if there is a need to use it outside +of the given function, it must be cloned (it will happen automatically +in state's push callback, but that copy will not be shared by any other +user). + [EMAIL PROTECTED] +This callback is invoked for each error, which happend when processed
Re: [PATCH][BRIDGE] Lost call to br_fdb_fini() in br_init() error path
On Tue, Nov 27, 2007 at 05:39:42PM +0300, Pavel Emelyanov wrote: In case the br_netfilter_init() (or any subsequent call) fails, the br_fdb_fini() must be called to free the allocated in br_fdb_init() br_fdb_cache kmem cache. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Patch applied to net-2.6. Thanks Pavel! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix inet_diag.ko register vs rcv race
Em Thu, Nov 29, 2007 at 11:37:34PM +1100, Herbert Xu escreveu: On Tue, Nov 27, 2007 at 04:09:43PM +0300, Pavel Emelyanov wrote: The following race is possible when one cpu unregisters the handler while other one is trying to receive a message and call this one: Good catch! But I think we need a bit more to close this fully. Dumps can resume asynchronously which means that they won't be holding inet_diag_mutex. We can fix that pretty easily by giving that as our cb_mutex. So could you add that to your patch and resubmit? Arnaldo, synchronize_rcu() doesn't work on its own. Whoever accesses the object that it's supposed to protect has to use the correct RCU primitives for this to work. Synchronisation is like tango, it always takes two to make it work :) Agreed, I didn't checked that when refactoring inet_diag, leaving this as it was before I put my hands on it :-) - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 iptables]Interface group match
Lutz Jaenicke írta: On Tue, Nov 20, 2007 at 02:14:28PM +0100, Laszlo Attila Toth wrote: Interface group values can be checked on both input and output interfaces with optional mask. Index: extensions/libxt_ifgroup.c === --- extensions/libxt_ifgroup.c (revision 0) +++ extensions/libxt_ifgroup.c (revision 0) + info-in_group = strtoul(optarg, end, 0); This is somewhat inconsistent with the iproute patch which targets specific groups (with names). Should iptables be allowed to read /etc/iproute2/rt_ifgroup? It would be good but cannot be used if a mask is set and only values less than 256 can be used with names. There is no standard API like getservbyname()... The code of iproute2 should be copied. If Patrick says it is ok, I'll write this part. I do have a draft patch for physdev which is however against iptables-1.3.8 and linux-2.6.19 so it will need some more work but I will attach it for discussion. Thanks. I will send soon for net-2.6.25 and iptables svn version. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[0/4] dst: Distributed storage.
Distributed storage. I'm pleased to announce the 9'th release of the distributed storage subsystem (DST). This is maintenance release and include bug fixing only. DST allows to form a storage on top of local and remote nodes and combine them into linear or mirroring setup, which in turn can be exported to remote nodes. Short changelog: * use node's size in sectors instead of bytes * fixed old/new ages for the first node. Error spotted by Matthew Hodgson [EMAIL PROTECTED] * fixed debug printk declaration * it is now called 'astonishingly screwed tapeworm' Overall list of features of the DST can be found on project's homepage: http://tservice.net.ru/~s0mbre/old/?section=projectsitem=dst Thank you. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[2/4] dst: Core distributed storage files.
Core distributed storage files. Include userspace interfaces, initialization, block layer bindings and other core functionality. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index b4c8319..ca6592d 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -451,6 +451,8 @@ config ATA_OVER_ETH This driver provides Support for ATA over Ethernet block devices like the Coraid EtherDrive (R) Storage Blade. +source drivers/block/dst/Kconfig + source drivers/s390/block/Kconfig endmenu diff --git a/drivers/block/Makefile b/drivers/block/Makefile index dd88e33..fcf042d 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -29,3 +29,4 @@ obj-$(CONFIG_VIODASD) += viodasd.o obj-$(CONFIG_BLK_DEV_SX8) += sx8.o obj-$(CONFIG_BLK_DEV_UB) += ub.o +obj-$(CONFIG_DST) += dst/ diff --git a/drivers/block/dst/Kconfig b/drivers/block/dst/Kconfig new file mode 100644 index 000..d35e0cc --- /dev/null +++ b/drivers/block/dst/Kconfig @@ -0,0 +1,21 @@ +config DST + tristate Distributed storage + depends on NET + select CONNECTOR + select LIBCRC32C + ---help--- + This driver allows to create a distributed storage. + +config DST_ALG_LINEAR + tristate Linear distribution algorithm + depends on DST + ---help--- + This module allows to create linear mapping of the nodes + in the distributed storage. + +config DST_ALG_MIRROR + tristate Mirror distribution algorithm + depends on DST + ---help--- + This module allows to create a mirror of the noes in the + distributed storage. diff --git a/drivers/block/dst/Makefile b/drivers/block/dst/Makefile new file mode 100644 index 000..1400e94 --- /dev/null +++ b/drivers/block/dst/Makefile @@ -0,0 +1,6 @@ +obj-$(CONFIG_DST) += dst.o + +dst-y := dcore.o kst.o + +obj-$(CONFIG_DST_ALG_LINEAR) += alg_linear.o +obj-$(CONFIG_DST_ALG_MIRROR) += alg_mirror.o diff --git a/drivers/block/dst/dcore.c b/drivers/block/dst/dcore.c new file mode 100644 index 000..06d0810 --- /dev/null +++ b/drivers/block/dst/dcore.c @@ -0,0 +1,1608 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/init.h +#include linux/blkdev.h +#include linux/bio.h +#include linux/slab.h +#include linux/connector.h +#include linux/socket.h +#include linux/dst.h +#include linux/device.h +#include linux/in.h +#include linux/in6.h +#include linux/buffer_head.h + +#include net/sock.h + +static LIST_HEAD(dst_storage_list); +static LIST_HEAD(dst_alg_list); +static DEFINE_MUTEX(dst_storage_lock); +static DEFINE_MUTEX(dst_alg_lock); +static int dst_major; +static struct kst_worker *kst_main_worker; +static struct cb_id cn_dst_id = { CN_DST_IDX, CN_DST_VAL }; + +struct kmem_cache *dst_request_cache; + +static char dst_name[] = Astonishingly screwed tapeworm; + +/* + * DST sysfs tree. For device called 'storage' which is formed + * on top of two nodes this looks like this: + * + * /sys/devices/storage/ + * /sys/devices/storage/alg : alg_linear + * /sys/devices/storage/n-800/type : R: 192.168.4.80:1025 + * /sys/devices/storage/n-800/size : 800 + * /sys/devices/storage/n-800/start : 800 + * /sys/devices/storage/n-800/clean + * /sys/devices/storage/n-800/dirty + * /sys/devices/storage/n-0/type : R: 192.168.4.81:1025 + * /sys/devices/storage/n-0/size : 800 + * /sys/devices/storage/n-0/start : 0 + * /sys/devices/storage/n-0/clean + * /sys/devices/storage/n-0/dirty + * /sys/devices/storage/remove_all_nodes + * /sys/devices/storage/nodes : sectors (start [size]): 0 [800] | 800 [800] + * /sys/devices/storage/name : storage + */ + +static int dst_dev_match(struct device *dev, struct device_driver *drv) +{ + return 1; +} + +static void dst_dev_release(struct device *dev) +{ +} + +static struct bus_type dst_dev_bus_type = { + .name = dst, + .match = dst_dev_match, +}; + +static struct device dst_dev = { + .bus= dst_dev_bus_type, + .release= dst_dev_release +}; + +static void dst_node_release(struct device *dev) +{ +} + +static struct device dst_node_dev = { + .release= dst_node_release +}; + +static void dst_free_alg(struct dst_alg *alg) +{ + kfree(alg); +} + +/* + * Algorithm is never freed directly, + * since its
Re: [PATCH (resubmit)][BRIDGE] Properly dereference the br_should_route_hook
On Tue, Nov 27, 2007 at 07:21:08PM +0300, Pavel Emelyanov wrote: This hook is protected with the RCU, so simple if (br_should_route_hook) br_should_route_hook(...) is not enough on some architectures. Use the rcu_dereference/rcu_assign_pointer in this case. Fixed Stephen's comment concerning using the typeof(). Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Applied to net-2.6. Thanks Pavel! static void __exit ebtable_broute_fini(void) { - br_should_route_hook = NULL; + rcu_assign_pointer(br_should_route_hook, NULL); Just for the record, rcu_assign_pointer is never necessary when you're assigning NULL. The reason is that rcu_assign_pointer serves as a barrier between the initialisation of the content of what you're assigning and the actual assignment. Since NULL does not need to be initialised you don't need the barrier :) Hmm, perhaps we could even build this logic into rcu_assign_pointer. Then again, who still uses an Alpha? Mine died years ago :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
On Tue, Nov 27, 2007 at 09:16:07AM -0800, H. Peter Anvin wrote: I wrote a patch for the IP stack to realign packets if necessary at one point. I should dredge it up again and submit it for collective flamage. As long as it doesn't penalise Ethernet (e.g., the 10Gb crowd :) it would be good to have. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH (resubmit)] Fix inet_diag.ko register vs rcv race
On Thu, Nov 29, 2007 at 04:01:25PM +0300, Pavel Emelyanov wrote: Besides, as Herbert pointed out, asynchronous dumps should hold this mutex as well, and thus, we provide the mutex as cb_mutex one. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Thanks for the quick response! Patch applied to net-2.6. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Nicer WARN_ON in netstat_show
On Wed, Nov 28, 2007 at 01:11:24PM +0300, Pavel Emelyanov wrote: The if (statement) WARN_ON(1); looks much better as WARN_ON(statement); Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Applied to net-2.6.25. Thanks. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + xfrm_policy-warning-fix.patch added to -mm tree
On Wed, Nov 28, 2007 at 02:56:51AM -0800, [EMAIL PROTECTED] wrote: The patch titled xfrm_policy warning fix has been added to the -mm tree. Its filename is xfrm_policy-warning-fix.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: xfrm_policy warning fix From: Andrew Morton [EMAIL PROTECTED] Fix this: net/xfrm/xfrm_policy.c: In function '__xfrm_lookup': net/xfrm/xfrm_policy.c:1449: warning: 'dst' may be used uninitialized in this function by checking for impossible values in the switch(). Thanks Andrew. I've added the following patch to net-2.6. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- commit 5e5234ff17ef98932688116025b30958bd28a940 Author: Herbert Xu [EMAIL PROTECTED] Date: Fri Nov 30 00:50:31 2007 +1100 [IPSEC]: Fix uninitialised dst warning in __xfrm_lookup Andrew Morton reported that __xfrm_lookup generates this warning: net/xfrm/xfrm_policy.c: In function '__xfrm_lookup': net/xfrm/xfrm_policy.c:1449: warning: 'dst' may be used uninitialized in this function This is because if policy-action is of an unexpected value then dst will not be initialised. Of course, in practice this should never happen since the input layer xfrm_user/af_key will filter out all illegal values. But the compiler doesn't know that of course. So this patch fixes this by taking the conservative approach and treat all unknown actions the same as a blocking action. Thanks to Andrew for finding this and providing an initial fix. Signed-off-by: Herbert Xu [EMAIL PROTECTED] diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index b702bd8..9a4cf2e 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1344,6 +1344,7 @@ restart: xfrm_nr += pols[0]-xfrm_nr; switch (policy-action) { + default: case XFRM_POLICY_BLOCK: /* Prohibit the flow */ err = -EPERM; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH 2.6.24-rc3-mm2 - build breakage - bnx2x depends on ZLIB_INFLATE
On Wed, 2007-11-28 at 12:25 -0800, Lee Schermerhorn wrote: Couldn't find one of these on the lists... PATCH 2.6.24-rc3-mm1: bnx2x depends on ZLIB_INFLATE The bnx2x module depends on the zlib_inflate functions. The build will fail if ZLIB_INFLATE has not been selected manually or by building another module that automatically selects it. Modify BNX2X config option to 'select ZLIB_INFLATE' like BNX2 and others. This seems to fix it. Signed-off-by: Lee Schermerhorn [EMAIL PROTECTED] You are right. My mistake, Thanks. Acked-by: Eliezer Tamir [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] XFRM: SPD auditing fix to include the netmask/prefix-length
On Thursday 29 November 2007 5:34:59 am Herbert Xu wrote: On Mon, Nov 26, 2007 at 07:55:12PM +, Paul Moore wrote: Currently the netmask/prefix-length of an IPsec SPD entry is not included in any of the SPD related audit messages. This can cause a problem when the audit log is examined as the netmask/prefix-length is vital in determining what network traffic is affected by a particular SPD entry. This patch fixes this problem by adding two additional fields, src_prefixlen and dst_prefixlen, to the SPD audit messages to indicate the source and destination netmasks. These new fields are only included in the audit message when the netmask/prefix-length is less than the address length, i.e. the SPD entry applies to a network address and not a host address. Any reason why we don't just always include them? The audit folks seem to be very sensitive to the size/length of the audit messages, they prefer they be as small as possible. I thought that one way to save space would be to only print the prefix length information when the address referred to a network and not a single host. Would you prefer it if the prefix length information was always included in the audit message? Joy? Audit folks? -- paul moore linux security @ hp - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25 (v2)] [TCP]: Two fixes to new sacktag code
On Wed, Nov 28, 2007 at 06:52:51PM +0200, Ilpo Järvinen wrote: On Wed, 28 Nov 2007, Ilpo Järvinen wrote: @@ -1575,7 +1575,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ continue; } - if (!before(start_seq, tcp_highest_sack_seq(tp))) { + if (tp-sacked_out after(start_seq, tcp_highest_sack_seq(tp))) { In this v2, of this patch I'll put !before back here instead, change to after is unnecessary here, checking that a sacked skb exists is enough to avoid skipping in the problematic head case. Thanks Ilpo, patch applied to net-2.6.25. BTW, how's the RB-tree stuff coming along? David wanted it to be in as early as possible so that it gets the maximum amount of testing before the next merge window. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp-illinois: incorrect beta usage
On Wed, Nov 28, 2007 at 03:47:25PM -0800, Stephen Hemminger wrote: Lachlan Andrew observed that my TCP-Illinois implementation uses the beta value incorrectly: The parameter beta in the paper specifies the amount to decrease *by*: that is, on loss, W - W - beta*W but in tcp_illinois_ssthresh() uses beta as the amount to decrease *to*: W - beta*W This bug makes the Linux TCP-Illinois get less-aggressive on uncongested network, hurting performance. Note: since the base beta value is .5, it has no impact on a congested network. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] Applied to net-2.6. Thanks Stephen! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
net-2.6.25 rebased
Hi: Just to let you all know that I've just rebased net-2.6.25 so that it now contains all of net-2.6 as it currently stands. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET 00/02]: Remove NET_ACT_NAT dependency on NETFILTER
On Thu, Nov 29, 2007 at 10:57:34AM +0100, Patrick McHardy wrote: These patches remove the dependency of NET_ACT_NAT on NETFILTER by moving the netfilter checksum helpers to include/net/checksum and net/core/utils.c. I didn't find more appropriate locations, but I'd happily change it if someone suggests something better. Looks good to me. I've applied both to net-2.6.25. Thanks Patrick! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET: parentheses around definitions
There are multiplictions wherein these defines are abused in: drivers/net/netxen/netxen_nic_ethtool.c:705 drivers/net/s2io.c:350 -- Add parentheses to prevent operator precedence errors Signed-off-by: Roel Kluin [EMAIL PROTECTED] --- diff --git a/drivers/net/netxen/netxen_nic_ethtool.c b/drivers/net/netxen/netxen_nic_ethtool.c index cfb847b..b3c0a00 100644 --- a/drivers/net/netxen/netxen_nic_ethtool.c +++ b/drivers/net/netxen/netxen_nic_ethtool.c @@ -86,7 +86,7 @@ static const char netxen_nic_gstrings_test[][ETH_GSTRING_LEN] = { Link_Test_on_offline }; -#define NETXEN_NIC_TEST_LEN sizeof(netxen_nic_gstrings_test) / ETH_GSTRING_LEN +#define NETXEN_NIC_TEST_LEN (sizeof(netxen_nic_gstrings_test) / ETH_GSTRING_LEN) #define NETXEN_NIC_REGS_COUNT 42 #define NETXEN_NIC_REGS_LEN (NETXEN_NIC_REGS_COUNT * sizeof(__le32)) diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c index 6326667..379d70b 100644 --- a/drivers/net/s2io.c +++ b/drivers/net/s2io.c @@ -346,7 +346,7 @@ static char ethtool_driver_stats_keys[][ETH_GSTRING_LEN] = { #define XFRAME_I_STAT_STRINGS_LEN ( XFRAME_I_STAT_LEN * ETH_GSTRING_LEN ) #define XFRAME_II_STAT_STRINGS_LEN ( XFRAME_II_STAT_LEN * ETH_GSTRING_LEN ) -#define S2IO_TEST_LEN sizeof(s2io_gstrings) / ETH_GSTRING_LEN +#define S2IO_TEST_LEN (sizeof(s2io_gstrings) / ETH_GSTRING_LEN) #define S2IO_STRINGS_LEN S2IO_TEST_LEN * ETH_GSTRING_LEN #define S2IO_TIMER_CONF(timer, handle, arg, exp) \ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] SCTP bug fixes for net-2.6
Hi Herbert The following changes since commit a357dde9df33f28611e6a3d4f88265e39bcc8880: Stephen Hemminger (1): [TCP] illinois: Incorrect beta usage are available in the git repository at: aster.kernel.org:/pub/scm/linux/kernel/git/vxy/lksctp-dev.git pending Vlad Yasevich (5): SCTP: Fix the number of HB transmissions. SCTP: Fix SCTP-AUTH to correctly add HMACS paramter. SCTP: Fix the supported extensions paramter SCTP: Fix chunk acceptance when no authenticated chunks were listed. SCTP: Fix build issues with SCTP AUTH. include/net/sctp/constants.h |9 ++--- net/sctp/Kconfig |6 +++--- net/sctp/auth.c |4 +++- net/sctp/sm_make_chunk.c | 25 + net/sctp/sm_statefuns.c |2 +- 5 files changed, 22 insertions(+), 24 deletions(-) Thanks -vlad - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH (resubmit)][BRIDGE] Properly dereference the br_should_route_hook
On Fri, Nov 30, 2007 at 12:04:20AM +1100, Herbert Xu wrote: On Tue, Nov 27, 2007 at 07:21:08PM +0300, Pavel Emelyanov wrote: This hook is protected with the RCU, so simple if (br_should_route_hook) br_should_route_hook(...) is not enough on some architectures. Use the rcu_dereference/rcu_assign_pointer in this case. Fixed Stephen's comment concerning using the typeof(). Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Applied to net-2.6. Thanks Pavel! static void __exit ebtable_broute_fini(void) { - br_should_route_hook = NULL; + rcu_assign_pointer(br_should_route_hook, NULL); Just for the record, rcu_assign_pointer is never necessary when you're assigning NULL. The reason is that rcu_assign_pointer serves as a barrier between the initialisation of the content of what you're assigning and the actual assignment. Since NULL does not need to be initialised you don't need the barrier :) Of course, if the rcu_assign_pointer() of NULL is not on a hot code path, the extra memory barrier might not be hurting enough to care. Hmm, perhaps we could even build this logic into rcu_assign_pointer. That certainly is an interesting tradeoff... Save a memory barrier when assigning NULL, but pay an extra test and branch in all cases. Though it does make for a simpler rule -- just use rcu_assign_pointer() in all cases. Of course, if almost all rcu_assign_pointer() executions assign non-NULL pointers, the optimal strategy would be to leave the implementation of rcu_assign_pointer() alone, and simply enforce use of rcu_assign_pointer(), even if the pointer being assigned is NULL. For a rough guess, if fewer than a few percent of rcu_assign_pointer() executions assign NULL, then it is best to simply change the rule. If more than about ten percent of rcu_assign_pointer() executions assign NULL, then it would make sense to put the check into the rcu_assign_pointer() primitive. The percentages would be of dynamic executions, rather than static counts of lines of code. So, any intuitions on what fraction of the time rcu_assign_pointer() is assigning NULL? Failing that, what workload should be used to take the measurements? ;-) Then again, who still uses an Alpha? Mine died years ago :) Although rcu_dereference() does a memory barrier only on Alpha, that of rcu_assign_pointer() is needed on any machine that does not preserve store order (Itanium, POWER, ARM, some MIPS boxes according to rumor, ...). Thanx, Paul Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv7 0/5 + 3] Interface group patches
Hello, This is the 7th version of our interface group patches. The interface group value can be used to manage different interfaces at the same time such as in netfilter/iptables. As earlier discussed, it can be used for advanced routing, tc command and so on [1]. An u_int32_t member was added to net devices indicating the interface group number of the device which can be get/set via netlink. The xt_ifgroup netfilter match is for checking this value with an optional mask. Changes: - The first patch of the previous version splitted into 2 separate patches. - The ip command now let values larger than 0xff be set, octal, decimal and hexadecimal values are valid and in the range of 0x00-0xff any name can be used (from /etc/iproute2/rt_ifgroup). - added sysfs support to read/write the ifgroup value Other patches are for userpace programs: * iptables * iproute2. Because kernel 2.6.24-rc1 introduced a new enum value, IFLA_NET_NS_PID, and it wasn't in the iproute2 code, the first patch simply adds this value. The second patch adds support of interface group. Usage: ip link set eth0 group 684# set ip link set eth0 group 0 # unset iptables -A INPUT -m ifgroup --ifgroup-in 4/0xf -j ACCEPT iptables -A FORWARD -m ifgroup --ifgroup-in 4 ! --ifgroup-out 5 -j DROP Patches: [1/5] Remove unnecessary locks from rtnetlink (in do_setlink) [2/5] rtnetlink: send a single notification on device state changes [3/5] Interface group: core (netlink) part [4/5] Ifgroup read/write support in sysfs [5/5] Netfilter Interface group match [iptables]Interface group match [iproute2 1/2] Added IFLA_NET_NS_PID as in kernel v2.6.24-rc1 [iproute2 2/2] Interface group as new ip link option Rererences: [1] http://marc.info/?l=linux-netdevm=119556459514598w=2 -- Laszlo Attila Toth - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv7 3/5] Interface group: core (netlink) part
Interface groups let handle different interfaces together. Modified net device structure and netlink interface. Signed-off-by: Laszlo Attila Toth [EMAIL PROTECTED] --- include/linux/if_link.h |2 ++ include/linux/netdevice.h |2 ++ net/core/rtnetlink.c | 11 +++ 3 files changed, 15 insertions(+), 0 deletions(-) diff --git a/include/linux/if_link.h b/include/linux/if_link.h index 84c3492..722b25c 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -79,6 +79,8 @@ enum IFLA_LINKINFO, #define IFLA_LINKINFO IFLA_LINKINFO IFLA_NET_NS_PID, + IFLA_IFGROUP, +#define IFLA_IFGROUP IFLA_IFGROUP __IFLA_MAX }; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 1e6af4f..b1bdcb2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -519,6 +519,8 @@ struct net_device /* Interface index. Unique device identifier*/ int ifindex; int iflink; + /* interface group this interface belongs to */ + u_int32_t ifgroup; struct net_device_stats* (*get_stats)(struct net_device *dev); diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 6be8608..61c7367 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -614,6 +614,7 @@ static inline size_t if_nlmsg_size(const struct net_device *dev) + nla_total_size(4) /* IFLA_MTU */ + nla_total_size(4) /* IFLA_LINK */ + nla_total_size(4) /* IFLA_MASTER */ + + nla_total_size(4) /* IFLA_IFGROUP */ + nla_total_size(1) /* IFLA_OPERSTATE */ + nla_total_size(1) /* IFLA_LINKMODE */ + rtnl_link_get_size(dev); /* IFLA_LINKINFO */ @@ -651,6 +652,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev, if (dev-master) NLA_PUT_U32(skb, IFLA_MASTER, dev-master-ifindex); + if (dev-ifgroup) + NLA_PUT_U32(skb, IFLA_IFGROUP, dev-ifgroup); + if (dev-qdisc_sleeping) NLA_PUT_STRING(skb, IFLA_QDISC, dev-qdisc_sleeping-ops-id); @@ -889,6 +893,13 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, } } + if (tb[IFLA_IFGROUP]) { + if (dev-ifgroup != nla_get_u32(tb[IFLA_IFGROUP])) { + dev-ifgroup = nla_get_u32(tb[IFLA_IFGROUP]); + modified = 1; + } + } + err = 0; errout: - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv7 5/5] Netfilter Interface group match
Interface group values can be checked on both input and output interfaces. Signed-off-by: Laszlo Attila Toth [EMAIL PROTECTED] --- include/linux/netfilter/xt_ifgroup.h | 17 + net/netfilter/Kconfig| 10 +++ net/netfilter/Makefile |1 net/netfilter/xt_ifgroup.c | 120 ++ 4 files changed, 148 insertions(+), 0 deletions(-) diff --git a/include/linux/netfilter/xt_ifgroup.h b/include/linux/netfilter/xt_ifgroup.h new file mode 100644 index 000..3aa4d61 --- /dev/null +++ b/include/linux/netfilter/xt_ifgroup.h @@ -0,0 +1,17 @@ +#ifndef _XT_IFGROUP_H +#define _XT_IFGROUP_H + +#define XT_IFGROUP_INVERT_IN 0x01 +#define XT_IFGROUP_INVERT_OUT 0x02 +#define XT_IFGROUP_MATCH_IN0x04 +#define XT_IFGROUP_MATCH_OUT 0x08 + +struct xt_ifgroup_info { + u_int32_t in_group; + u_int32_t in_mask; + u_int32_t out_group; + u_int32_t out_mask; + u_int8_t flags; +}; + +#endif /*_XT_IFGROUP_H*/ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 21a9fcc..07ee4a7 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -508,6 +508,16 @@ config NETFILTER_XT_MATCH_HELPER To compile it as a module, choose M here. If unsure, say Y. +config NETFILTER_XT_MATCH_IFGROUP + tristate 'ifgroup interface group match support' + depends on NETFILTER_XTABLES + help + Interface group matching allows you to match a packet by + its incoming interface group, settable using ip link set + group + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_LENGTH tristate 'length match support' depends on NETFILTER_XTABLES diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index ad0e36e..5107c86 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -61,6 +61,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_DSCP) += xt_dscp.o obj-$(CONFIG_NETFILTER_XT_MATCH_ESP) += xt_esp.o obj-$(CONFIG_NETFILTER_XT_MATCH_HASHLIMIT) += xt_hashlimit.o obj-$(CONFIG_NETFILTER_XT_MATCH_HELPER) += xt_helper.o +obj-$(CONFIG_NETFILTER_XT_MATCH_IFGROUP) += xt_ifgroup.o obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o diff --git a/net/netfilter/xt_ifgroup.c b/net/netfilter/xt_ifgroup.c new file mode 100644 index 000..712ee54 --- /dev/null +++ b/net/netfilter/xt_ifgroup.c @@ -0,0 +1,120 @@ +/* + * An x_tables match module to match interface groups + * + * (C) 2006,2007 Balazs Scheidler [EMAIL PROTECTED], + * Laszlo Attila Toth [EMAIL PROTECTED] + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include linux/module.h +#include linux/skbuff.h + +#include linux/netfilter/xt_ifgroup.h +#include linux/netfilter/x_tables.h + +MODULE_LICENSE(GPL); +MODULE_AUTHOR(Laszlo Attila Toth [EMAIL PROTECTED]); +MODULE_DESCRIPTION(Xtables interface group matching module); +MODULE_ALIAS(ipt_ifgroup); +MODULE_ALIAS(ip6t_ifgroup); + + +static inline bool +ifgroup_match_in(const struct net_device *in, +const struct xt_ifgroup_info *info) +{ + return ((in-ifgroup info-in_mask) == info-in_group) ^ + ((info-flags XT_IFGROUP_INVERT_IN) == XT_IFGROUP_INVERT_IN); +} + +static inline bool +ifgroup_match_out(const struct net_device *out, +const struct xt_ifgroup_info *info) +{ + return ((out-ifgroup info-out_mask) == info-out_group) ^ + ((info-flags XT_IFGROUP_INVERT_OUT) == XT_IFGROUP_INVERT_OUT); +} + +static bool +ifgroup_match(const struct sk_buff *skb, +const struct net_device *in, +const struct net_device *out, +const struct xt_match *match, +const void *matchinfo, +int offset, +unsigned int protoff, +bool *hotdrop) +{ + const struct xt_ifgroup_info *info = matchinfo; + + if (info-flags XT_IFGROUP_MATCH_IN !ifgroup_match_in(in, info)) + return false; + if (info-flags XT_IFGROUP_MATCH_OUT !ifgroup_match_out(out, info)) + return false; + + return true; +} + +static bool ifgroup_checkentry(const char *tablename, const void *ip_void, + const struct xt_match *match, + void *matchinfo, unsigned int hook_mask) +{ + struct xt_ifgroup_info *info = matchinfo; + + if (!(info-flags (XT_IFGROUP_MATCH_IN|XT_IFGROUP_MATCH_OUT))) { + printk(KERN_ERR xt_ifgroup: neither incoming nor + outgoing device selected\n); + return false; + } + if (hook_mask (1 NF_INET_PRE_ROUTING | 1 NF_INET_LOCAL_IN) +
[PATCHv7 2/5] rtnetlink: send a single notification on device state changes
In do_setlink() a single ntification is sent at the end of the function if any modification occured. If the address has been changed, another notification is sent. Signed-off-by: Laszlo Attila Toth [EMAIL PROTECTED] --- net/core/rtnetlink.c | 27 --- 1 files changed, 20 insertions(+), 7 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index f95c6c5..6be8608 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -542,7 +542,7 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst, u32 id, EXPORT_SYMBOL_GPL(rtnl_put_cacheinfo); -static void set_operstate(struct net_device *dev, unsigned char transition) +static int set_operstate(struct net_device *dev, unsigned char transition) { unsigned char operstate = dev-operstate; @@ -563,8 +563,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition) if (dev-operstate != operstate) { dev-operstate = operstate; - netdev_state_change(dev); - } + return 1; + } else + return 0; } static void copy_rtnl_link_stats(struct rtnl_link_stats *a, @@ -858,6 +859,7 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, if (tb[IFLA_BROADCAST]) { nla_memcpy(dev-broadcast, tb[IFLA_BROADCAST], dev-addr_len); send_addr_notify = 1; + modified = 1; } if (ifm-ifi_flags || ifm-ifi_change) { @@ -870,14 +872,21 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, dev_change_flags(dev, flags); } - if (tb[IFLA_TXQLEN]) - dev-tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]); + if (tb[IFLA_TXQLEN]) { + if (dev-tx_queue_len != nla_get_u32(tb[IFLA_TXQLEN])) { + dev-tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]); + modified = 1; + } + } if (tb[IFLA_OPERSTATE]) - set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE])); + modified |= set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE])); if (tb[IFLA_LINKMODE]) { - dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]); + if (dev-link_mode != nla_get_u8(tb[IFLA_LINKMODE])) { + dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]); + modified = 1; + } } err = 0; @@ -891,6 +900,10 @@ errout: if (send_addr_notify) call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + + if (modified) + netdev_state_change(dev); + return err; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv7 iproute2 2/2] Interface group as new ip link option
Interfaces can be grouped and each group has an unique positive integer ID. It can be set via ip link. Symbolic names can be specified in /etc/iproute2/rt_ifgroup. Any value of unsigned int32 is valid. Signed-off-by: Laszlo Attila Toth [EMAIL PROTECTED] diff --git a/include/linux/if_link.h b/include/linux/if_link.h index c948395..5a2d071 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -79,6 +79,8 @@ enum IFLA_LINKINFO, #define IFLA_LINKINFO IFLA_LINKINFO IFLA_NET_NS_PID, + IFLA_IFGROUP, +#defineIFLA_IFGROUP IFLA_IFGROUP __IFLA_MAX }; diff --git a/include/rt_names.h b/include/rt_names.h index 07a10e0..ea2d46a 100644 --- a/include/rt_names.h +++ b/include/rt_names.h @@ -8,11 +8,13 @@ char* rtnl_rtscope_n2a(int id, char *buf, int len); char* rtnl_rttable_n2a(__u32 id, char *buf, int len); char* rtnl_rtrealm_n2a(int id, char *buf, int len); char* rtnl_dsfield_n2a(int id, char *buf, int len); +char* rtnl_ifgroup_n2a(__u32 id, char *buf, int len); int rtnl_rtprot_a2n(__u32 *id, char *arg); int rtnl_rtscope_a2n(__u32 *id, char *arg); int rtnl_rttable_a2n(__u32 *id, char *arg); int rtnl_rtrealm_a2n(__u32 *id, char *arg); int rtnl_dsfield_a2n(__u32 *id, char *arg); +int rtnl_ifgroup_a2n(__u32 *id, char *arg); const char *inet_proto_n2a(int proto, char *buf, int len); int inet_proto_a2n(char *buf); diff --git a/ip/ipaddress.c b/ip/ipaddress.c index d1c6620..1ecbe03 100644 --- a/ip/ipaddress.c +++ b/ip/ipaddress.c @@ -227,6 +227,10 @@ int print_linkinfo(const struct sockaddr_nl *who, fprintf(fp, mtu %u , *(int*)RTA_DATA(tb[IFLA_MTU])); if (tb[IFLA_QDISC]) fprintf(fp, qdisc %s , (char*)RTA_DATA(tb[IFLA_QDISC])); + if (tb[IFLA_IFGROUP]) { + SPRINT_BUF(b1); + fprintf(fp, group %s , rtnl_ifgroup_n2a(*(int*)RTA_DATA(tb[IFLA_IFGROUP]), b1, sizeof(b1))); + } #ifdef IFLA_MASTER if (tb[IFLA_MASTER]) { SPRINT_BUF(b1); diff --git a/ip/iplink.c b/ip/iplink.c index f28f91c..cdef533 100644 --- a/ip/iplink.c +++ b/ip/iplink.c @@ -27,6 +27,7 @@ #include string.h #include sys/ioctl.h #include linux/sockios.h +#include linux/rtnetlink.h #include rt_names.h #include utils.h @@ -46,6 +47,7 @@ void iplink_usage(void) fprintf(stderr, promisc { on | off } |\n); fprintf(stderr, trailers { on | off } |\n); fprintf(stderr, txqueuelen PACKETS |\n); + fprintf(stderr, group GROUP |\n); fprintf(stderr, name NEWNAME |\n); fprintf(stderr, address LLADDR | broadcast LLADDR |\n); fprintf(stderr, mtu MTU }\n); @@ -146,6 +148,7 @@ static int iplink_have_newlink(void) static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv) { int qlen = -1; + __u32 group = 0; int mtu = -1; int len; char abuf[32]; @@ -198,6 +201,14 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv) if (get_integer(qlen, *argv, 0)) invarg(Invalid \txqueuelen\ value\n, *argv); addattr_l(req.n, sizeof(req), IFLA_TXQLEN, qlen, 4); + } else if (matches(*argv, group) == 0) { + NEXT_ARG(); + if (group != 0) + duparg(group, *argv); + + if (rtnl_ifgroup_a2n(group, *argv)) + invarg(\group\ value is invalid\n, *argv); + addattr_l(req.n, sizeof(req), IFLA_IFGROUP, group, sizeof(group)); } else if (strcmp(*argv, mtu) == 0) { NEXT_ARG(); if (mtu != -1) diff --git a/lib/rt_names.c b/lib/rt_names.c index 8d019a0..ec6638c 100644 --- a/lib/rt_names.c +++ b/lib/rt_names.c @@ -439,10 +439,72 @@ int rtnl_dsfield_a2n(__u32 *id, char *arg) } } - res = strtoul(arg, end, 16); + res = strtoul(arg, end, 0); if (!end || end == arg || *end || res 255) return -1; *id = res; return 0; } +static char * rtnl_rtifgroup_tab[256] = { + 0, +}; + +static int rtnl_rtifgroup_init; + +static void rtnl_rtifgroup_initialize(void) +{ + rtnl_rtifgroup_init = 1; + rtnl_tab_initialize(/etc/iproute2/rt_ifgroup, + rtnl_rtifgroup_tab, 256); +} + +char * rtnl_ifgroup_n2a(__u32 id, char *buf, int len) +{ + if (id=256) { + snprintf(buf, len, 0x%x, id); + return buf; + } + if (!rtnl_rtifgroup_tab[id]) { + if (!rtnl_rtifgroup_init) + rtnl_rtifgroup_initialize(); + } + if (rtnl_rtifgroup_tab[id]) +
[PATCHv7 1/5] Remove unnecessary locks from rtnetlink (in do_setlink)
The do_setlink function is protected by rtnl, additional locks are unnecessary, and the set_operstate() function is called from protected parts. Locks removed from both functions. Signed-off-by: Laszlo Attila Toth [EMAIL PROTECTED] --- net/core/rtnetlink.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 4a07e83..f95c6c5 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -562,9 +562,7 @@ static void set_operstate(struct net_device *dev, unsigned char transition) } if (dev-operstate != operstate) { - write_lock_bh(dev_base_lock); dev-operstate = operstate; - write_unlock_bh(dev_base_lock); netdev_state_change(dev); } } @@ -879,9 +877,7 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE])); if (tb[IFLA_LINKMODE]) { - write_lock_bh(dev_base_lock); dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]); - write_unlock_bh(dev_base_lock); } err = 0; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv7 4/5] Ifgroup read/write support in sysfs
The ifgroup member of each net device can be read and changed in sysfs. Author: Lutz Jaenicke [EMAIL PROTECTED] --- net/core/net-sysfs.c | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 61ead1d..5bd6d35 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -219,6 +219,20 @@ static ssize_t store_tx_queue_len(struct device *dev, return netdev_store(dev, attr, buf, len, change_tx_queue_len); } +NETDEVICE_SHOW(ifgroup, fmt_hex); + +static int change_ifgroup(struct net_device *net, unsigned long new_ifgroup) +{ + net-ifgroup = new_ifgroup; + return 0; +} + +static ssize_t store_ifgroup(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + return netdev_store(dev, attr, buf, len, change_ifgroup); +} + static struct device_attribute net_class_attributes[] = { __ATTR(addr_len, S_IRUGO, show_addr_len, NULL), __ATTR(iflink, S_IRUGO, show_iflink, NULL), @@ -235,6 +249,7 @@ static struct device_attribute net_class_attributes[] = { __ATTR(flags, S_IRUGO | S_IWUSR, show_flags, store_flags), __ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len, store_tx_queue_len), + __ATTR(ifgroup, S_IRUGO | S_IWUSR, show_ifgroup, store_ifgroup), {} }; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv7 iptables] Interface group match
Interface group values can be checked on both input and output interfaces with optional mask. Signed-off-by: Laszlo Attila Toth [EMAIL PROTECTED] --- extensions/Makefile |2 extensions/libxt_ifgroup.c | 201 +++ extensions/libxt_ifgroup.man | 36 ++ include/linux/netfilter/xt_ifgroup.h | 17 ++ 4 files changed, 255 insertions(+), 1 deletion(-) Index: include/linux/netfilter/xt_ifgroup.h === --- include/linux/netfilter/xt_ifgroup.h(revision 0) +++ include/linux/netfilter/xt_ifgroup.h(revision 0) @@ -0,0 +1,17 @@ +#ifndef _XT_IFGROUP_H +#define _XT_IFGROUP_H + +#define XT_IFGROUP_INVERT_IN 0x01 +#define XT_IFGROUP_INVERT_OUT 0x02 +#define XT_IFGROUP_MATCH_IN0x04 +#define XT_IFGROUP_MATCH_OUT 0x08 + +struct xt_ifgroup_info { + u_int32_t in_group; + u_int32_t in_mask; + u_int32_t out_group; + u_int32_t out_mask; + u_int8_t flags; +}; + +#endif /*_XT_IFGROUP_H*/ Index: extensions/libxt_ifgroup.c === --- extensions/libxt_ifgroup.c (revision 0) +++ extensions/libxt_ifgroup.c (revision 0) @@ -0,0 +1,201 @@ +/* + * Shared library add-on to iptables to match + * packets by the incoming interface group. + * + * (c) 2006, 2007 Balazs Scheidler [EMAIL PROTECTED], + * Laszlo Attila Toth [EMAIL PROTECTED] + */ +#include stdio.h +#include netdb.h +#include string.h +#include stdlib.h +#include getopt.h +#include xtables.h +#include linux/netfilter/xt_ifgroup.h + +static void +ifgroup_help(void) +{ + printf( +ifgroup v%s options:\n + --ifgroup-in [!] group[/mask] incoming interface group and its mask\n + --ifgroup-out [!] group[/mask] outgoing interface group and its mask\n +\n, IPTABLES_VERSION); +} + +static struct option opts[] = { + {ifgroup-in, 1, NULL, '1'}, + {ifgroup-out, 1, NULL, '2'}, + { } +}; + +#define PARAM_MATCH_IN 0x01 +#define PARAM_MATCH_OUT0x02 + + +#define IFGROUP_DEFAULT_MASK 0xU + +static int +ifgroup_parse(int c, char **argv, int invert, unsigned int *flags, + const void *entry, struct xt_entry_match **match) +{ + struct xt_ifgroup_info *info = +(struct xt_ifgroup_info *) (*match)-data; + char *end; + + switch (c) { + case '1': + if (*flags PARAM_MATCH_IN) + exit_error(PARAMETER_PROBLEM, + ifgroup match: Can't specify --ifgroup-in twice); + + check_inverse(optarg, invert, optind, 0); + + info-in_group = strtoul(optarg, end, 0); + info-in_mask = IFGROUP_DEFAULT_MASK; + + if (*end == '/') + info-in_mask = strtoul(end+1, end, 0); + + if (*end != '\0' || end == optarg) + exit_error(PARAMETER_PROBLEM, + ifgroup match: Bad ifgroup value `%s', optarg); + + if (invert) + info-flags |= XT_IFGROUP_INVERT_IN; + + *flags |= PARAM_MATCH_IN; + info-flags |= XT_IFGROUP_MATCH_IN; + break; + + case '2': + if (*flags PARAM_MATCH_OUT) + exit_error(PARAMETER_PROBLEM, + ifgroup match: Can't specify --ifgroup-out twice); + + check_inverse(optarg, invert, optind, 0); + + info-out_group = strtoul(optarg, end, 0); + info-out_mask = IFGROUP_DEFAULT_MASK; + + if (*end == '/') + info-out_mask = strtoul(end+1, end, 0); + + if (*end != '\0' || end == optarg) + exit_error(PARAMETER_PROBLEM, + ifgroup match: Bad ifgroup value `%s', optarg); + + if (invert) + info-flags |= XT_IFGROUP_INVERT_OUT; + + *flags |= PARAM_MATCH_OUT; + info-flags |= XT_IFGROUP_MATCH_OUT; + break; + + default: + return 0; + } + + return 1; +} + +static void +ifgroup_final_check(unsigned int flags) +{ + if (!flags) + exit_error(PARAMETER_PROBLEM, + You must specify either + `--ifgroup-in' or `--ifgroup-out'); +} + +static void +ifgroup_print_value_in(struct xt_ifgroup_info *info) +{ + printf(0x%x, info-in_group); + if (info-in_mask != IFGROUP_DEFAULT_MASK) + printf(/0x%x, info-in_mask); + printf( ); +} + +static void +ifgroup_print_value_out(struct xt_ifgroup_info *info) +{ + printf(0x%x, info-out_group); + if (info-out_mask != IFGROUP_DEFAULT_MASK) + printf(/0x%x, info-out_mask); + printf( ); +} + +static void +ifgroup_print(const void *ip, + const struct
[PATCHv7 iproute2 1/2] Added IFLA_NET_NS_PID as in kernel v2.6.24-rc1
Signed-off-by: Laszlo Attila Toth [EMAIL PROTECTED] diff --git a/include/linux/if_link.h b/include/linux/if_link.h index 23b3a8e..c948395 100644 --- a/include/linux/if_link.h +++ b/include/linux/if_link.h @@ -78,6 +78,7 @@ enum IFLA_LINKMODE, IFLA_LINKINFO, #define IFLA_LINKINFO IFLA_LINKINFO + IFLA_NET_NS_PID, __IFLA_MAX }; -- 1.5.2.5 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 iptables]Interface group match
Laszlo Attila Toth wrote: Lutz Jaenicke írta: On Tue, Nov 20, 2007 at 02:14:28PM +0100, Laszlo Attila Toth wrote: Interface group values can be checked on both input and output interfaces with optional mask. Index: extensions/libxt_ifgroup.c === --- extensions/libxt_ifgroup.c(revision 0) +++ extensions/libxt_ifgroup.c(revision 0) +info-in_group = strtoul(optarg, end, 0); This is somewhat inconsistent with the iproute patch which targets specific groups (with names). Should iptables be allowed to read /etc/iproute2/rt_ifgroup? It would be good but cannot be used if a mask is set and only values less than 256 can be used with names. Why 256? I can see no such limitation. For masks you could simply allow to define masks in rt_ifgroup too and use name/name or simply name/0xmask. There is no standard API like getservbyname()... The code of iproute2 should be copied. If Patrick says it is ok, I'll write this part. Of course. Please put the tab part somewhere common, I always wanted to have named firewall marks shared with ip and tc and I believe Balazs wanted that too :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][DECNET] dn_nl_deladdr() almost always returns no error
As far as I see from the err variable initialization the dn_nl_deladdr() routine was designed to report errors like EADDRNOTAVAIL and probaby ENODEV. But the code sets this err to 0 after the first nlmsg_parse and goes on, returning this 0 in any case. Is this made deliberately, or the patch below is correct? Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c index 66e266f..3bc82dc 100644 --- a/net/decnet/dn_dev.c +++ b/net/decnet/dn_dev.c @@ -651,16 +651,18 @@ static int dn_nl_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) struct dn_dev *dn_db; struct ifaddrmsg *ifm; struct dn_ifaddr *ifa, **ifap; - int err = -EADDRNOTAVAIL; + int err; err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, dn_ifa_policy); if (err 0) goto errout; + err = -ENODEV; ifm = nlmsg_data(nlh); if ((dn_db = dn_dev_by_index(ifm-ifa_index)) == NULL) goto errout; + err = -EADDRNOTAVAIL; for (ifap = dn_db-ifa_list; (ifa = *ifap); ifap = ifa-ifa_next) { if (tb[IFA_LOCAL] nla_memcmp(tb[IFA_LOCAL], ifa-ifa_local, 2)) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 0/5 + 3] Interface group patches
Laszlo Attila Toth wrote: Hello, This is the 7th version of our interface group patches. Patches: [1/5] Remove unnecessary locks from rtnetlink (in do_setlink) [2/5] rtnetlink: send a single notification on device state changes [3/5] Interface group: core (netlink) part [4/5] Ifgroup read/write support in sysfs I vote for these to go in, they're ready and there's no use in reposting them again and again. [5/5] Netfilter Interface group match Then I'd queue this one and fix it up on top of my current tree [iptables]Interface group match This one I would queue until we have released the 1.4.0 version of iptables. I don't want to release things that are not in at least a -rc kernel yet. [iproute2 1/2] Added IFLA_NET_NS_PID as in kernel v2.6.24-rc1 [iproute2 2/2] Interface group as new ip link option And for these Stephen has to decide, but both look fine to me. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 iptables]Interface group match
Patrick McHardy írta: Laszlo Attila Toth wrote: Lutz Jaenicke írta: On Tue, Nov 20, 2007 at 02:14:28PM +0100, Laszlo Attila Toth wrote: Interface group values can be checked on both input and output interfaces with optional mask. Index: extensions/libxt_ifgroup.c === --- extensions/libxt_ifgroup.c(revision 0) +++ extensions/libxt_ifgroup.c(revision 0) +info-in_group = strtoul(optarg, end, 0); This is somewhat inconsistent with the iproute patch which targets specific groups (with names). Should iptables be allowed to read /etc/iproute2/rt_ifgroup? It would be good but cannot be used if a mask is set and only values less than 256 can be used with names. Why 256? I can see no such limitation. For masks you could simply allow to define masks in rt_ifgroup too and use name/name or simply name/0xmask. 256 because it is the size of a static array (and I don't want allocate too much memory when other arrays such as the routing table names also have this size). In the current version I posted some minutes ago 0..2^32-1 can be used. The syntax name/0xmask is simply too strange for me. There is no standard API like getservbyname()... The code of iproute2 should be copied. If Patrick says it is ok, I'll write this part. Of course. Please put the tab part somewhere common, I always wanted to have named firewall marks shared with ip and tc and I believe Balazs wanted that too :) Ok. Yes, he wants :) -- Attila - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 iptables]Interface group match
Laszlo Attila Toth wrote: Patrick McHardy írta: Laszlo Attila Toth wrote: Lutz Jaenicke írta: Should iptables be allowed to read /etc/iproute2/rt_ifgroup? It would be good but cannot be used if a mask is set and only values less than 256 can be used with names. Why 256? I can see no such limitation. For masks you could simply allow to define masks in rt_ifgroup too and use name/name or simply name/0xmask. 256 because it is the size of a static array (and I don't want allocate too much memory when other arrays such as the routing table names also have this size). In the current version I posted some minutes ago 0..2^32-1 can be used. Its a hash. You can put as much in there as you like :) The syntax name/0xmask is simply too strange for me. Then how about name/name with masks also defined in rt_ifgroup? The same question applies for marks of course. There is no standard API like getservbyname()... The code of iproute2 should be copied. If Patrick says it is ok, I'll write this part. Of course. Please put the tab part somewhere common, I always wanted to have named firewall marks shared with ip and tc and I believe Balazs wanted that too :) Ok. Yes, he wants :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] New driver sfc for Solarstorm SFC4000 controller
Stephen Hemminger wrote: On Fri, 23 Nov 2007 17:08:15 + Ben Hutchings [EMAIL PROTECTED] wrote: 1. When we enable NAPI polling, we need to set __LINK_STATE_START in the net device used for NAPI. This bit is commented as private in netdevice.h, but e1000 also does this. Is this incorrect? Why are you using it directly? It seems this line is historic and we can remove it. The driver is pretty big (28K loc), and non trivial to get a good review. We are aware that it appears to be a large amount of code. The driver does support many types of PHY (10Gbase-T, XFP, CX4) on five different 10G reference designs and one 1G NIC ref design. There is also support for two generations of controller silicon, full ethtool support, start of day self-tests and an mtd driver for putting PXE images into flash. Perhaps we could help making it more reviewable by suggesting some sets of files that can be reviewed together that represent different parts of the functionality. The main functionality is contained in efx.c, rx.c, tx.c and falcon.c if that helps. Minor note: * use u8 not uint8_t (etc.) * gone overboard with docbook style comments, they are only needed on external API's. * Please use dev_err() rather than reinventing own message macros. OK - we will look at addressing these as well as checkpatch violations and resubmit a patch shortly. * why are you exporting symbol's? We need a small API so that other drivers can use parts of the hardware after the main driver has performed initialisation. One example is the mtd driver to access the flash. Another example is an accelerated driver for Xen that Solarflare is in the process of submitting. Regards -- Rob Stonehouse - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 0/5 + 3] Interface group patches
Patrick McHardy írta: Laszlo Attila Toth wrote: Hello, This is the 7th version of our interface group patches. Patches: [1/5] Remove unnecessary locks from rtnetlink (in do_setlink) [2/5] rtnetlink: send a single notification on device state changes [3/5] Interface group: core (netlink) part [4/5] Ifgroup read/write support in sysfs I vote for these to go in, they're ready and there's no use in reposting them again and again. I see, sorry. In fact, I didn't missed it. But you said the removing of the locks in the rtnl needs a separate patch. This is why I resent _all_. [iptables]Interface group match This one I would queue until we have released the 1.4.0 version of iptables. I don't want to release things that are not in at least a -rc kernel yet. Later I'll resend it in two patches, one for extending iptables with hash tables and one for the ifgroup match. [iproute2 1/2] Added IFLA_NET_NS_PID as in kernel v2.6.24-rc1 [iproute2 2/2] Interface group as new ip link option And for these Stephen has to decide, but both look fine to me. -- Attila - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/1] s390: ctc patch for 2.6.24
-- The following patch is intended for 2.6.24 and repairs the ctc driver by introducing alloc_netdev(). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][DECNET] dn_nl_deladdr() almost always returns no error
Hi, On Thu, Nov 29, 2007 at 07:29:20PM +0300, Pavel Emelyanov wrote: As far as I see from the err variable initialization the dn_nl_deladdr() routine was designed to report errors like EADDRNOTAVAIL and probaby ENODEV. But the code sets this err to 0 after the first nlmsg_parse and goes on, returning this 0 in any case. Is this made deliberately, or the patch below is correct? The patch looks good to me. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Acked-by: Steven Whitehouse [EMAIL PROTECTED] Steve. --- diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c index 66e266f..3bc82dc 100644 --- a/net/decnet/dn_dev.c +++ b/net/decnet/dn_dev.c @@ -651,16 +651,18 @@ static int dn_nl_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) struct dn_dev *dn_db; struct ifaddrmsg *ifm; struct dn_ifaddr *ifa, **ifap; - int err = -EADDRNOTAVAIL; + int err; err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, dn_ifa_policy); if (err 0) goto errout; + err = -ENODEV; ifm = nlmsg_data(nlh); if ((dn_db = dn_dev_by_index(ifm-ifa_index)) == NULL) goto errout; + err = -EADDRNOTAVAIL; for (ifap = dn_db-ifa_list; (ifa = *ifap); ifap = ifa-ifa_next) { if (tb[IFA_LOCAL] nla_memcmp(tb[IFA_LOCAL], ifa-ifa_local, 2)) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] [1/9] Core module symbol namespaces code and intro.
On Mon, 26 Nov 2007 10:25:33 -0800 Agreed. On first glance, I was intrigued but: 1) Why is everyone so concerned that export symbol space is large? - does it cost cpu or running memory? yes. about 120 bytes per symbol - does it cause bugs? yes, bad apis are causing bugs... sys_open is just the starter of that. -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/1] ctc: make use of alloc_netdev()
On Thu, 29 Nov 2007 17:36:27 +0100 Ursula Braun [EMAIL PROTECTED] wrote: From: Peter Tiedemann [EMAIL PROTECTED] Currently ctc-device initialization is broken (kernel bug in ctc_new_device). The new network namespace code reveals a deficiency of the ctc driver. It should make use of alloc_netdev() as described in Documentation/networking/netdevices.txt. Signed-off-by: Peter Tiedemann [EMAIL PROTECTED] Signed-off-by: Ursula Braun [EMAIL PROTECTED] --- drivers/s390/net/ctcmain.c | 45 - 1 file changed, 16 insertions(+), 29 deletions(-) Index: linux-2.6-uschi/drivers/s390/net/ctcmain.c === --- linux-2.6-uschi.orig/drivers/s390/net/ctcmain.c +++ linux-2.6-uschi/drivers/s390/net/ctcmain.c @@ -2782,35 +2782,14 @@ ctc_probe_device(struct ccwgroup_device } /** - * Initialize everything of the net device except the name and the - * channel structs. + * Device setup function called by alloc_netdev(). + * + * @param dev Device to be setup. */ -static struct net_device * -ctc_init_netdevice(struct net_device * dev, int alloc_device, -struct ctc_priv *privptr) +void ctc_init_netdevice(struct net_device * dev) { - if (!privptr) - return NULL; - DBF_TEXT(setup, 3, __FUNCTION__); - if (alloc_device) { - dev = kzalloc(sizeof(struct net_device), GFP_KERNEL); - if (!dev) - return NULL; - } - - dev-priv = privptr; - privptr-fsm = init_fsm(ctcdev, dev_state_names, - dev_event_names, CTC_NR_DEV_STATES, CTC_NR_DEV_EVENTS, - dev_fsm, DEV_FSM_LEN, GFP_KERNEL); - if (privptr-fsm == NULL) { - if (alloc_device) - kfree(dev); - return NULL; - } - fsm_newstate(privptr-fsm, DEV_STATE_STOPPED); - fsm_settimer(privptr-fsm, privptr-restart_timer); if (dev-mtu == 0) dev-mtu = CTC_BUFSIZE_DEFAULT - LL_HEADER_LENGTH - 2; dev-hard_start_xmit = ctc_tx; @@ -2823,7 +2802,7 @@ ctc_init_netdevice(struct net_device * d dev-type = ARPHRD_SLIP; dev-tx_queue_len = 100; dev-flags = IFF_POINTOPOINT | IFF_NOARP; - return dev; + SET_MODULE_OWNER(dev); } @@ -2879,14 +2858,22 @@ ctc_new_device(struct ccwgroup_device *c ccw_device_set_online (cdev[1]) failed with ret = %d\n, ret); } - dev = ctc_init_netdevice(NULL, 1, privptr); - + dev = alloc_netdev(0, ctc%d, ctc_init_netdevice); if (!dev) { ctc_pr_warn(ctc_init_netdevice failed\n); goto out; } + dev-priv = privptr; Why not use standard private data area, rather than allocating it separately? -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 iptables]Interface group match
On Nov 29 2007 17:27, Patrick McHardy wrote: The syntax name/0xmask is simply too strange for me. Then how about name/name with masks also defined in rt_ifgroup? The same question applies for marks of course. I would find that confusing, which is why the new xt_TOS only allows names when no /mask or a mask of /allbits is used. There is no standard API like getservbyname()... The code of iproute2 should be copied. If Patrick says it is ok, I'll write this part. Of course. Please put the tab part somewhere common, I always wanted to have named firewall marks shared with ip and tc and I believe Balazs wanted that too :) Ok. Yes, he wants :) So, we are going to see a librtnl? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 iptables]Interface group match
Jan Engelhardt wrote: On Nov 29 2007 17:27, Patrick McHardy wrote: The syntax name/0xmask is simply too strange for me. Then how about name/name with masks also defined in rt_ifgroup? The same question applies for marks of course. I would find that confusing, which is why the new xt_TOS only allows names when no /mask or a mask of /allbits is used. Its still useful, you don't have to use it :) Another alternative would be to allow defining names to val/mask. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + xfrm_policy-warning-fix.patch added to -mm tree
On Fri, 30 Nov 2007 00:51:33 +1100 Herbert Xu [EMAIL PROTECTED] wrote: On Wed, Nov 28, 2007 at 02:56:51AM -0800, [EMAIL PROTECTED] wrote: The patch titled xfrm_policy warning fix has been added to the -mm tree. Its filename is xfrm_policy-warning-fix.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: xfrm_policy warning fix From: Andrew Morton [EMAIL PROTECTED] Fix this: net/xfrm/xfrm_policy.c: In function '__xfrm_lookup': net/xfrm/xfrm_policy.c:1449: warning: 'dst' may be used uninitialized in this function by checking for impossible values in the switch(). Thanks Andrew. I've added the following patch to net-2.6. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- commit 5e5234ff17ef98932688116025b30958bd28a940 Author: Herbert Xu [EMAIL PROTECTED] Date: Fri Nov 30 00:50:31 2007 +1100 [IPSEC]: Fix uninitialised dst warning in __xfrm_lookup Andrew Morton reported that __xfrm_lookup generates this warning: net/xfrm/xfrm_policy.c: In function '__xfrm_lookup': net/xfrm/xfrm_policy.c:1449: warning: 'dst' may be used uninitialized in this function This is because if policy-action is of an unexpected value then dst will not be initialised. Of course, in practice this should never happen since the input layer xfrm_user/af_key will filter out all illegal values. But the compiler doesn't know that of course. So this patch fixes this by taking the conservative approach and treat all unknown actions the same as a blocking action. Thanks to Andrew for finding this and providing an initial fix. Signed-off-by: Herbert Xu [EMAIL PROTECTED] diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index b702bd8..9a4cf2e 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1344,6 +1344,7 @@ restart: xfrm_nr += pols[0]-xfrm_nr; switch (policy-action) { + default: case XFRM_POLICY_BLOCK: /* Prohibit the flow */ err = -EPERM; hm. If someone feeds a bad value into here we want to know about it rather than silently fixing it up, don't we? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Sysctl namespace support
Currently the network namespace work has gotten about as far as we can without the ability to make sysctls that are per network namespace. The techniques we have been using for other namespace of examining current and replacing the ctl_table.data field depending on the namespace instance that current-nsproxy refers to are both ugly and do not work for the network sysctls. The case in handling the networking sysctls that does not work with the existing ugly pointer munging techniques are directories like /proc/sys/net/ipv4/conf/ and /proc/sys/net/ipv4/neigh/ whose contents vary depending on the networking devices present in the network namespace. Adding support to the sysctl infrastructure to allow to register a sysctl table for a particular instance of a particular namespace removes the need for magic sysctl methods, and allows the use of the techniques for managing dynamic sysctl tables used for years in the network stack. Herbert we need this infrastructure most in net-2.6.25 (as not having it is a current bottleneck to further development of the network namespace) so these patches are against net-2.6.25. Andrew also need this infrastructure in -mm so that we can take advantage of this new infrastructure when implementing other namespaces. So I expect the sane way to deal with this patchset is to merge into both net-2.6.25 and -mm and then Andrew can drop or disable the patches once he pulls bases -mm on a version of net-2.6.25 with the changes. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] sysctl: Add register_sysctl_paths function
There are a number of modules that register a sysctl table somewhere deeply nested in the sysctl hierarchy, such as fs/nfs, fs/xfs, dev/cdrom, etc. They all specify several dummy ctl_tables for the path name. This patch implements register_sysctl_path that takes an additional path name, and makes up dummy sysctl nodes for each component. This patch was originally written by Olaf Kirch and brought to my attention and reworked some by Olaf Hering. I have changed a few additional things so the bugs are mine. After converting all of the easy callers Olaf Hering observed allyesconfig ARCH=i386, the patch reduces the final binary size by 9369 bytes. .text +897 .data -7008 textdata bss dec hex filename 269593104045899 4718592 357238012211a19 ../vmlinux-vanilla 269602074038891 4718592 35717690221023a ../O-allyesconfig/vmlinux So this change is both a space savings and a code simplification. CC: Olaf Kirch [EMAIL PROTECTED] CC: Olaf Hering [EMAIL PROTECTED] Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/sysctl.h |9 + kernel/sysctl.c| 90 2 files changed, 84 insertions(+), 15 deletions(-) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index e99171f..eb522bf 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -1065,7 +1065,16 @@ struct ctl_table_header struct completion *unregistering; }; +/* struct ctl_path describes where in the hierarchy a table is added */ +struct ctl_path +{ + const char *procname; + int ctl_name; +}; + struct ctl_table_header *register_sysctl_table(struct ctl_table * table); +struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path, + struct ctl_table *table); void unregister_sysctl_table(struct ctl_table_header * table); int sysctl_check_table(struct ctl_table *table); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 0deed82..fa92e70 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1490,11 +1490,12 @@ static __init int sysctl_init(void) core_initcall(sysctl_init); /** - * register_sysctl_table - register a sysctl hierarchy + * register_sysctl_paths - register a sysctl hierarchy + * @path: The path to the directory the sysctl table is in. * @table: the top-level table structure * * Register a sysctl table hierarchy. @table should be a filled in ctl_table - * array. An entry with a ctl_name of 0 terminates the table. + * array. A completely 0 filled entry terminates the table. * * The members of the struct ctl_table structure are used as follows: * @@ -1557,28 +1558,80 @@ core_initcall(sysctl_init); * This routine returns %NULL on a failure to register, and a pointer * to the table header on success. */ -struct ctl_table_header *register_sysctl_table(struct ctl_table * table) +struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path, + struct ctl_table *table) { - struct ctl_table_header *tmp; - tmp = kmalloc(sizeof(struct ctl_table_header), GFP_KERNEL); - if (!tmp) + struct ctl_table_header *header; + struct ctl_table *new, **prevp; + unsigned int n, npath; + + /* Count the path components */ + for (npath = 0; path[npath].ctl_name || path[npath].procname; ++npath) + ; + + /* +* For each path component, allocate a 2-element ctl_table array. +* The first array element will be filled with the sysctl entry +* for this, the second will be the sentinel (ctl_name == 0). +* +* We allocate everything in one go so that we don't have to +* worry about freeing additional memory in unregister_sysctl_table. +*/ + header = kzalloc(sizeof(struct ctl_table_header) + +(2 * npath * sizeof(struct ctl_table)), GFP_KERNEL); + if (!header) return NULL; - tmp-ctl_table = table; - INIT_LIST_HEAD(tmp-ctl_entry); - tmp-used = 0; - tmp-unregistering = NULL; - sysctl_set_parent(NULL, table); - if (sysctl_check_table(tmp-ctl_table)) { - kfree(tmp); + + new = (struct ctl_table *) (header + 1); + + /* Now connect the dots */ + prevp = header-ctl_table; + for (n = 0; n npath; ++n, ++path) { + /* Copy the procname */ + new-procname = path-procname; + new-ctl_name = path-ctl_name; + new-mode = 0555; + + *prevp = new; + prevp = new-child; + + new += 2; + } + *prevp = table; + + INIT_LIST_HEAD(header-ctl_entry); + header-used = 0; + header-unregistering = NULL; + sysctl_set_parent(NULL, header-ctl_table); + if (sysctl_check_table(header-ctl_table)) { +
Re: wireless vs. alignment requirements
Herbert Xu wrote: On Tue, Nov 27, 2007 at 09:16:07AM -0800, H. Peter Anvin wrote: I wrote a patch for the IP stack to realign packets if necessary at one point. I should dredge it up again and submit it for collective flamage. As long as it doesn't penalise Ethernet (e.g., the 10Gb crowd :) it would be good to have. Thanks, Uhm, most cards affected *ARE* Ethernet cards, due to the bloody 14-byte header. But it doesn't affect architectures which have alignment, and the cost of scanning a properly-aligned packet is minimal. I'll try to find it some time today. -hpa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] sysctl: Remember the ctl_table we passed to register_sysctl_paths
By doing this we allow users of register_sysctl_paths that build and dynamically allocate their ctl_table to be simpler. This allows them to just remember the ctl_table_header returned from register_sysctl_paths from which they can now find the ctl_table array they need to free. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/sysctl.h |1 + kernel/sysctl.c|1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index eb522bf..8b2e9e0 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -1063,6 +1063,7 @@ struct ctl_table_header struct list_head ctl_entry; int used; struct completion *unregistering; + struct ctl_table *ctl_table_arg; }; /* struct ctl_path describes where in the hierarchy a table is added */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index fa92e70..effae87 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1598,6 +1598,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path, new += 2; } *prevp = table; + header-ctl_table_arg = table; INIT_LIST_HEAD(header-ctl_entry); header-used = 0; -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] net: Implement the per network namespace sysctl infrastructure
The user interface is: register_net_sysctl_table and unregister_net_sysctl_table. Very much like the current interface except there is a network namespace parameter. With this any sysctl registered with register_net_sysctl_table will only show up to tasks in the same network namespace. All other sysctls continue to be globally visible. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/net/net_namespace.h |9 +++ net/sysctl_net.c| 57 +++ 2 files changed, 66 insertions(+), 0 deletions(-) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 4d0d634..235214c 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -25,6 +25,8 @@ struct net { struct proc_dir_entry *proc_net_stat; struct proc_dir_entry *proc_net_root; + struct list_headsysctl_table_headers; + struct net_device *loopback_dev; /* The loopback */ struct list_headdev_base_head; @@ -144,4 +146,11 @@ extern void unregister_pernet_subsys(struct pernet_operations *); extern int register_pernet_device(struct pernet_operations *); extern void unregister_pernet_device(struct pernet_operations *); +struct ctl_path; +struct ctl_table; +struct ctl_table_header; +extern struct ctl_table_header *register_net_sysctl_table(struct net *net, + const struct ctl_path *path, struct ctl_table *table); +extern void unregister_net_sysctl_table(struct ctl_table_header *header); + #endif /* __NET_NET_NAMESPACE_H */ diff --git a/net/sysctl_net.c b/net/sysctl_net.c index cd4eafb..c50c793 100644 --- a/net/sysctl_net.c +++ b/net/sysctl_net.c @@ -14,6 +14,7 @@ #include linux/mm.h #include linux/sysctl.h +#include linux/nsproxy.h #include net/sock.h @@ -54,3 +55,59 @@ struct ctl_table net_table[] = { #endif { 0 }, }; + +static struct list_head * +net_ctl_header_lookup(struct ctl_table_root *root, struct nsproxy *namespaces) +{ + return namespaces-net_ns-sysctl_table_headers; +} + +static struct ctl_table_root net_sysctl_root = { + .lookup = net_ctl_header_lookup, +}; + +static int sysctl_net_init(struct net *net) +{ + INIT_LIST_HEAD(net-sysctl_table_headers); + return 0; +} + +static void sysctl_net_exit(struct net *net) +{ + WARN_ON(!list_empty(net-sysctl_table_headers)); + return; +} + +static struct pernet_operations sysctl_pernet_ops = { + .init = sysctl_net_init, + .exit = sysctl_net_exit, +}; + +static __init int sysctl_init(void) +{ + int ret; + ret = register_pernet_subsys(sysctl_pernet_ops); + if (ret) + goto out; + register_sysctl_root(net_sysctl_root); +out: + return ret; +} +subsys_initcall(sysctl_init); + +struct ctl_table_header *register_net_sysctl_table(struct net *net, + const struct ctl_path *path, struct ctl_table *table) +{ + struct nsproxy namespaces; + namespaces = *current-nsproxy; + namespaces.net_ns = net; + return __register_sysctl_paths(net_sysctl_root, + namespaces, path, table); +} +EXPORT_SYMBOL_GPL(register_net_sysctl_table); + +void unregister_net_sysctl_table(struct ctl_table_header *header) +{ + return unregister_sysctl_table(header); +} +EXPORT_SYMBOL_GPL(unregister_net_sysctl_table); -- 1.5.3.rc6.17.g1911 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] sysctl: Infrastructure for per namespace sysctls
This patch implements the basic infrastructure for per namespace sysctls. A list of lists of sysctl headers is added, allowing each namespace to have it's own list of sysctl headers. Each list of sysctl headers has a lookup function to find the first sysctl header in the list, allowing the lists to have a per namespace instance. register_sysct_root is added to tell sysctl.c about additional lists of sysctl_headers. As all of the users are expected to be in kernel no unregister function is provided. sysctl_head_next is updated to walk through the list of lists. __register_sysctl_paths is added to add a new sysctl table on a non-default sysctl list. The only intrusive part of this patch is propagating the information to decided which list of sysctls to use for sysctl_check_table. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- include/linux/sysctl.h | 16 - kernel/sysctl.c| 93 ++-- kernel/sysctl_check.c | 25 +++-- 3 files changed, 111 insertions(+), 23 deletions(-) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 8b2e9e0..cd1da5c 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -951,7 +951,9 @@ enum /* For the /proc/sys support */ struct ctl_table; +struct nsproxy; extern struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev); +extern struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces, struct ctl_table_header *prev); extern void sysctl_head_finish(struct ctl_table_header *prev); extern int sysctl_perm(struct ctl_table *table, int op); @@ -1055,6 +1057,13 @@ struct ctl_table void *extra2; }; +struct ctl_table_root { + struct list_head root_list; + struct list_head header_list; + struct list_head *(*lookup)(struct ctl_table_root *root, + struct nsproxy *namespaces); +}; + /* struct ctl_table_header is used to maintain dynamic lists of struct ctl_table trees. */ struct ctl_table_header @@ -1064,6 +1073,7 @@ struct ctl_table_header int used; struct completion *unregistering; struct ctl_table *ctl_table_arg; + struct ctl_table_root *root; }; /* struct ctl_path describes where in the hierarchy a table is added */ @@ -1073,12 +1083,16 @@ struct ctl_path int ctl_name; }; +void register_sysctl_root(struct ctl_table_root *root); +struct ctl_table_header *__register_sysctl_paths( + struct ctl_table_root *root, struct nsproxy *namespaces, + const struct ctl_path *path, struct ctl_table *table); struct ctl_table_header *register_sysctl_table(struct ctl_table * table); struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path, struct ctl_table *table); void unregister_sysctl_table(struct ctl_table_header * table); -int sysctl_check_table(struct ctl_table *table); +int sysctl_check_table(struct nsproxy *namespaces, struct ctl_table *table); #else /* __KERNEL__ */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index effae87..ad4b709 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -156,8 +156,16 @@ static int proc_dointvec_taint(struct ctl_table *table, int write, struct file * #endif static struct ctl_table root_table[]; -static struct ctl_table_header root_table_header = - { root_table, LIST_HEAD_INIT(root_table_header.ctl_entry) }; +static struct ctl_table_root sysctl_table_root; +static struct ctl_table_header root_table_header = { + .ctl_table = root_table, + .ctl_entry = LIST_HEAD_INIT(sysctl_table_root.header_list), + .root = sysctl_table_root, +}; +static struct ctl_table_root sysctl_table_root = { + .root_list = LIST_HEAD_INIT(sysctl_table_root.root_list), + .header_list = LIST_HEAD_INIT(root_table_header.ctl_entry), +}; static struct ctl_table kern_table[]; static struct ctl_table vm_table[]; @@ -1300,12 +1308,27 @@ void sysctl_head_finish(struct ctl_table_header *head) spin_unlock(sysctl_lock); } -struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev) +static struct list_head * +lookup_header_list(struct ctl_table_root *root, struct nsproxy *namespaces) { + struct list_head *header_list; + header_list = root-header_list; + if (root-lookup) + header_list = root-lookup(root, namespaces); + return header_list; +} + +struct ctl_table_header *__sysctl_head_next(struct nsproxy *namespaces, + struct ctl_table_header *prev) +{ + struct ctl_table_root *root; + struct list_head *header_list; struct ctl_table_header *head; struct list_head *tmp; + spin_lock(sysctl_lock); if (prev) { + head = prev; tmp = prev-ctl_entry; unuse_table(prev); goto next; @@ -1319,14 +1342,38 @@ struct
Re: [PATCH 0/3] cxgb - driver fixes.
Divy Le Ray wrote: Jeff, I'm submitting a patch series for inclusion in 2.6.24 for the cxgb driver. The patches are built against Linus'git tree. Here is a brief description: - Ensure that GSO skbs have enough headroom before encapsulating them, - Fix a crash in NAPI mode, - Fix statistics accounting and report. We ran pktgen overnight on 2.6.23 with patch 1 and 3 applied (patch 2 not needed on .23 it seems) and it was stable at about 1.5Gbps bi-directional using 1500 MTU sized frames. We'll run some more tests with user-space TCP UDP today, but it looks good so far. Perhaps these patches should be considered for .23 stable as well? Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Netchannels. The 21'th release.
Hi. This is the 21'th release of the netchannels, a peer-to-peer protocol agnostic communication channel between hardware and users. It uses unified cache to store channels, allows to allocate buffers for data from userspace mapped area or from other preallocated set of pages (like VFS cache). All protocol processing happens in process context. Users of the system can be for example userspace - it allows to receive and send traffic from the wire without any kernel interference, to implement own protocols and offload its processing to the hardware. This idea was originally proposed and implemented by Van Jacobson. This patchset (with userspace netowrk stack) is a logical continuation of the idea with move to the full peer-to-peer processing. One of its users is userspace network stack [2]. Short changelog: * fixed queue length usage * fixed dst release path. Both problems reported by Salvatore Del Popolo [EMAIL PROTECTED] * removed nat user 1. Netchannels homepage. http://tservice.net.ru/~s0mbre/old/?section=projectsitem=netchannel 2. Userspace network stack. http://tservice.net.ru/~s0mbre/old/?section=projectsitem=unetstack Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S index 2697e92..3231b22 100644 --- a/arch/i386/kernel/syscall_table.S +++ b/arch/i386/kernel/syscall_table.S @@ -319,3 +319,4 @@ ENTRY(sys_call_table) .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait + .long sys_netchannel_control diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S index b4aa875..d35d4d8 100644 --- a/arch/x86_64/ia32/ia32entry.S +++ b/arch/x86_64/ia32/ia32entry.S @@ -718,4 +718,5 @@ ia32_sys_call_table: .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu + .quad sys_netchannel_control ia32_syscall_end: diff --git a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h index beeeaf6..33242f8 100644 --- a/include/asm-i386/unistd.h +++ b/include/asm-i386/unistd.h @@ -325,10 +325,11 @@ #define __NR_move_pages317 #define __NR_getcpu318 #define __NR_epoll_pwait 319 +#define __NR_netchannel_control320 #ifdef __KERNEL__ -#define NR_syscalls 320 +#define NR_syscalls 321 #include linux/err.h /* diff --git a/include/asm-x86_64/unistd.h b/include/asm-x86_64/unistd.h index 777288e..16f1aac 100644 --- a/include/asm-x86_64/unistd.h +++ b/include/asm-x86_64/unistd.h @@ -619,8 +619,10 @@ __SYSCALL(__NR_sync_file_range, sys_sync_file_range) __SYSCALL(__NR_vmsplice, sys_vmsplice) #define __NR_move_pages279 __SYSCALL(__NR_move_pages, sys_move_pages) +#define __NR_netchannel_control280 +__SYSCALL(__NR_netchannel_control, sys_netchannel_control) -#define __NR_syscall_max __NR_move_pages +#define __NR_syscall_max __NR_netchannel_control #ifdef __KERNEL__ #include linux/err.h diff --git a/include/linux/connector.h b/include/linux/connector.h index 4c02119..bdf6432 100644 --- a/include/linux/connector.h +++ b/include/linux/connector.h @@ -36,9 +36,11 @@ #define CN_VAL_CIFS 0x1 #define CN_W1_IDX 0x3 /* w1 communication */ #define CN_W1_VAL 0x1 +#define CN_NETCHANNELS_IDX 0x04/* Netchannels connection control */ +#define CN_NETCHANNELS_VAL 0x01 -#define CN_NETLINK_USERS 4 +#define CN_NETLINK_USERS 5 /* * Maximum connector's message size. diff --git a/include/linux/netchannel.h b/include/linux/netchannel.h new file mode 100644 index 000..c56afc5 --- /dev/null +++ b/include/linux/netchannel.h @@ -0,0 +1,175 @@ +/* + * netchannel.h + * + * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef __NETCHANNEL_H +#define __NETCHANNEL_H + +#include linux/types.h + +enum netchannel_commands { + NETCHANNEL_CREATE = 0, +}; + +enum netchannel_type { + NETCHANNEL_EMPTY = 0, + NETCHANNEL_COPY_USER, + NETCHANNEL_NAT, + NETCHANNEL_MAX +}; + +/* + * Destination and source addresses/ports are from receiving point ov view, + *
Re: [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma
Argh - mind the line breaks... sln On Nov 29, 2007 12:08 PM, Nelson, Shannon [EMAIL PROTECTED] wrote: [RFC] I/OAT: Handle incoming udp through ioatdma From: Shannon Nelson [EMAIL PROTECTED] If the incoming udp packet is larger than sysctl_udp_dma_copybreak, try pushing it through the ioatdma asynchronous memcpy. This is very much the same as the tcp copy offload. This is an RFC because we know there are stability problems under high traffic. This code was originally proposed by the Capstone students at Portland State University: Aaron Armstrong, Greg Nishikawa, Sean Gayner, Toai Nguyen, Stephen Bekefi, and Derek Chiles. Signed-off-by: Shannon Nelson [EMAIL PROTECTED] --- include/net/udp.h |5 +++ net/core/user_dma.c |1 + net/ipv4/udp.c | 79 --- 3 files changed, 81 insertions(+), 4 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index 98755eb..d5e05d8 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -173,4 +173,9 @@ extern void udp_proc_unregister(struct udp_seq_afinfo *afinfo); extern int udp4_proc_init(void); extern void udp4_proc_exit(void); #endif + +#ifdef CONFIG_NET_DMA +extern int sysctl_udp_dma_copybreak; +#endif + #endif /* _UDP_H */ diff --git a/net/core/user_dma.c b/net/core/user_dma.c index 0ad1cd5..e876ca4 100644 --- a/net/core/user_dma.c +++ b/net/core/user_dma.c @@ -34,6 +34,7 @@ #define NET_DMA_DEFAULT_COPYBREAK 4096 int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK; +int sysctl_udp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK; /** * dma_skb_copy_datagram_iovec - Copy a datagram to an iovec. diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 69d4bd1..3b6d91c 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -102,6 +102,8 @@ #include net/route.h #include net/checksum.h #include net/xfrm.h +#include net/netdma.h +#include linux/dmaengine.h #include udp_impl.h /* @@ -819,6 +821,11 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, unsigned int ulen, copied; int err; int is_udplite = IS_UDPLITE(sk); +#ifdef CONFIG_NET_DMA + struct dma_chan *dma_chan = NULL; + struct dma_pinned_list *pinned_list = NULL; + dma_cookie_tdma_cookie = 0; +#endif /* * Check any passed addresses @@ -829,6 +836,18 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, if (flags MSG_ERRQUEUE) return ip_recv_error(sk, msg, len); +#ifdef CONFIG_NET_DMA + preempt_disable(); + if ((len sysctl_udp_dma_copybreak) + !(flags MSG_PEEK) + __get_cpu_var(softnet_data).net_dma) { + + preempt_enable_no_resched(); + pinned_list = dma_pin_iovec_pages(msg-msg_iov, len); + } else + preempt_enable_no_resched(); +#endif + try_again: skb = skb_recv_datagram(sk, flags, noblock, err); if (!skb) @@ -852,10 +871,30 @@ try_again: goto csum_copy_err; } - if (skb_csum_unnecessary(skb)) - err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), - msg-msg_iov, copied ); - else { + if (skb_csum_unnecessary(skb)) { +#ifdef CONFIG_NET_DMA + if (pinned_list !dma_chan) + dma_chan = get_softnet_dma(); + if (dma_chan) { + dma_cookie = dma_skb_copy_datagram_iovec( + dma_chan, skb, sizeof(struct udphdr), + msg-msg_iov, copied, pinned_list); + if (dma_cookie 0) { + printk(KERN_ALERT dma_cookie 0\n); + + /* Exception. Bailout! */ + if (!copied) + copied = -EFAULT; + goto out_free; + } + err = 0; + } + else +#endif + err = skb_copy_datagram_iovec(skb, + sizeof(struct udphdr), + msg-msg_iov, copied); + } else { err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg-msg_iov); if (err == -EINVAL) @@ -882,6 +921,35 @@ try_again: if (flags MSG_TRUNC) err = ulen; +#ifdef CONFIG_NET_DMA + if (dma_chan) { + struct sk_buff *skb; + dma_cookie_t done, used; + + dma_async_memcpy_issue_pending(dma_chan); + + while (dma_async_memcpy_complete(dma_chan, dma_cookie, done,
[PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma
[RFC] I/OAT: Handle incoming udp through ioatdma From: Shannon Nelson [EMAIL PROTECTED] If the incoming udp packet is larger than sysctl_udp_dma_copybreak, try pushing it through the ioatdma asynchronous memcpy. This is very much the same as the tcp copy offload. This is an RFC because we know there are stability problems under high traffic. This code was originally proposed by the Capstone students at Portland State University: Aaron Armstrong, Greg Nishikawa, Sean Gayner, Toai Nguyen, Stephen Bekefi, and Derek Chiles. Signed-off-by: Shannon Nelson [EMAIL PROTECTED] --- include/net/udp.h |5 +++ net/core/user_dma.c |1 + net/ipv4/udp.c | 79 --- 3 files changed, 81 insertions(+), 4 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index 98755eb..d5e05d8 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -173,4 +173,9 @@ extern void udp_proc_unregister(struct udp_seq_afinfo *afinfo); extern int udp4_proc_init(void); extern void udp4_proc_exit(void); #endif + +#ifdef CONFIG_NET_DMA +extern int sysctl_udp_dma_copybreak; +#endif + #endif /* _UDP_H */ diff --git a/net/core/user_dma.c b/net/core/user_dma.c index 0ad1cd5..e876ca4 100644 --- a/net/core/user_dma.c +++ b/net/core/user_dma.c @@ -34,6 +34,7 @@ #define NET_DMA_DEFAULT_COPYBREAK 4096 int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK; +int sysctl_udp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK; /** * dma_skb_copy_datagram_iovec - Copy a datagram to an iovec. diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 69d4bd1..3b6d91c 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -102,6 +102,8 @@ #include net/route.h #include net/checksum.h #include net/xfrm.h +#include net/netdma.h +#include linux/dmaengine.h #include udp_impl.h /* @@ -819,6 +821,11 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, unsigned int ulen, copied; int err; int is_udplite = IS_UDPLITE(sk); +#ifdef CONFIG_NET_DMA + struct dma_chan *dma_chan = NULL; + struct dma_pinned_list *pinned_list = NULL; + dma_cookie_tdma_cookie = 0; +#endif /* * Check any passed addresses @@ -829,6 +836,18 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, if (flags MSG_ERRQUEUE) return ip_recv_error(sk, msg, len); +#ifdef CONFIG_NET_DMA + preempt_disable(); + if ((len sysctl_udp_dma_copybreak) + !(flags MSG_PEEK) + __get_cpu_var(softnet_data).net_dma) { + + preempt_enable_no_resched(); + pinned_list = dma_pin_iovec_pages(msg-msg_iov, len); + } else + preempt_enable_no_resched(); +#endif + try_again: skb = skb_recv_datagram(sk, flags, noblock, err); if (!skb) @@ -852,10 +871,30 @@ try_again: goto csum_copy_err; } - if (skb_csum_unnecessary(skb)) - err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), - msg-msg_iov, copied ); - else { + if (skb_csum_unnecessary(skb)) { +#ifdef CONFIG_NET_DMA + if (pinned_list !dma_chan) + dma_chan = get_softnet_dma(); + if (dma_chan) { + dma_cookie = dma_skb_copy_datagram_iovec( + dma_chan, skb, sizeof(struct udphdr), + msg-msg_iov, copied, pinned_list); + if (dma_cookie 0) { + printk(KERN_ALERT dma_cookie 0\n); + + /* Exception. Bailout! */ + if (!copied) + copied = -EFAULT; + goto out_free; + } + err = 0; + } + else +#endif + err = skb_copy_datagram_iovec(skb, + sizeof(struct udphdr), + msg-msg_iov, copied); + } else { err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg-msg_iov); if (err == -EINVAL) @@ -882,6 +921,35 @@ try_again: if (flags MSG_TRUNC) err = ulen; +#ifdef CONFIG_NET_DMA + if (dma_chan) { + struct sk_buff *skb; + dma_cookie_t done, used; + + dma_async_memcpy_issue_pending(dma_chan); + + while (dma_async_memcpy_complete(dma_chan, dma_cookie, done, + used) == DMA_IN_PROGRESS) { + /* do partial cleanup of sk_async_wait_queue */ + while ((skb = skb_peek(sk-sk_async_wait_queue)) +
[PATCH 2/2] [IPSEC]: Reinject v6 packet on input instead of calling netfilter
2 of 2 cheers, jamal bins8dZMGaaLp.bin Description: application/mbox
Re: [PATCH 2/2] [IPSEC]: Reinject v6 packet on input instead of calling netfilter
jamal wrote: [ can't quote because non-inline attachment ] I think Yoshifuji had some objections to this because extension headers will be processed twice. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] [IPSEC]: Reinject packet instead of calling netfilter directly on input
Herbert, This is a simplified version of one of your earlier patches that never made it in. I liked it so much that i reduced it to this and infact given the cycles today, tested it (with transport and tunnel mode only;-). We re-inject a decrypted ipsec (other than tunnel mode) back and let it bubble up the network stack. This improves debugability (since sniffers like tcpdump can see the packet) and usability since ingress tc filters can act on it. Ive broken it down into two: IPv4 and IPV6. If you want to go through the xfrm reinject() method, then I am gonna need more time to resubmit or you be my guest and go for it and i will test it. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] [IPSEC]: Reinject v4 packet on input instead of calling netfilter
1 of 2. cheers, jamal binjYANAm8J6C.bin Description: application/mbox
Re: [PATCH 2/2] [IPSEC]: Reinject v6 packet on input instead of calling netfilter
jamal wrote: On Thu, 2007-29-11 at 21:55 +0100, Patrick McHardy wrote: jamal wrote: [ can't quote because non-inline attachment ] Evolution seems to have whitespace issues everytime i inlined the attachment; and Dave has been able to tolerate me doing this so far. I have just read it in I used to work fine for me as well, the Debian switch to icedove broke it. Never mind, I'm sure its going to get fixed some day :) I think Yoshifuji had some objections to this because extension headers will be processed twice. ah, i missed that part. Could you point to a specific portion? http://lists.openwall.net/netdev/2007/10/16/88 I wouldnt mind just ipv4 going in - but that would be lacking consistency. Is there anything that can be done to get the extension headers to be processed only once? I would prefer to keep things consistent between IPv4 and IPv6. Not sure if anything could be done, perhaps we could keep the necessary parts of the IP6CB and skip parsing up to the ESP nexthdr. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] [IPSEC]: Reinject v6 packet on input instead of calling netfilter
On Thu, 2007-29-11 at 21:55 +0100, Patrick McHardy wrote: jamal wrote: [ can't quote because non-inline attachment ] Evolution seems to have whitespace issues everytime i inlined the attachment; and Dave has been able to tolerate me doing this so far. I have just read it in I think Yoshifuji had some objections to this because extension headers will be processed twice. ah, i missed that part. Could you point to a specific portion? I wouldnt mind just ipv4 going in - but that would be lacking consistency. Is there anything that can be done to get the extension headers to be processed only once? cheers, jamal From 83d91d3c6f5df027a446b575af8dd4a3fdf90148 Mon Sep 17 00:00:00 2001 From: Jamal Hadi Salim [EMAIL PROTECTED] Date: Thu, 29 Nov 2007 15:41:21 -0500 Subject: [PATCH 2/2] [IPSEC]: Reinject v6 packet on input instead of calling netfilter This is the ipv6 version. Derived from an earlier down-trodden patch from Herbert. We re-inject a decrypted ipsec back and let it bubble up the network stack. This improves packet debugability (since sniffers like tcpdump can see the packet) and ingress tc filters can act on it. Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] --- net/ipv6/xfrm6_input.c | 23 ++- 1 files changed, 14 insertions(+), 9 deletions(-) diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c index e2c3efd..c741fba 100644 --- a/net/ipv6/xfrm6_input.c +++ b/net/ipv6/xfrm6_input.c @@ -33,19 +33,24 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async) skb_network_header(skb)[IP6CB(skb)-nhoff] = XFRM_MODE_SKB_CB(skb)-protocol; -#ifdef CONFIG_NETFILTER + if (async) + return ip6_rcv_finish(skb); + ipv6_hdr(skb)-payload_len = htons(skb-len); __skb_push(skb, skb-data - skb_network_header(skb)); - NF_HOOK(PF_INET6, NF_INET_PRE_ROUTING, skb, skb-dev, NULL, - ip6_rcv_finish); - return -1; -#else - if (async) - return ip6_rcv_finish(skb); + dst_release(skb-dst); + skb-dst = NULL; + { + /* make some packet-sock user (eg tcpdump) happy */ + const unsigned char *old_mac; + old_mac = skb_mac_header(skb); + skb_set_mac_header(skb, -skb-mac_len); + memmove(skb_mac_header(skb), old_mac, skb-mac_len); + } - return 1; -#endif + netif_rx(skb); + return 0; } int xfrm6_rcv(struct sk_buff *skb) -- 1.4.4.1.gaed4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Stephen Hemminger wrote: Still interested in this. I got part way through integrating it but had concerns about the API from the application to netem for getting the data. It seemed like there ought to be a better way to do it that could handle large data sets better, but never really got a good solution worked out (that is why I never said anything). Would spreading them over multiple netlink messages work? A different, slightly ugly possibility would be to simply use copy_from_user, netlink is synchronous now (still better than using configfs IMO). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] [IPSEC]: Reinject v6 packet on input instead of calling netfilter
On Thu, 2007-29-11 at 22:21 +0100, Patrick McHardy wrote: http://lists.openwall.net/netdev/2007/10/16/88 I wouldnt mind just ipv4 going in - but that would be lacking consistency. Is there anything that can be done to get the extension headers to be processed only once? I would prefer to keep things consistent between IPv4 and IPv6. Makes sense. Not sure if anything could be done, perhaps we could keep the necessary parts of the IP6CB and skip parsing up to the ESP nexthdr. I will compute in the background and talk to Yoshfuji (hopefully will bump into him next week;-). Herbert, if you have any clever ideas please shoot. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
On Tue, 27 Nov 2007 14:57:26 +0100 Ariane Keller [EMAIL PROTECTED] wrote: I just wanted to ask whether there is a general interest in this patch. If yes: great, how to proceed? otherwise: please let me know why. Thanks! Ariane Keller wrote: Hi Stephen Approximately a year ago we discussed an enhancement to netem, which we called trace control for netem. We obtain the value for the packet delay, drop, duplication and corruption from a so called trace file. The trace file may be obtained by monitoring network traffic and thus enables us to emulate real world network behavior. Traces can ether be generated individually (we supply a set of tools to do this) or can be downloaded from our homepage: http://tcn.hypert.net . Since our last submission on 2006-12-15 we did some code clean up and have created two new patches one against kernel 2.6.23.8 and one against iproute2-2.6.23. To refer to our discussion from last year please have a look at messages with subject LARTC: trace control for netem. We are looking forward for any comments, suggestions and instructions to bring the trace enhancement to the kernel and to iproute2. Thanks, Ariane Still interested in this. I got part way through integrating it but had concerns about the API from the application to netem for getting the data. It seemed like there ought to be a better way to do it that could handle large data sets better, but never really got a good solution worked out (that is why I never said anything). The 2.6.23.8 patch seems to be unavailable right now. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ZD1211RW unaligned accesses...
So, did the patch below fix the problem? Should I apply it? John On Sat, Nov 24, 2007 at 11:02:16PM +0800, Herbert Xu wrote: On Wed, Nov 21, 2007 at 01:00:44PM +, Shaddy Baddah wrote: It hasn't seemed to. I patched the source (confirming the patched lines are in), compiled, installed and rebooted to effect the changes. My zd1211rw modules timestamp indicates that I have an updated module: Thanks for your quick response and sorry for my late answer :) I think Dave's patch is definietly on the right track but there are subsequent unaligned accesses of a similar kind which is why it still appears to be broken if you look at the kernel messages. But there is definitely progress because those addresses are now bigger (0x394/0x39c/0x3a8 vs. 0x2** earlier). So please try the following patch (instead of the original one) which should fix all the unailgned accesses in do_rx. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c b/drivers/net/wireless/zd1211rw/zd_mac.c index a903645..d06b05b 100644 --- a/drivers/net/wireless/zd1211rw/zd_mac.c +++ b/drivers/net/wireless/zd1211rw/zd_mac.c @@ -1166,15 +1166,16 @@ static void do_rx(unsigned long mac_ptr) int zd_mac_rx_irq(struct zd_mac *mac, const u8 *buffer, unsigned int length) { struct sk_buff *skb; + unsigned int hlen = ALIGN(sizeof(struct zd_rt_hdr), 16); - skb = dev_alloc_skb(sizeof(struct zd_rt_hdr) + length); + skb = dev_alloc_skb(hlen + length); if (!skb) { struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac); dev_warn(zd_mac_dev(mac), Could not allocate skb.\n); ieee-stats.rx_dropped++; return -ENOMEM; } - skb_reserve(skb, sizeof(struct zd_rt_hdr)); + skb_reserve(skb, hlen - ZD_PLCP_HEADER_SIZE); memcpy(__skb_put(skb, length), buffer, length); skb_queue_tail(mac-rx_queue, skb); tasklet_schedule(mac-rx_tasklet); - To unsubscribe from this list: send the line unsubscribe linux-wireless in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH (resubmit)][BRIDGE] Properly dereference the br_should_route_hook
On Thu, Nov 29, 2007 at 06:36:50AM -0800, Paul E. McKenney wrote: That certainly is an interesting tradeoff... Save a memory barrier when assigning NULL, but pay an extra test and branch in all cases. Though it does make for a simpler rule -- just use rcu_assign_pointer() in all cases. Of course, if almost all rcu_assign_pointer() executions assign non-NULL pointers, the optimal strategy would be to leave the implementation of rcu_assign_pointer() alone, and simply enforce use of rcu_assign_pointer(), even if the pointer being assigned is NULL. I was thinking of something much simpler. If the second argument is constant and NULL, then skip the barrier. No run-time slow-down at all. Although rcu_dereference() does a memory barrier only on Alpha, that of rcu_assign_pointer() is needed on any machine that does not preserve store order (Itanium, POWER, ARM, some MIPS boxes according to rumor, ...). Good point! Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: + xfrm_policy-warning-fix.patch added to -mm tree
On Thu, Nov 29, 2007 at 09:32:02AM -0800, Andrew Morton wrote: diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index b702bd8..9a4cf2e 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1344,6 +1344,7 @@ restart: xfrm_nr += pols[0]-xfrm_nr; switch (policy-action) { + default: case XFRM_POLICY_BLOCK: /* Prohibit the flow */ err = -EPERM; hm. If someone feeds a bad value into here we want to know about it rather than silently fixing it up, don't we? As I said, we already check all policy actions when they enter the kernel from user-space. For example, in xfrm_user we make sure that action is one of these values. So this is mainly to shut gcc up. Even if we did somehow get an illegal value through, dropping the packet sounds like a sane action to follow and I'm sure people will notice pretty quickly when packets don't flow anymore :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ZD1211RW unaligned accesses...
On Thu, Nov 29, 2007 at 04:45:33PM -0500, John W. Linville wrote: So, did the patch below fix the problem? Should I apply it? I'm keen to find out the result too :) Chances are it does make progress however we may still have the general wireless/IP stack alignment issue that we are still discussing. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma
On Thu, 2007-29-11 at 12:08 -0800, Nelson, Shannon wrote: [RFC] I/OAT: Handle incoming udp through ioatdma From: Shannon Nelson [EMAIL PROTECTED] If the incoming udp packet is larger than sysctl_udp_dma_copybreak, try pushing it through the ioatdma asynchronous memcpy. This is very much the same as the tcp copy offload. This is an RFC because we know there are stability problems under high traffic. What stability problems? Is there some magic sysctl_udp_dma_copybreak threshold value where you start seeing the benefit of IOAT-ing? Since you mentioned studentsevil grin here, it would be interesting to see data where udp starts benefitting. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
On Thu, Nov 29, 2007 at 09:50:35AM -0800, H. Peter Anvin wrote: Uhm, most cards affected *ARE* Ethernet cards, due to the bloody 14-byte header. Well most Ethernet drivers are using NET_IP_ALIGN which means that IP stack gets aligned packets only. sky2 is the exception here, not the rule. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH -mm] [RFC] I/OAT: Handle incoming udp through ioatdma
From: J Hadi Salim [mailto:[EMAIL PROTECTED] On Behalf Of jamal On Thu, 2007-29-11 at 12:08 -0800, Nelson, Shannon wrote: [RFC] I/OAT: Handle incoming udp through ioatdma From: Shannon Nelson [EMAIL PROTECTED] If the incoming udp packet is larger than sysctl_udp_dma_copybreak, try pushing it through the ioatdma asynchronous memcpy. This is very much the same as the tcp copy offload. This is an RFC because we know there are stability problems under high traffic. What stability problems? Under a heavy stress test combining TCP and UDP traffic we would get a kernel panic from a NULL dereference in dma_unpin_iovec_pages(). Remove this patch and the panic goes away. Unfortunately, this problem is below our priority line so it has received little attention since then. We know of interest in this patch, however, so decided to release it into the wild and see if it garners any other attention. Part of the panic message: Unable to handle kernel NULL pointer dereference at RIP: [8025b406] set_page_dirty_lock+0xe/0x3a PGD 2b91f067 PUD 2a04b067 PMD 0 Oops: 0002 [1] SMP CPU 5 Modules linked in: ioatdma dca igb i2c_dev i2c_core e1000 Pid: 10998, comm: netserver Not tainted 2.6.22.9_CB-2.05_patched #1 RIP: 0010:[8025b406] [8025b406] set_page_dirty_lock+0xe/0x3a RSP: 0018:810028fedb68 EFLAGS: 00010246 RAX: RBX: 81003afea648 RCX: 81002a382b88 RDX: 810028fedfd8 RSI: 0282 RDI: RBP: R08: 0001 R09: R10: 806c13e0 R11: 0246 R12: R13: 0001 R14: R15: 81003afea660 FS: 2b23ba4177c0() GS:810001164e40() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 29831000 CR4: 06e0 Process netserver (pid: 10998, threadinfo 810028fec000, task 81003a881590) Stack: 810039887c80 81003afea648 81003afea640 8048545a fff4 810028fedf38 81003afea658 81003afea670 81003afea640 80485657 81003afea660 Call Trace: [8048545a] dma_unpin_iovec_pages+0x31/0x6e [80485657] dma_pin_iovec_pages+0x1c0/0x1d9 [804ce479] udp_recvmsg+0x94/0x43e [8049268e] sock_common_recvmsg+0x30/0x45 [80491013] sock_recvmsg+0xd5/0xed [80518d48] mutex_lock+0xd/0x1e [802425ff] autoremove_wake_function+0x0/0x2e [802564ca] find_get_page+0x21/0x50 [80258572] filemap_nopage+0x180/0x2b0 [80262b59] __handle_mm_fault+0x404/0x9fc [80245b35] getnstimeofday+0x32/0x8d [80245b35] getnstimeofday+0x32/0x8d [80491dc8] sys_recvfrom+0xe2/0x130 [802445ca] enqueue_hrtimer+0x64/0x6b [80244b18] hrtimer_start+0xf2/0x104 [80234d27] do_setitimer+0x15e/0x329 [80234fb9] alarm_setitimer+0x35/0x65 [8020935e] system_call+0x7e/0x83 Code: f0 0f ba 6d 00 00 19 c0 85 c0 74 08 48 89 ef e8 89 ce ff ff RIP [8025b406] set_page_dirty_lock+0xe/0x3a RSP 810028fedb68 CR2: Is there some magic sysctl_udp_dma_copybreak threshold value where you start seeing the benefit of IOAT-ing? Since you mentioned studentsevil grin here, it would be interesting to see data where udp starts benefitting. As I said, this is low on our priority list, so this data has not been gathered. cheers, jamal Thanks for your interest. sln -- == Mr. Shannon Nelson LAN Access Division, Intel Corp. [EMAIL PROTECTED]I don't speak for Intel (503) 712-7659Parents can't afford to be squeamish. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
On Thu, Nov 29, 2007 at 04:28:34PM -0800, H. Peter Anvin wrote: sky2 is the exception here, not the rule. It is, but it's not unique. Several USB adapters have the same problem, for example. Notice the common theme here that slow (or slower, i.e., certainly nowhere near 10Gb) NICs are the norm for violating alignment :) So I'd prefer something that only penalised them rather than everybody else. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
Herbert Xu wrote: On Thu, Nov 29, 2007 at 04:28:34PM -0800, H. Peter Anvin wrote: sky2 is the exception here, not the rule. It is, but it's not unique. Several USB adapters have the same problem, for example. Notice the common theme here that slow (or slower, i.e., certainly nowhere near 10Gb) NICs are the norm for violating alignment :) So I'd prefer something that only penalised them rather than everybody else. Obviously, and this should be a configure option anyway. -hpa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless vs. alignment requirements
Herbert Xu wrote: On Thu, Nov 29, 2007 at 09:50:35AM -0800, H. Peter Anvin wrote: Uhm, most cards affected *ARE* Ethernet cards, due to the bloody 14-byte header. Well most Ethernet drivers are using NET_IP_ALIGN which means that IP stack gets aligned packets only. sky2 is the exception here, not the rule. It is, but it's not unique. Several USB adapters have the same problem, for example. -hpa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH (resubmit)][BRIDGE] Properly dereference the br_should_route_hook
On Fri, Nov 30, 2007 at 10:49:00AM +1100, Herbert Xu wrote: On Thu, Nov 29, 2007 at 06:36:50AM -0800, Paul E. McKenney wrote: That certainly is an interesting tradeoff... Save a memory barrier when assigning NULL, but pay an extra test and branch in all cases. Though it does make for a simpler rule -- just use rcu_assign_pointer() in all cases. Of course, if almost all rcu_assign_pointer() executions assign non-NULL pointers, the optimal strategy would be to leave the implementation of rcu_assign_pointer() alone, and simply enforce use of rcu_assign_pointer(), even if the pointer being assigned is NULL. I was thinking of something much simpler. If the second argument is constant and NULL, then skip the barrier. No run-time slow-down at all. That certainly makes a lot of sense!!! You have in mind something like the following? #define rcu_assign_pointer(p, v) \ ({ \ if (!__builtin_constant_p(v) || \ ((v) != NULL)) \ smp_wmb(); \ (p) = (v); \ }) If so, I will do some testing and submit a patch. Probably to Gautham's preemptible-RCU patchset to avoid gratuitously complicating his life, especially given that he very graciously agreed to take it over from me. We should be able to live with the overhead in the meantime. ;-) Thanx, Paul Although rcu_dereference() does a memory barrier only on Alpha, that of rcu_assign_pointer() is needed on any machine that does not preserve store order (Itanium, POWER, ARM, some MIPS boxes according to rumor, ...). Good point! Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/1] ctc: make use of alloc_netdev()
To maintain the area used for sysfs attribute data which may be stored already previously. Mit freundlichen Grüßen / Best regards / Saluti, Peter Tiedemann - phone: +49-7031-16-4172 fax: ++3159e-mail: [EMAIL PROTECTED] IBM Deutschland Entwicklung GmbH Linux for eServer Development, Dept. 3303 Schoenaicher Str. 220 71032 Boeblingen, Germany IBM Deutschland Entwicklung GmbH, Vorsitzender des Aufsichtsrats: Martin Jetter, Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen, Registergericht: Amtsgericht Stuttgart, HRB 243294 Stephen Hemminger [EMAIL PROTECTED] -foundation.org To Ursula Braun1/Germany/[EMAIL PROTECTED] 29.11.2007 18:12 cc [EMAIL PROTECTED], netdev@vger.kernel.org, [EMAIL PROTECTED], Peter Tiedemann/Germany/[EMAIL PROTECTED] Subject Re: [patch 1/1] ctc: make use of alloc_netdev() On Thu, 29 Nov 2007 17:36:27 +0100 Ursula Braun [EMAIL PROTECTED] wrote: From: Peter Tiedemann [EMAIL PROTECTED] Currently ctc-device initialization is broken (kernel bug in ctc_new_device). The new network namespace code reveals a deficiency of the ctc driver. It should make use of alloc_netdev() as described in Documentation/networking/netdevices.txt. Signed-off-by: Peter Tiedemann [EMAIL PROTECTED] Signed-off-by: Ursula Braun [EMAIL PROTECTED] --- drivers/s390/net/ctcmain.c | 45 - 1 file changed, 16 insertions(+), 29 deletions(-) Index: linux-2.6-uschi/drivers/s390/net/ctcmain.c === --- linux-2.6-uschi.orig/drivers/s390/net/ctcmain.c +++ linux-2.6-uschi/drivers/s390/net/ctcmain.c @@ -2782,35 +2782,14 @@ ctc_probe_device(struct ccwgroup_device } /** - * Initialize everything of the net device except the name and the - * channel structs. + * Device setup function called by alloc_netdev(). + * + * @param dev Device to be setup. */ -static struct net_device * -ctc_init_netdevice(struct net_device * dev, int alloc_device, - struct ctc_priv *privptr) +void ctc_init_netdevice(struct net_device * dev) { - if (!privptr) - return NULL; - DBF_TEXT(setup, 3, __FUNCTION__); - if (alloc_device) { - dev = kzalloc(sizeof(struct net_device), GFP_KERNEL); - if (!dev) - return NULL; - } - - dev-priv = privptr; - privptr-fsm = init_fsm(ctcdev, dev_state_names, - dev_event_names, CTC_NR_DEV_STATES, CTC_NR_DEV_EVENTS, - dev_fsm, DEV_FSM_LEN, GFP_KERNEL); - if (privptr-fsm == NULL) { - if (alloc_device) - kfree(dev); - return NULL; - } - fsm_newstate(privptr-fsm, DEV_STATE_STOPPED); - fsm_settimer(privptr-fsm, privptr-restart_timer); if (dev-mtu == 0) dev-mtu = CTC_BUFSIZE_DEFAULT - LL_HEADER_LENGTH - 2; dev-hard_start_xmit = ctc_tx; @@ -2823,7 +2802,7 @@ ctc_init_netdevice(struct net_device * d dev-type = ARPHRD_SLIP; dev-tx_queue_len = 100; dev-flags = IFF_POINTOPOINT | IFF_NOARP; - return dev; + SET_MODULE_OWNER(dev); } @@ -2879,14 +2858,22 @@ ctc_new_device(struct ccwgroup_device *c ccw_device_set_online (cdev[1]) failed with ret = %d\n, ret); } - dev = ctc_init_netdevice(NULL, 1, privptr); - + dev = alloc_netdev(0, ctc%d, ctc_init_netdevice); if (!dev) {
Please pull 'fixes-jgarzik' branch of wireless-2.6
Jeff, A few fixes intended for 2.6.24... Let me know if there are any problems! Thanks, John --- Individual patches are available here: http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6.git fixes-jgarzik --- The following changes since commit d9f8bcbf67a0ee67c8cb0734f003dfe916bb5248: Linus Torvalds (1): Linux 2.6.24-rc3 are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git fixes-jgarzik David Woodhouse (1): libertas: Don't set NETIF_F_IPV6_CSUM in dev-features Holger Schurig (1): libertas: let more than one MAC event through Joe Perches (1): drivers/net/wireless: Add missing space Joonwoo Park (2): iwlwifi 3945 Fix race conditional panic. iwlwifi 4965 Fix race conditional panic. Saleem Abdulrasool (1): iwlwifi: fix possible NULL dereference in iwl_set_rate() Stefano Brivio (1): b43/b43legacy: fix left-over URLs and ifdefs Tomas Winkler (1): iwlwifi: fix iwl_mac_add_interface handler drivers/net/wireless/b43/main.c |2 +- drivers/net/wireless/b43/phy.c |2 +- drivers/net/wireless/b43legacy/dma.c|2 +- drivers/net/wireless/b43legacy/main.c |2 +- drivers/net/wireless/b43legacy/phy.c|2 +- drivers/net/wireless/bcm43xx/bcm43xx_phy.c |2 +- drivers/net/wireless/iwlwifi/iwl3945-base.c | 16 drivers/net/wireless/iwlwifi/iwl4965-base.c | 13 ++--- drivers/net/wireless/libertas/if_cs.c |3 ++- drivers/net/wireless/libertas/main.c|4 drivers/net/wireless/libertas/wext.c|2 +- drivers/net/wireless/netwave_cs.c |2 +- drivers/net/wireless/p54usb.c |2 +- 13 files changed, 33 insertions(+), 21 deletions(-) diff --git a/drivers/net/wireless/b43/main.c b/drivers/net/wireless/b43/main.c index 2b17c1d..b45eecc 100644 --- a/drivers/net/wireless/b43/main.c +++ b/drivers/net/wireless/b43/main.c @@ -1566,7 +1566,7 @@ static void b43_release_firmware(struct b43_wldev *dev) static void b43_print_fw_helptext(struct b43_wl *wl) { b43err(wl, You must go to - http://linuxwireless.org/en/users/Drivers/bcm43xx#devicefirmware + http://linuxwireless.org/en/users/Drivers/b43#devicefirmware and download the correct firmware (version 4).\n); } diff --git a/drivers/net/wireless/b43/phy.c b/drivers/net/wireless/b43/phy.c index 3d4ed64..7ff091e 100644 --- a/drivers/net/wireless/b43/phy.c +++ b/drivers/net/wireless/b43/phy.c @@ -2214,7 +2214,7 @@ int b43_phy_init_tssi2dbm_table(struct b43_wldev *dev) } dyn_tssi2dbm = kmalloc(64, GFP_KERNEL); if (dyn_tssi2dbm == NULL) { - b43err(dev-wl, Could not allocate memory + b43err(dev-wl, Could not allocate memory for tssi2dbm table\n); return -ENOMEM; } diff --git a/drivers/net/wireless/b43legacy/dma.c b/drivers/net/wireless/b43legacy/dma.c index 8cb3dc4..83161d9 100644 --- a/drivers/net/wireless/b43legacy/dma.c +++ b/drivers/net/wireless/b43legacy/dma.c @@ -996,7 +996,7 @@ int b43legacy_dma_init(struct b43legacy_wldev *dev) err = ssb_dma_set_mask(dev-dev, dmamask); if (err) { -#ifdef BCM43XX_PIO +#ifdef CONFIG_B43LEGACY_PIO b43legacywarn(dev-wl, DMA for this device not supported. Falling back to PIO\n); dev-__using_pio = 1; diff --git a/drivers/net/wireless/b43legacy/main.c b/drivers/net/wireless/b43legacy/main.c index 3bde1e9..32d5e17 100644 --- a/drivers/net/wireless/b43legacy/main.c +++ b/drivers/net/wireless/b43legacy/main.c @@ -1419,7 +1419,7 @@ static void b43legacy_release_firmware(struct b43legacy_wldev *dev) static void b43legacy_print_fw_helptext(struct b43legacy_wl *wl) { b43legacyerr(wl, You must go to http://linuxwireless.org/en/users/; -Drivers/bcm43xx#devicefirmware +Drivers/b43#devicefirmware and download the correct firmware (version 3).\n); } diff --git a/drivers/net/wireless/b43legacy/phy.c b/drivers/net/wireless/b43legacy/phy.c index 22a4b3d..491e518 100644 --- a/drivers/net/wireless/b43legacy/phy.c +++ b/drivers/net/wireless/b43legacy/phy.c @@ -2020,7 +2020,7 @@ int b43legacy_phy_init_tssi2dbm_table(struct b43legacy_wldev *dev) phy-idle_tssi = 62; dyn_tssi2dbm = kmalloc(64, GFP_KERNEL); if (dyn_tssi2dbm == NULL) { - b43legacyerr(dev-wl, Could not allocate memory + b43legacyerr(dev-wl, Could not allocate memory for tssi2dbm table\n); return -ENOMEM; } diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_phy.c