Re: [PATCH] [IPv4] Reply net unreachable ICMP message
Hello, Jarek, I am sorry, but I think I am nor sure I underatand exactly what you mean when you say: It overrides err codes from fib_lookup, where such decisions should be made. What is incorrect here ? There are two lines added in this patch; IP_INC_STATS_BH(IPSTATS_MIB_INNOROUTES); and err = -ENETUNREACH; The first one is, regardless to say, not relevant to err codes. The second, err = -ENETUNREACH, is from: ip_route_input_slow(). (net/ipv4/route.c). Assigning values to err is done more than once in this method; for example, e_hostunreach: err = -EHOSTUNREACH; e_inval: err = -EINVAL; e_nobufs: err = -ENOBUFS; So I don't think anything is incorrect here. Regards, Rami Rosen On Dec 6, 2007 9:49 AM, Jarek Poplawski [EMAIL PROTECTED] wrote: On 06-12-2007 07:31, Mitsuru Chinen wrote: IPv4 stack doesn't reply any ICMP destination unreachable message with net unreachable code when IP detagrams are being discarded because of no route could be found in the forwarding path. Incidentally, IPv6 stack replies such ICMPv6 message in the similar situation. Signed-off-by: Mitsuru Chinen [EMAIL PROTECTED] --- net/ipv4/route.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 6714bbc..ba85ec9 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1375,6 +1375,7 @@ static int ip_error(struct sk_buff *skb) break; case ENETUNREACH: code = ICMP_NET_UNREACH; + IP_INC_STATS_BH(IPSTATS_MIB_INNOROUTES); break; case EACCES: code = ICMP_PKT_FILTERED; @@ -2004,6 +2005,7 @@ no_route: RT_CACHE_STAT_INC(in_no_route); spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE); res.type = RTN_UNREACHABLE; + err = -ENETUNREACH; goto local_input; /* This patch seems to be wrong. It overrides err codes from fib_lookup, where such decisions should be made. Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv4] Reply net unreachable ICMP message
On Thu, 6 Dec 2007 08:49:47 +0100 Jarek Poplawski [EMAIL PROTECTED] wrote: On 06-12-2007 07:31, Mitsuru Chinen wrote: IPv4 stack doesn't reply any ICMP destination unreachable message with net unreachable code when IP detagrams are being discarded because of no route could be found in the forwarding path. Incidentally, IPv6 stack replies such ICMPv6 message in the similar situation. Signed-off-by: Mitsuru Chinen [EMAIL PROTECTED] --- net/ipv4/route.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 6714bbc..ba85ec9 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1375,6 +1375,7 @@ static int ip_error(struct sk_buff *skb) break; case ENETUNREACH: code = ICMP_NET_UNREACH; + IP_INC_STATS_BH(IPSTATS_MIB_INNOROUTES); break; case EACCES: code = ICMP_PKT_FILTERED; @@ -2004,6 +2005,7 @@ no_route: RT_CACHE_STAT_INC(in_no_route); spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE); res.type = RTN_UNREACHABLE; + err = -ENETUNREACH; goto local_input; /* This patch seems to be wrong. It overrides err codes from fib_lookup, where such decisions should be made. fib_lookup() replies -ESRCH in this situation. It is necessary to override the variable by the suitable error number like the code under e_hostunreach label. Best Regards, Mitsuru Chinen [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv4] Reply net unreachable ICMP message
On 06-12-2007 09:14, Mitsuru Chinen wrote: On Thu, 6 Dec 2007 08:49:47 +0100 Jarek Poplawski [EMAIL PROTECTED] wrote: On 06-12-2007 07:31, Mitsuru Chinen wrote: IPv4 stack doesn't reply any ICMP destination unreachable message with net unreachable code when IP detagrams are being discarded because of no route could be found in the forwarding path. Incidentally, IPv6 stack replies such ICMPv6 message in the similar situation. ... This patch seems to be wrong. It overrides err codes from fib_lookup, where such decisions should be made. fib_lookup() replies -ESRCH in this situation. It is necessary to override the variable by the suitable error number like the code under e_hostunreach label. Probably I miss something, but I can't see how can you be sure it's only -ESRCH possible here? Isn't opt-action() in fib_rules_lookup() supposed to return this -ENETUNREACH when needed? Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
Am Donnerstag, 6. Dezember 2007 03:25 schrieb David Miller: POSIX says nothing about the semantics of route resolution. Of course not. Applications must not care about what happens at the transport layer. Non-blocking doesn't mean cannot sleep no matter what. ... and as O_CREAT on open() isn't specifically documented to apply to filenames starting with 'a', it is perfectly normal that echo x ash always fails since 2.6.22. To revert to the old behaviour, please do echo 1 /proc/sys/fs/allow_a_file_creation. Ok, irony aside. Just have a look at http://www.opengroup.org/onlinepubs/009695399/functions/connect.html (I hope 009695399 is not a personalition cookie ;-) If the connection cannot be established immediately and O_NONBLOCK is set for the file descriptor for the socket, connect() shall fail and set errno to [EINPROGRESS], but the connection request shall not be aborted, and the connection shall be established asynchronously. I think the words shall fail and immediately are quite clear. If this is changed for some IP sockets, event-driven applications will randomly and subtly break. If this was such a clear cut case we'd have changed things a long time ago, but it isn't so don't pretend this is the case. Well, the only reason this doesn't break on a daily basis is because the code isn't in the kernel that long and not many people run applications on an IPSEC gateway. This will change if kernel based IPSEC is used for roadwarrior connections or dnssec based anonymous IPSEC someday. Trust me, you will revert this misbehaviour in -stable then. For some real life applications that break when nonblocking connect() blocks, please look f.e. at squid or mozilla firefox. Stefan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
From: Stefan Rompf [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 09:49:01 +0100 If the connection cannot be established immediately and O_NONBLOCK is set for the file descriptor for the socket, connect() shall fail and set errno to [EINPROGRESS], but the connection request shall not be aborted, and the connection shall be established asynchronously. I think the words shall fail and immediately are quite clear. They are, but the context in which they apply is vague. I can equally generate examples where the non-blocking behavior you are a proponent of would break non-blocking UDP apps during a sendmsg() call when we hit IPSEC resolution. Yet similar language on blocking semantics exists for sendmsg() in the standards. The world is shades of gray, implying anything else is foolhardy and that's how I'm handling this. Well, the only reason this doesn't break on a daily basis is because the code isn't in the kernel that long and not many people run applications on an IPSEC gateway. This will change if kernel based IPSEC is used for roadwarrior connections or dnssec based anonymous IPSEC someday. Trust me, you will revert this misbehaviour in -stable then. I use IPSEC every single day in this fashion, and I haven't. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reproducible data corruption with sendfile+vsftp - splice regression?
On Wed, 05 Dec 2007 23:54:29 +0100, Francois Romieu wrote: Holger Hoffstaette [EMAIL PROTECTED] : [...] Should I file this in bugzilla? Yes. Thanks for responding - will do. I verified with 2.6.24-rc4 (same bug) and have some new information about this. Despite my previous posting the corruption is NOT triggered by NAPI. It may be related, but even without NAPI but tso on again I got corruption, now also on the gbit client (Thinkpad T60). When ftp'ing to ramdisk with full speed (at a reasonable ~77 MB/sec) it often works, but intermediate writes that cause the ftp to temporarily slow down reliably cause corrupted files, so I guess tso gets confused when some kind of throttling sets in during transfer. That is probably why I first noticed it on the slow 100mbit client. Maybe turning off sendfile or NAPI just lead to random success - so far it really looks like tso on the r8169 is the common cause. thank you Holger -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reproducible data corruption with sendfile+vsftp - splice regression?
(removing .kernel as it seems to concern netdev only) On Thu, 06 Dec 2007 02:13:00 +0100, Francois Romieu wrote: Francois Romieu [EMAIL PROTECTED] : Holger Hoffstaette [EMAIL PROTECTED] : [...] Should I file this in bugzilla? Yes. 5326 5585327 5585328 5585329 5585330 5585331 5585332 5585333 5585334 5585335 558 5336 5585337 5585338 5585339 5585340 5585341 5585342 5585343 5589440 5589441 558 ^^^ ^^^ 9442 5589443 5589444 5589445 5589446 5589447 5589448 5589449 5589450 5589451 558 9452 5589453 5589454 5589455 5589456 5589457 5589458 5589459 5589460 5589461 558 It misses 8*4096 bytes. 8443 9068442 9068441 9068440 9068439 9068438 9068437 9068436 9068435 9068434 906 8433 9068432 9068431 9068430 9068429 9068428 9068427 9064330 9064329 9064328 906 ^^^ ^^^ 4327 9064326 9064325 9064324 9064323 9064322 9064321 9064320 9064319 9064318 906 Same thing later. But the amount of data transmitted is fine. Could you locate the offsets were the sequence is broken ? According to my hex editor the offsets are: 0x02aa43e4 0x02feb473 0x03142994 0x03765f33 0x03e42ff3 0x03e5079c 0x03e60d9c 0x0451db54 0x0452e7ec I'll also put all this into bugzilla. thanks! Holger -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bonding sysfs output
Jean Delvare [EMAIL PROTECTED] writes: On Mon, 26 Nov 2007 09:29:40 +0100, Wagner Ferenc wrote: On the policy side: some files are not applicable to some types of bonds, and return a single linefeed in that case. Except for one single case, which returns 'NA\n'. The patch changes these cases into emtpy files. IMHO a better approach would be to not create the files at all when they make no sense for a given type of bond. That would require much more in-depth changes in the sysfs code, I'm afraid. But see also the 5th patch in the series, which reponds to Jay's suggestion. And as such, goes in the opposite direction. -- Thanks, Feri. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] remove prototype of ip_rt_advice
ip_rt_advice has been gone, so no need to keep prototype and debug message. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] -- diff --git a/include/net/route.h b/include/net/route.h index f7ce625..59b0b19 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -109,7 +109,6 @@ struct in_device; extern int ip_rt_init(void); extern voidip_rt_redirect(__be32 old_gw, __be32 dst, __be32 new_gw, __be32 src, struct net_device *dev); -extern voidip_rt_advice(struct rtable **rp, int advice); extern voidrt_cache_flush(int how); extern int __ip_route_output_key(struct rtable **, const struct flowi *flp); extern int ip_route_output_key(struct rtable **, struct flowi *flp); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 134cab5..cefae61 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1198,7 +1198,7 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst) unsigned hash = rt_hash(rt-fl.fl4_dst, rt-fl.fl4_src, rt-fl.oif); #if RT_CACHE_DEBUG = 1 - printk(KERN_DEBUG ip_rt_advice: redirect to + printk(KERN_DEBUG ipv4_negative_advice: redirect to %u.%u.%u.%u/%02x dropped\n, NIPQUAD(rt-rt_dst), rt-fl.fl4_tos); #endif -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove prototype of ip_rt_advice
From: Denis V. Lunev [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 13:17:43 +0300 ip_rt_advice has been gone, so no need to keep prototype and debug message. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] Applied to net-2.6, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP event tracking via netlink...
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 5 Dec 2007 16:33:38 -0500 On Wed, 05 Dec 2007 08:53:07 -0800 Joe Perches [EMAIL PROTECTED] wrote: it occurred to me that we might want to do something like a state change event generator. This could be a basis for an interesting TCP performance tester. That is what tcpprobe does but it isn't detailed enough to address SACK issues. Indeed, this could be done via the jprobe there. Silly me I didn't do this in the implementation I whipped up, which I'll likely correct. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP event tracking via netlink...
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 01:18:28 +0200 (EET) On Wed, 5 Dec 2007, David Miller wrote: I assume you're using something like carefully crafted printk's, kprobes, or even ad-hoc statistic counters. That's what I used to do :-) No, that's not at all what I do :-). I usually look time-seq graphs expect for the cases when I just find things out by reading code (or by just thinking of it). Can you briefly detail what graph tools and command lines you are using? The last time I did graphing to analyze things, the tools were hit-or-miss. Much of the info is available in tcpdump already, it's just hard to read without graphing it first because there are some many overlapping things to track in two-dimensional space. ...But yes, I have to admit that couple of problems come to my mind where having some variable from tcp_sock would have made the problem more obvious. The most important are the cwnd and ssthresh, which you could guess using graphs but it is important to know on a packet to packet basis why we might have sent a packet or not because this has rippling effects down the rest of the RTT. Not sure what is the benefit of having distributions with it because those people hardly report problems anyway to here, they're just too happy with TCP performance unless we print something to their logs, which implies that we must setup a *_ON() condition :-(. That may be true, but if we could integrate the information with tcpdumps, we could gather internal state using tools the user already has available. Imagine if tcpdump printed out: 02:26:14.865805 IP $SRC $DEST: . 11226:12686(1460) ack 0 win 108 ss_thresh: 129 cwnd: 133 packets_out: 132 or something like that. Some problems are simply such that things cannot be accurately verified without high processing overhead until it's far too late (eg skb bits vs *_out counters). Maybe we should start to build an expensive state validator as well which would automatically check invariants of the write queue and tcp_sock in a straight forward, unoptimized manner? That would definately do a lot of work for us, just ask people to turn it on and it spits out everything that went wrong :-) (unless they really depend on very high-speed things and are therefore unhappy if we scan thousands of packets unnecessarily per ACK :-)). ...Early enough! ...That would work also for distros but there's always human judgement needed to decide whether the bug reporter will be happy when his TCP processing does no longer scale ;-). I think it's useful as a TCP_DEBUG config option or similar, sure. But sometimes the algorithms are working as designed, it's just that they provide poor pipe utilization and CWND analysis embedded inside of a tcpdump would be one way to see that as well as determine the flaw in the algorithm. ...Hopefully you found any of my comments useful. Very much so, thanks. I put together a sample implementation anyways just to show the idea, against net-2.6.25 below. It is untested since I didn't write the userland app yet to see that proper things get logged. Basically you could run a daemon that writes per-connection traces into files based upon the incoming netlink events. Later, using the binary pcap file and these traces, you can piece together traces like the above using the timestamps etc. to match up pcap packets to ones from the TCP logger. The userland tools could do analysis and print pre-cooked state diff logs, like this ACK raised CWND by one or whatever else you wanted to know. It's nice that an expert like you can look at graphs and understand, but we'd like to create more experts and besides reading code one way to become an expert is to be able to extrace live real data from the kernel's working state and try to understand how things got that way. This information is permanently lost currently. diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 56342c3..c0e61d0 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -170,6 +170,47 @@ struct tcp_md5sig { __u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */ }; +/* TCP netlink event logger. */ +struct tcp_log_key { + union { + __be32 a4; + __be32 a6[4]; + } saddr, daddr; + __be16 sport; + __be16 dport; + unsigned short family; + unsigned short __pad; +}; + +struct tcp_log_stamp { + __u32 tv_sec; + __u32 tv_usec; +}; + +struct tcp_log_payload { + struct tcp_log_key key; + struct tcp_log_stampstamp; + struct tcp_info info; +}; + +enum { + TCP_LOG_A_UNSPEC = 0, + __TCP_LOG_A_MAX, +}; +#define TCP_LOG_A_MAX (__TCP_LOG_A_MAX - 1) + +#define TCP_LOG_GENL_NAME tcp_log +#define TCP_LOG_GENL_VERSION 1 + +enum { + TCP_LOG_CMD_UNSPEC = 0, + TCP_LOG_CMD_HELLO, + TCP_LOG_CMD_GOODBYE, +
Re: TCP event tracking via netlink...
On Wed, Dec 05, 2007 at 09:03:43PM -0800, David Miller ([EMAIL PROTECTED]) wrote: I think this work is very different. When I say state I mean something more significant than CLOSE, ESTABLISHED, etc. which is what Samir's patches are tracking. I'm talking about all of the sequence numbers, SACK information, congestion control knobs, etc. whose values are nearly impossible to track on a packet to packet basis in order to diagnose problems. I pointed that work as a possible basis for collecting more info if you needs including sequence numbers, window sizes and so on. It just requires a useful structure layout placed, so that one would not require to recreate the same bits again, so that it could be called from any place inside the stack. -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
Am Donnerstag, 6. Dezember 2007 09:53 schrieb David Miller: I think the words shall fail and immediately are quite clear. They are, but the context in which they apply is vague. socket is connection-mode = SOCK_STREAM I can equally generate examples where the non-blocking behavior you are a proponent of would break non-blocking UDP apps during a sendmsg() call when we hit IPSEC resolution. Yet similar language on blocking semantics exists for sendmsg() in the standards. I am not a good enough kernel hacker to exactly understand the code flow in udp_sendmsg(). However, it seems that it first checks destination validity via ip_route_output_flow() and queues the message then. The sendmsg() documentation only talks about buffer space. I can see your dilemma. The reason why I'm pushing this issue another time is that I know quite a bit about system level application development. A very typical design pattern for non-naive single or multi threaded programs is that they set all communication sockets to be nonblocking and use a select()/epoll() based loop to dispatch IO. This often includes initiating a TCP connect() and asynchronously waiting for it to finish or fail from the main loop. The dangerous situation here is that in 99% of all cases things will just work because the phase 2 SA exists. In 0.8%, the SA will be established in 1 sec. However, in the rest of time the server application that you have considered to be stable will end up sleeping with all threads in a connect() call that is supposed to return immediatly. The world is shades of gray, implying anything else is foolhardy and that's how I'm handling this. Even though I consider programmers that ignore the result code on a nonblocking UDP sendmsg() fools, I agree. May be the best compromise is what Herbert Xu suggested in [EMAIL PROTECTED] in this thread: At least, for connect() O_NONBLOCK ist ALWAYS respected. Because this is where the chance for breakage is highest. Stefan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch] net/xfrm/xfrm_policy.c: Some small improvements
This patch contains the following changes. - Use 'bool' instead of 'int' for booleans. - Use 'size_t' instead of 'int' for 'sizeof' return value. - Some style fixes. Cc: Herbert Xu [EMAIL PROTECTED] Cc: David Miller [EMAIL PROTECTED] Signed-off-by: WANG Cong [EMAIL PROTECTED] --- net/xfrm/xfrm_policy.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 5d6a81d..311b08f 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -476,17 +476,17 @@ static u32 xfrm_gen_index(u8 type, int dir) struct hlist_head *list; struct xfrm_policy *p; u32 idx; - int found; + bool found; idx = (idx_generator | dir); idx_generator += 8; if (idx == 0) idx = 8; list = xfrm_policy_byidx + idx_hash(idx); - found = 0; + found = false; hlist_for_each_entry(p, entry, list, byidx) { if (p-index == idx) { - found = 1; + found = true; break; } } @@ -499,8 +499,8 @@ static inline int selector_cmp(struct xfrm_selector *s1, struct xfrm_selector *s { u32 *p1 = (u32 *) s1; u32 *p2 = (u32 *) s2; - int len = sizeof(struct xfrm_selector) / sizeof(u32); - int i; + size_t len = sizeof(struct xfrm_selector) / sizeof(u32); + size_t i; for (i = 0; i len; i++) { if (p1[i] != p2[i]) @@ -953,7 +953,7 @@ static int xfrm_policy_lookup(struct flowi *fl, u16 family, u8 dir, #ifdef CONFIG_XFRM_SUB_POLICY end: #endif - if ((*objp = (void *) pol) != NULL) + if ((*objp = pol) != NULL) *obj_refp = pol-refcnt; return err; } @@ -1137,7 +1137,7 @@ xfrm_tmpl_resolve_one(struct xfrm_policy *policy, struct flowi *fl, xfrm_address_t *saddr = xfrm_flowi_saddr(fl, family); xfrm_address_t tmp; - for (nx=0, i = 0; i policy-xfrm_nr; i++) { + for (nx = 0, i = 0; i policy-xfrm_nr; i++) { struct xfrm_state *x; xfrm_address_t *remote = daddr; xfrm_address_t *local = saddr; @@ -1395,7 +1395,7 @@ free_dst: } static int inline -xfrm_dst_alloc_copy(void **target, void *src, int size) +xfrm_dst_alloc_copy(void **target, void *src, size_t size) { if (!*target) { *target = kmalloc(size, GFP_ATOMIC); @@ -1554,7 +1554,7 @@ restart: #endif nx = xfrm_tmpl_resolve(pols, npols, fl, xfrm, family); - if (unlikely(nx0)) { + if (unlikely(nx 0)) { err = nx; if (err == -EAGAIN sysctl_xfrm_larval_drop) { /* EREMOTE tells the caller to generate @@ -1688,7 +1688,8 @@ xfrm_state_ok(struct xfrm_tmpl *tmpl, struct xfrm_state *x, unsigned short family) { if (xfrm_state_kern(x)) - return tmpl-optional !xfrm_state_addr_cmp(tmpl, x, tmpl-encap_family); + return tmpl-optional + !xfrm_state_addr_cmp(tmpl, x, tmpl-encap_family); return x-id.proto == tmpl-id.proto (x-id.spi == tmpl-id.spi || !tmpl-id.spi) (x-props.reqid == tmpl-reqid || !tmpl-reqid) @@ -1777,7 +1778,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, if (skb-sp) { int i; - for (i=skb-sp-len-1; i=0; i--) { + for (i = skb-sp-len-1; i = 0; i--) { struct xfrm_state *x = skb-sp-xvec[i]; if (!xfrm_selector_match(x-sel, fl, family)) return 0; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25 10/11][INET] Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding
On Wed, Dec 05, 2007 at 09:39:33PM -0800, David Miller wrote: But we go back again to the question of how to get this current behavior setting instantiated early enough. So much stuff happens via initrd's etc. before the real userland has a change to run things, read setting from the real filesystem config giles, in order to change this. Perhaps a boot time command line option? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
From: Stefan Rompf [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 11:56:48 +0100 Am Donnerstag, 6. Dezember 2007 09:53 schrieb David Miller: I think the words shall fail and immediately are quite clear. They are, but the context in which they apply is vague. socket is connection-mode = SOCK_STREAM I meant whether immediately mean in reference to socket state or includes auxiliary things like route lookups. When you do a non-blocking write on a socket, things like memory allocations can block, potentially for a long time. It is an example where there are definite boundaries to where the non-blocking'ness applies. And therefore it is not so cut and dry and you present this issue. The reason why I'm pushing this issue another time is that I know quite a bit about system level application development. A very typical design pattern for non-naive single or multi threaded programs is that they set all communication sockets to be nonblocking and use a select()/epoll() based loop to dispatch IO. This often includes initiating a TCP connect() and asynchronously waiting for it to finish or fail from the main loop. The dangerous situation here is that in 99% of all cases things will just work because the phase 2 SA exists. In 0.8%, the SA will be established in 1 sec. However, in the rest of time the server application that you have considered to be stable will end up sleeping with all threads in a connect() call that is supposed to return immediatly. And that connect() call can hang for a long time due to any memory allocation done in the connect() path. You are not avoiding blocking by setting O_NONBLOCK on the socket, it is quite foolhardy to think that it does so unilaterally. And that's why this is a grey area. Why is waiting for memory allocation on a O_NONBLOCK socket OK but waiting for IPSEC route resolution is not? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] net/xfrm/xfrm_policy.c: Some small improvements
From: WANG Cong [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 19:01:23 +0800 This patch contains the following changes. - Use 'bool' instead of 'int' for booleans. - Use 'size_t' instead of 'int' for 'sizeof' return value. - Some style fixes. Cc: Herbert Xu [EMAIL PROTECTED] Cc: David Miller [EMAIL PROTECTED] Signed-off-by: WANG Cong [EMAIL PROTECTED] Normally I would let a patch like this sit in my mailbox for a week and then delete it. But this time I'll just let you know up front that I don't see much value in this patch. It is not a clear improvement to replace int's with bool's in my mind and the other changes are just whitespace changes. And thus I can delete the patch from my mailbox immediately :-) Sorry. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25 10/11][INET] Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 22:06:01 +1100 On Wed, Dec 05, 2007 at 09:39:33PM -0800, David Miller wrote: But we go back again to the question of how to get this current behavior setting instantiated early enough. So much stuff happens via initrd's etc. before the real userland has a change to run things, read setting from the real filesystem config giles, in order to change this. Perhaps a boot time command line option? It's not pleasant but it would indeed work. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
Am Donnerstag, 6. Dezember 2007 12:13 schrieb David Miller: And that's why this is a grey area. Why is waiting for memory allocation on a O_NONBLOCK socket OK but waiting for IPSEC route resolution is not? Because you just will put enough RAM modules into you server when setting up a scalable system. Local resource, managable by the admin. What you cannot control in many cases is the network connection to the remote node. Simon Arlott has been talking about an 8 hour network outage. Stefan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
From: Stefan Rompf [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 12:35:05 +0100 Because you just will put enough RAM modules into you server when setting up a scalable system. This suggestion is avoiding the important semantic issue, and won't lead to a real discussion of the core problem. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25] multiple namespaces in the all dst_ifdown routines
move dst entries to a namespace loopback to catch refcounting leaks. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/core/dst.c |4 ++-- net/ipv4/route.c|5 +++-- net/ipv4/xfrm4_policy.c |3 ++- net/ipv6/route.c|7 +-- net/ipv6/xfrm6_policy.c |3 ++- net/xfrm/xfrm_policy.c |2 +- 6 files changed, 15 insertions(+), 9 deletions(-) diff --git a/net/core/dst.c b/net/core/dst.c index f538061..5c6cfc4 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -279,11 +279,11 @@ static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, if (!unregister) { dst-input = dst-output = dst_discard; } else { - dst-dev = init_net.loopback_dev; + dst-dev = dst-dev-nd_net-loopback_dev; dev_hold(dst-dev); dev_put(dev); if (dst-neighbour dst-neighbour-dev == dev) { - dst-neighbour-dev = init_net.loopback_dev; + dst-neighbour-dev = dst-dev; dev_put(dev); dev_hold(dst-neighbour-dev); } diff --git a/net/ipv4/route.c b/net/ipv4/route.c index dae1290..e4aa97e 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1425,8 +1425,9 @@ static void ipv4_dst_ifdown(struct dst_entry *dst, struct net_device *dev, { struct rtable *rt = (struct rtable *) dst; struct in_device *idev = rt-idev; - if (dev != init_net.loopback_dev idev idev-dev == dev) { - struct in_device *loopback_idev = in_dev_get(init_net.loopback_dev); + if (dev != dev-nd_net-loopback_dev idev idev-dev == dev) { + struct in_device *loopback_idev = + in_dev_get(dev-nd_net-loopback_dev); if (loopback_idev) { rt-idev = loopback_idev; in_dev_put(idev); diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 19fdf8a..e086260 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -216,7 +216,8 @@ static void xfrm4_dst_ifdown(struct dst_entry *dst, struct net_device *dev, xdst = (struct xfrm_dst *)dst; if (xdst-u.rt.idev-dev == dev) { - struct in_device *loopback_idev = in_dev_get(init_net.loopback_dev); + struct in_device *loopback_idev = + in_dev_get(dev-nd_net-loopback_dev); BUG_ON(!loopback_idev); do { diff --git a/net/ipv6/route.c b/net/ipv6/route.c index e36cac9..e757a3c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -216,9 +216,12 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev, { struct rt6_info *rt = (struct rt6_info *)dst; struct inet6_dev *idev = rt-rt6i_idev; + struct net_device *loopback_dev = + dev-nd_net-loopback_dev; - if (dev != init_net.loopback_dev idev != NULL idev-dev == dev) { - struct inet6_dev *loopback_idev = in6_dev_get(init_net.loopback_dev); + if (dev != loopback_dev idev != NULL idev-dev == dev) { + struct inet6_dev *loopback_idev = + in6_dev_get(loopback_dev); if (loopback_idev != NULL) { rt-rt6i_idev = loopback_idev; in6_dev_put(idev); diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index cc0d151..7b360ea 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -233,7 +233,8 @@ static void xfrm6_dst_ifdown(struct dst_entry *dst, struct net_device *dev, xdst = (struct xfrm_dst *)dst; if (xdst-u.rt6.rt6i_idev-dev == dev) { - struct inet6_dev *loopback_idev = in6_dev_get(init_net.loopback_dev); + struct inet6_dev *loopback_idev = + in6_dev_get(dev-nd_net-loopback_dev); BUG_ON(!loopback_idev); do { diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index a9ac748..900f6b6 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1932,7 +1932,7 @@ static int stale_bundle(struct dst_entry *dst) void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev) { while ((dst = dst-child) dst-xfrm dst-dev == dev) { - dst-dev = init_net.loopback_dev; + dst-dev = dev-nd_net-loopback_dev; dev_hold(dst-dev); dev_put(dev); } -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BRIDGE] warning message when add an interface to bridge
Thanks. After applying this patch, the warning message is gone. [PATCH] net: Fix running without sysfs On Dec 6, 2007 2:00 PM, Eric W. Biederman [EMAIL PROTECTED] wrote: Stephen Hemminger [EMAIL PROTECTED] writes: On Wed, 5 Dec 2007 10:44:17 +0800 Chung-Chi Lo [EMAIL PROTECTED] wrote: My kernel is Linxu 2.6.22.1. SYSFS is off. When adding an interface to bridge, console will show WARNING message. If turn SYSFS to on, then the WARNING message is gone. Any suggestion how to debug this problem? Thanks. # ifconfig eth0 0.0.0.0 eth0: starting interface. # brctl addbr br0 # brctl addif br0 eth0 WARNING: at lib/kref.c:33 kref_get() Call Trace: [80027844] dump_stack+0x8/0x38 [8011f348] kref_get+0xdc/0xe4 [8011ee20] kobject_get+0x20/0x34 [8011e910] kobject_shadow_add+0x5c/0x170 [8011ea34] kobject_add+0x10/0x20 [8020aac0] br_add_if+0xb4/0x1b4 [8020b354] add_del_if+0x5c/0x118 [8020bcc4] br_dev_ioctl+0x6c/0x88 [80182edc] dev_ifsioc+0x334/0x3c0 [80183184] dev_ioctl+0x21c/0x2ec [8016f76c] sock_ioctl+0x130/0x2e4 [800b3b2c] do_ioctl+0x6c/0x84 [800b3d40] vfs_ioctl+0x80/0x248 [800b3f58] sys_ioctl+0x50/0x98 [8002a8a8] stack_done+0x20/0x3c -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html This is an an artifact of the kobject_shadow code which was reverted in later kernels. It is gone in 2.6.23 I don't think it was the kobject_shadow, but rather we didn't initialize the kref or something like that in net/core/dev.c I believe commit 8b41d1887db718be9a2cd9e18c58ce25a4c7fd93 was the fix. Disabling sysfs can be a fun exercise in finding corner case bugs right now. Eric -- Lino, Chung-Chi Lo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
Am Donnerstag, 6. Dezember 2007 12:39 schrieb David Miller: Because you just will put enough RAM modules into you server when setting up a scalable system. This suggestion is avoiding the important semantic issue, and won't lead to a real discussion of the core problem. When writing applications for unix operating systems, it is known since ages that stuff can be swapped out and that even things like memory accesses can block. So it does not really surprise when a system call has to wait for memory - just imagine the kernel code for connect() could be and has been swapped out. Even with moderate swap activity, this memory should be available in much less than one second. If on the other hand the system is already threshing, it is no difference if it does so within connect() or while reaching the connect() system call in the application flow. Btw, this is where admin responsibility to size their systems kicks in. So where I would draw the line: connect() is clearly a network related function. Therefore, if a nonblocking connect() has to sleep for a local, controllable resource like memory to become available, this is ok. Maybe it shouldn't wait for a 128MB buffer if someone configured such an abonimation, haven't thought deeply about that. But when being told not to wait the connection to complete, it should never ever wait for another network related activity like IPSEC SA setup to complete, especially not for hours. IMHO this is what developers expect, and is also consistent with the fact that POSIX does not define O_NONBLOCK behaviour for local files. Stefan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25 10/11][INET] Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding
Herbert Xu wrote: David Miller [EMAIL PROTECTED] wrote: The user is pretty much screwed in one way or the other. For example: 1) If 'default' propagates to all devices, any specific setting for a device is lost. 2) If 'default' does not propagate, there is no way to have 'default' influence devices which have already been loaded. Well the way it works on IPv4 currently (for most options) is that we'll propagate default settings to a device until either: 1) the user modifies the setting for that device; 2) or that an IPv4 address has been added to the device. BTW, this is not 100% true. Look, in rtm_to_ifaddr() I see the following code flow: ipv4_devconf_setall(in_dev); ifa = inet_alloc_ifa(); if (ifa == NULL) { /* * A potential indev allocation can be left alive, it stays * assigned to its device and is destroy with it. */ err = -ENOBUFS; goto errout; } if we fail to allocate the ifa (hard to happen, but), we will make this device not to accept the default propagation. If this is a relevant note, I can prepare the patch. 2) was done to preserve backwards compatibility as the controls were previously only available after address addition and we did not propagate default settings in that case.. We could easily extend this so that the default propagation worked until the user modified the setting, with an ioctl to revert to the current behaviour for compatibility. Cheers, -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC2][PATCH 7/7] [TFRC]: New rx history code
Gerrit, I think I got this right this time, please see if there is anything left so that we can move on. I plan to go thru the following patches restricting myself to namespacing and consistency issues, leaving ideas I have for later, when we get more of your backlog merged. The first six patches in this series are unmodified, so if you are OK with them please send me your Signed-off-by. Thanks a lot, - Arnaldo From 2a3b4067dd514ce0e307d165783bc561cc7f17c4 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 10:56:58 -0200 Subject: [PATCH 7/7] [TFRC]: New rx history code Credit here goes to Gerrit Renker, that provided the initial implementation for this new codebase. I modified it just to try to make it closer to the existing API, renaming some functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not freeing the allocated ring entries on the error path. Original changeset comment from Gerrit: --- This provides a new, self-contained and generic RX history service for TFRC based protocols. Details: * new data structure, initialisation and cleanup routines; * allocation of dccp_rx_hist entries local to packet_history.c, as a service exported by the dccp_tfrc_lib module. * interface to automatically track highest-received seqno; * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] --- net/dccp/ccids/ccid3.c | 292 +-- net/dccp/ccids/ccid3.h | 14 +- net/dccp/ccids/lib/loss_interval.c | 13 ++- net/dccp/ccids/lib/packet_history.c | 290 +-- net/dccp/ccids/lib/packet_history.h | 83 +-- 5 files changed, 334 insertions(+), 358 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5ff5aab..28a5e4d 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len, /* * Receiver Half-Connection Routines */ + +/* CCID3 feedback types */ +enum ccid3_fback_type { + CCID3_FBACK_NONE = 0, + CCID3_FBACK_INITIAL, + CCID3_FBACK_PERIODIC, + CCID3_FBACK_PARAM_CHANGE +}; + #ifdef CONFIG_IP_DCCP_CCID3_DEBUG static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state) { @@ -667,59 +676,60 @@ static void ccid3_hc_rx_set_state(struct sock *sk, hcrx-ccid3hcrx_state = state; } -static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) -{ - if (likely(len 0))/* don't update on empty packets (e.g. ACKs) */ - hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9); -} - -static void ccid3_hc_rx_send_feedback(struct sock *sk) +static void ccid3_hc_rx_send_feedback(struct sock *sk, + const struct sk_buff *skb, + enum ccid3_fback_type fbtype) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct tfrc_rx_hist_entry *packet; ktime_t now; - suseconds_t delta; + s64 delta = 0; ccid3_pr_debug(%s(%p) - entry \n, dccp_role(sk), sk); + if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_TERM)) + return; + now = ktime_get_real(); - switch (hcrx-ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: + switch (fbtype) { + case CCID3_FBACK_INITIAL: hcrx-ccid3hcrx_x_recv = 0; + hcrx-ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ break; - case TFRC_RSTATE_DATA: - delta = ktime_us_delta(now, - hcrx-ccid3hcrx_tstamp_last_feedback); - DCCP_BUG_ON(delta 0); - hcrx-ccid3hcrx_x_recv = - scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta); + case CCID3_FBACK_PARAM_CHANGE: + /* +* When parameters change (new loss or p p_prev), we do not +* have a reliable estimate for R_m of [RFC 3448, 6.2] and so +* need to reuse the previous value of X_recv. However, when +* X_recv was 0 (due to early loss), this would kill X down to +* s/t_mbi (i.e. one packet in 64 seconds). +* To avoid such drastic reduction, we approximate X_recv as +* the number of bytes since last feedback. +* This is a safe fallback, since X is bounded above by X_calc. +*/ + if (hcrx-ccid3hcrx_x_recv 0) + break; + /* fall through */ + case CCID3_FBACK_PERIODIC: + delta = ktime_us_delta(now,
Re: TCP event tracking via netlink...
Em Thu, Dec 06, 2007 at 02:20:58AM -0800, David Miller escreveu: From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 5 Dec 2007 16:33:38 -0500 On Wed, 05 Dec 2007 08:53:07 -0800 Joe Perches [EMAIL PROTECTED] wrote: it occurred to me that we might want to do something like a state change event generator. This could be a basis for an interesting TCP performance tester. That is what tcpprobe does but it isn't detailed enough to address SACK issues. Indeed, this could be done via the jprobe there. Silly me I didn't do this in the implementation I whipped up, which I'll likely correct. I have some experiments from the past on this area: This is what is produced by ctracer + the ostra callgrapher when tracking many sk_buff objects, tracing sk_buff routines and as well all other structs that have a pointer to a sk_buff, i.e. where the sk_buff can be get from the struct that has a pointer to it, tcp_sock is an alias to struct inet_sock that is an alias to struct sock, etc, so when tracing tcp_sock you also trace inet_connection_sock, inet_sock, sock methods: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sk_buff/many_objects/ With just one object (that is reused, so appears many times): http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sk_buff/0x8101013130e8/ Following struct sock methods: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sock/many_objects/ http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sock/0xf61bf500/ struct socket: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/socket/many_objects/ It works by using the DWARF information to generate a systemtap module that in turn will create a relayfs channel where we store the traces and a automatically reorganized struct with just the base types (int, char, long, etc) and typedefs that end up being base types. Example of the struct minisock recreated from the debugging information and reorganized using the algorithms in pahole to save space, generated by this tool, go to the bottom, where you'll find struct ctracer__mini_sock and the collector, that from a full sized object creates the mini struct. http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/ctracer_collector.struct.sock.c And the systemtap module (the tcpprobe on steroids) automatically generated: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/ctracer_methods.struct.sock.stp This requires more work to: . reduce the overhead . filter out undesired functions creating a project with the functions desired using some gui editor . specify lists of fields to put on the internal state to be collected, again using a gui or plain ctracer-edit using vi, instead of getting just base types . Be able to say: collect just the fields on the second and fourth cacheline . collectors for complex objects such as spinlocks, socket lock, mutexes But since people are wanting to work on tools to watch state transitions, fields changing, etc, I thought I should dust off the ostra experiments and the more recent dwarves ctracer work I'm doing on my copious spare time 8) In the callgrapher there are some more interesting stuff: Interface to see where fields changed: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sock/0xf61bf500/changes.html In this page clicking on a field name, such as: http://oops.ghostprotocols.net:81/acme/dwarves/callgraphs/sock/0xf61bf500/sk_forward_alloc.png You'll get graphs over time. Code is in the dwarves repo at: http://master.kernel.org/git/?p=linux/kernel/git/acme/pahole.git;a=summary Thanks, - Arnaldo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][VLAN] Lost rtnl_unlock() in vlan_ioctl()
The SET_VLAN_NAME_TYPE_CMD command w/o CAP_NET_ADMIN capability doesn't release the rtnl lock. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 6567213..5b18315 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -776,7 +776,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg) case SET_VLAN_NAME_TYPE_CMD: err = -EPERM; if (!capable(CAP_NET_ADMIN)) - return -EPERM; + break; if ((args.u.name_type = 0) (args.u.name_type VLAN_NAME_TYPE_HIGHEST)) { vlan_name_type = args.u.name_type; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency
| Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Signed-off-by: Gerrit Renker [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][VLAN] Lost rtnl_unlock() in vlan_ioctl()
Pavel Emelyanov wrote: The SET_VLAN_NAME_TYPE_CMD command w/o CAP_NET_ADMIN capability doesn't release the rtnl lock. Thanks Pavel. I somehow recall that we already fixed this one, but can't find the patch :) Dave, please apply. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] [TFRC]: Make the rx history slab be global
| Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Signed-off-by: Gerrit Renker [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_
| Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Signed-off-by: Gerrit Renker [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][VLAN] Lost rtnl_unlock() in vlan_ioctl()
From: Patrick McHardy [EMAIL PROTECTED] Date: Thu, 06 Dec 2007 14:59:24 +0100 Pavel Emelyanov wrote: The SET_VLAN_NAME_TYPE_CMD command w/o CAP_NET_ADMIN capability doesn't release the rtnl lock. Thanks Pavel. I somehow recall that we already fixed this one, but can't find the patch :) Dave, please apply. I think we even added this bug to -stable, or something like that, didn't we? Yikes... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC2][PATCH 7/7] [TFRC]: New rx history code
| The first six patches in this series are unmodified, so if you | are OK with them please send me your Signed-off-by. Patches [1/7], [2/7], and [6/7] already have a signed-off and there are no changes. Just acknowledged [3..5/7], will look at [7/7] now. Cheers Gerrit -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/6] ipv6 - make xfrm6_init to return an error code
The xfrm initialization function does not return any error code, so if there is an error, the caller can not be advise of that. This patch checks the return code of the different called functions in order to return a successful or failed initialization. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- include/net/xfrm.h |4 ++-- net/ipv6/xfrm6_policy.c | 22 +- net/ipv6/xfrm6_state.c |4 ++-- 3 files changed, 21 insertions(+), 9 deletions(-) Index: net-2.6.25/include/net/xfrm.h === --- net-2.6.25.orig/include/net/xfrm.h +++ net-2.6.25/include/net/xfrm.h @@ -1066,11 +1066,11 @@ struct xfrm6_tunnel { extern void xfrm_init(void); extern void xfrm4_init(void); -extern void xfrm6_init(void); +extern int xfrm6_init(void); extern void xfrm6_fini(void); extern void xfrm_state_init(void); extern void xfrm4_state_init(void); -extern void xfrm6_state_init(void); +extern int xfrm6_state_init(void); extern void xfrm6_state_fini(void); extern int xfrm_state_walk(u8 proto, int (*func)(struct xfrm_state *, int, void*), void *); Index: net-2.6.25/net/ipv6/xfrm6_policy.c === --- net-2.6.25.orig/net/ipv6/xfrm6_policy.c +++ net-2.6.25/net/ipv6/xfrm6_policy.c @@ -269,9 +269,9 @@ static struct xfrm_policy_afinfo xfrm6_p .fill_dst = xfrm6_fill_dst, }; -static void __init xfrm6_policy_init(void) +static int __init xfrm6_policy_init(void) { - xfrm_policy_register_afinfo(xfrm6_policy_afinfo); + return xfrm_policy_register_afinfo(xfrm6_policy_afinfo); } static void xfrm6_policy_fini(void) @@ -279,10 +279,22 @@ static void xfrm6_policy_fini(void) xfrm_policy_unregister_afinfo(xfrm6_policy_afinfo); } -void __init xfrm6_init(void) +int __init xfrm6_init(void) { - xfrm6_policy_init(); - xfrm6_state_init(); + int ret; + + ret = xfrm6_policy_init(); + if (ret) + goto out; + + ret = xfrm6_state_init(); + if (ret) + goto out_policy; +out: + return ret; +out_policy: + xfrm6_policy_fini(); + goto out; } void xfrm6_fini(void) Index: net-2.6.25/net/ipv6/xfrm6_state.c === --- net-2.6.25.orig/net/ipv6/xfrm6_state.c +++ net-2.6.25/net/ipv6/xfrm6_state.c @@ -198,9 +198,9 @@ static struct xfrm_state_afinfo xfrm6_st .transport_finish = xfrm6_transport_finish, }; -void __init xfrm6_state_init(void) +int __init xfrm6_state_init(void) { - xfrm_state_register_afinfo(xfrm6_state_afinfo); + return xfrm_state_register_afinfo(xfrm6_state_afinfo); } void xfrm6_state_fini(void) -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/6] ipv6 - ipv6 routing initialization
This patchset provides modifications around the routes initialization for ipv6. Actually the init functions does not return an error code so the protocol can not be notified that there were an error while initializing the routing subsystems. The patchset make the init functions to return an error code, so the ipv6 can safely handle the error and fail gracefully. The error code can also let to catch the kmem_cache_creation failure without doing a radical panic. That's allow just to fail to load the ipv6 module without crashing down the machine. -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/6] ipv6 - make fib6_rules_init to return an error code
When the fib_rules initialization finished, no return code is provided so there is no way to know, for the caller, if the initialization has been successful or has failed. This patch fix that. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- include/net/fib_rules.h |1 + include/net/ip6_fib.h |2 +- net/core/fib_rules.c|5 +++-- net/ipv6/fib6_rules.c | 19 --- 4 files changed, 21 insertions(+), 6 deletions(-) Index: net-2.6.25/include/net/ip6_fib.h === --- net-2.6.25.orig/include/net/ip6_fib.h +++ net-2.6.25/include/net/ip6_fib.h @@ -226,7 +226,7 @@ extern void fib6_gc_cleanup(void); extern int fib6_init(void); -extern voidfib6_rules_init(void); +extern int fib6_rules_init(void); extern voidfib6_rules_cleanup(void); #endif Index: net-2.6.25/net/ipv6/fib6_rules.c === --- net-2.6.25.orig/net/ipv6/fib6_rules.c +++ net-2.6.25/net/ipv6/fib6_rules.c @@ -265,10 +265,23 @@ static int __init fib6_default_rules_ini return 0; } -void __init fib6_rules_init(void) +int __init fib6_rules_init(void) { - BUG_ON(fib6_default_rules_init()); - fib_rules_register(fib6_rules_ops); + int ret; + + ret = fib6_default_rules_init(); + if (ret) + goto out; + + ret = fib_rules_register(fib6_rules_ops); + if (ret) + goto out_default_rules_init; +out: + return ret; + +out_default_rules_init: + fib_rules_cleanup_ops(fib6_rules_ops); + goto out; } void fib6_rules_cleanup(void) Index: net-2.6.25/include/net/fib_rules.h === --- net-2.6.25.orig/include/net/fib_rules.h +++ net-2.6.25/include/net/fib_rules.h @@ -103,6 +103,7 @@ static inline u32 frh_get_table(struct f extern int fib_rules_register(struct fib_rules_ops *); extern int fib_rules_unregister(struct fib_rules_ops *); +extern void fib_rules_cleanup_ops(struct fib_rules_ops *); extern int fib_rules_lookup(struct fib_rules_ops *, struct flowi *, int flags, Index: net-2.6.25/net/core/fib_rules.c === --- net-2.6.25.orig/net/core/fib_rules.c +++ net-2.6.25/net/core/fib_rules.c @@ -102,7 +102,7 @@ errout: EXPORT_SYMBOL_GPL(fib_rules_register); -static void cleanup_ops(struct fib_rules_ops *ops) +void fib_rules_cleanup_ops(struct fib_rules_ops *ops) { struct fib_rule *rule, *tmp; @@ -111,6 +111,7 @@ static void cleanup_ops(struct fib_rules fib_rule_put(rule); } } +EXPORT_SYMBOL_GPL(fib_rules_cleanup_ops); int fib_rules_unregister(struct fib_rules_ops *ops) { @@ -121,7 +122,7 @@ int fib_rules_unregister(struct fib_rule list_for_each_entry(o, rules_ops, list) { if (o == ops) { list_del_rcu(o-list); - cleanup_ops(ops); + fib_rules_cleanup_ops(ops); goto out; } } -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC2][PATCH 7/7] [TFRC]: New rx history code
Em Thu, Dec 06, 2007 at 02:02:25PM +, Gerrit Renker escreveu: | The first six patches in this series are unmodified, so if you | are OK with them please send me your Signed-off-by. Patches [1/7], [2/7], and [6/7] already have a signed-off and there are no changes. Just acknowledged [3..5/7], will look at [7/7] now. OK, please let me know if there are still any problems. The removal of timestamp insertion in ccid3_hc_rx_insert_options will be put in another cset. - Arnaldo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
From: Stefan Rompf [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 13:30:20 +0100 IMHO this is what developers expect, and is also consistent with the fact that POSIX does not define O_NONBLOCK behaviour for local files. You keep ignoring the fact that, as Herbert and I discussed, not blocking for IPSEC resolution will make some connect() cases fail that would otherwise not fail. There are two sides to this issue, and we need to consider them both. Long term a resolution-packet-queue provides a solution that handles both angles correctly, but we don't have that code yet. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 5/6] ipv6 - make af_inet6 to check ip6_route_init return value
The af_inet6 initialization function does not check the return code of the route initilization, so if something goes wrong, the protocol initialization will continue anyway. This patch takes into account the modification made in the different route's initialization subroutines to check the return value and to make the protocol initialization to fail. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- net/ipv6/af_inet6.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Index: net-2.6.25/net/ipv6/af_inet6.c === --- net-2.6.25.orig/net/ipv6/af_inet6.c +++ net-2.6.25/net/ipv6/af_inet6.c @@ -849,7 +849,9 @@ static int __init inet6_init(void) if (if6_proc_init()) goto proc_if6_fail; #endif - ip6_route_init(); + err = ip6_route_init(); + if (err) + goto ip6_route_fail; ip6_flowlabel_init(); err = addrconf_init(); if (err) @@ -874,6 +876,7 @@ out: addrconf_fail: ip6_flowlabel_cleanup(); ip6_route_cleanup(); +ip6_route_fail: #ifdef CONFIG_PROC_FS if6_proc_exit(); proc_if6_fail: @@ -904,6 +907,7 @@ icmp_fail: cleanup_ipv6_mibs(); out_unregister_sock: sock_unregister(PF_INET6); + rtnl_unregister_all(PF_INET6); out_unregister_raw_proto: proto_unregister(rawv6_prot); out_unregister_udplite_proto: -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/6] ipv6 - make ip6_route_init to return an error code
The route initialization function does not return any value to notify if the initialization is successful or not. This patch checks all calls made for the initilization in order to return a value for the caller. Unfortunatly, proc_net_fops_create will return a NULL pointer if CONFIG_PROC_FS is off, so we can not check the return code without an ifdef CONFIG_PROC_FS block in the ip6_route_init function. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- include/net/ip6_route.h |2 - net/ipv6/route.c| 66 +++- 2 files changed, 55 insertions(+), 13 deletions(-) Index: net-2.6.25/include/net/ip6_route.h === --- net-2.6.25.orig/include/net/ip6_route.h +++ net-2.6.25/include/net/ip6_route.h @@ -50,7 +50,7 @@ extern void ip6_route_input(struct sk_ extern struct dst_entry * ip6_route_output(struct sock *sk, struct flowi *fl); -extern voidip6_route_init(void); +extern int ip6_route_init(void); extern voidip6_route_cleanup(void); extern int ipv6_route_ioctl(unsigned int cmd, void __user *arg); Index: net-2.6.25/net/ipv6/route.c === --- net-2.6.25.orig/net/ipv6/route.c +++ net-2.6.25/net/ipv6/route.c @@ -2460,26 +2460,70 @@ ctl_table ipv6_route_table[] = { #endif -void __init ip6_route_init(void) +int __init ip6_route_init(void) { + int ret; + ip6_dst_ops.kmem_cachep = kmem_cache_create(ip6_dst_cache, sizeof(struct rt6_info), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); ip6_dst_blackhole_ops.kmem_cachep = ip6_dst_ops.kmem_cachep; - fib6_init(); - proc_net_fops_create(init_net, ipv6_route, 0, ipv6_route_proc_fops); - proc_net_fops_create(init_net, rt6_stats, S_IRUGO, rt6_stats_seq_fops); + ret = fib6_init(); + if (ret) + goto out_kmem_cache; + +#ifdef CONFIG_PROC_FS + ret = -ENOMEM; + if (!proc_net_fops_create(init_net, ipv6_route, + 0, ipv6_route_proc_fops)) + goto out_fib6_init; + + if (!proc_net_fops_create(init_net, rt6_stats, + S_IRUGO, rt6_stats_seq_fops)) + goto out_proc_ipv6_route; +#endif + #ifdef CONFIG_XFRM - xfrm6_init(); + ret = xfrm6_init(); + if (ret) + goto out_proc_rt6_stats; #endif #ifdef CONFIG_IPV6_MULTIPLE_TABLES - fib6_rules_init(); -#endif + ret = fib6_rules_init(); + if (ret) + goto xfrm6_init; +#endif + ret = -ENOBUFS; + if (__rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL) || + __rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL) || + __rtnl_register(PF_INET6, RTM_GETROUTE, inet6_rtm_getroute, NULL)) + goto fib6_rules_init; + + ret = 0; +out: + return ret; - __rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL); - __rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL); - __rtnl_register(PF_INET6, RTM_GETROUTE, inet6_rtm_getroute, NULL); +fib6_rules_init: +#ifdef CONFIG_IPV6_MULTIPLE_TABLES + fib6_rules_cleanup(); +xfrm6_init: +#endif +#ifdef CONFIG_XFRM + xfrm6_fini(); +out_proc_rt6_stats: +#endif +#ifdef CONFIG_PROC_FS + proc_net_remove(init_net, rt6_stats); +out_proc_ipv6_route: + proc_net_remove(init_net, ipv6_route); +out_fib6_init: +#endif + rt6_ifdown(NULL); + fib6_gc_cleanup(); +out_kmem_cache: + kmem_cache_destroy(ip6_dst_ops.kmem_cachep); + goto out; } void ip6_route_cleanup(void) @@ -2487,10 +2531,8 @@ void ip6_route_cleanup(void) #ifdef CONFIG_IPV6_MULTIPLE_TABLES fib6_rules_cleanup(); #endif -#ifdef CONFIG_PROC_FS proc_net_remove(init_net, ipv6_route); proc_net_remove(init_net, rt6_stats); -#endif #ifdef CONFIG_XFRM xfrm6_fini(); #endif -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 6/6] ipv6 - route6/fib6 : dont panic a kmem_cache_create
If the kmem_cache_creation fails, the kernel will panic. It is acceptable if the system is booting, but if the ipv6 protocol is compiled as a module and it is loaded after the system has booted, do we want to panic instead of just failing to initialize the protocol ? The init function is now returning an error and this one is checked for protocol initialization. So the ipv6 protocol will safely fails. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- net/ipv6/ip6_fib.c |5 - net/ipv6/route.c |5 - 2 files changed, 8 insertions(+), 2 deletions(-) Index: net-2.6.25/net/ipv6/ip6_fib.c === --- net-2.6.25.orig/net/ipv6/ip6_fib.c +++ net-2.6.25/net/ipv6/ip6_fib.c @@ -1478,8 +1478,11 @@ int __init fib6_init(void) int ret; fib6_node_kmem = kmem_cache_create(fib6_nodes, sizeof(struct fib6_node), - 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, + 0, SLAB_HWCACHE_ALIGN, NULL); + if (!fib6_node_kmem) + return -ENOMEM; + fib6_tables_init(); ret = __rtnl_register(PF_INET6, RTM_GETROUTE, NULL, inet6_dump_fib); Index: net-2.6.25/net/ipv6/route.c === --- net-2.6.25.orig/net/ipv6/route.c +++ net-2.6.25/net/ipv6/route.c @@ -2466,7 +2466,10 @@ int __init ip6_route_init(void) ip6_dst_ops.kmem_cachep = kmem_cache_create(ip6_dst_cache, sizeof(struct rt6_info), 0, - SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); + SLAB_HWCACHE_ALIGN, NULL); + if (!ip6_dst_ops.kmem_cachep) + return -ENOMEM; + ip6_dst_blackhole_ops.kmem_cachep = ip6_dst_ops.kmem_cachep; ret = fib6_init(); -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/6] ipv6 - make fib6_init to return an error code
If there is an error in the initialization function, nothing is followed up to the caller. So I add a return value to be set for the init function. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] Acked-by: Benjamin Thery [EMAIL PROTECTED] --- include/net/ip6_fib.h |2 +- net/ipv6/ip6_fib.c| 14 +++--- 2 files changed, 12 insertions(+), 4 deletions(-) Index: net-2.6.25/include/net/ip6_fib.h === --- net-2.6.25.orig/include/net/ip6_fib.h +++ net-2.6.25/include/net/ip6_fib.h @@ -224,7 +224,7 @@ extern void fib6_run_gc(unsigned long extern voidfib6_gc_cleanup(void); -extern voidfib6_init(void); +extern int fib6_init(void); extern voidfib6_rules_init(void); extern voidfib6_rules_cleanup(void); Index: net-2.6.25/net/ipv6/ip6_fib.c === --- net-2.6.25.orig/net/ipv6/ip6_fib.c +++ net-2.6.25/net/ipv6/ip6_fib.c @@ -1473,16 +1473,24 @@ void fib6_run_gc(unsigned long dummy) spin_unlock_bh(fib6_gc_lock); } -void __init fib6_init(void) +int __init fib6_init(void) { + int ret; fib6_node_kmem = kmem_cache_create(fib6_nodes, sizeof(struct fib6_node), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); - fib6_tables_init(); - __rtnl_register(PF_INET6, RTM_GETROUTE, NULL, inet6_dump_fib); + ret = __rtnl_register(PF_INET6, RTM_GETROUTE, NULL, inet6_dump_fib); + if (ret) + goto out_kmem_cache_create; +out: + return ret; + +out_kmem_cache_create: + kmem_cache_destroy(fib6_node_kmem); + goto out; } void fib6_gc_cleanup(void) -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][VLAN] Lost rtnl_unlock() in vlan_ioctl()
David Miller wrote: From: Patrick McHardy [EMAIL PROTECTED] Date: Thu, 06 Dec 2007 14:59:24 +0100 Pavel Emelyanov wrote: The SET_VLAN_NAME_TYPE_CMD command w/o CAP_NET_ADMIN capability doesn't release the rtnl lock. Thanks Pavel. I somehow recall that we already fixed this one, but can't find the patch :) Dave, please apply. I think we even added this bug to -stable, or something like that, didn't we? Yikes... No, I mixed those two patches up as well. The bug was introduced with the vlan_netlink stuff, the -stable patch fixed an invalid return value, but still properly dropped the lock. This patch should of course go in -stable anyway. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
Am Donnerstag, 6. Dezember 2007 14:55 schrieb David Miller: You keep ignoring the fact that, as Herbert and I discussed, not blocking for IPSEC resolution will make some connect() cases fail that would otherwise not fail. There are two sides to this issue, and we need to consider them both. as far as I've understood Herbert's patch, at least TCP connect can be fixed so that non blocking connect() will neither fail nor block, but just use the first or second retransmission of the SYN packet to complete the handshake after IPSEC is up. As this will fix the common breakage case, just do so and keep UDP sendmsg() etc for later. You are looking at this issue too much from the kernel side. Admitted, this is a corner case, but therefore nobody cares if connection completion takes two SYNs and three seconds instead of one SYN and may be two seconds. But application developers and users will validly complain if their applications block unexpectedly for hours just because some random provider has a network outage and IPSEC cannot come up. Stefan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] net/xfrm/xfrm_policy.c: Some small improvements
David Miller wrote: From: WANG Cong [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 19:01:23 +0800 This patch contains the following changes. - Use 'bool' instead of 'int' for booleans. - Use 'size_t' instead of 'int' for 'sizeof' return value. - Some style fixes. Cc: Herbert Xu [EMAIL PROTECTED] Cc: David Miller [EMAIL PROTECTED] Signed-off-by: WANG Cong [EMAIL PROTECTED] Normally I would let a patch like this sit in my mailbox for a week and then delete it. That is evil! ;) But this time I'll just let you know up front that I don't see much value in this patch. It is not a clear improvement to replace int's with bool's in my mind and the other changes are just whitespace changes. Is it not an improvement to distinct booleans from actual values? Do you use integers for ASCII characters too? It can also avoid some potential bugs like the 'if (i == TRUE)'... What is wrong with 'size_t' (since it is unsigned, compared to (some) 'int')? /Richard Knutsson -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25] Remove ip_fib_local_table and ip_fib_main_table defines
From: Eric W. Biederman [EMAIL PROTECTED] There are only 2 users and it doesn't hurt to call fib_get_table instead, and it makes it easier to make the fib network namespace aware. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- include/net/ip_fib.h |3 --- net/ipv4/fib_hash.c |5 +++-- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index ed514bf..690fb4d 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -191,9 +191,6 @@ extern void __init fib4_rules_init(void); extern u32 fib_rules_tclass(struct fib_result *res); #endif -#define ip_fib_local_table fib_get_table(RT_TABLE_LOCAL) -#define ip_fib_main_table fib_get_table(RT_TABLE_MAIN) - extern int fib_lookup(struct flowi *flp, struct fib_result *res); extern struct fib_table *fib_new_table(u32 id); diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c index 9d0cee2..30ff657 100644 --- a/net/ipv4/fib_hash.c +++ b/net/ipv4/fib_hash.c @@ -810,7 +810,8 @@ struct fib_iter_state { static struct fib_alias *fib_get_first(struct seq_file *seq) { struct fib_iter_state *iter = seq-private; - struct fn_hash *table = (struct fn_hash *) ip_fib_main_table-tb_data; + struct fib_table *main_table = fib_get_table(RT_TABLE_MAIN); + struct fn_hash *table = (struct fn_hash *)main_table-tb_data; iter-bucket= 0; iter-hash_head = NULL; @@ -949,7 +950,7 @@ static void *fib_seq_start(struct seq_file *seq, loff_t *pos) void *v = NULL; read_lock(fib_hash_lock); - if (ip_fib_main_table) + if (fib_get_table(RT_TABLE_MAIN)) v = *pos ? fib_get_idx(seq, *pos - 1) : SEQ_START_TOKEN; return v; } -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25] net: move trie_local and trie_main into the proc iterator
From: Eric W. Biederman [EMAIL PROTECTED] We only use these variables when displaying the trie in proc so place them into the iterator to make this explicit. We should probably do something smarter to handle the CONFIG_IP_MULTIPLE_TABLES case but at least this makes it clear that the silliness is limited to the display in /proc. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- net/ipv4/fib_trie.c | 47 ++- 1 files changed, 34 insertions(+), 13 deletions(-) diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 8d8c291..6385cca 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -164,7 +164,6 @@ static struct tnode *halve(struct trie *t, struct tnode *tn); static void tnode_free(struct tnode *tn); static struct kmem_cache *fn_alias_kmem __read_mostly; -static struct trie *trie_local = NULL, *trie_main = NULL; static inline struct tnode *node_parent(struct node *node) { @@ -2000,11 +1999,6 @@ struct fib_table * __init fib_hash_init(u32 id) trie_init(t); if (id == RT_TABLE_LOCAL) - trie_local = t; - else if (id == RT_TABLE_MAIN) - trie_main = t; - - if (id == RT_TABLE_LOCAL) printk(KERN_INFO IPv4 FIB: Using LC-trie version %s\n, VERSION); return tb; @@ -2013,6 +2007,7 @@ struct fib_table * __init fib_hash_init(u32 id) #ifdef CONFIG_PROC_FS /* Depth first Trie walk iterator */ struct fib_trie_iter { + struct trie *trie_local, *trie_main; struct tnode *tnode; struct trie *trie; unsigned index; @@ -2179,7 +2174,20 @@ static void trie_show_stats(struct seq_file *seq, struct trie_stat *stat) static int fib_triestat_seq_show(struct seq_file *seq, void *v) { + struct trie *trie_local, *trie_main; struct trie_stat *stat; + struct fib_table *tb; + + trie_local = NULL; + tb = fib_get_table(RT_TABLE_LOCAL); + if (tb) + trie_local = (struct trie *) tb-tb_data; + + trie_main = NULL; + tb = fib_get_table(RT_TABLE_MAIN); + if (tb) + trie_main = (struct trie *) tb-tb_data; + stat = kmalloc(sizeof(*stat), GFP_KERNEL); if (!stat) @@ -2223,13 +2231,13 @@ static struct node *fib_trie_get_idx(struct fib_trie_iter *iter, loff_t idx = 0; struct node *n; - for (n = fib_trie_get_first(iter, trie_local); + for (n = fib_trie_get_first(iter, iter-trie_local); n; ++idx, n = fib_trie_get_next(iter)) { if (pos == idx) return n; } - for (n = fib_trie_get_first(iter, trie_main); + for (n = fib_trie_get_first(iter, iter-trie_main); n; ++idx, n = fib_trie_get_next(iter)) { if (pos == idx) return n; @@ -2239,10 +2247,23 @@ static struct node *fib_trie_get_idx(struct fib_trie_iter *iter, static void *fib_trie_seq_start(struct seq_file *seq, loff_t *pos) { + struct fib_trie_iter *iter = seq-private; + struct fib_table *tb; + + if (!iter-trie_local) { + tb = fib_get_table(RT_TABLE_LOCAL); + if (tb) + iter-trie_local = (struct trie *) tb-tb_data; + } + if (!iter-trie_main) { + tb = fib_get_table(RT_TABLE_MAIN); + if (tb) + iter-trie_main = (struct trie *) tb-tb_data; + } rcu_read_lock(); if (*pos == 0) return SEQ_START_TOKEN; - return fib_trie_get_idx(seq-private, *pos - 1); + return fib_trie_get_idx(iter, *pos - 1); } static void *fib_trie_seq_next(struct seq_file *seq, void *v, loff_t *pos) @@ -2260,8 +2281,8 @@ static void *fib_trie_seq_next(struct seq_file *seq, void *v, loff_t *pos) return v; /* continue scan in next trie */ - if (iter-trie == trie_local) - return fib_trie_get_first(iter, trie_main); + if (iter-trie == iter-trie_local) + return fib_trie_get_first(iter, iter-trie_main); return NULL; } @@ -2327,7 +2348,7 @@ static int fib_trie_seq_show(struct seq_file *seq, void *v) return 0; if (!node_parent(n)) { - if (iter-trie == trie_local) + if (iter-trie == iter-trie_local) seq_puts(seq, local:\n); else seq_puts(seq, main:\n); @@ -2426,7 +2447,7 @@ static int fib_route_seq_show(struct seq_file *seq, void *v) return 0; } - if (iter-trie == trie_local) + if (iter-trie == iter-trie_local) return 0; if (IS_TNODE(l)) return 0; -- 1.5.3.rc5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at
[PATCH] virtio_net: Fix stalled inbound traffic on early packets
The current virtio_net driver has a startup race, which prevents any incoming traffic: If try_fill_recv submits buffers to the host system data might be filled in and an interrupt is sent, before napi_enable finishes. In that case the interrupt will kick skb_recv_done which will then call netif_rx_schedule. netif_rx_schedule checks, if NAPI_STATE_SCHED is set - which is not as we did not run napi_enable. No poll routine is scheduled. Furthermore, skb_recv_done returns false, we disables interrupts for this device. One solution is the enable napi before inbound buffer are available. Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] --- drivers/net/virtio_net.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Index: kvm/drivers/net/virtio_net.c === --- kvm.orig/drivers/net/virtio_net.c +++ kvm/drivers/net/virtio_net.c @@ -285,13 +285,15 @@ static int virtnet_open(struct net_devic { struct virtnet_info *vi = netdev_priv(dev); + napi_enable(vi-napi); try_fill_recv(vi); /* If we didn't even get one input buffer, we're useless. */ - if (vi-num == 0) + if (vi-num == 0) { + napi_disable(vi-napi); return -ENOMEM; + } - napi_enable(vi-napi); return 0; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/20] net/core/dev.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost always can be replaced with LIST_HEAD declaration, this shrinks the code and looks better. Signed-off-by: Denis Cheng [EMAIL PROTECTED] --- net/core/dev.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 86d6261..7626db4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3501,7 +3501,7 @@ static int dev_new_index(struct net *net) /* Delayed registration/unregisteration */ static DEFINE_SPINLOCK(net_todo_list_lock); -static struct list_head net_todo_list = LIST_HEAD_INIT(net_todo_list); +static LIST_HEAD(net_todo_list); static void net_set_todo(struct net_device *dev) { -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/20] net/ipv4/cipso_ipv4.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost always can be replaced with LIST_HEAD declaration, this shrinks the code and looks better. Signed-off-by: Denis Cheng [EMAIL PROTECTED] --- net/ipv4/cipso_ipv4.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c index f18e88b..d4dc4eb 100644 --- a/net/ipv4/cipso_ipv4.c +++ b/net/ipv4/cipso_ipv4.c @@ -63,7 +63,7 @@ struct cipso_v4_domhsh_entry { * probably be turned into a hash table or something similar so we * can do quick lookups. */ static DEFINE_SPINLOCK(cipso_v4_doi_list_lock); -static struct list_head cipso_v4_doi_list = LIST_HEAD_INIT(cipso_v4_doi_list); +static LIST_HEAD(cipso_v4_doi_list); /* Label mapping cache */ int cipso_v4_cache_enabled = 1; -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sky2: RX lockup fix
I have ways to generate errors, so I'll check Thanks Stephen. We didn't spend a lot of time characterizing the issue, but our test setup had two blades, each with an 88E8062. Our test software pumped UDP and TCP traffic of varying packet sizes between the blades in both directions (including jumbo frames - we increased the MTU of the interfaces to 9000). The issue could generally be brought out in about 15 minutes and almost always within an hour. If you'd like any additional details on the test setup or would like me to try something on my end, let me know. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ucc_geth 10 Mbit/s locks up CPU even though NAPI is enabled
Injecting a 10 MBit/s stream with 64 bytes pkgs locks up my MPC832x CPU even though I got NAPI enabled. Kernel 2.6.23 Any ideas? Jocke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP event tracking via netlink...
On Thu, 06 Dec 2007 02:33:46 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 01:18:28 +0200 (EET) On Wed, 5 Dec 2007, David Miller wrote: I assume you're using something like carefully crafted printk's, kprobes, or even ad-hoc statistic counters. That's what I used to do :-) No, that's not at all what I do :-). I usually look time-seq graphs expect for the cases when I just find things out by reading code (or by just thinking of it). Can you briefly detail what graph tools and command lines you are using? The last time I did graphing to analyze things, the tools were hit-or-miss. Much of the info is available in tcpdump already, it's just hard to read without graphing it first because there are some many overlapping things to track in two-dimensional space. ...But yes, I have to admit that couple of problems come to my mind where having some variable from tcp_sock would have made the problem more obvious. The most important are the cwnd and ssthresh, which you could guess using graphs but it is important to know on a packet to packet basis why we might have sent a packet or not because this has rippling effects down the rest of the RTT. Not sure what is the benefit of having distributions with it because those people hardly report problems anyway to here, they're just too happy with TCP performance unless we print something to their logs, which implies that we must setup a *_ON() condition :-(. That may be true, but if we could integrate the information with tcpdumps, we could gather internal state using tools the user already has available. Imagine if tcpdump printed out: 02:26:14.865805 IP $SRC $DEST: . 11226:12686(1460) ack 0 win 108 ss_thresh: 129 cwnd: 133 packets_out: 132 or something like that. Some problems are simply such that things cannot be accurately verified without high processing overhead until it's far too late (eg skb bits vs *_out counters). Maybe we should start to build an expensive state validator as well which would automatically check invariants of the write queue and tcp_sock in a straight forward, unoptimized manner? That would definately do a lot of work for us, just ask people to turn it on and it spits out everything that went wrong :-) (unless they really depend on very high-speed things and are therefore unhappy if we scan thousands of packets unnecessarily per ACK :-)). ...Early enough! ...That would work also for distros but there's always human judgement needed to decide whether the bug reporter will be happy when his TCP processing does no longer scale ;-). I think it's useful as a TCP_DEBUG config option or similar, sure. But sometimes the algorithms are working as designed, it's just that they provide poor pipe utilization and CWND analysis embedded inside of a tcpdump would be one way to see that as well as determine the flaw in the algorithm. ...Hopefully you found any of my comments useful. Very much so, thanks. I put together a sample implementation anyways just to show the idea, against net-2.6.25 below. It is untested since I didn't write the userland app yet to see that proper things get logged. Basically you could run a daemon that writes per-connection traces into files based upon the incoming netlink events. Later, using the binary pcap file and these traces, you can piece together traces like the above using the timestamps etc. to match up pcap packets to ones from the TCP logger. The userland tools could do analysis and print pre-cooked state diff logs, like this ACK raised CWND by one or whatever else you wanted to know. It's nice that an expert like you can look at graphs and understand, but we'd like to create more experts and besides reading code one way to become an expert is to be able to extrace live real data from the kernel's working state and try to understand how things got that way. This information is permanently lost currently. Tools and scripts for testing that generate graphs are at: git://git.kernel.org/pub/scm/tcptest/tcptest -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[SCTP] Bug fixes to the migrate/accept code path.
Hi Dave The following two patches fix some bugs in the SCTP accept code path. The first one fixes a slab corruption bug that we found during stress testing. The second one is just a clean-up and the right way to do things. You can also pull both from: master.kernel.org:/pub/scm/linux/kernel/git/lksctp-dev.git pending Thanks -vlad -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] SCTP: Fix the bind_addr info during migration.
During accept/migrate the code attempts to copy the addresses from the parent endpoint to the new endpoint. However, if the parent was bound to a wildcard address, then we end up pointlessly copying all of the current addresses on the system. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] --- include/net/sctp/structs.h |3 +++ net/sctp/bind_addr.c | 26 ++ net/sctp/socket.c | 12 ++-- 3 files changed, 31 insertions(+), 10 deletions(-) diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index eb3113c..002a00a 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1184,6 +1184,9 @@ int sctp_bind_addr_copy(struct sctp_bind_addr *dest, const struct sctp_bind_addr *src, sctp_scope_t scope, gfp_t gfp, int flags); +int sctp_bind_addr_dup(struct sctp_bind_addr *dest, + const struct sctp_bind_addr *src, + gfp_t gfp); int sctp_add_bind_addr(struct sctp_bind_addr *, union sctp_addr *, __u8 use_as_src, gfp_t gfp); int sctp_del_bind_addr(struct sctp_bind_addr *, union sctp_addr *); diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c index cae95af..6a7d010 100644 --- a/net/sctp/bind_addr.c +++ b/net/sctp/bind_addr.c @@ -105,6 +105,32 @@ out: return error; } +/* Exactly duplicate the address lists. This is necessary when doing + * peer-offs and accepts. We don't want to put all the current system + * addresses into the endpoint. That's useless. But we do want duplicat + * the list of bound addresses that the older endpoint used. + */ +int sctp_bind_addr_dup(struct sctp_bind_addr *dest, + const struct sctp_bind_addr *src, + gfp_t gfp) +{ + struct sctp_sockaddr_entry *addr; + struct list_head *pos; + int error = 0; + + /* All addresses share the same port. */ + dest-port = src-port; + + list_for_each(pos, src-address_list) { + addr = list_entry(pos, struct sctp_sockaddr_entry, list); + error = sctp_add_bind_addr(dest, addr-a, 1, gfp); + if (error 0) + break; + } + + return error; +} + /* Initialize the SCTP_bind_addr structure for either an endpoint or * an association. */ diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 9f5d793..ea9649c 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6326,7 +6326,6 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk, struct sk_buff *skb, *tmp; struct sctp_ulpevent *event; struct sctp_bind_hashbucket *head; - int flags = 0; /* Migrate socket buffer sizes and all the socket level options to the * new socket. @@ -6356,15 +6355,8 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk, /* Copy the bind_addr list from the original endpoint to the new * endpoint so that we can handle restarts properly */ - if (PF_INET6 == assoc-base.sk-sk_family) - flags = SCTP_ADDR6_ALLOWED; - if (assoc-peer.ipv4_address) - flags |= SCTP_ADDR4_PEERSUPP; - if (assoc-peer.ipv6_address) - flags |= SCTP_ADDR6_PEERSUPP; - sctp_bind_addr_copy(newsp-ep-base.bind_addr, -oldsp-ep-base.bind_addr, -SCTP_SCOPE_GLOBAL, GFP_KERNEL, flags); + sctp_bind_addr_dup(newsp-ep-base.bind_addr, + oldsp-ep-base.bind_addr, GFP_KERNEL); /* Move any messages in the old socket's receive queue that are for the * peeled off association to the new socket's receive queue. -- 1.5.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] SCTP: Add bind hash locking to the migrate code
SCTP accept code tries to add a newliy created socket to a bind bucket without holding a lock. On a really busy system, that can causes slab corruptions. Add a lock around this code. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] --- net/sctp/socket.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index ff8bc95..9f5d793 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6325,6 +6325,7 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk, struct sctp_endpoint *newep = newsp-ep; struct sk_buff *skb, *tmp; struct sctp_ulpevent *event; + struct sctp_bind_hashbucket *head; int flags = 0; /* Migrate socket buffer sizes and all the socket level options to the @@ -6342,10 +6343,15 @@ static void sctp_sock_migrate(struct sock *oldsk, struct sock *newsk, newsp-hmac = NULL; /* Hook this new socket in to the bind_hash list. */ + head = sctp_port_hashtable[sctp_phashfn(inet_sk(oldsk)-num)]; + sctp_local_bh_disable(); + sctp_spin_lock(head-lock); pp = sctp_sk(oldsk)-bind_hash; sk_add_bind_node(newsk, pp-owner); sctp_sk(newsk)-bind_hash = pp; inet_sk(newsk)-num = inet_sk(oldsk)-num; + sctp_spin_unlock(head-lock); + sctp_local_bh_enable(); /* Copy the bind_addr list from the original endpoint to the new * endpoint so that we can handle restarts properly -- 1.5.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reproducible data corruption with sendfile+vsftp - splice regression?
Holger Hoffstaette [EMAIL PROTECTED] : [...] Maybe turning off sendfile or NAPI just lead to random success - so far it really looks like tso on the r8169 is the common cause. TSO on the r8169 is the magic switch but the regression makes imvho more sense from a VM pov: - the corrupted file has the same size as the expected file - the corrupted file exhibits holes which come as a multiple of 4096 bytes (8*4k, 2 places, there may be more) - the r8169 driver does not know what a page is - the 8169 hardware has a small 8192 bytes Tx buffer It would be nice if someone could do a sendfile + vsftp test with TSO on a different hardware. While I could not reproduce the corruption when simply downloading a file that I had copied on the server with scp, it triggered almost immediately after I copied it locally and tried to download the copy. -- Ueimor -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] net/xfrm/xfrm_policy.c: Some small improvements
On Thu, Dec 06, 2007 at 03:37:46PM +0100, Richard Knutsson wrote: Is it not an improvement to distinct booleans from actual values? Do you use integers for ASCII characters too? It can also avoid some potential bugs like the 'if (i == TRUE)'... What is wrong with 'size_t' (since it is unsigned, compared to (some) 'int')? I agree with Dave. There are so many useful things that we can do (and need to do) in IPsec that bool/size_t conversions just add churn without adding much value. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25 10/11][INET] Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding
On Thu, Dec 06, 2007 at 03:31:14PM +0300, Pavel Emelyanov wrote: BTW, this is not 100% true. Look, in rtm_to_ifaddr() I see the following code flow: ipv4_devconf_setall(in_dev); ifa = inet_alloc_ifa(); if (ifa == NULL) { /* * A potential indev allocation can be left alive, it stays * assigned to its device and is destroy with it. */ err = -ENOBUFS; goto errout; } if we fail to allocate the ifa (hard to happen, but), we will make this device not to accept the default propagation. Yes that's unintentional. If this is a relevant note, I can prepare the patch. It certainly seems easy enough to fix by just swapping the order. Please do. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] s2io: fix inconsistent hardware VLAN tagging during driver init
The s2io driver keeps a local variable around (vlan_strip_flag) to keep track of the current state of the hardware and whether or not it will strip VLAN tags on incoming packets. It seems as though the hardware default is to strip them, but that variable is not set correctly during initialization if the default setup is used. This check ensures vlan_strip_flag and the hardware setting are in sync. These variables were introduced by this patch: commit 926930b202d56c3dfb6aea0a0c6bfba2b87a8c03 Author: Sivakumar Subramani [EMAIL PROTECTED] Date: Sat Feb 24 01:59:39 2007 -0500 so this problem hasn't been around forever. Recent patches from Ramkrishna Vepa [EMAIL PROTECTED] removed this variable and would have worked around the problem, but they were not accepted. Signed-off-by: Andy Gospodarek [EMAIL PROTECTED] --- s2io.c |5 + 1 files changed, 5 insertions(+) diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c index 8b9f0ea..08c08de 100644 --- a/drivers/net/s2io.c +++ b/drivers/net/s2io.c @@ -2151,6 +2151,11 @@ static int start_nic(struct s2io_nic *nic) val64 = ~RX_PA_CFG_STRIP_VLAN_TAG; writeq(val64, bar0-rx_pa_cfg); vlan_strip_flag = 0; + } else { + val64 = readq(bar0-rx_pa_cfg); + val64 |= RX_PA_CFG_STRIP_VLAN_TAG; + writeq(val64, bar0-rx_pa_cfg); + vlan_strip_flag = 1; } /* -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] iproute2: support dotted-quad netmask notation.
On Tue, 4 Dec 2007 14:58:18 +0100 Andreas Henriksson [EMAIL PROTECTED] wrote: Suggested patch for allowing netmask to be specified in dotted quad format. See http://bugs.debian.org/357172 (Known problem: this will not prevent some invalid syntaxes, ie. 255.0.255.0 will be treated as 255.255.255.0) Comments? Suggestions? Improvements? Fix the bug you mentioned? /* a valid netmask must be 2^n - 1 (n = 1..31) */ static int is_valid_netmask(const inet_prefix *addr) { uint32_t host; if (addr-family != AF_INET) return 0; host = ~ntohl(addr-data[0]); if (host == 0 || ~host == 0) return 0; return (host (host + 1)) == 0; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] [DCCP]: Introduce generic function to test for `data packets'
From: Gerrit Renker [EMAIL PROTECTED] as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] --- net/dccp/dccp.h | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index ee97950..f4a5ea1 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -334,6 +334,7 @@ struct dccp_skb_cb { #define DCCP_SKB_CB(__skb) ((struct dccp_skb_cb *)((__skb)-cb[0])) +/* RFC 4340, sec. 7.7 */ static inline int dccp_non_data_packet(const struct sk_buff *skb) { const __u8 type = DCCP_SKB_CB(skb)-dccpd_type; @@ -346,6 +347,17 @@ static inline int dccp_non_data_packet(const struct sk_buff *skb) type == DCCP_PKT_SYNCACK; } +/* RFC 4340, sec. 7.7 */ +static inline int dccp_data_packet(const struct sk_buff *skb) +{ + const __u8 type = DCCP_SKB_CB(skb)-dccpd_type; + + return type == DCCP_PKT_DATA || + type == DCCP_PKT_DATAACK || + type == DCCP_PKT_REQUEST || + type == DCCP_PKT_RESPONSE; +} + static inline int dccp_packet_without_ack(const struct sk_buff *skb) { const __u8 type = DCCP_SKB_CB(skb)-dccpd_type; -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] [CCID3]: The receiver of a half-connection does not set window counter values
From: Gerrit Renker [EMAIL PROTECTED] Only the sender sets window counters [RFC 4342, sections 5 and 8.1]. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] --- net/dccp/ccids/ccid3.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index c95dca8..5ff5aab 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -733,7 +733,6 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) return 0; hcrx = ccid3_hc_rx_sk(sk); - DCCP_SKB_CB(skb)-dccpd_ccval = hcrx-ccid3hcrx_ccval_last_counter; if (dccp_packet_without_ack(skb)) return 0; -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] [TFRC]: Make the rx history slab be global
This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] --- net/dccp/ccids/ccid3.c | 35 ++--- net/dccp/ccids/lib/packet_history.c | 95 ++- net/dccp/ccids/lib/packet_history.h | 43 ++-- 3 files changed, 60 insertions(+), 113 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5dea690..07920bb 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -49,8 +49,6 @@ static int ccid3_debug; #define ccid3_pr_debug(format, a...) #endif -static struct dccp_rx_hist *ccid3_rx_hist; - /* * Transmitter Half-Connection Routines */ @@ -807,9 +805,9 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, } detect_out: - dccp_rx_hist_add_packet(ccid3_rx_hist, hcrx-ccid3hcrx_hist, - hcrx-ccid3hcrx_li_hist, packet, - hcrx-ccid3hcrx_seqno_nonloss); + dccp_rx_hist_add_packet(hcrx-ccid3hcrx_hist, + hcrx-ccid3hcrx_li_hist, packet, + hcrx-ccid3hcrx_seqno_nonloss); return loss; } @@ -852,8 +850,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) return; } - packet = dccp_rx_hist_entry_new(ccid3_rx_hist, opt_recv-dccpor_ndp, - skb, GFP_ATOMIC); + packet = dccp_rx_hist_entry_new(opt_recv-dccpor_ndp, skb, GFP_ATOMIC); if (unlikely(packet == NULL)) { DCCP_WARN(%s(%p), Not enough mem to add rx packet to history, consider it lost!\n, dccp_role(sk), sk); @@ -936,7 +933,7 @@ static void ccid3_hc_rx_exit(struct sock *sk) ccid3_hc_rx_set_state(sk, TFRC_RSTATE_TERM); /* Empty packet history */ - dccp_rx_hist_purge(ccid3_rx_hist, hcrx-ccid3hcrx_hist); + dccp_rx_hist_purge(hcrx-ccid3hcrx_hist); /* Empty loss interval history */ dccp_li_hist_purge(hcrx-ccid3hcrx_li_hist); @@ -1013,33 +1010,13 @@ MODULE_PARM_DESC(ccid3_debug, Enable debug messages); static __init int ccid3_module_init(void) { - int rc = -ENOBUFS; - - ccid3_rx_hist = dccp_rx_hist_new(ccid3); - if (ccid3_rx_hist == NULL) - goto out; - - rc = ccid_register(ccid3); - if (rc != 0) - goto out_free_rx; -out: - return rc; - -out_free_rx: - dccp_rx_hist_delete(ccid3_rx_hist); - ccid3_rx_hist = NULL; - goto out; + return ccid_register(ccid3); } module_init(ccid3_module_init); static __exit void ccid3_module_exit(void) { ccid_unregister(ccid3); - - if (ccid3_rx_hist != NULL) { - dccp_rx_hist_delete(ccid3_rx_hist); - ccid3_rx_hist = NULL; - } } module_exit(ccid3_module_exit); diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index b628714..e1ab853 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -114,48 +114,33 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); /* * Receiver History Routines */ -struct dccp_rx_hist *dccp_rx_hist_new(const char *name) +static struct kmem_cache *tfrc_rx_hist_slab; + +struct dccp_rx_hist_entry *dccp_rx_hist_entry_new(const u32 ndp, + const struct sk_buff *skb, + const gfp_t prio) { - struct dccp_rx_hist *hist = kmalloc(sizeof(*hist), GFP_ATOMIC); - static const char dccp_rx_hist_mask[] = rx_hist_%s; - char *slab_name; - - if (hist == NULL) - goto out; - - slab_name = kmalloc(strlen(name) + sizeof(dccp_rx_hist_mask) - 1, - GFP_ATOMIC); - if (slab_name == NULL) - goto out_free_hist; - - sprintf(slab_name, dccp_rx_hist_mask, name); - hist-dccprxh_slab = kmem_cache_create(slab_name, -sizeof(struct dccp_rx_hist_entry), -0, SLAB_HWCACHE_ALIGN, NULL); - if (hist-dccprxh_slab == NULL) - goto out_free_slab_name; -out: - return hist; -out_free_slab_name: - kfree(slab_name); -out_free_hist: - kfree(hist); - hist = NULL; - goto out; -} + struct dccp_rx_hist_entry *entry = kmem_cache_alloc(tfrc_rx_hist_slab, + prio); -EXPORT_SYMBOL_GPL(dccp_rx_hist_new); + if (entry != NULL) { + const struct dccp_hdr *dh = dccp_hdr(skb); -void dccp_rx_hist_delete(struct dccp_rx_hist *hist) -{ - const char* name = kmem_cache_name(hist-dccprxh_slab); + entry-dccphrx_seqno = DCCP_SKB_CB(skb)-dccpd_seq; + entry-dccphrx_ccval =
[PATCH 5/7] [TFRC]: Rename dccp_rx_ to tfrc_rx_
This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] --- net/dccp/ccids/ccid3.c | 32 ++-- net/dccp/ccids/lib/loss_interval.c | 14 +++--- net/dccp/ccids/lib/packet_history.c | 90 +- net/dccp/ccids/lib/packet_history.h | 48 +- 4 files changed, 92 insertions(+), 92 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 07920bb..c95dca8 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -677,7 +677,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct dccp_rx_hist_entry *packet; + struct tfrc_rx_hist_entry *packet; ktime_t now; suseconds_t delta; @@ -701,7 +701,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk) return; } - packet = dccp_rx_hist_find_data_packet(hcrx-ccid3hcrx_hist); + packet = tfrc_rx_hist_find_data_packet(hcrx-ccid3hcrx_hist); if (unlikely(packet == NULL)) { DCCP_WARN(%s(%p), no data packet in history!\n, dccp_role(sk), sk); @@ -709,7 +709,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk) } hcrx-ccid3hcrx_tstamp_last_feedback = now; - hcrx-ccid3hcrx_ccval_last_counter = packet-dccphrx_ccval; + hcrx-ccid3hcrx_ccval_last_counter = packet-tfrchrx_ccval; hcrx-ccid3hcrx_bytes_recv = 0; if (hcrx-ccid3hcrx_p == 0) @@ -752,12 +752,12 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) } static int ccid3_hc_rx_detect_loss(struct sock *sk, - struct dccp_rx_hist_entry *packet) + struct tfrc_rx_hist_entry *packet) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); - struct dccp_rx_hist_entry *rx_hist = - dccp_rx_hist_head(hcrx-ccid3hcrx_hist); - u64 seqno = packet-dccphrx_seqno; + struct tfrc_rx_hist_entry *rx_hist = + tfrc_rx_hist_head(hcrx-ccid3hcrx_hist); + u64 seqno = packet-tfrchrx_seqno; u64 tmp_seqno; int loss = 0; u8 ccval; @@ -766,9 +766,9 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, tmp_seqno = hcrx-ccid3hcrx_seqno_nonloss; if (!rx_hist || - follows48(packet-dccphrx_seqno, hcrx-ccid3hcrx_seqno_nonloss)) { + follows48(packet-tfrchrx_seqno, hcrx-ccid3hcrx_seqno_nonloss)) { hcrx-ccid3hcrx_seqno_nonloss = seqno; - hcrx-ccid3hcrx_ccval_nonloss = packet-dccphrx_ccval; + hcrx-ccid3hcrx_ccval_nonloss = packet-tfrchrx_ccval; goto detect_out; } @@ -789,7 +789,7 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, dccp_inc_seqno(tmp_seqno); hcrx-ccid3hcrx_seqno_nonloss = tmp_seqno; dccp_inc_seqno(tmp_seqno); - while (dccp_rx_hist_find_entry(hcrx-ccid3hcrx_hist, + while (tfrc_rx_hist_find_entry(hcrx-ccid3hcrx_hist, tmp_seqno, ccval)) { hcrx-ccid3hcrx_seqno_nonloss = tmp_seqno; hcrx-ccid3hcrx_ccval_nonloss = ccval; @@ -799,13 +799,13 @@ static int ccid3_hc_rx_detect_loss(struct sock *sk, /* FIXME - this code could be simplified with above while */ /* but works at moment */ - if (follows48(packet-dccphrx_seqno, hcrx-ccid3hcrx_seqno_nonloss)) { + if (follows48(packet-tfrchrx_seqno, hcrx-ccid3hcrx_seqno_nonloss)) { hcrx-ccid3hcrx_seqno_nonloss = seqno; - hcrx-ccid3hcrx_ccval_nonloss = packet-dccphrx_ccval; + hcrx-ccid3hcrx_ccval_nonloss = packet-tfrchrx_ccval; } detect_out: - dccp_rx_hist_add_packet(hcrx-ccid3hcrx_hist, + tfrc_rx_hist_add_packet(hcrx-ccid3hcrx_hist, hcrx-ccid3hcrx_li_hist, packet, hcrx-ccid3hcrx_seqno_nonloss); return loss; @@ -815,7 +815,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); const struct dccp_options_received *opt_recv; - struct dccp_rx_hist_entry *packet; + struct tfrc_rx_hist_entry *packet; u32 p_prev, r_sample, rtt_prev; int loss, payload_size; ktime_t now; @@ -850,7 +850,7 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) return; } - packet = dccp_rx_hist_entry_new(opt_recv-dccpor_ndp, skb, GFP_ATOMIC); + packet = tfrc_rx_hist_entry_new(opt_recv-dccpor_ndp, skb,
[PATCH 7/7] [TFRC]: New rx history code
Credit here goes to Gerrit Renker, that provided the initial implementation for this new codebase. I modified it just to try to make it closer to the existing API, renaming some functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not freeing the allocated ring entries on the error path. Original changeset comment from Gerrit: --- This provides a new, self-contained and generic RX history service for TFRC based protocols. Details: * new data structure, initialisation and cleanup routines; * allocation of dccp_rx_hist entries local to packet_history.c, as a service exported by the dccp_tfrc_lib module. * interface to automatically track highest-received seqno; * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] --- net/dccp/ccids/ccid3.c | 288 -- net/dccp/ccids/ccid3.h | 14 +- net/dccp/ccids/lib/loss_interval.c | 13 ++- net/dccp/ccids/lib/packet_history.c | 290 +-- net/dccp/ccids/lib/packet_history.h | 83 +-- 5 files changed, 330 insertions(+), 358 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5ff5aab..faacffa 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -641,6 +641,15 @@ static int ccid3_hc_tx_getsockopt(struct sock *sk, const int optname, int len, /* * Receiver Half-Connection Routines */ + +/* CCID3 feedback types */ +enum ccid3_fback_type { + CCID3_FBACK_NONE = 0, + CCID3_FBACK_INITIAL, + CCID3_FBACK_PERIODIC, + CCID3_FBACK_PARAM_CHANGE +}; + #ifdef CONFIG_IP_DCCP_CCID3_DEBUG static const char *ccid3_rx_state_name(enum ccid3_hc_rx_states state) { @@ -667,59 +676,60 @@ static void ccid3_hc_rx_set_state(struct sock *sk, hcrx-ccid3hcrx_state = state; } -static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) -{ - if (likely(len 0))/* don't update on empty packets (e.g. ACKs) */ - hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9); -} - -static void ccid3_hc_rx_send_feedback(struct sock *sk) +static void ccid3_hc_rx_send_feedback(struct sock *sk, + const struct sk_buff *skb, + enum ccid3_fback_type fbtype) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct tfrc_rx_hist_entry *packet; ktime_t now; - suseconds_t delta; + s64 delta = 0; ccid3_pr_debug(%s(%p) - entry \n, dccp_role(sk), sk); + if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_TERM)) + return; + now = ktime_get_real(); - switch (hcrx-ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: + switch (fbtype) { + case CCID3_FBACK_INITIAL: hcrx-ccid3hcrx_x_recv = 0; + hcrx-ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ break; - case TFRC_RSTATE_DATA: - delta = ktime_us_delta(now, - hcrx-ccid3hcrx_tstamp_last_feedback); - DCCP_BUG_ON(delta 0); - hcrx-ccid3hcrx_x_recv = - scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta); + case CCID3_FBACK_PARAM_CHANGE: + /* +* When parameters change (new loss or p p_prev), we do not +* have a reliable estimate for R_m of [RFC 3448, 6.2] and so +* need to reuse the previous value of X_recv. However, when +* X_recv was 0 (due to early loss), this would kill X down to +* s/t_mbi (i.e. one packet in 64 seconds). +* To avoid such drastic reduction, we approximate X_recv as +* the number of bytes since last feedback. +* This is a safe fallback, since X is bounded above by X_calc. +*/ + if (hcrx-ccid3hcrx_x_recv 0) + break; + /* fall through */ + case CCID3_FBACK_PERIODIC: + delta = ktime_us_delta(now, hcrx-ccid3hcrx_tstamp_last_feedback); + if (delta = 0) + DCCP_BUG(delta (%ld) = 0, (long)delta); + else + hcrx-ccid3hcrx_x_recv = + scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta); break; - case TFRC_RSTATE_TERM: - DCCP_BUG(%s(%p) - Illegal state TERM, dccp_role(sk), sk); + default: return; } - packet = tfrc_rx_hist_find_data_packet(hcrx-ccid3hcrx_hist); - if (unlikely(packet == NULL)) { - DCCP_WARN(%s(%p), no data
[PATCHES 0/7]: DCCP patches for 2.6.25
Hi David, Please consider pulling from: master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25 Best Regards, - Arnaldo b/net/dccp/ccids/Kconfig | 13 b/net/dccp/ccids/ccid3.c | 35 -- b/net/dccp/ccids/ccid3.h | 14 b/net/dccp/ccids/lib/Makefile |2 b/net/dccp/ccids/lib/loss_interval.c | 14 b/net/dccp/ccids/lib/packet_history.c | 27 - b/net/dccp/ccids/lib/packet_history.h |3 b/net/dccp/ccids/lib/tfrc.c | 48 +++ b/net/dccp/ccids/lib/tfrc.h | 18 - b/net/dccp/dccp.h | 13 net/dccp/ccids/ccid3.c| 322 -- net/dccp/ccids/lib/loss_interval.c| 13 net/dccp/ccids/lib/packet_history.c | 496 +++--- net/dccp/ccids/lib/packet_history.h | 177 14 files changed, 579 insertions(+), 616 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency
Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] --- net/dccp/ccids/lib/packet_history.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 1d4d6ee..b628714 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -53,7 +53,7 @@ struct tfrc_tx_hist_entry { /* * Transmitter History Routines */ -static struct kmem_cache *tfrc_tx_hist; +static struct kmem_cache *tfrc_tx_hist_slab; static struct tfrc_tx_hist_entry * tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno) @@ -66,7 +66,7 @@ static struct tfrc_tx_hist_entry * int tfrc_tx_hist_add(struct tfrc_tx_hist_entry **headp, u64 seqno) { - struct tfrc_tx_hist_entry *entry = kmem_cache_alloc(tfrc_tx_hist, gfp_any()); + struct tfrc_tx_hist_entry *entry = kmem_cache_alloc(tfrc_tx_hist_slab, gfp_any()); if (entry == NULL) return -ENOBUFS; @@ -85,7 +85,7 @@ void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp) while (head != NULL) { struct tfrc_tx_hist_entry *next = head-next; - kmem_cache_free(tfrc_tx_hist, head); + kmem_cache_free(tfrc_tx_hist_slab, head); head = next; } @@ -278,17 +278,17 @@ EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); __init int packet_history_init(void) { - tfrc_tx_hist = kmem_cache_create(tfrc_tx_hist, -sizeof(struct tfrc_tx_hist_entry), 0, -SLAB_HWCACHE_ALIGN, NULL); + tfrc_tx_hist_slab = kmem_cache_create(tfrc_tx_hist, + sizeof(struct tfrc_tx_hist_entry), 0, + SLAB_HWCACHE_ALIGN, NULL); - return tfrc_tx_hist == NULL ? -ENOBUFS : 0; + return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0; } void packet_history_exit(void) { - if (tfrc_tx_hist != NULL) { - kmem_cache_destroy(tfrc_tx_hist); - tfrc_tx_hist = NULL; + if (tfrc_tx_hist_slab != NULL) { + kmem_cache_destroy(tfrc_tx_hist_slab); + tfrc_tx_hist_slab = NULL; } } -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] [TFRC]: Provide central source file and debug facility
From: Gerrit Renker [EMAIL PROTECTED] This patch changes the tfrc_lib module in the following manner: (1) a dedicated tfrc source file to call the packet history loss interval init/exit functions. (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'. Commiter note: renamed tfrc_module.c to tfrc.c, and made CONFIG_IP_DCCP_CCID3 select IP_DCCP_TFRC_LIB. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] --- net/dccp/ccids/Kconfig | 13 ++--- net/dccp/ccids/lib/Makefile |2 +- net/dccp/ccids/lib/packet_history.c | 27 ++- net/dccp/ccids/lib/packet_history.h |3 +- net/dccp/ccids/lib/tfrc.c | 48 +++ net/dccp/ccids/lib/tfrc.h | 17 +--- 6 files changed, 75 insertions(+), 35 deletions(-) create mode 100644 net/dccp/ccids/lib/tfrc.c diff --git a/net/dccp/ccids/Kconfig b/net/dccp/ccids/Kconfig index 3d7d867..1227594 100644 --- a/net/dccp/ccids/Kconfig +++ b/net/dccp/ccids/Kconfig @@ -38,6 +38,7 @@ config IP_DCCP_CCID2_DEBUG config IP_DCCP_CCID3 tristate CCID3 (TCP-Friendly) (EXPERIMENTAL) def_tristate IP_DCCP + select IP_DCCP_TFRC_LIB ---help--- CCID 3 denotes TCP-Friendly Rate Control (TFRC), an equation-based rate-controlled congestion control mechanism. TFRC is designed to @@ -63,10 +64,6 @@ config IP_DCCP_CCID3 If in doubt, say M. -config IP_DCCP_TFRC_LIB - depends on IP_DCCP_CCID3 - def_tristate IP_DCCP_CCID3 - config IP_DCCP_CCID3_DEBUG bool CCID3 debugging messages depends on IP_DCCP_CCID3 @@ -110,5 +107,13 @@ config IP_DCCP_CCID3_RTO is serious network congestion: experimenting with larger values should therefore not be performed on WANs. +config IP_DCCP_TFRC_LIB + tristate + default n + +config IP_DCCP_TFRC_DEBUG + bool + depends on IP_DCCP_TFRC_LIB + default y if IP_DCCP_CCID3_DEBUG endmenu diff --git a/net/dccp/ccids/lib/Makefile b/net/dccp/ccids/lib/Makefile index 5f940a6..68c93e3 100644 --- a/net/dccp/ccids/lib/Makefile +++ b/net/dccp/ccids/lib/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_IP_DCCP_TFRC_LIB) += dccp_tfrc_lib.o -dccp_tfrc_lib-y := loss_interval.o packet_history.o tfrc_equation.o +dccp_tfrc_lib-y := tfrc.o tfrc_equation.o packet_history.o loss_interval.o diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 4805de9..1d4d6ee 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -35,7 +35,6 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -#include linux/module.h #include linux/string.h #include packet_history.h @@ -277,39 +276,19 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list) EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); -extern int __init dccp_li_init(void); -extern void dccp_li_exit(void); - -static __init int packet_history_init(void) +__init int packet_history_init(void) { - if (dccp_li_init() != 0) - goto out; - tfrc_tx_hist = kmem_cache_create(tfrc_tx_hist, sizeof(struct tfrc_tx_hist_entry), 0, SLAB_HWCACHE_ALIGN, NULL); - if (tfrc_tx_hist == NULL) - goto out_li_exit; - return 0; -out_li_exit: - dccp_li_exit(); -out: - return -ENOBUFS; + return tfrc_tx_hist == NULL ? -ENOBUFS : 0; } -module_init(packet_history_init); -static __exit void packet_history_exit(void) +void packet_history_exit(void) { if (tfrc_tx_hist != NULL) { kmem_cache_destroy(tfrc_tx_hist); tfrc_tx_hist = NULL; } - dccp_li_exit(); } -module_exit(packet_history_exit); - -MODULE_AUTHOR(Ian McDonald [EMAIL PROTECTED], - Arnaldo Carvalho de Melo [EMAIL PROTECTED]); -MODULE_DESCRIPTION(DCCP TFRC library); -MODULE_LICENSE(GPL); diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 0670f46..9a2642e 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -39,8 +39,7 @@ #include linux/ktime.h #include linux/list.h #include linux/slab.h - -#include ../../dccp.h +#include tfrc.h /* Number of later packets received before one is considered lost */ #define TFRC_RECV_NUM_LATE_LOSS 3 diff --git a/net/dccp/ccids/lib/tfrc.c b/net/dccp/ccids/lib/tfrc.c new file mode 100644 index 000..3a7a183 --- /dev/null +++ b/net/dccp/ccids/lib/tfrc.c @@ -0,0 +1,48 @@ +/* + * TFRC: main module holding the pieces of the TFRC library together + * + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK + * Copyright (c) 2007 Arnaldo Carvalho de Melo [EMAIL PROTECTED] + */ +#include
Re: [PATCH] Reduce stack used by lib/hexdump.c
On Wed, 2007-12-05 at 16:01 -0800, Andrew Morton wrote: No, I think print_hex_dump() is too low-level to be doing allocations. For example, one could easily choose to call print_hex_dump() at oops time, and then what happens if we oops in kmalloc() (as we often do...)? You could trim linebuf[] to 80 chars or so. Extra points for making it very clear when someone tries to exceed that - strcpy(linebuf, stop being stupid). No extra points, but here's a revised patch to hexdump against Linus' current: hex_dump_to_buffer: Removes casts to type for non-1 group sizes Used by: fs/ext(3|4)super.c, fs/jfs If someone really dislikes this change, please say so. I think casting to type in a hex dump odd, especially for mixed type structures. If you want an array of type dumper, it probably shouldn't be called hex_dump_to_buffer. Groups by arbitrary size print_hex_dump: Removes rowsize argument Reduces linebuf stack use to ~120 bytes prefix:25 + address:20 + data:48 + ascii:20) Aligns multiline ascii output Changes return to size_t, number of bytes actually output include/linux/kernel.h Removes hex_asc define Updates hex_dump prototypes The rest are trivial conversions to new argument list. size before: textdata bss dec hex filename 1142 0 01142 476 lib/hexdump.o size after: textdata bss dec hex filename 823 0 0 823 337 lib/hexdump.o Signed-off-by: Joe Perches [EMAIL PROTECTED] --- include/linux/kernel.h | 13 +- lib/hexdump.c | 164 --- drivers/mtd/ubi/debug.c |2 +- drivers/mtd/ubi/io.c|2 +- drivers/net/wireless/iwlwifi/iwl3945-base.c |4 +- drivers/net/wireless/iwlwifi/iwl4965-base.c |4 +- drivers/scsi/ide-scsi.c |8 +- drivers/usb/gadget/file_storage.c |4 +- fs/ext3/super.c |6 +- fs/ext4/super.c |6 +- fs/jffs2/wbuf.c |4 +- fs/jfs/jfs_imap.c |2 +- fs/jfs/jfs_logmgr.c |6 +- fs/jfs/jfs_metapage.c |2 +- fs/jfs/jfs_txnmgr.c |8 +- fs/jfs/xattr.c |4 +- 16 files changed, 110 insertions(+), 129 deletions(-) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 94bc996..ab45524 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -248,15 +248,14 @@ enum { DUMP_PREFIX_ADDRESS, DUMP_PREFIX_OFFSET }; -extern void hex_dump_to_buffer(const void *buf, size_t len, - int rowsize, int groupsize, - char *linebuf, size_t linebuflen, bool ascii); +extern size_t hex_dump_to_buffer(const void *buf, size_t len, +size_t rowsize, size_t groupsize, +char *linebuf, size_t linebuflen, bool ascii); extern void print_hex_dump(const char *level, const char *prefix_str, - int prefix_type, int rowsize, int groupsize, - const void *buf, size_t len, bool ascii); + int prefix_type, size_t groupsize, + const void *buf, size_t len, bool ascii); extern void print_hex_dump_bytes(const char *prefix_str, int prefix_type, - const void *buf, size_t len); -#define hex_asc(x) 0123456789abcdef[x] +const void *buf, size_t len); #define pr_emerg(fmt, arg...) \ printk(KERN_EMERG fmt, ##arg) diff --git a/lib/hexdump.c b/lib/hexdump.c index 3435465..df82012 100644 --- a/lib/hexdump.c +++ b/lib/hexdump.c @@ -12,18 +12,21 @@ #include linux/kernel.h #include linux/module.h +#define ROWSIZE ((size_t)16) +#define MAX_PREFIX_LEN ((size_t)20) + /** * hex_dump_to_buffer - convert a blob of data to hex ASCII in memory * @buf: data blob to dump * @len: number of bytes in the @buf - * @rowsize: number of bytes to print per line; must be 16 or 32 + * @rowsize: maximum number of bytes to output (aligns ascii) * @groupsize: number of bytes to print at a time (1, 2, 4, 8; default = 1) * @linebuf: where to put the converted data * @linebuflen: total size of @linebuf, including space for terminating NUL * @ascii: include ASCII after the hex output * * hex_dump_to_buffer() works on one line of output at a time, i.e., - * 16 or 32 bytes of input data converted to hex + ASCII output. + * input data converted to hex + ASCII output. * * Given a buffer of u8 data, hex_dump_to_buffer() converts the input data * to a
[PATCH 2/3] [POWERPC] fsl_soc: add support for gianfar for fixed-link property
fixed-link says: register new Fixed/emulated PHY, i.e. PHY that not connected to the real MDIO bus. Signed-off-by: Vitaly Bordug [EMAIL PROTECTED] Signed-off-by: Anton Vorontsov [EMAIL PROTECTED] --- Documentation/powerpc/booting-without-of.txt |4 + arch/powerpc/sysdev/fsl_soc.c| 79 -- 2 files changed, 66 insertions(+), 17 deletions(-) diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index e9a3cb1..9dfd308 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -1254,6 +1254,10 @@ platforms are moved over to use the flattened-device-tree model. services interrupts for this device. - phy-handle : The phandle for the PHY connected to this ethernet controller. +- fixed-link : a b c d e where a is emulated phy id - choose any, + but unique to the all specified fixed-links, b is duplex - 0 half, + 1 full, c is link speed - d#10/d#100/d#1000, d is pause - 0 no + pause, 1 pause, e is asym_pause - 0 no asym_pause, 1 asym_pause. Recommended properties: diff --git a/arch/powerpc/sysdev/fsl_soc.c b/arch/powerpc/sysdev/fsl_soc.c index 3ace747..a008e32 100644 --- a/arch/powerpc/sysdev/fsl_soc.c +++ b/arch/powerpc/sysdev/fsl_soc.c @@ -24,6 +24,7 @@ #include linux/platform_device.h #include linux/of_platform.h #include linux/phy.h +#include linux/phy_fixed.h #include linux/spi/spi.h #include linux/fsl_devices.h #include linux/fs_enet_pd.h @@ -130,6 +131,37 @@ u32 get_baudrate(void) EXPORT_SYMBOL(get_baudrate); #endif /* CONFIG_CPM2 */ +#ifdef CONFIG_FIXED_PHY +static int __init of_add_fixed_phys(void) +{ + int ret; + struct device_node *np; + u32 *fixed_link; + struct fixed_phy_status status = {}; + + for_each_node_by_name(np, ethernet) { + fixed_link = (u32 *)of_get_property(np, fixed-link, NULL); + if (!fixed_link) + continue; + + status.link = 1; + status.duplex = fixed_link[1]; + status.speed = fixed_link[2]; + status.pause = fixed_link[3]; + status.asym_pause = fixed_link[4]; + + ret = fixed_phy_add(PHY_POLL, fixed_link[0], status); + if (ret) { + of_node_put(np); + return ret; + } + } + + return 0; +} +arch_initcall(of_add_fixed_phys); +#endif /* CONFIG_FIXED_PHY */ + static int __init gfar_mdio_of_init(void) { struct device_node *np; @@ -193,7 +225,6 @@ static const char *gfar_tx_intr = tx; static const char *gfar_rx_intr = rx; static const char *gfar_err_intr = error; - static int __init gfar_of_init(void) { struct device_node *np; @@ -277,29 +308,43 @@ static int __init gfar_of_init(void) gfar_data.interface = PHY_INTERFACE_MODE_MII; ph = of_get_property(np, phy-handle, NULL); - phy = of_find_node_by_phandle(*ph); + if (ph == NULL) { + u32 *fixed_link; - if (phy == NULL) { - ret = -ENODEV; - goto unreg; - } + fixed_link = (u32 *)of_get_property(np, fixed-link, + NULL); + if (!fixed_link) { + ret = -ENODEV; + goto unreg; + } - mdio = of_get_parent(phy); + gfar_data.bus_id = 0; + gfar_data.phy_id = fixed_link[0]; + } else { + phy = of_find_node_by_phandle(*ph); + + if (phy == NULL) { + ret = -ENODEV; + goto unreg; + } + + mdio = of_get_parent(phy); + + id = of_get_property(phy, reg, NULL); + ret = of_address_to_resource(mdio, 0, res); + if (ret) { + of_node_put(phy); + of_node_put(mdio); + goto unreg; + } + + gfar_data.phy_id = *id; + gfar_data.bus_id = res.start; - id = of_get_property(phy, reg, NULL); - ret = of_address_to_resource(mdio, 0, res); - if (ret) { of_node_put(phy); of_node_put(mdio); - goto unreg; } - gfar_data.phy_id = *id; - gfar_data.bus_id = res.start; - - of_node_put(phy); - of_node_put(mdio); - ret =
[PATCH 3/3] [POWERPC] MPC8349E-mITX: Vitesse 7385 PHY is not connected to the MDIO bus
...thus use fixed-link to register proper Fixed PHY Signed-off-by: Anton Vorontsov [EMAIL PROTECTED] Signed-off-by: Vitaly Bordug [EMAIL PROTECTED] --- arch/powerpc/boot/dts/mpc8349emitx.dts | 11 ++- 1 files changed, 2 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/boot/dts/mpc8349emitx.dts b/arch/powerpc/boot/dts/mpc8349emitx.dts index 5072f6d..877ee6d 100644 --- a/arch/powerpc/boot/dts/mpc8349emitx.dts +++ b/arch/powerpc/boot/dts/mpc8349emitx.dts @@ -115,14 +115,6 @@ reg = 1c; device_type = ethernet-phy; }; - - /* Vitesse 7385 */ - phy1f: [EMAIL PROTECTED] { - interrupt-parent = ipic ; - interrupts = 12 8; - reg = 1f; - device_type = ethernet-phy; - }; }; [EMAIL PROTECTED] { @@ -159,7 +151,8 @@ local-mac-address = [ 00 00 00 00 00 00 ]; interrupts = 23 8 24 8 25 8; interrupt-parent = ipic ; - phy-handle = phy1f ; + /* Vitesse 7385 isn't on the MDIO bus */ + fixed-link = 1 1 d#1000 0 0; linux,network-index = 1; }; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] [NET] phy/fixed.c: rework to not duplicate PHY layer functionality
With that patch fixed.c now fully emulates MDIO bus, thus no need to duplicate PHY layer functionality. That, in turn, drastically simplifies the code, and drops down line count. As an additional bonus, now there is no need to register MDIO bus for each PHY, all emulated PHYs placed on the platform fixed MDIO bus. There is also no more need to pre-allocate PHYs via .config option, this is all now handled dynamically. Signed-off-by: Anton Vorontsov [EMAIL PROTECTED] Signed-off-by: Vitaly Bordug [EMAIL PROTECTED] Acked-by: Jeff Garzik [EMAIL PROTECTED] --- drivers/net/phy/Kconfig | 32 +-- drivers/net/phy/fixed.c | 445 + include/linux/phy_fixed.h | 51 ++--- 3 files changed, 195 insertions(+), 333 deletions(-) diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index 54b2ba9..7fe03ce 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -61,34 +61,12 @@ config ICPLUS_PHY Currently supports the IP175C PHY. config FIXED_PHY - tristate Drivers for PHY emulation on fixed speed/link + bool Driver for MDIO Bus/PHY emulation with fixed speed/link PHYs ---help--- - Adds the driver to PHY layer to cover the boards that do not have any PHY bound, - but with the ability to manipulate the speed/link in software. The relevant MII - speed/duplex parameters could be effectively handled in a user-specified function. - Currently tested with mpc866ads. - -config FIXED_MII_10_FDX - bool Emulation for 10M Fdx fixed PHY behavior - depends on FIXED_PHY - -config FIXED_MII_100_FDX - bool Emulation for 100M Fdx fixed PHY behavior - depends on FIXED_PHY - -config FIXED_MII_1000_FDX - bool Emulation for 1000M Fdx fixed PHY behavior - depends on FIXED_PHY - -config FIXED_MII_AMNT -int Number of emulated PHYs to allocate -depends on FIXED_PHY -default 1 ----help--- -Sometimes it is required to have several independent emulated -PHYs on the bus (in case of multi-eth but phy-less HW for instance). -This control will have specified number allocated for each fixed -PHY type enabled. + Adds the platform fixed MDIO Bus to cover the boards that use + PHYs that are not connected to the real MDIO bus. + + Currently tested with mpc866ads and mpc8349e-mitx. config MDIO_BITBANG tristate Support for bitbanged MDIO buses diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c index 5619182..73b6d39 100644 --- a/drivers/net/phy/fixed.c +++ b/drivers/net/phy/fixed.c @@ -1,362 +1,253 @@ /* - * drivers/net/phy/fixed.c + * Fixed MDIO bus (MDIO bus emulation with fixed PHYs) * - * Driver for fixed PHYs, when transceiver is able to operate in one fixed mode. + * Author: Vitaly Bordug [EMAIL PROTECTED] + * Anton Vorontsov [EMAIL PROTECTED] * - * Author: Vitaly Bordug - * - * Copyright (c) 2006 MontaVista Software, Inc. + * Copyright (c) 2006-2007 MontaVista Software, Inc. * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the * Free Software Foundation; either version 2 of the License, or (at your * option) any later version. - * */ + #include linux/kernel.h -#include linux/string.h -#include linux/errno.h -#include linux/unistd.h -#include linux/slab.h -#include linux/interrupt.h -#include linux/init.h -#include linux/delay.h -#include linux/netdevice.h -#include linux/etherdevice.h -#include linux/skbuff.h -#include linux/spinlock.h -#include linux/mm.h #include linux/module.h +#include linux/platform_device.h +#include linux/list.h #include linux/mii.h -#include linux/ethtool.h #include linux/phy.h #include linux/phy_fixed.h -#include asm/io.h -#include asm/irq.h -#include asm/uaccess.h +#define MII_REGS_NUM 29 -/* we need to track the allocated pointers in order to free them on exit */ -static struct fixed_info *fixed_phy_ptrs[CONFIG_FIXED_MII_AMNT*MAX_PHY_AMNT]; - -/*- - * If something weird is required to be done with link/speed, - * network driver is able to assign a function to implement this. - * May be useful for PHY's that need to be software-driven. - *-*/ -int fixed_mdio_set_link_update(struct phy_device *phydev, - int (*link_update) (struct net_device *, - struct fixed_phy_status *)) -{ - struct fixed_info *fixed; - - if (link_update == NULL) - return -EINVAL; - - if (phydev) { - if (phydev-bus) { - fixed = phydev-bus-priv; - fixed-link_update = link_update; - return 0; -
Re: [PATCH 0/2] cxgb3 - driver update
Divy Le Ray wrote: Jeff, I'm submitting a patch series for inclusion in 2.6.25. The patches are built against netdev#upstream. Here is a brief description: - Update GPIO pinning and MAC support for T3C adapters - Enable parity error detection. Jeff, I posted a third patch to fix the EEH code and add a missing softirq blocking call. Cheers, Divy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/2] cxgb3 - Fix EEH, missing softirq blocking
From: Divy Le Ray [EMAIL PROTECTED] set_pci_drvdata() stores a pointer to the adapter, not the net device. Add missing softirq blocking in t3_mgmt_tx. Signed-off-by: Divy Le Ray [EMAIL PROTECTED] --- drivers/net/cxgb3/cxgb3_main.c | 14 -- drivers/net/cxgb3/sge.c|7 ++- 2 files changed, 10 insertions(+), 11 deletions(-) diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index d1aa777..0e3dcbf 100644 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -2408,9 +2408,7 @@ void t3_fatal_err(struct adapter *adapter) static pci_ers_result_t t3_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { - struct net_device *dev = pci_get_drvdata(pdev); - struct port_info *pi = netdev_priv(dev); - struct adapter *adapter = pi-adapter; + struct adapter *adapter = pci_get_drvdata(pdev); int i; /* Stop all ports */ @@ -2444,9 +2442,7 @@ static pci_ers_result_t t3_io_error_detected(struct pci_dev *pdev, */ static pci_ers_result_t t3_io_slot_reset(struct pci_dev *pdev) { - struct net_device *dev = pci_get_drvdata(pdev); - struct port_info *pi = netdev_priv(dev); - struct adapter *adapter = pi-adapter; + struct adapter *adapter = pci_get_drvdata(pdev); if (pci_enable_device(pdev)) { dev_err(pdev-dev, @@ -2469,9 +2465,7 @@ static pci_ers_result_t t3_io_slot_reset(struct pci_dev *pdev) */ static void t3_io_resume(struct pci_dev *pdev) { - struct net_device *dev = pci_get_drvdata(pdev); - struct port_info *pi = netdev_priv(dev); - struct adapter *adapter = pi-adapter; + struct adapter *adapter = pci_get_drvdata(pdev); int i; /* Restart the ports */ @@ -2491,7 +2485,7 @@ static void t3_io_resume(struct pci_dev *pdev) if (is_offload(adapter)) { __set_bit(OFFLOAD_DEVMAP_BIT, adapter-registered_device_map); - if (offload_open(dev)) + if (offload_open(adapter-port[0])) printk(KERN_WARNING Could not bring back offload capabilities\n); } diff --git a/drivers/net/cxgb3/sge.c b/drivers/net/cxgb3/sge.c index cef153d..6367cee 100644 --- a/drivers/net/cxgb3/sge.c +++ b/drivers/net/cxgb3/sge.c @@ -1364,7 +1364,12 @@ static void restart_ctrlq(unsigned long data) */ int t3_mgmt_tx(struct adapter *adap, struct sk_buff *skb) { - return ctrl_xmit(adap, adap-sge.qs[0].txq[TXQ_CTRL], skb); + int ret; + local_bh_disable(); + ret = ctrl_xmit(adap, adap-sge.qs[0].txq[TXQ_CTRL], skb); + local_bh_enable(); + + return ret; } /** -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
Greetings Ilpo, On 04/12/2007, Ilpo Järvinen [EMAIL PROTECTED] wrote: On Mon, 3 Dec 2007, Lachlan Andrew wrote: When SACK is active, the per-packet processing becomes more involved, tracking the list of lost/SACKed packets. This causes a CPU spike just after a loss, which increases the RTTs, at least in my experience. I suspect that as long as old code was able to use hint, it wasn't doing that bad. But it was seriously lacking ability to take advantage of sack processing hint when e.g., a new hole appeared, or cumulative ACK arrived. ...Code available in net-2.6.25 might cure those. We had been using one of your earlier patches, and still had the problem. I think you've cured the problem with SACK itself, but there still seems to be something taking a lot of CPU while recovering from the loss. It is possible that it was to do with web100 which we have also been running, but I cut out most of the statistics from that and still had problems. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603 http://netlab.caltech.edu/~lachlan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] net/xfrm/xfrm_policy.c: Some small improvements
From: Richard Knutsson [EMAIL PROTECTED] Date: Thu, 06 Dec 2007 15:37:46 +0100 David Miller wrote: But this time I'll just let you know up front that I don't see much value in this patch. It is not a clear improvement to replace int's with bool's in my mind and the other changes are just whitespace changes. Is it not an improvement to distinct booleans from actual values? Do you use integers for ASCII characters too? It can also avoid some potential bugs like the 'if (i == TRUE)'... What is wrong with 'size_t' (since it is unsigned, compared to (some) 'int')? When you say int found; is there any doubt in your mind that this integer is going to hold a 1 or a 0 depending upon whether we found something? That's the problem I have with these kinds of patches, they do not increase clarity, it's just pure mindless edits. In new code, fine, use booleans if you want. I would even accept that it helps to change to boolean for arguments to functions that are global in scope. But not for function local variables in cases like this. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sockets affected by IPsec always block (2.6.23)
From: Stefan Rompf [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 15:31:53 +0100 as far as I've understood Herbert's patch, at least TCP connect can be fixed so that non blocking connect() will neither fail nor block, but just use the first or second retransmission of the SYN packet to complete the handshake after IPSEC is up. If IPSEC takes a long time to resolve, and we don't block, the connect() can hard fail (we will just keep dropping the outgoing SYN packet send attempts, eventually hitting the retry limit) in cases where if we did block it would not fail (because we wouldn't send the first SYN until IPSEC resolved). -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [IPv4] Add strict check for replying net unreachable message
The patch `Reply net unreachable ICMP message' had a bug. A route whose type is blockhole or prohibit type is treated as unreachable type. The case where err is set to ENETUNREACH should be that no route is found in the routing table only. Signed-off-by: Mitsuru Chinen [EMAIL PROTECTED] --- net/ipv4/route.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 8a79f74..d2bc614 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1882,7 +1882,8 @@ no_route: RT_CACHE_STAT_INC(in_no_route); spec_dst = inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE); res.type = RTN_UNREACHABLE; - err = -ENETUNREACH; + if (err == -ESRCH) + err = -ENETUNREACH; goto local_input; /* -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv4] Reply net unreachable ICMP message
On Thu, 6 Dec 2007 09:47:33 +0100 Jarek Poplawski [EMAIL PROTECTED] wrote: On 06-12-2007 09:14, Mitsuru Chinen wrote: On Thu, 6 Dec 2007 08:49:47 +0100 Jarek Poplawski [EMAIL PROTECTED] wrote: On 06-12-2007 07:31, Mitsuru Chinen wrote: IPv4 stack doesn't reply any ICMP destination unreachable message with net unreachable code when IP detagrams are being discarded because of no route could be found in the forwarding path. Incidentally, IPv6 stack replies such ICMPv6 message in the similar situation. ... This patch seems to be wrong. It overrides err codes from fib_lookup, where such decisions should be made. fib_lookup() replies -ESRCH in this situation. It is necessary to override the variable by the suitable error number like the code under e_hostunreach label. Probably I miss something, but I can't see how can you be sure it's only -ESRCH possible here? Isn't opt-action() in fib_rules_lookup() supposed to return this -ENETUNREACH when needed? Oh, excuse me. I did mistake. fib_rules_lookup() replies -ESRCH when no route is found. The case it replies -ENETUNREACH is that user adds unreachable route. However, if the err value is override with no check, a blackhole or prohibit route is treated as a unreachable route. As the patch is already applied, I will send another patch to add a check for it. Thank you very much for pointing out the issue! Best Regards, Mitsuru Chinen [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage
On Mon, 2007-11-26 at 15:17 -0700, Eric W. Biederman wrote: Well I clearly goofed when I added the initial network namespace support for /proc/net. Currently things work but there are odd details visible to user space, even when we have a single network namespace. Since we do not cache proc_dir_entry dentries at the moment we can just modify -lookup to return a different directory inode depending on the network namespace of the process looking at /proc/net, replacing the current technique of using a magic and fragile follow_link method. To accomplish that this patch: - introduces a shadow_proc method to allow different dentries to be returned from proc_lookup. - Removes the old /proc/net follow_link magic - Fixes a weakness in our not caching of proc generic dentries. As shadow_proc uses a task struct to decided which dentry to return we can go back later and fix the proc generic caching without modifying any code that uses the shadow_proc method. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] --- fs/proc/generic.c | 12 ++- fs/proc/proc_net.c | 86 +++ include/linux/proc_fs.h |3 ++ 3 files changed, 19 insertions(+), 82 deletions(-) (commit 2b1e300a9dfc3196ccddf6f1d74b91b7af55e416) This seems to have broken the use of /proc/bus/usb as a mountpoint. It always appears empty now, whatever's supposed to be mounted there. -- dwmw2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][VLAN] Lost rtnl_unlock() in vlan_ioctl()
From: Patrick McHardy [EMAIL PROTECTED] Date: Thu, 06 Dec 2007 14:59:24 +0100 Pavel Emelyanov wrote: The SET_VLAN_NAME_TYPE_CMD command w/o CAP_NET_ADMIN capability doesn't release the rtnl lock. Thanks Pavel. I somehow recall that we already fixed this one, but can't find the patch :) Dave, please apply. Applied and I'll push to -stable once Linus pulls it in. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP event tracking via netlink...
From: Stephen Hemminger [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 09:23:12 -0800 Tools and scripts for testing that generate graphs are at: git://git.kernel.org/pub/scm/tcptest/tcptest I know about this, I'm just curious what exactly Ilpo is using :-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHES 0/7]: DCCP patches for 2.6.25
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 19:02:47 -0200 Please consider pulling from: master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25 Pulled and pushed out to net-2.6.25, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv4] Add strict check for replying net unreachable message
From: Mitsuru Chinen [EMAIL PROTECTED] Date: Fri, 7 Dec 2007 13:24:18 +0900 The patch `Reply net unreachable ICMP message' had a bug. A route whose type is blockhole or prohibit type is treated as unreachable type. The case where err is set to ENETUNREACH should be that no route is found in the routing table only. Signed-off-by: Mitsuru Chinen [EMAIL PROTECTED] Applied, thanks. I'll probably combine this with your original change before I push these changes upstream. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SCTP] Bug fixes to the migrate/accept code path.
From: Vlad Yasevich [EMAIL PROTECTED] Date: Thu, 6 Dec 2007 12:48:22 -0500 The following two patches fix some bugs in the SCTP accept code path. The first one fixes a slab corruption bug that we found during stress testing. The second one is just a clean-up and the right way to do things. Both patches applied to net-2.6, thanks Vlad! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] bonding: Allow setting and querying xmit policy regardless of mode
From: Wagner Ferenc [EMAIL PROTECTED] From: Wagner Ferenc [EMAIL PROTECTED] For consistency with the behaviour of the arp_ip_target option, let /sys/class/net/bond0/bonding/xmit_hash_policy accept and report current policy even if the bonding mode in effect does not use it. Signed-off-by: Ferenc Wagner [EMAIL PROTECTED] Acked-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_sysfs.c | 21 +++-- 1 files changed, 3 insertions(+), 18 deletions(-) diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 9de2c52..11b76b3 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -456,17 +456,11 @@ static ssize_t bonding_show_xmit_hash(struct device *d, struct device_attribute *attr, char *buf) { - int count = 0; struct bonding *bond = to_bond(d); - if ((bond-params.mode == BOND_MODE_XOR) || - (bond-params.mode == BOND_MODE_8023AD)) { - count = sprintf(buf, %s %d\n, - xmit_hashtype_tbl[bond-params.xmit_policy].modename, - bond-params.xmit_policy); - } - - return count; + return sprintf(buf, %s %d\n, + xmit_hashtype_tbl[bond-params.xmit_policy].modename, + bond-params.xmit_policy); } static ssize_t bonding_store_xmit_hash(struct device *d, @@ -484,15 +478,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d, goto out; } - if ((bond-params.mode != BOND_MODE_XOR) - (bond-params.mode != BOND_MODE_8023AD)) { - printk(KERN_ERR DRV_NAME - %s: Transmit hash policy is irrelevant in this mode.\n, - bond-dev-name); - ret = -EPERM; - goto out; - } - new_value = bond_parse_parm((char *)buf, xmit_hashtype_tbl); if (new_value 0) { printk(KERN_ERR DRV_NAME -- 1.5.3.4.206.g58ba4-dirty -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] bonding: Return nothing for not applicable values
From: Wagner Ferenc [EMAIL PROTECTED] From: Wagner Ferenc [EMAIL PROTECTED] The previous code returned '\n' (that is, a single empty line) from most files, with one exception (xmit_hash_policy), where it returned 'NA\n'. This patch consolidates each file to return nothing at all if not applicable, not even a '\n'. I find this behaviour more usual, more useful, more efficient and shorter to code from both sides. Signed-off-by: Ferenc Wagner [EMAIL PROTECTED] Acked-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_sysfs.c | 25 - 1 files changed, 4 insertions(+), 21 deletions(-) diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index a3f1b4a..6bb91e2 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -455,14 +455,11 @@ static ssize_t bonding_show_xmit_hash(struct device *d, struct device_attribute *attr, char *buf) { - int count; + int count = 0; struct bonding *bond = to_bond(d); - if ((bond-params.mode != BOND_MODE_XOR) - (bond-params.mode != BOND_MODE_8023AD)) { - // Not Applicable - count = sprintf(buf, NA\n); - } else { + if ((bond-params.mode == BOND_MODE_XOR) || + (bond-params.mode == BOND_MODE_8023AD)) { count = sprintf(buf, %s %d\n, xmit_hashtype_tbl[bond-params.xmit_policy].modename, bond-params.xmit_policy); @@ -1079,8 +1076,6 @@ static ssize_t bonding_show_primary(struct device *d, if (bond-primary_slave) count = sprintf(buf, %s\n, bond-primary_slave-dev-name); - else - count = sprintf(buf, \n); return count; } @@ -1186,7 +1181,7 @@ static ssize_t bonding_show_active_slave(struct device *d, { struct slave *curr; struct bonding *bond = to_bond(d); - int count; + int count = 0; read_lock(bond-curr_slave_lock); curr = bond-curr_active_slave; @@ -1194,8 +1189,6 @@ static ssize_t bonding_show_active_slave(struct device *d, if (USES_PRIMARY(bond-params.mode) curr) count = sprintf(buf, %s\n, curr-dev-name); - else - count = sprintf(buf, \n); return count; } @@ -1309,8 +1302,6 @@ static ssize_t bonding_show_ad_aggregator(struct device *d, struct ad_info ad_info; count = sprintf(buf, %d\n, (bond_3ad_get_active_agg_info(bond, ad_info)) ? 0 : ad_info.aggregator_id); } - else - count = sprintf(buf, \n); return count; } @@ -1331,8 +1322,6 @@ static ssize_t bonding_show_ad_num_ports(struct device *d, struct ad_info ad_info; count = sprintf(buf, %d\n, (bond_3ad_get_active_agg_info(bond, ad_info)) ? 0: ad_info.ports); } - else - count = sprintf(buf, \n); return count; } @@ -1353,8 +1342,6 @@ static ssize_t bonding_show_ad_actor_key(struct device *d, struct ad_info ad_info; count = sprintf(buf, %d\n, (bond_3ad_get_active_agg_info(bond, ad_info)) ? 0 : ad_info.actor_key); } - else - count = sprintf(buf, \n); return count; } @@ -1375,8 +1362,6 @@ static ssize_t bonding_show_ad_partner_key(struct device *d, struct ad_info ad_info; count = sprintf(buf, %d\n, (bond_3ad_get_active_agg_info(bond, ad_info)) ? 0 : ad_info.partner_key); } - else - count = sprintf(buf, \n); return count; } @@ -1401,8 +1386,6 @@ static ssize_t bonding_show_ad_partner_mac(struct device *d, print_mac(mac, ad_info.partner_system)); } } - else - count = sprintf(buf, \n); return count; } -- 1.5.3.4.206.g58ba4-dirty -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] bonding: Remove trailing NULs from sysfs interface.
From: Wagner Ferenc [EMAIL PROTECTED] From: Wagner Ferenc [EMAIL PROTECTED] Also remove trailing spaces from multivalued files. This fixes output like for example: $ od -c /sys/class/net/bond0/bonding/slaves 000 e t h - l e f t e t h - r i g 020 h t \n \0 025 It mostly entails deleting '+1'-s after sprintf() calls: the return value of sprintf is the number of characters printed, without the closing NUL, ie. exactly what the sysfs interface requires. The three multivalue cases are different, because they also have to swallow back a trailing space. Signed-off-by: Ferenc Wagner [EMAIL PROTECTED] Acked-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_sysfs.c | 66 + 1 files changed, 30 insertions(+), 36 deletions(-) diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index b29330d..a3f1b4a 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -86,14 +86,13 @@ static ssize_t bonding_show_bonds(struct class *cls, char *buffer) /* not enough space for another interface name */ if ((PAGE_SIZE - res) 10) res = PAGE_SIZE - 10; - res += sprintf(buffer + res, ++more++); + res += sprintf(buffer + res, ++more++ ); break; } res += sprintf(buffer + res, %s , bond-dev-name); } - res += sprintf(buffer + res, \n); - res++; + if (res) buffer[res-1] = '\n'; /* eat the leftover space */ up_read((bonding_rwsem)); return res; } @@ -235,14 +234,13 @@ static ssize_t bonding_show_slaves(struct device *d, /* not enough space for another interface name */ if ((PAGE_SIZE - res) 10) res = PAGE_SIZE - 10; - res += sprintf(buf + res, ++more++); + res += sprintf(buf + res, ++more++ ); break; } res += sprintf(buf + res, %s , slave-dev-name); } read_unlock(bond-lock); - res += sprintf(buf + res, \n); - res++; + if (res) buf[res-1] = '\n'; /* eat the leftover space */ return res; } @@ -406,7 +404,7 @@ static ssize_t bonding_show_mode(struct device *d, return sprintf(buf, %s %d\n, bond_mode_tbl[bond-params.mode].modename, - bond-params.mode) + 1; + bond-params.mode); } static ssize_t bonding_store_mode(struct device *d, @@ -463,11 +461,11 @@ static ssize_t bonding_show_xmit_hash(struct device *d, if ((bond-params.mode != BOND_MODE_XOR) (bond-params.mode != BOND_MODE_8023AD)) { // Not Applicable - count = sprintf(buf, NA\n) + 1; + count = sprintf(buf, NA\n); } else { count = sprintf(buf, %s %d\n, xmit_hashtype_tbl[bond-params.xmit_policy].modename, - bond-params.xmit_policy) + 1; + bond-params.xmit_policy); } return count; @@ -527,7 +525,7 @@ static ssize_t bonding_show_arp_validate(struct device *d, return sprintf(buf, %s %d\n, arp_validate_tbl[bond-params.arp_validate].modename, - bond-params.arp_validate) + 1; + bond-params.arp_validate); } static ssize_t bonding_store_arp_validate(struct device *d, @@ -627,7 +625,7 @@ static ssize_t bonding_show_arp_interval(struct device *d, { struct bonding *bond = to_bond(d); - return sprintf(buf, %d\n, bond-params.arp_interval) + 1; + return sprintf(buf, %d\n, bond-params.arp_interval); } static ssize_t bonding_store_arp_interval(struct device *d, @@ -711,10 +709,7 @@ static ssize_t bonding_show_arp_targets(struct device *d, res += sprintf(buf + res, %u.%u.%u.%u , NIPQUAD(bond-params.arp_targets[i])); } - if (res) - res--; /* eat the leftover space */ - res += sprintf(buf + res, \n); - res++; + if (res) buf[res-1] = '\n'; /* eat the leftover space */ return res; } @@ -815,7 +810,7 @@ static ssize_t bonding_show_downdelay(struct device *d, { struct bonding *bond = to_bond(d); - return sprintf(buf, %d\n, bond-params.downdelay * bond-params.miimon) + 1; + return sprintf(buf, %d\n, bond-params.downdelay * bond-params.miimon); } static ssize_t bonding_store_downdelay(struct device *d, @@ -872,7 +867,7 @@ static ssize_t bonding_show_updelay(struct device *d, { struct bonding *bond = to_bond(d); - return sprintf(buf, %d\n, bond-params.updelay *
[PATCH 0/8] bonding: Several fixes, new hash mode
Patch series to fix some bugs, fix coding style, and add a new hash mode for balance-xor/802.3ad modes. Jeff: please apply to upstream. Patch 8 should arguably go in to 2.6.24, as it's a bug in the locking fixes added there and can cause an oops; is it too late for that? [PATCH 1/8] bonding: Remove trailing NULs from sysfs interface. [PATCH 2/8] bonding: Return nothing for not applicable values [PATCH 3/8] bonding: Purely cosmetic: rename a local variable [PATCH 4/8] bonding: Coding style: break line after the if condition [PATCH 5/8] bonding: Allow setting and querying xmit policy regardless of mode [PATCH 6/8] bonding: Fix time comparison [PATCH 7/8] bonding: Add new layer2+3 hash for xor/802.3ad modes [PATCH 8/8] bonding: Fix race at module unload -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] bonding: Fix time comparison
From: David Sterba [EMAIL PROTECTED] From: David Sterba [EMAIL PROTECTED] Use macros for comparing jiffies. Jiffies' wrap caused missed events and hangs. Module reinsert was needed to make bonding work again. Signed-off-by: David Sterba [EMAIL PROTECTED] Acked-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 25 + 1 files changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 423298c..e4a4714 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -74,6 +74,7 @@ #include linux/ethtool.h #include linux/if_vlan.h #include linux/if_bonding.h +#include linux/jiffies.h #include net/route.h #include net/net_namespace.h #include bonding.h @@ -2722,8 +2723,8 @@ void bond_loadbalance_arp_mon(struct work_struct *work) */ bond_for_each_slave(bond, slave, i) { if (slave-link != BOND_LINK_UP) { - if (((jiffies - slave-dev-trans_start) = delta_in_ticks) - ((jiffies - slave-dev-last_rx) = delta_in_ticks)) { + if (time_before_eq(jiffies, slave-dev-trans_start + delta_in_ticks) + time_before_eq(jiffies, slave-dev-last_rx + delta_in_ticks)) { slave-link = BOND_LINK_UP; slave-state = BOND_STATE_ACTIVE; @@ -2754,8 +2755,8 @@ void bond_loadbalance_arp_mon(struct work_struct *work) * when the source ip is 0, so don't take the link down * if we don't know our ip yet */ - if (((jiffies - slave-dev-trans_start) = (2*delta_in_ticks)) || - (((jiffies - slave-dev-last_rx) = (2*delta_in_ticks)) + if (time_after_eq(jiffies, slave-dev-trans_start + 2*delta_in_ticks) || + (time_after_eq(jiffies, slave-dev-last_rx + 2*delta_in_ticks) bond_has_ip(bond))) { slave-link = BOND_LINK_DOWN; @@ -2848,8 +2849,8 @@ void bond_activebackup_arp_mon(struct work_struct *work) */ bond_for_each_slave(bond, slave, i) { if (slave-link != BOND_LINK_UP) { - if ((jiffies - slave_last_rx(bond, slave)) = -delta_in_ticks) { + if (time_before_eq(jiffies, + slave_last_rx(bond, slave) + delta_in_ticks)) { slave-link = BOND_LINK_UP; @@ -2858,7 +2859,7 @@ void bond_activebackup_arp_mon(struct work_struct *work) write_lock_bh(bond-curr_slave_lock); if ((!bond-curr_active_slave) - ((jiffies - slave-dev-trans_start) = delta_in_ticks)) { + time_before_eq(jiffies, slave-dev-trans_start + delta_in_ticks)) { bond_change_active_slave(bond, slave); bond-current_arp_slave = NULL; } else if (bond-curr_active_slave != slave) { @@ -2897,7 +2898,7 @@ void bond_activebackup_arp_mon(struct work_struct *work) if ((slave != bond-curr_active_slave) (!bond-current_arp_slave) - (((jiffies - slave_last_rx(bond, slave)) = 3*delta_in_ticks) + (time_after_eq(jiffies, slave_last_rx(bond, slave) + 3*delta_in_ticks) bond_has_ip(bond))) { /* a backup slave has gone down; three times * the delta allows the current slave to be @@ -2943,10 +2944,10 @@ void bond_activebackup_arp_mon(struct work_struct *work) * before being taken out. if a primary is being used, check * if it is up and needs to take over as the curr_active_slave */ - if jiffies - slave-dev-trans_start) = (2*delta_in_ticks)) || - (((jiffies - slave_last_rx(bond, slave)) = (2*delta_in_ticks)) -bond_has_ip(bond))) - ((jiffies - slave-jiffies) = 2*delta_in_ticks)) { + if ((time_after_eq(jiffies, slave-dev-trans_start + 2*delta_in_ticks) || + (time_after_eq(jiffies, slave_last_rx(bond, slave) + 2*delta_in_ticks) +bond_has_ip(bond))) + time_after_eq(jiffies, slave-jiffies + 2*delta_in_ticks)) { slave-link = BOND_LINK_DOWN; -- 1.5.3.4.206.g58ba4-dirty -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at
[PATCH 8/8] bonding: Fix race at module unload
Fixes a race condition in module unload. Without this change, workqueue events may fire while bonding data structures are partially freed but before bond_close() is invoked by unregister_netdevice(). Update version to 3.2.3. Signed-off-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 43 --- drivers/net/bonding/bonding.h |2 +- 2 files changed, 23 insertions(+), 22 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 08879d5..b0b2603 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4492,6 +4492,27 @@ static void bond_deinit(struct net_device *bond_dev) #endif } +static void bond_work_cancel_all(struct bonding *bond) +{ + write_lock_bh(bond-lock); + bond-kill_timers = 1; + write_unlock_bh(bond-lock); + + if (bond-params.miimon delayed_work_pending(bond-mii_work)) + cancel_delayed_work(bond-mii_work); + + if (bond-params.arp_interval delayed_work_pending(bond-arp_work)) + cancel_delayed_work(bond-arp_work); + + if (bond-params.mode == BOND_MODE_ALB + delayed_work_pending(bond-alb_work)) + cancel_delayed_work(bond-alb_work); + + if (bond-params.mode == BOND_MODE_8023AD + delayed_work_pending(bond-ad_work)) + cancel_delayed_work(bond-ad_work); +} + /* Unregister and free all bond devices. * Caller must hold rtnl_lock. */ @@ -4502,6 +4523,7 @@ static void bond_free_all(void) list_for_each_entry_safe(bond, nxt, bond_dev_list, bond_list) { struct net_device *bond_dev = bond-dev; + bond_work_cancel_all(bond); bond_mc_list_destroy(bond); /* Release the bonded slaves */ bond_release_all(bond_dev); @@ -4902,27 +4924,6 @@ out_rtnl: return res; } -static void bond_work_cancel_all(struct bonding *bond) -{ - write_lock_bh(bond-lock); - bond-kill_timers = 1; - write_unlock_bh(bond-lock); - - if (bond-params.miimon delayed_work_pending(bond-mii_work)) - cancel_delayed_work(bond-mii_work); - - if (bond-params.arp_interval delayed_work_pending(bond-arp_work)) - cancel_delayed_work(bond-arp_work); - - if (bond-params.mode == BOND_MODE_ALB - delayed_work_pending(bond-alb_work)) - cancel_delayed_work(bond-alb_work); - - if (bond-params.mode == BOND_MODE_8023AD - delayed_work_pending(bond-ad_work)) - cancel_delayed_work(bond-ad_work); -} - static int __init bonding_init(void) { int i; diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index ccafc74..e1e4734 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -22,7 +22,7 @@ #include bond_3ad.h #include bond_alb.h -#define DRV_VERSION3.2.2 +#define DRV_VERSION3.2.3 #define DRV_RELDATEDecember 6, 2007 #define DRV_NAME bonding #define DRV_DESCRIPTIONEthernet Channel Bonding Driver -- 1.5.3.4.206.g58ba4-dirty -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] bonding: Add new layer2+3 hash for xor/802.3ad modes
Add new hash for balance-xor and 802.3ad modes. Originally submitted by Glenn Griffin [EMAIL PROTECTED]; modified by Jay Vosburgh to move setting of hash policy out of line, tweak the documentation update and add version update to 3.2.2. Glenn's original comment follows: Included is a patch for a new xmit_hash_policy for the bonding driver that selects slaves based on MAC and IP information. This is a middle ground between what currently exists in the layer2 only policy and the layer3+4 policy. This policy strives to be fully 802.3ad compliant by transmitting every packet of any particular flow over the same link. As documented the layer3+4 policy is not fully compliant for extreme cases such as ip fragmentation, so this policy is a nice compromise for environments that require full compliance but desire more than the layer2 only policy. Signed-off-by: Glenn Griffin [EMAIL PROTECTED] Signed-off-by: Jay Vosburgh [EMAIL PROTECTED] --- Documentation/networking/bonding.txt | 29 +++- drivers/net/bonding/bond_main.c | 48 ++--- drivers/net/bonding/bonding.h|4 +- include/linux/if_bonding.h |3 +- 4 files changed, 69 insertions(+), 15 deletions(-) diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index eda0f06..a0cda06 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -559,6 +559,30 @@ xmit_hash_policy This algorithm is 802.3ad compliant. + layer2+3 + + This policy uses a combination of layer2 and layer3 + protocol information to generate the hash. + + Uses XOR of hardware MAC addresses and IP addresses to + generate the hash. The formula is + + (((source IP XOR dest IP) AND 0x) XOR + ( source MAC XOR destination MAC )) + modulo slave count + + This algorithm will place all traffic to a particular + network peer on the same slave. For non-IP traffic, + the formula is the same as for the layer2 transmit + hash policy. + + This policy is intended to provide a more balanced + distribution of traffic than layer2 alone, especially + in environments where a layer3 gateway device is + required to reach most destinations. + + This algorithm is 802.3ad complient. + layer3+4 This policy uses upper layer protocol information, @@ -594,8 +618,9 @@ xmit_hash_policy or may not tolerate this noncompliance. The default value is layer2. This option was added in bonding -version 2.6.3. In earlier versions of bonding, this parameter does -not exist, and the layer2 policy is the only policy. + version 2.6.3. In earlier versions of bonding, this parameter + does not exist, and the layer2 policy is the only policy. The + layer2+3 value was added for bonding version 3.2.2. 3. Configuring Bonding Devices diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index e4a4714..08879d5 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -175,6 +175,7 @@ struct bond_parm_tbl bond_mode_tbl[] = { struct bond_parm_tbl xmit_hashtype_tbl[] = { { layer2, BOND_XMIT_POLICY_LAYER2}, { layer3+4, BOND_XMIT_POLICY_LAYER34}, +{ layer2+3, BOND_XMIT_POLICY_LAYER23}, { NULL, -1}, }; @@ -3605,6 +3606,24 @@ void bond_unregister_arp(struct bonding *bond) /* Hashing Policies -*/ /* + * Hash for the output device based upon layer 2 and layer 3 data. If + * the packet is not IP mimic bond_xmit_hash_policy_l2() + */ +static int bond_xmit_hash_policy_l23(struct sk_buff *skb, +struct net_device *bond_dev, int count) +{ + struct ethhdr *data = (struct ethhdr *)skb-data; + struct iphdr *iph = ip_hdr(skb); + + if (skb-protocol == __constant_htons(ETH_P_IP)) { + return ((ntohl(iph-saddr ^ iph-daddr) 0x) ^ + (data-h_dest[5] ^ bond_dev-dev_addr[5])) % count; + } + + return (data-h_dest[5] ^ bond_dev-dev_addr[5]) % count; +} + +/* * Hash for the output device based upon layer 3 and layer 4 data. If * the packet is a frag or not TCP or UDP, just use layer 3 data. If it is * altogether not IP, mimic bond_xmit_hash_policy_l2() @@ -4306,6 +4325,22 @@ out: /*- Device initialization ---*/ +static void bond_set_xmit_hash_policy(struct bonding *bond) +{ + switch (bond-params.xmit_policy) { + case BOND_XMIT_POLICY_LAYER23: + bond-xmit_hash_policy =
[PATCH 3/8] bonding: Purely cosmetic: rename a local variable
From: Wagner Ferenc [EMAIL PROTECTED] From: Wagner Ferenc [EMAIL PROTECTED] Code for rendering multivalue sysfs files occurs three times in this module. Rename 'buffer' to 'buf' in the first, for the sake of consistency. Signed-off-by: Ferenc Wagner [EMAIL PROTECTED] Acked-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_sysfs.c |9 - 1 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 6bb91e2..5c31f5c 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -74,7 +74,7 @@ struct rw_semaphore bonding_rwsem; * show function for the bond_masters attribute. * The class parameter is ignored. */ -static ssize_t bonding_show_bonds(struct class *cls, char *buffer) +static ssize_t bonding_show_bonds(struct class *cls, char *buf) { int res = 0; struct bonding *bond; @@ -86,13 +86,12 @@ static ssize_t bonding_show_bonds(struct class *cls, char *buffer) /* not enough space for another interface name */ if ((PAGE_SIZE - res) 10) res = PAGE_SIZE - 10; - res += sprintf(buffer + res, ++more++ ); + res += sprintf(buf + res, ++more++ ); break; } - res += sprintf(buffer + res, %s , - bond-dev-name); + res += sprintf(buf + res, %s , bond-dev-name); } - if (res) buffer[res-1] = '\n'; /* eat the leftover space */ + if (res) buf[res-1] = '\n'; /* eat the leftover space */ up_read((bonding_rwsem)); return res; } -- 1.5.3.4.206.g58ba4-dirty -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] bonding: Coding style: break line after the if condition
From: Wagner Ferenc [EMAIL PROTECTED] From: Wagner Ferenc [EMAIL PROTECTED] Adhere to coding style: break line after the if condition Signed-off-by: Ferenc Wagner [EMAIL PROTECTED] Acked-by: Jay Vosburgh [EMAIL PROTECTED] --- drivers/net/bonding/bond_sysfs.c |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 5c31f5c..9de2c52 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -91,7 +91,8 @@ static ssize_t bonding_show_bonds(struct class *cls, char *buf) } res += sprintf(buf + res, %s , bond-dev-name); } - if (res) buf[res-1] = '\n'; /* eat the leftover space */ + if (res) + buf[res-1] = '\n'; /* eat the leftover space */ up_read((bonding_rwsem)); return res; } @@ -239,7 +240,8 @@ static ssize_t bonding_show_slaves(struct device *d, res += sprintf(buf + res, %s , slave-dev-name); } read_unlock(bond-lock); - if (res) buf[res-1] = '\n'; /* eat the leftover space */ + if (res) + buf[res-1] = '\n'; /* eat the leftover space */ return res; } @@ -705,7 +707,8 @@ static ssize_t bonding_show_arp_targets(struct device *d, res += sprintf(buf + res, %u.%u.%u.%u , NIPQUAD(bond-params.arp_targets[i])); } - if (res) buf[res-1] = '\n'; /* eat the leftover space */ + if (res) + buf[res-1] = '\n'; /* eat the leftover space */ return res; } -- 1.5.3.4.206.g58ba4-dirty -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html