Re: TSO trimming question
On Thu, 20 Dec 2007, David Miller wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET) [PATCH] [TCP]: Fix TSO deferring I'd say that most of what tcp_tso_should_defer had in between there was dead code because of this. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Yikes! John, we've been living a lie for more than a year. :-/ On the bright side this explains a lot of small TSO frames I've been seeing in traces over the past year but never got a chance to investigate. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 8dafda9..693b9f6 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb) goto send_now; /* Defer for less than two clock ticks. */ - if (!tp-tso_deferred ((jiffies1)1) - (tp-tso_deferred1) 1) + if (tp-tso_deferred + ((jiffies 1) 1) - (tp-tso_deferred 1) 1) goto send_now; in_flight = tcp_packets_in_flight(tp); I meant to ask about this a while back but then got distracted by other things. But now since the subject has come up, I had a couple of more questions about this code. What's with all the shifting back and forth? Here with: ((jiffies1)1) - (tp-tso_deferred1) and later with: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = 1 | (jiffies1); Is this just done to try and avoid the special case of jiffies==0 when the jiffies wrap? If so it seems like a lot of unnecessary work just to avoid a 1 in 4 billion event, since it's my understanding that the whole tcp_tso_should_defer function is just an optimization and not a criticality to the proper functioning of TCP, especially considering it hasn't even been executing at all up to now. My second question is more basic and if I'm not mistaken actually relates to a remaining bug in the (corrected) test: /* Defer for less than two clock ticks. */ if (tp-tso_deferred ((jiffies 1) 1) - (tp-tso_deferred 1) 1) Since jiffies is an unsigned long, which is 64-bits on a 64-bit system, whereas tp-tso_deferred is a u32, once jiffies exceeds 31-bits, which will happen in about 25 days if HZ=1000, won't the second part of the test always be true after that? Or am I missing something obvious? -Bill -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()
YOSHIFUJI Hideaki / 吉藤英明 a écrit : In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 08:06:32 +0100), Eric Dumazet [EMAIL PROTECTED] says: YOSHIFUJI Hideaki / 吉藤英明 a écrit : In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 07:03:58 +0100), Eric Dumazet [EMAIL PROTECTED] says: Because tot_len is signed in tcp_v6_send_ack(), tot_len/4 forces compiler to emit an integer divide, while we can help it to use a right shift, less expensive. Are you really sure? At least, gcc-4.1.2-20061115 (debian) does not make any difference. And, IMHO, because shift for signed variable is fragile, so we should avoid using it. Yes I am sure, but maybe you are on x86_64 ? gcc-4.2.2 on x86 I'm on gcc-4.1.2 20061115 (prerelease) (Debian 4.1.1-21), on x86 (i686). Maybe compiler difference?! hum, more probably a .config difference. Try with CONFIG_CC_OPTIMIZE_FOR_SIZE=Y (default config) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[INET] Avoid an integer divide in rt_garbage_collect()
Since 'goal' is a signed int, compiler may emit an integer divide to compute goal/2. Using a right shift is OK here and less expensive. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/ipv4/route.c b/net/ipv4/route.c index e35076e..10915bb 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -851,14 +851,14 @@ static int rt_garbage_collect(void) equilibrium = ipv4_dst_ops.gc_thresh; goal = atomic_read(ipv4_dst_ops.entries) - equilibrium; if (goal 0) { - equilibrium += min_t(unsigned int, goal / 2, rt_hash_mask + 1); + equilibrium += min_t(unsigned int, goal 1, rt_hash_mask + 1); goal = atomic_read(ipv4_dst_ops.entries) - equilibrium; } } else { /* We are in dangerous area. Try to reduce cache really * aggressively. */ - goal = max_t(unsigned int, goal / 2, rt_hash_mask + 1); + goal = max_t(unsigned int, goal 1, rt_hash_mask + 1); equilibrium = atomic_read(ipv4_dst_ops.entries) - goal; }
Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly
On 21-12-2007 03:24, Satoru SATOH wrote: 2007/12/21, Jarek Poplawski [EMAIL PROTECTED]: Jarek Poplawski wrote, On 12/20/2007 09:24 PM: ... but since it's your patch, I hope you do some additional checking if it's always like this... ...or maybe only changing this all a little bit will make it look safer! Jarek P. OK, how about this? Signed-off-by: Satoru SATOH [EMAIL PROTECTED] ip/iproute.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/ip/iproute.c b/ip/iproute.c index f4200ae..c771b34 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -510,16 +510,20 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) fprintf(fp, %u, *(unsigned*)RTA_DATA(mxrta[i])); else { unsigned val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned hz1 = hz; + if (hz1 1000) Looks OK (safe) to me: it's compatible both with old and new way. I'd only suggest to maybe change this to '(hz1 1024)', because it's the biggest HZ currently in the kernel, so this compatibility should be 100%. I think, you could leave 1 empty line before this 'if', as well. (Btw., aren't these overflows connected with CONFIG_HIGH_RES_TIMERS?) On the other hand this 'hz' still looks 'strange' here - I don't understand, why, a bit earlier it's: if (!hz) hz = get_hz(); while 'else' would use: hz == get_user_hz(); So, probably I miss something, but even after your patch, there could be different outputs here... Thanks, Jarek P. PS: did you CC Stephen Hemminger on this? + hz1 /= 1000; + else + val *= 1000; - val *= 1000; if (i == RTAX_RTT) val /= 8; else if (i == RTAX_RTTVAR) val /= 4; - if (val = hz) - fprintf(fp, %ums, val/hz); + if (val = hz1) + fprintf(fp, %ums, val/hz1); else - fprintf(fp, %.2fms, (float)val/hz); + fprintf(fp, %.2fms, (float)val/hz1); } } } Thanks, Satoru SATOH -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly
In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 11:24:54 +0900), Satoru SATOH [EMAIL PROTECTED] says: 2007/12/21, Jarek Poplawski [EMAIL PROTECTED]: Jarek Poplawski wrote, On 12/20/2007 09:24 PM: ... but since it's your patch, I hope you do some additional checking if it's always like this... ...or maybe only changing this all a little bit will make it look safer! Jarek P. OK, how about this? Signed-off-by: Satoru SATOH [EMAIL PROTECTED] ip/iproute.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/ip/iproute.c b/ip/iproute.c index f4200ae..c771b34 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -510,16 +510,20 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) fprintf(fp, %u, *(unsigned*)RTA_DATA(mxrta[i])); else { unsigned val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned hz1 = hz; + if (hz1 1000) Why don't you simply use unsigned long long (or maybe uint64_t) here? Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] --- diff --git a/ip/iproute.c b/ip/iproute.c index f4200ae..db9a3b6 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -509,16 +509,21 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) i != RTAX_RTO_MIN) fprintf(fp, %u, *(unsigned*)RTA_DATA(mxrta[i])); else { - unsigned val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned long long val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned div = 1; - val *= 1000; if (i == RTAX_RTT) - val /= 8; + div = 8; else if (i == RTAX_RTTVAR) - val /= 4; - if (val = hz) - fprintf(fp, %ums, val/hz); + div = 4; else + div = 1; + + val = val * 1000ULL / div; + + if (val = hz) { + fprintf(fp, %llums, val/hz); + } else fprintf(fp, %.2fms, (float)val/hz); } } --yoshfuji -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket
nl_fib_input re-reuses incoming skb to send the reply. This means that this packet will be freed twice, namely in: - netlink_unicast_kernel - on receive path Use clone to send as a cure, the caller is responsible for kfree_skb on error. Thanks to Alexey Dobryan, who originally found the problem. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index b03c4c6..ac6238a 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -832,10 +832,13 @@ static void nl_fib_input(struct sk_buff *skb) nlh = nlmsg_hdr(skb); if (skb-len NLMSG_SPACE(0) || skb-len nlh-nlmsg_len || - nlh-nlmsg_len NLMSG_LENGTH(sizeof(*frn))) { - kfree_skb(skb); + nlh-nlmsg_len NLMSG_LENGTH(sizeof(*frn))) return; - } + + skb = skb_clone(skb, GFP_KERNEL); + if (skb == NULL) + return; + nlh = nlmsg_hdr(skb); frn = (struct fib_result_nl *) NLMSG_DATA(nlh); tb = fib_get_table(frn-tb_id_in); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, 21 Dec 2007, Bill Fink wrote: I meant to ask about this a while back but then got distracted by other things. But now since the subject has come up, I had a couple of more questions about this code. What's with all the shifting back and forth? Here with: ((jiffies1)1) - (tp-tso_deferred1) and later with: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = 1 | (jiffies1); Is this just done to try and avoid the special case of jiffies==0 when the jiffies wrap? I thought that it must be the reason. I couldn't figure out any other explination while thinking the same thing but since I saw no problem in that rather weird approach, I didn't want to touch that in a patch which had net-2.6 (or stable) potential. If so it seems like a lot of unnecessary work just to avoid a 1 in 4 billion event, since it's my understanding that the whole tcp_tso_should_defer function is just an optimization and not a criticality to the proper functioning of TCP, especially considering it hasn't even been executing at all up to now. It would still be good to not return 1 in that case we didn't flag the deferral, how about adding one additional tick (+comment) in the zero case making tso_deferred == 0 again unambiguously defined, e.g.: tp-tso_deferred = min_t(u32, jiffies, 1); ...I'm relatively sure that nobody would ever notice that tick :-) and we kept return value consistent with tso_deferred state invariant. My second question is more basic and if I'm not mistaken actually relates to a remaining bug in the (corrected) test: /* Defer for less than two clock ticks. */ if (tp-tso_deferred ((jiffies 1) 1) - (tp-tso_deferred 1) 1) Since jiffies is an unsigned long, which is 64-bits on a 64-bit system, whereas tp-tso_deferred is a u32, once jiffies exceeds 31-bits, which will happen in about 25 days if HZ=1000, won't the second part of the test always be true after that? Or am I missing something obvious? I didn't notice that earlier but I think you're right though my knowledge about jiffies and such is quite limited. ...Feel free to submit a patch to correct these. -- i. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] PS3: gelic: Add wireless support for PS3
Hi David, On Fri, 14 Dec 2007 01:48:52 -0500 David Woodhouse [EMAIL PROTECTED] wrote: --- linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.c~ 2007-12-14 01:31:50.0 -0500 +++ linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.c2007-12-14 01:39:25.0 -0500 @@ -1139,7 +1139,7 @@ static irqreturn_t gelic_card_interrupt( * * see Documentation/networking/netconsole.txt */ -static void gelic_net_poll_controller(struct net_device *netdev) +void gelic_net_poll_controller(struct net_device *netdev) { struct gelic_card *card = netdev_card(netdev); --- linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.h~ 2007-12-14 01:31:50.0 -0500 +++ linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.h2007-12-14 01:39:47.0 -0500 @@ -346,6 +346,7 @@ extern void gelic_net_tx_timeout(struct extern int gelic_net_change_mtu(struct net_device *netdev, int new_mtu); extern int gelic_net_setup_netdev(struct net_device *netdev, struct gelic_card *card); +extern void gelic_net_poll_controller(struct net_device *netdev); /* shared ethtool ops */ extern void gelic_net_get_drvinfo(struct net_device *netdev, Thank you for your reviewing. I'll fix it. -- Masakazu MOKUNO -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
From: Bill Fink [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 03:06:48 -0500 What's with all the shifting back and forth? Here with: ((jiffies1)1) - (tp-tso_deferred1) and later with: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = 1 | (jiffies1); Is this just done to try and avoid the special case of jiffies==0 when the jiffies wrap? If so it seems like a lot of unnecessary work just to avoid a 1 in 4 billion event, since it's my understanding that the whole tcp_tso_should_defer function is just an optimization and not a criticality to the proper functioning of TCP, especially considering it hasn't even been executing at all up to now. How else would you avoid the incorrect result when jiffies is indeed zero? It's two shifts, and this gets scheduled along with the other instructions on many cpus so it's effectively free. I don't see why this is even worth mentioning and discussing. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, 21 Dec 2007, Ilpo Järvinen wrote: On Fri, 21 Dec 2007, Bill Fink wrote: If so it seems like a lot of unnecessary work just to avoid a 1 in 4 billion event, since it's my understanding that the whole tcp_tso_should_defer function is just an optimization and not a criticality to the proper functioning of TCP, especially considering it hasn't even been executing at all up to now. It would still be good to not return 1 in that case we didn't flag the deferral, how about adding one additional tick (+comment) in the zero case making tso_deferred == 0 again unambiguously defined, e.g.: tp-tso_deferred = min_t(u32, jiffies, 1); Blah, max_t of course :-). -- i.
Re: TSO trimming question
On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote: It's two shifts, and this gets scheduled along with the other instructions on many cpus so it's effectively free. I don't see why this is even worth mentioning and discussing. I totally agree. Two shifts are way better than a branch. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket
From: Denis V. Lunev [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 12:00:43 +0300 nl_fib_input re-reuses incoming skb to send the reply. This means that this packet will be freed twice, namely in: - netlink_unicast_kernel - on receive path Use clone to send as a cure, the caller is responsible for kfree_skb on error. Thanks to Alexey Dobryan, who originally found the problem. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] What introduced this bug? This code didn't have this problem previously. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket
David Miller wrote: From: Denis V. Lunev [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 12:00:43 +0300 nl_fib_input re-reuses incoming skb to send the reply. This means that this packet will be freed twice, namely in: - netlink_unicast_kernel - on receive path Use clone to send as a cure, the caller is responsible for kfree_skb on error. Thanks to Alexey Dobryan, who originally found the problem. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] What introduced this bug? This code didn't have this problem previously. commit cd40b7d3983c708aabe3d3008ec64ffce56d33b0 Author: Denis V. Lunev [EMAIL PROTECTED] Date: Wed Oct 10 21:15:29 2007 -0700 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()
From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 08:39:24 +0100 I didnt chose this path, because David was against changing some fields from 'int' to 'unsigned'. If you look in other parts of networking, we have many 1 or 2 already there. I don't remember making this statement, where did I say this? I'm genuinely curious :-) But I am indeed against it in cases where the variable is compared against signed quantities. I think the shifts are more pretty and more closely match what the programmer wants to come out of the compiler. Getting all of these divides is an awful surprise for me. I've learned over the years to never explicitly code divides or multiplies by powers of two and to always use shifts. As a result I am never surprised. In fact I've been burnt every time I mistakedly didn't use a shift. Nevertheless, this tcplen arg is always assigned to and used with unsigned quantities so I'll apply Yoshifuji's version of the fix. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP]: Convert several length variable to unsigned.
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 16:48:11 +0900 (JST) Several length variables cannot be negative, so convert int to unsigned int. This also allows us to do sane shift operations on those variables. Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] Applied to net-2.6.25, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket
From: Denis V. Lunev [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 12:39:36 +0300 David Miller wrote: What introduced this bug? This code didn't have this problem previously. commit cd40b7d3983c708aabe3d3008ec64ffce56d33b0 Author: Denis V. Lunev [EMAIL PROTECTED] Date: Wed Oct 10 21:15:29 2007 -0700 Thank you, I'm going to add this reference to the changeset comments so that it will help others who look at this fix. Consider your patch applied, thanks! :-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SOCK] Avoid integer divides where not necessary in include/net/sock.h
From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 07:18:40 +0100 Because sk_wmem_queued, sk_sndbuf are signed, a divide per two forces compiler to use an integer divide. We can instead use a right shift. SK_STREAM_MEM_QUANTUM deserves to be declared as an unsigned quantity, so that sk_stream_pages() and __sk_stream_mem_reclaim() can use right shifts instead of integer divides. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Everything that works with SK_STREAM_MEM_QUANTUM is an int. We do a DIV_ROUND_UP() using SK_STREAM_MEM_QUANTUM and an int. We do sk-sk_forward_alloc modifications using multiplies on SK_STREAM_MEM_QUANTUM and an int, sk_forward_alloc is an int too. Changing the type of SK_STREAM_MEM_QUANTUM does nothing more than create an exception in the typing used in these operations for no real gain, in fact it makes this code's correctness harder to verify. I'm fine with the rest of your change, so please resubmit this patch with the type change removed. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Increasing initial cwnd size
-- Forwarded message -- From: Sourav Chakraborty [EMAIL PROTECTED] Date: Dec 21, 2007 1:28 PM Subject: Increasing initial cwnd size To: [EMAIL PROTECTED] Hello, We are trying to test with kernel version 2.6.20,the RFC 3390 implementation of increased initial congestion window size(~4 when MSS=1460 bytes).We are not getting consistent results meaning,sometimes we see 4 data pkts sent without an ACK received and sometimes we see 3.Also we would like to test by setting the initial cwnd to a value more than 4.Can you guys advise us on how to do that(where in the code should we change and what)?Also mention if using linux 2.6.23 kernel instead of the one we are using,would help in this work. Thanks and Regards Sourav -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [INET] Avoid an integer divide in rt_garbage_collect()
From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 09:20:50 +0100 Since 'goal' is a signed int, compiler may emit an integer divide to compute goal/2. Using a right shift is OK here and less expensive. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
From: Herbert Xu [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 17:29:27 +0800 On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote: It's two shifts, and this gets scheduled along with the other instructions on many cpus so it's effectively free. I don't see why this is even worth mentioning and discussing. I totally agree. Two shifts are way better than a branch. We take probably a thousand+ 100+ cycle cache misses in the TCP stack on big window openning ACKs. Instead of discussing ways to solve that huge performance killer we're wanking about two friggin' integer shifts. It's hilarious isn't it? :-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing
From: Paul Moore [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 16:42:25 -0500 This patch adds a number of new IPsec audit events to meet the auditing requirements of RFC4303. This includes audit hooks for the following events: * Could not find a valid SA [sections 2.1, 3.4.2] . xfrm_audit_state_notfound() . xfrm_audit_state_notfound_simple() * Sequence number overflow [section 3.3.3] . xfrm_audit_state_replay_overflow() * Replayed packet [section 3.4.3] . xfrm_audit_state_replay() * Integrity check failure [sections 3.4.4.1, 3.4.4.2] . xfrm_audit_state_icvfail() While RFC4304 deals only with ESP most of the changes in this patch apply to IPsec in general, i.e. both AH and ESP. The one case, integrity check failure, where ESP specific code had to be modified the same was done to the AH code for the sake of consistency. Signed-off-by: Paul Moore [EMAIL PROTECTED] This doesn't apply at all to net-2.6.25, in particular xfrm6_input_addr() doesn't even have a local variable named xfrm_vec_one let alone the conditional where you're adding the state notfound audit hook. Please respin this and the third patch, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] tcp_write_timeout.c cleanup
From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 06:56:17 +0100 Before submiting a patch to change a divide to a right shift, I felt necessary to create a helper function tcp_mtu_probing() to reduce length of lines exceeding 100 chars in tcp_write_timeout(). Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Thanks for not mixing code cleanups with functional changes, it makes my life so much easier and make me so much want to review and apply your patches. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[DCCP] [Patch 1/1] [Bug-Fix]: Need to ignore Ack Vectors on request sockets
[ACKVEC]: Need to ignore Ack Vectors on request sockets This fixes an oversight (mine) from an earlier patch and is in principle a bug-fix, although the bug will with the current code not become visible. The issue is that Ack Vectors must not be parsed on request sockets, since the Ack Vector feature depends on the selection of the (TX) CCID. During the initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies: Using CCID-specific options and feature options during a negotiation for the corresponding CCID feature is NOT RECOMMENDED [...] Worse, it is not even possible: when the server receives the Request from the client, the CCID and Ack vector features are undefined; when the Ack finalising the 3-way hanshake arrives, the request socket has not been cloned yet into a new (full) socket. This order is necessary, since otherwise the newly created socket would have to be destroyed whenever an option error occurred (a malicious hacker could simply send garbage options and exploit this). As a result, it is not feasible to parse Ack Vectors on request sockets, since their destination (the CCIDs interested in using Ack Vector information) is undefined. Hence disabled by this patch. Two more changes suggested itself: * replaced magic numbers for CCID-specific options with symbolic constants; * replaced local variables `idx' with computation in argument list. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- include/linux/dccp.h |6 -- net/dccp/options.c | 19 +++ 2 files changed, 11 insertions(+), 14 deletions(-) --- a/net/dccp/options.c +++ b/net/dccp/options.c @@ -104,9 +104,10 @@ int dccp_parse_options(struct sock *sk, * * CCID-specific options are ignored during connection setup, as * negotiation may still be in progress (see RFC 4340, 10.3). -* +* The same applies to Ack Vectors, as these depend on the CCID. */ - if (dreq != NULL opt = 128) + if (dreq != NULL (opt = DCCPO_MIN_RX_CCID_SPECIFIC || + opt == DCCPO_ACK_VECTOR_0 || opt == DCCPO_ACK_VECTOR_1)) goto ignore_option; switch (opt) { @@ -222,23 +223,17 @@ int dccp_parse_options(struct sock *sk, dccp_pr_debug(%s rx opt: ELAPSED_TIME=%d\n, dccp_role(sk), elapsed_time); break; - case 128 ... 191: { - const u16 idx = value - options; - + case DCCPO_MIN_RX_CCID_SPECIFIC ... DCCPO_MAX_RX_CCID_SPECIFIC: if (ccid_hc_rx_parse_options(dp-dccps_hc_rx_ccid, sk, -opt, len, idx, +opt, len, value - options, value) != 0) goto out_invalid_option; - } break; - case 192 ... 255: { - const u16 idx = value - options; - + case DCCPO_MIN_TX_CCID_SPECIFIC ... DCCPO_MAX_TX_CCID_SPECIFIC: if (ccid_hc_tx_parse_options(dp-dccps_hc_tx_ccid, sk, -opt, len, idx, +opt, len, value - options, value) != 0) goto out_invalid_option; - } break; default: DCCP_CRIT(DCCP(%p): option %d(len=%d) not --- a/include/linux/dccp.h +++ b/include/linux/dccp.h @@ -165,8 +165,10 @@ enum { DCCPO_TIMESTAMP_ECHO = 42, DCCPO_ELAPSED_TIME = 43, DCCPO_MAX = 45, - DCCPO_MIN_CCID_SPECIFIC = 128, - DCCPO_MAX_CCID_SPECIFIC = 255, + DCCPO_MIN_RX_CCID_SPECIFIC = 128, /* from sender to receiver */ + DCCPO_MAX_RX_CCID_SPECIFIC = 191, + DCCPO_MIN_TX_CCID_SPECIFIC = 192, /* from receiver to sender */ + DCCPO_MAX_TX_CCID_SPECIFIC = 255, }; /* DCCP CCIDS */ -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, 21 Dec 2007, David Miller wrote: From: Herbert Xu [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 17:29:27 +0800 On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote: It's two shifts, and this gets scheduled along with the other instructions on many cpus so it's effectively free. I don't see why this is even worth mentioning and discussing. I totally agree. Two shifts are way better than a branch. We take probably a thousand+ 100+ cycle cache misses in the TCP stack on big window openning ACKs. Instead of discussing ways to solve that huge performance killer we're wanking about two friggin' integer shifts. It's hilarious isn't it? :-) I don't think obfuscated code is hilarious. Instead of the convoluted and dense code: /* Defer for less than two clock ticks. */ if (tp-tso_deferred ((jiffies 1) 1) - (tp-tso_deferred 1) 1) You can have the much simpler and more easily understandable: /* Defer for less than two clock ticks. */ if (tp-tso_deferred (jiffies - tp-tso_deferred) 1) And instead of: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = 1 | (jiffies1); return 1; You could do as Ilpo suggested: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = max_t(u32, jiffies, 1); return 1; Or perhaps more efficiently: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = jiffies; if (unlikely(jiffies == 0)) tp-tso_deferred = 1; return 1; Or perhaps even: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = jiffies; /* need to return a non-zero value to defer, which means won't * defer if jiffies == 0 but it's only a 1 in 4 billion event * (and avoids a compare/branch by not checking jiffies) / return jiffies; Since it really only needs a non-zero return value to defer. See, no branches needed and much clearer code. That seems worthwhile to me from a code maintenance standpoint, even if it isn't any speed improvement. And what about the 64-bit jiffies versus 32-bit tp-tso_deferred issue? Should tso_deferred be made unsigned long to match jiffies? -Bill -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Evgeniy Polyakov
From: Evgeniy Polyakov [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 13:34:12 +0300 Yep, I saw him couple of times and will try to contact. :-) Do not unsubscribe me :) Ok! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Evgeniy Polyakov
On Thu, Dec 20, 2007 at 08:34:59PM -0800, David Miller ([EMAIL PROTECTED]) wrote: If someone has a way other than email to contact Evgeniy, could you please let him know that his email is bouncing in strange ways. Yep, I saw him couple of times and will try to contact. I'll have to unsubscribe him if this goes on much longer, which I don't want to do. Thanks. Here is some example bounce text: 451 4.0.0 readqf: cannot open ./dflBL48UH3032179: No such file or directory 552 5.3.4 Message is too large; 1500 bytes max 554 5.0.0 Service unavailable This looks really strange for me - I will forward it to system admin of the university server where I have a mail. Likely is is because of some troubles with the mail queue or FS... Do not unsubscribe me :) -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SOCK] Avoid integer divides where not necessary in include/net/sock.h
On Fri, 21 Dec 2007 01:55:43 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 07:18:40 +0100 Because sk_wmem_queued, sk_sndbuf are signed, a divide per two forces compiler to use an integer divide. We can instead use a right shift. SK_STREAM_MEM_QUANTUM deserves to be declared as an unsigned quantity, so that sk_stream_pages() and __sk_stream_mem_reclaim() can use right shifts instead of integer divides. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Everything that works with SK_STREAM_MEM_QUANTUM is an int. We do a DIV_ROUND_UP() using SK_STREAM_MEM_QUANTUM and an int. We do sk-sk_forward_alloc modifications using multiplies on SK_STREAM_MEM_QUANTUM and an int, sk_forward_alloc is an int too. Changing the type of SK_STREAM_MEM_QUANTUM does nothing more than create an exception in the typing used in these operations for no real gain, in fact it makes this code's correctness harder to verify. I'm fine with the rest of your change, so please resubmit this patch with the type change removed. I cannot remove underlying divide without telling compiler *something* is unsigned, or adding a new _SHIFT macro We currently do int_value / int_constant I was suggesting int_value / uint_constant Since you prefer to keep sk_forward_alloc as signed, the only clean (without casts) way is to do : int_value SK_STREAM_MEM_QUANTUM_SHIFT Please tell me if you are OK with this solution, or if you prefer I change sk_forward_alloc to be unsigned :) Thank you Here is the patch handling the change on sk_wmem_queued, sk_sndbuf. Keeping small patches may help future bisection anyway... [SOCK] Avoid integer divides where not necessary in include/net/sock.h Because sk_wmem_queued, sk_sndbuf are signed, a divide per two may force compiler to use an integer divide. We can instead use a right shift. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/include/net/sock.h b/include/net/sock.h index 803d8f2..4456453 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -445,7 +445,7 @@ static inline int sk_acceptq_is_full(struct sock *sk) */ static inline int sk_stream_min_wspace(struct sock *sk) { - return sk-sk_wmem_queued / 2; + return sk-sk_wmem_queued 1; } static inline int sk_stream_wspace(struct sock *sk) @@ -1187,7 +1187,7 @@ static inline void sk_wake_async(struct sock *sk, int how, int band) static inline void sk_stream_moderate_sndbuf(struct sock *sk) { if (!(sk-sk_userlocks SOCK_SNDBUF_LOCK)) { - sk-sk_sndbuf = min(sk-sk_sndbuf, sk-sk_wmem_queued / 2); + sk-sk_sndbuf = min(sk-sk_sndbuf, sk-sk_wmem_queued 1); sk-sk_sndbuf = max(sk-sk_sndbuf, SOCK_MIN_SNDBUF); } } @@ -1211,7 +1211,7 @@ static inline struct page *sk_stream_alloc_page(struct sock *sk) */ static inline int sock_writeable(const struct sock *sk) { - return atomic_read(sk-sk_wmem_alloc) (sk-sk_sndbuf / 2); + return atomic_read(sk-sk_wmem_alloc) (sk-sk_sndbuf 1); } static inline gfp_t gfp_any(void) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly
On Fri, 21 Dec 2007, YOSHIFUJI Hideaki wrote: In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 11:24:54 +0900), Satoru SATOH [EMAIL PROTECTED] says: 2007/12/21, Jarek Poplawski [EMAIL PROTECTED]: Jarek Poplawski wrote, On 12/20/2007 09:24 PM: ... but since it's your patch, I hope you do some additional checking if it's always like this... ...or maybe only changing this all a little bit will make it look safer! Jarek P. OK, how about this? Signed-off-by: Satoru SATOH [EMAIL PROTECTED] ip/iproute.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/ip/iproute.c b/ip/iproute.c index f4200ae..c771b34 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -510,16 +510,20 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) fprintf(fp, %u, *(unsigned*)RTA_DATA(mxrta[i])); else { unsigned val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned hz1 = hz; + if (hz1 1000) Why don't you simply use unsigned long long (or maybe uint64_t) here? I was wondering that too. And maybe change the (float) cast to (double) in the fprintf. -Bill Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] --- diff --git a/ip/iproute.c b/ip/iproute.c index f4200ae..db9a3b6 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -509,16 +509,21 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) i != RTAX_RTO_MIN) fprintf(fp, %u, *(unsigned*)RTA_DATA(mxrta[i])); else { - unsigned val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned long long val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned div = 1; - val *= 1000; if (i == RTAX_RTT) - val /= 8; + div = 8; else if (i == RTAX_RTTVAR) - val /= 4; - if (val = hz) - fprintf(fp, %ums, val/hz); + div = 4; else + div = 1; + + val = val * 1000ULL / div; + + if (val = hz) { + fprintf(fp, %llums, val/hz); + } else fprintf(fp, %.2fms, (float)val/hz); } } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Wed, 19 Dec 2007 15:30:03 +0200 (EET) On Wed, 19 Dec 2007, Gavin McCullagh wrote: Will do. I gather I should use the latest net- tree in future when submitting patches. Doh, I owe you apology as I was probably too hasty to point you towards net-2.6.25. I suppose this could by considered as fix as well and therefore could probably be accepted to net-2.6 as well, which is for bugfixes only after merge window is closed. But it's Dave how will make such decisions, not me :-), and it's he who gets to deal with all the resulting conflicts ;-) (I added Cc to him). When Gavin respins the patch I'll look at in the context of submitting it as a bug fix. So Gavin please generate the patch against Linus's vanilla GIT tree or net-2.6, your choise. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SOCK] Avoid integer divides where not necessary in include/net/sock.h
From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 11:40:51 +0100 On Fri, 21 Dec 2007 01:55:43 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: Please tell me if you are OK with this solution, or if you prefer I change sk_forward_alloc to be unsigned :) When I was playing with this crap a long time ago I think I remember that sk-sk_forward_alloc can become negative in some circumstances. Or maybe that was just a bug :-) Here is the patch handling the change on sk_wmem_queued, sk_sndbuf. Keeping small patches may help future bisection anyway... [SOCK] Avoid integer divides where not necessary in include/net/sock.h Because sk_wmem_queued, sk_sndbuf are signed, a divide per two may force compiler to use an integer divide. We can instead use a right shift. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] I'll apply this, thanks Eric. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()
From: Jeff Garzik [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 06:26:48 -0500 YOSHIFUJI Hideaki / 吉藤英明 wrote: In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 07:03:58 +0100), Eric Dumazet [EMAIL PROTECTED] says: Because tot_len is signed in tcp_v6_send_ack(), tot_len/4 forces compiler to emit an integer divide, while we can help it to use a right shift, less expensive. Are you really sure? At least, gcc-4.1.2-20061115 (debian) does not make any difference. Quite true -- thus it is a matter of taste to the programmer. Not true, the code output does change, check your optimize-for-size kernel config setting. This was discussed and explained later in this thread, and I also explained it to you on IRC Jeff ;-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10] sysfs network namespace support
Greg KH [EMAIL PROTECTED] writes: On Sat, Dec 01, 2007 at 02:06:58AM -0700, Eric W. Biederman wrote: Now that we have network namespace support merged it is time to revisit the sysfs support so we can remove the dependency on !SYSFS. snip Oops, I forgot to apply this to my tree. Eric, you still want this submitted, right? Yes. I'm am just about to head out of town to visit my parents over Christmas. So I'm not going to be very responsive until I after the New Year. Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing
On Friday 21 December 2007 4:43:10 am David Miller wrote: From: Paul Moore [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 16:42:25 -0500 This patch adds a number of new IPsec audit events to meet the auditing requirements of RFC4303. This includes audit hooks for the following events: * Could not find a valid SA [sections 2.1, 3.4.2] . xfrm_audit_state_notfound() . xfrm_audit_state_notfound_simple() * Sequence number overflow [section 3.3.3] . xfrm_audit_state_replay_overflow() * Replayed packet [section 3.4.3] . xfrm_audit_state_replay() * Integrity check failure [sections 3.4.4.1, 3.4.4.2] . xfrm_audit_state_icvfail() While RFC4304 deals only with ESP most of the changes in this patch apply to IPsec in general, i.e. both AH and ESP. The one case, integrity check failure, where ESP specific code had to be modified the same was done to the AH code for the sake of consistency. Signed-off-by: Paul Moore [EMAIL PROTECTED] This doesn't apply at all to net-2.6.25, in particular xfrm6_input_addr() doesn't even have a local variable named xfrm_vec_one let alone the conditional where you're adding the state notfound audit hook. Please respin this and the third patch, thanks. Sorry about that, I must have missed something (or probably just updated the wrong tree on accident). I'll respin the patches and send them out today. -- paul moore linux security @ hp -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] tcp_write_timeout.c cleanup
Em Fri, Dec 21, 2007 at 04:30:26AM -0800, David Miller escreveu: From: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 10:23:16 -0200 Em Fri, Dec 21, 2007 at 06:56:17AM +0100, Eric Dumazet escreveu: +static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) +{ + int mss; - int mss; + + /* Black hole detection */ + if (sysctl_tcp_mtu_probing) { + if (!icsk-icsk_mtup.enabled) { + icsk-icsk_mtup.enabled = 1; + tcp_sync_mss(sk, icsk-icsk_pmtu_cookie); + } else { + struct tcp_sock *tp = tcp_sk(sk); + int mss; + mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2; I've checked in this, ummm... patch :-) Thanks! - Arnaldo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SOCK] Avoid integer divides where not necessary in include/net/sock.h
David Miller [EMAIL PROTECTED] wrote: When I was playing with this crap a long time ago I think I remember that sk-sk_forward_alloc can become negative in some circumstances. Or maybe that was just a bug :-) Yeah we had a few bugs there in the early days of TSO but it's been quiet for the last 18 months. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] Avoid a divide in tcp_mtu_probing()
From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 14:32:41 +0100 tcp_mtu_to_mss() being signed, compiler might emit an integer divide to compute tcp_mtu_to_mss()/2 . Using a right shift is OK here and less expensive. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Applied, thanks Eric. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Top 10 kernel oopses/warnings for the week of December 21st 2007
Arjan van de Ven [EMAIL PROTECTED] writes: Rank 8: __change_page_attr BUG at arch/x86/mm/pageattr_64.c:176 Reported 2 times Reported this week for 2.6.24-rc5; history goes back to 2.6.15 There is no BUG on this line on 2.6.24-rc5. Since there are many BUG_ONs in this file it is unclear which you mean. Could you always include the version of the kernel where the actual oops in the line came from? Anyways there are a lot of third party modules who do strange things with c_p_a(), not always legal, so you might look up out for that pattern too. Perhaps report the out of tree modules loaded in the summary too? -Andi -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: After many hours all outbound connections get stuck in SYN_SENT
Huh? Do you mean a PIX blade in a Cisco switch-router chassis? It would be very useful if you could be less vague about the equipment in use. Right it's a PIX blade in Cisco chassis. The PIX is running ASA version 7.0(6) That depends more on your customers' networking attributes then you are sharing or perhaps even know. Perhaps your customer base is very Window-skewed and you simply aren't seeing any Sack Permitted negotiations for the first 37.999 hours. Or perhaps you've had a network glitch, and all of your connections have done a Selective Ack, which the firewall has trashed, leaving all the connections in a wacko state, not just a few which you haven't noticed. I definitely see SACKs over the course of the 38 hours. I don't have any Windows hosts, they are all running Linux except for a very small number that run a proprietary OS and webserver. If the firewall were to get trashed and hose the currently active connections, I would expect that newly initiated connections would work. The actual failure mode needs a packet trace to determine, but you should be able to do this yourself (or ask your local network engineering staff). If your firewall is trashing the Sack field, then it needs to be fixed. Time to raise a case with the Cisco TAC and ask them directly if your PIX version has bug CSCse14419. You can't expect Sack to work when it's being fed trash, so it is important to make sure that is not happening. I've pinged our dude that handles the PIX stuff to see about getting an upgrade to 7.0(7). I should be able to get a packet trace, but it may take some time. At this point I'm getting a lot of resistance and people here telling me to just turn SACK off and not worry about what is causing this issue, but I'd really like to get to the bottom of it. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing
On Friday 21 December 2007 8:27:23 am Paul Moore wrote: On Friday 21 December 2007 4:43:10 am David Miller wrote: From: Paul Moore [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 16:42:25 -0500 This patch adds a number of new IPsec audit events to meet the auditing requirements of RFC4303. This includes audit hooks for the following events: * Could not find a valid SA [sections 2.1, 3.4.2] . xfrm_audit_state_notfound() . xfrm_audit_state_notfound_simple() * Sequence number overflow [section 3.3.3] . xfrm_audit_state_replay_overflow() * Replayed packet [section 3.4.3] . xfrm_audit_state_replay() * Integrity check failure [sections 3.4.4.1, 3.4.4.2] . xfrm_audit_state_icvfail() While RFC4304 deals only with ESP most of the changes in this patch apply to IPsec in general, i.e. both AH and ESP. The one case, integrity check failure, where ESP specific code had to be modified the same was done to the AH code for the sake of consistency. Signed-off-by: Paul Moore [EMAIL PROTECTED] This doesn't apply at all to net-2.6.25, in particular xfrm6_input_addr() doesn't even have a local variable named xfrm_vec_one let alone the conditional where you're adding the state notfound audit hook. Please respin this and the third patch, thanks. Sorry about that, I must have missed something (or probably just updated the wrong tree on accident). I'll respin the patches and send them out today. Ah, looks like I may not be crazy after all! It looks like the XFRM patches from Masahide NAKAMURA were pulled into net-2.6.25 just before mine last night which caused my patches to conflict ... -- paul moore linux security @ hp -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly
I agree. I mistakenly thought hz in that context must be larger than 1000.. As it's uncertain, your's looks much simpler and better. (btw, the lines else div = 1 is not needed, is it?) Thanks, Satoru SATOH 2007/12/21, YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]: (snip) Why don't you simply use unsigned long long (or maybe uint64_t) here? Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] --- diff --git a/ip/iproute.c b/ip/iproute.c index f4200ae..db9a3b6 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -509,16 +509,21 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) i != RTAX_RTO_MIN) fprintf(fp, %u, *(unsigned*)RTA_DATA(mxrta[i])); else { - unsigned val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned long long val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned div = 1; - val *= 1000; if (i == RTAX_RTT) - val /= 8; + div = 8; else if (i == RTAX_RTTVAR) - val /= 4; - if (val = hz) - fprintf(fp, %ums, val/hz); + div = 4; else + div = 1; + + val = val * 1000ULL / div; + + if (val = hz) { + fprintf(fp, %llums, val/hz); + } else fprintf(fp, %.2fms, (float)val/hz); } } --yoshfuji -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[TCP] Avoid a divide in tcp_mtu_probing()
tcp_mtu_to_mss() being signed, compiler might emit an integer divide to compute tcp_mtu_to_mss()/2 . Using a right shift is OK here and less expensive. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index ea111e9..ea85bc0 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -125,7 +125,7 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); int mss; - mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2; + mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low) 1; mss = min(sysctl_tcp_base_mss, mss); mss = max(mss, 68 - tp-tcp_header_len); icsk-icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] tcp_write_timeout.c cleanup
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 10:23:16 -0200 Em Fri, Dec 21, 2007 at 06:56:17AM +0100, Eric Dumazet escreveu: +static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) +{ + int mss; - int mss; + + /* Black hole detection */ + if (sysctl_tcp_mtu_probing) { + if (!icsk-icsk_mtup.enabled) { + icsk-icsk_mtup.enabled = 1; + tcp_sync_mss(sk, icsk-icsk_pmtu_cookie); + } else { + struct tcp_sock *tp = tcp_sk(sk); + int mss; + mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2; I've checked in this, ummm... patch :-) commit 323f3f2f31840f94e540ec5a0ce33593d05dd8d9 Author: David S. Miller [EMAIL PROTECTED] Date: Fri Dec 21 04:29:16 2007 -0800 [TCP]: Move mss variable in tcp_mtu_probing() Down into the only scope where it is used. Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 8f14808..ea111e9 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -116,8 +116,6 @@ static int tcp_orphan_retries(struct sock *sk, int alive) static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) { - int mss; - /* Black hole detection */ if (sysctl_tcp_mtu_probing) { if (!icsk-icsk_mtup.enabled) { @@ -125,6 +123,8 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) tcp_sync_mss(sk, icsk-icsk_pmtu_cookie); } else { struct tcp_sock *tp = tcp_sk(sk); + int mss; + mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2; mss = min(sysctl_tcp_base_mss, mss); mss = max(mss, 68 - tp-tcp_header_len); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] tcp_write_timeout.c cleanup
Em Fri, Dec 21, 2007 at 06:56:17AM +0100, Eric Dumazet escreveu: Before submiting a patch to change a divide to a right shift, I felt necessary to create a helper function tcp_mtu_probing() to reduce length of lines exceeding 100 chars in tcp_write_timeout(). Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index d8970ec..8f14808 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -114,13 +114,31 @@ static int tcp_orphan_retries(struct sock *sk, int alive) return retries; } +static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk) +{ + int mss; - int mss; + + /* Black hole detection */ + if (sysctl_tcp_mtu_probing) { + if (!icsk-icsk_mtup.enabled) { + icsk-icsk_mtup.enabled = 1; + tcp_sync_mss(sk, icsk-icsk_pmtu_cookie); + } else { + struct tcp_sock *tp = tcp_sk(sk); + int mss; + mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2; + mss = min(sysctl_tcp_base_mss, mss); + mss = max(mss, 68 - tp-tcp_header_len); + icsk-icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss); + tcp_sync_mss(sk, icsk-icsk_pmtu_cookie); + } + } +} :-) - Arnaldo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()
David Miller wrote: From: Jeff Garzik [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 06:26:48 -0500 YOSHIFUJI Hideaki / 吉藤英明 wrote: In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 07:03:58 +0100), Eric Dumazet [EMAIL PROTECTED] says: Because tot_len is signed in tcp_v6_send_ack(), tot_len/4 forces compiler to emit an integer divide, while we can help it to use a right shift, less expensive. Are you really sure? At least, gcc-4.1.2-20061115 (debian) does not make any difference. Quite true -- thus it is a matter of taste to the programmer. Not true, the code output does change, check your optimize-for-size kernel config setting. This was discussed and explained later in this thread, and I also explained it to you on IRC Jeff ;-) [looks] For signed integers, this is true. For unsigned integers, the code output is the same, regardless of optimization setting. Jeff -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [TCP]: Remove seq_rtt ptr from clean_rtx_queue args
While checking Gavin's patch I noticed that the returned seq_rtt is not used by the caller. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- I think it shouldn't introduce new conflicts between net-2.6.25 and net-2.6 (tested with git-am -3 to both). net/ipv4/tcp_input.c |7 ++- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0f0c1c9..a020e83 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2638,8 +2638,7 @@ static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb) * is before the ack sequence we can discard it as it's confirmed to have * arrived at the other end. */ -static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, - int prior_fackets) +static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets) { struct tcp_sock *tp = tcp_sk(sk); const struct inet_connection_sock *icsk = inet_csk(sk); @@ -2803,7 +2802,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, } } #endif - *seq_rtt_p = seq_rtt; return flag; } @@ -3044,7 +3042,6 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag) u32 ack = TCP_SKB_CB(skb)-ack_seq; u32 prior_in_flight; u32 prior_fackets; - s32 seq_rtt; int prior_packets; int frto_cwnd = 0; @@ -3111,7 +3108,7 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag) prior_in_flight = tcp_packets_in_flight(tp); /* See if we can take anything off of the retransmit queue. */ - flag |= tcp_clean_rtx_queue(sk, seq_rtt, prior_fackets); + flag |= tcp_clean_rtx_queue(sk, prior_fackets); if (tp-frto_counter) frto_cwnd = tcp_process_frto(sk, flag); -- 1.5.0.6
Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing
On Friday 21 December 2007 9:02:41 am David Miller wrote: From: Paul Moore [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 08:51:22 -0500 Ah, looks like I may not be crazy after all! It looks like the XFRM patches from Masahide NAKAMURA were pulled into net-2.6.25 just before mine last night which caused my patches to conflict ... Sorry. I had double checked and was going to rework your patches by hand if that was the case, but for some reason I didn't see his patches as being the culprit causing the rejects when I checked things out. I must have looked over the history correctly. No harm, no foul. I've just about finished respining the patches, expect them shortly ... -- paul moore linux security @ hp -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch net-2.6.25 0/2][NETNS] prepare the neigh subsys to handle network namespaces v2
These two patches are coming from Eric Biederman patchset to make the neighbouring aware of the namespaces. The description in the patch headers is big enough to add more comments here :) Changelog: v2 : fixed bad coding style v1 : initial post -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch net-2.6.25 1/2][NETNS] net: Modify the neighbour table code so it handles multiple network namespaces
I'm actually surprised at how much was involved. At first glance it appears that the neighbour table data structures are already split by network device so all that should be needed is to modify the user interface commands to filter the set of neighbours by the network namespace of their devices. However a couple things turned up while I was reading through the code. The proxy neighbour table allows entries with no network device, and the neighbour parms are per network device (except for the defaults) so they now need a per network namespace default. So I updated the two structures (which surprised me) with their very own network namespace parameter. Updated the relevant lookup and destroy routines with a network namespace parameter and modified the code that interacts with users to filter out neighbour table entries for devices of other namespaces. I'm a little concerned that we can modify and display the global table configuration and from all network namespaces. But this appears good enough for now. I keep thinking modifying the neighbour table to have per network namespace instances of each table type would should be cleaner. The hash table is already dynamically sized so there are it is not a limiter. The default parameter would be straight forward to take care of. However when I look at the how the network table is built and used I still find some assumptions that there is only a single neighbour table for each type of table in the kernel. The netlink operations, neigh_seq_start, the non-core network users that call neigh_lookup. So while it might be doable it would require more refactoring than my current approach of just doing a little extra filtering in the code. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/net/neighbour.h | 10 ++-- net/atm/clip.c | 15 +- net/core/neighbour.c| 116 +--- net/decnet/dn_neigh.c |6 +- net/decnet/dn_route.c |2 net/ipv4/arp.c | 12 ++-- net/ipv6/ip6_output.c |2 net/ipv6/ndisc.c|4 - 8 files changed, 106 insertions(+), 61 deletions(-) Index: net-2.6.25/include/net/neighbour.h === --- net-2.6.25.orig/include/net/neighbour.h +++ net-2.6.25/include/net/neighbour.h @@ -34,6 +34,7 @@ struct neighbour; struct neigh_parms { + struct net *net; struct net_device *dev; struct neigh_parms *next; int (*neigh_setup)(struct neighbour *); @@ -126,7 +127,8 @@ struct neigh_ops struct pneigh_entry { struct pneigh_entry *next; - struct net_device *dev; + struct net *net; + struct net_device *dev; u8 flags; u8 key[0]; }; @@ -187,6 +189,7 @@ extern struct neighbour * neigh_lookup(s const void *pkey, struct net_device *dev); extern struct neighbour * neigh_lookup_nodev(struct neigh_table *tbl, + struct net *net, const void *pkey); extern struct neighbour * neigh_create(struct neigh_table *tbl, const void *pkey, @@ -211,8 +214,8 @@ extern unsigned longneigh_rand_reach_t extern voidpneigh_enqueue(struct neigh_table *tbl, struct neigh_parms *p, struct sk_buff *skb); -extern struct pneigh_entry *pneigh_lookup(struct neigh_table *tbl, const void *key, struct net_device *dev, int creat); -extern int pneigh_delete(struct neigh_table *tbl, const void *key, struct net_device *dev); +extern struct pneigh_entry *pneigh_lookup(struct neigh_table *tbl, struct net *net, const void *key, struct net_device *dev, int creat); +extern int pneigh_delete(struct neigh_table *tbl, struct net *net, const void *key, struct net_device *dev); extern void neigh_app_ns(struct neighbour *n); extern void neigh_for_each(struct neigh_table *tbl, void (*cb)(struct neighbour *, void *), void *cookie); @@ -220,6 +223,7 @@ extern void __neigh_for_each_release(str extern void pneigh_for_each(struct neigh_table *tbl, void (*cb)(struct pneigh_entry *)); struct neigh_seq_state { + struct net *net; struct neigh_table *tbl; void *(*neigh_sub_iter)(struct neigh_seq_state *state, struct neighbour *n, loff_t *pos); Index: net-2.6.25/net/atm/clip.c === --- net-2.6.25.orig/net/atm/clip.c +++ net-2.6.25/net/atm/clip.c @@ -949,6 +949,11 @@ static int arp_seq_open(struct inode *in seq = file-private_data;
[SOCK] Avoid divides in sk_stream_pages() and __sk_stream_mem_reclaim()
Hi David This is the last one of this boring patch suite. I prefered to not change sk_forward_alloc type, I prefer to wait 18 months more before taking this responsibility :) Merry Christmas everybody Eric [SOCK] Avoid divides in sk_stream_pages() and __sk_stream_mem_reclaim() sk_forward_alloc being signed, we should take care of divides by SK_STREAM_MEM_QUANTUM we do in sk_stream_pages() and __sk_stream_mem_reclaim() This patchs introduces SK_STREAM_MEM_QUANTUM_SHIFT, defined as ilog2(SK_STREAM_MEM_QUANTUM), to be able to use right shifts instead of plain divides. This should help compiler to choose right shifts instead of expensive divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86) Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/include/net/sock.h b/include/net/sock.h index 4456453..a134be1 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -716,10 +716,11 @@ extern void __sk_stream_mem_reclaim(struct sock *sk); extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind); #define SK_STREAM_MEM_QUANTUM ((int)PAGE_SIZE) +#define SK_STREAM_MEM_QUANTUM_SHIFT ilog2(SK_STREAM_MEM_QUANTUM) static inline int sk_stream_pages(int amt) { - return DIV_ROUND_UP(amt, SK_STREAM_MEM_QUANTUM); + return (amt + SK_STREAM_MEM_QUANTUM - 1) SK_STREAM_MEM_QUANTUM_SHIFT; } static inline void sk_stream_mem_reclaim(struct sock *sk) diff --git a/net/core/stream.c b/net/core/stream.c index 5586879..bf188ff 100644 --- a/net/core/stream.c +++ b/net/core/stream.c @@ -196,7 +196,7 @@ EXPORT_SYMBOL(sk_stream_error); void __sk_stream_mem_reclaim(struct sock *sk) { - atomic_sub(sk-sk_forward_alloc / SK_STREAM_MEM_QUANTUM, + atomic_sub(sk-sk_forward_alloc SK_STREAM_MEM_QUANTUM_SHIFT, sk-sk_prot-memory_allocated); sk-sk_forward_alloc = SK_STREAM_MEM_QUANTUM - 1; if (*sk-sk_prot-memory_pressure -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IPv4: Fix a typo in tcp_v4_do_calc_md5_hash() which prevents compilation
It's spelled unsigned. Signed-off-by: Paul Moore [EMAIL PROTECTED] --- net/ipv4/tcp_ipv4.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 9cb92b8..9aea88b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1020,7 +1020,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char __user *optval, static int tcp_v4_do_calc_md5_hash(char *md5_hash, struct tcp_md5sig_key *key, __be32 saddr, __be32 daddr, struct tcphdr *th, int protocol, - unsinged int tcplen) + unsigned int tcplen) { struct scatterlist sg[4]; __u16 data_len; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly
In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 22:49:59 +0900), Satoru SATOH [EMAIL PROTECTED] says: I agree. I mistakenly thought hz in that context must be larger than 1000.. As it's uncertain, your's looks much simpler and better. (btw, the lines else div = 1 is not needed, is it?) Simplest fix is as follows: Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] -- diff --git a/ip/iproute.c b/ip/iproute.c index f4200ae..7a885b0 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -509,7 +509,7 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) i != RTAX_RTO_MIN) fprintf(fp, %u, *(unsigned*)RTA_DATA(mxrta[i])); else { - unsigned val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned long long val = *(unsigned*)RTA_DATA(mxrta[i]); val *= 1000; if (i == RTAX_RTT) @@ -517,7 +517,7 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) else if (i == RTAX_RTTVAR) val /= 4; if (val = hz) - fprintf(fp, %ums, val/hz); + fprintf(fp, %llums, val/hz); else fprintf(fp, %.2fms, (float)val/hz); } -- YOSHIFUJI Hideaki @ USAGI Project [EMAIL PROTECTED] GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
On Fri, 21 Dec 2007, David Miller wrote: When Gavin respins the patch I'll look at in the context of submitting it as a bug fix. So Gavin please generate the patch against Linus's vanilla GIT tree or net-2.6, your choise. The existing patch was against Linus' linux-2.6.git from a few days ago so I've updated my tree and regenerated the patch (below). Is that the right one? I'm just checking through the existing CA modules. I don't see the rtt used for RTO anywhere. This is what I gather they're each using rtt for. tcp_highspeed.c doesn't implement .pkts_acked tcp_hybla.c doesn't implement .pkts_acked tcp_scalable.c doesn't implement .pkts_acked tcp_bic.c ignores rtt value from .pkts_acked tcp_lp.cseems to ignore rtt value from .pkts_acked (despite setting TCP_CONG_RTT_STAMP for high res rtts -- why?) tcp_vegas.c uses high res rtt to measure congestion signal, increase, backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt tcp_veno.c uses high res rtt to measure congestion signal, increase, backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt tcp_yeah.c uses high res rtt to measure congestion signal, increase, backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt tcp_illinois.c uses rtt to scale increase, backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt tcp_htcp.c uses rtt to scale increase, backoff tcp_cubic.c uses rtt to scale increase, backoff tcp_westwood.c scales backoff using rtt So as far as I can tell, timeout stuff is not ever altered using pkts_acked() so I guess this fix only affects westwood, htcp and cubic just now. I need to re-read properly, but I think the same problem affects the microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno, yeah, illinois). I might follow up with another patch which changes the behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that. Thanks, Gavin Signed-off-by: Gavin McCullagh [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 889c893..6fb7989 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, u32 cnt = 0; u32 reord = tp-packets_out; s32 seq_rtt = -1; + s32 ca_seq_rtt = -1; ktime_t last_ackt = net_invalid_timestamp(); while ((skb = tcp_write_queue_head(sk)) skb != tcp_send_head(sk)) { @@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, if (sacked TCPCB_SACKED_RETRANS) tp-retrans_out -= packets_acked; flag |= FLAG_RETRANS_DATA_ACKED; + ca_seq_rtt = -1; seq_rtt = -1; if ((flag FLAG_DATA_ACKED) || (packets_acked 1)) flag |= FLAG_NONHEAD_RETRANS_ACKED; } else { + ca_seq_rtt = now - scb-when; if (seq_rtt 0) { - seq_rtt = now - scb-when; + seq_rtt = ca_seq_rtt; if (fully_acked) last_ackt = skb-tstamp; } @@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, !before(end_seq, tp-snd_up)) tp-urg_mode = 0; } else { + ca_seq_rtt = now - scb-when; if (seq_rtt 0) { - seq_rtt = now - scb-when; + seq_rtt = ca_seq_rtt; if (fully_acked) last_ackt = skb-tstamp; } @@ -2772,8 +2776,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, net_invalid_timestamp())) rtt_us = ktime_us_delta(ktime_get_real(), last_ackt); - else if (seq_rtt 0) - rtt_us = jiffies_to_usecs(seq_rtt); + else if (ca_seq_rtt 0) + rtt_us = jiffies_to_usecs(ca_seq_rtt); } ca_ops-pkts_acked(sk, pkts_acked, rtt_us); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] XFRM: RFC4303 compliant auditing
This patch adds a number of new IPsec audit events to meet the auditing requirements of RFC4303. This includes audit hooks for the following events: * Could not find a valid SA [sections 2.1, 3.4.2] . xfrm_audit_state_notfound() . xfrm_audit_state_notfound_simple() * Sequence number overflow [section 3.3.3] . xfrm_audit_state_replay_overflow() * Replayed packet [section 3.4.3] . xfrm_audit_state_replay() * Integrity check failure [sections 3.4.4.1, 3.4.4.2] . xfrm_audit_state_icvfail() While RFC4304 deals only with ESP most of the changes in this patch apply to IPsec in general, i.e. both AH and ESP. The one case, integrity check failure, where ESP specific code had to be modified the same was done to the AH code for the sake of consistency. Signed-off-by: Paul Moore [EMAIL PROTECTED] --- include/net/xfrm.h | 33 -- net/ipv4/ah4.c |4 + net/ipv4/esp4.c|1 net/ipv6/ah6.c |2 - net/ipv6/esp6.c|1 net/ipv6/xfrm6_input.c |1 net/xfrm/xfrm_input.c |3 + net/xfrm/xfrm_output.c |2 + net/xfrm/xfrm_policy.c | 14 ++-- net/xfrm/xfrm_state.c | 153 +++- 10 files changed, 180 insertions(+), 34 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 2ed25e7..9141bbd 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -565,26 +565,33 @@ struct xfrm_audit }; #ifdef CONFIG_AUDITSYSCALL -static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 secid) +static inline struct audit_buffer *xfrm_audit_start(const char *op) { struct audit_buffer *audit_buf = NULL; - char *secctx; - u32 secctx_len; + if (audit_enabled == 0) + return NULL; audit_buf = audit_log_start(current-audit_context, GFP_ATOMIC, - AUDIT_MAC_IPSEC_EVENT); + AUDIT_MAC_IPSEC_EVENT); if (audit_buf == NULL) return NULL; + audit_log_format(audit_buf, op=%s, op); + return audit_buf; +} - audit_log_format(audit_buf, auid=%u, auid); +static inline void xfrm_audit_helper_usrinfo(u32 auid, u32 secid, +struct audit_buffer *audit_buf) +{ + char *secctx; + u32 secctx_len; + audit_log_format(audit_buf, auid=%u, auid); if (secid != 0 security_secid_to_secctx(secid, secctx, secctx_len) == 0) { audit_log_format(audit_buf, subj=%s, secctx); security_release_secctx(secctx, secctx_len); } else audit_log_task_context(audit_buf); - return audit_buf; } extern void xfrm_audit_policy_add(struct xfrm_policy *xp, int result, @@ -595,11 +602,22 @@ extern void xfrm_audit_state_add(struct xfrm_state *x, int result, u32 auid, u32 secid); extern void xfrm_audit_state_delete(struct xfrm_state *x, int result, u32 auid, u32 secid); +extern void xfrm_audit_state_replay_overflow(struct xfrm_state *x, +struct sk_buff *skb); +extern void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 family); +extern void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family, + __be32 net_spi, __be32 net_seq); +extern void xfrm_audit_state_icvfail(struct xfrm_state *x, +struct sk_buff *skb, u8 proto); #else #define xfrm_audit_policy_add(x, r, a, s) do { ; } while (0) #define xfrm_audit_policy_delete(x, r, a, s) do { ; } while (0) #define xfrm_audit_state_add(x, r, a, s) do { ; } while (0) #define xfrm_audit_state_delete(x, r, a, s)do { ; } while (0) +#define xfrm_audit_state_replay_overflow(x, s) do { ; } while (0) +#define xfrm_audit_state_notfound_simple(s, f) do { ; } while (0) +#define xfrm_audit_state_notfound(s, f, sp, sq)do { ; } while (0) +#define xfrm_audit_state_icvfail(x, s, p) do { ; } while (0) #endif /* CONFIG_AUDITSYSCALL */ static inline void xfrm_pol_hold(struct xfrm_policy *policy) @@ -1214,7 +1232,8 @@ extern int xfrm_state_delete(struct xfrm_state *x); extern int xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info); extern void xfrm_sad_getinfo(struct xfrmk_sadinfo *si); extern void xfrm_spd_getinfo(struct xfrmk_spdinfo *si); -extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq); +extern int xfrm_replay_check(struct xfrm_state *x, +struct sk_buff *skb, __be32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, __be32 seq); extern void xfrm_replay_notify(struct xfrm_state *x, int event); extern int xfrm_state_mtu(struct xfrm_state *x, int mtu); diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index d76803a..ec8de0a 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -179,8 +179,10 @@ static int ah_input(struct
[PATCH 0/2] XFRM auditing patch rebased
Based on the net-2.6.25 tree from about an hour ago. The first patch was dropped because it is already applied. -- paul moore linux security @ hp -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[XFRM]: Fix outbound statistics.
David, I failed to include this statistics codes since I didn't notice the conflict with the latest Herbert XFRM fix. Please apply this, too. Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED] --- net/xfrm/xfrm_output.c | 12 +--- 1 files changed, 9 insertions(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index 867484a..8dee031 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -43,17 +43,23 @@ static int xfrm_output_one(struct sk_buff *skb, int err) do { err = xfrm_state_check_space(x, skb); - if (err) + if (err) { goto error_nolock; + XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR); + } err = x-outer_mode-output(x, skb); - if (err) + if (err) { + XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEMODEERROR); goto error_nolock; + } spin_lock_bh(x-lock); err = xfrm_state_check_expire(x); - if (err) + if (err) { + XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEEXPIRED); goto error; + } if (x-type-flags XFRM_TYPE_REPLAY_PROT) { XFRM_SKB_CB(skb)-seq = ++x-replay.oseq; -- 1.4.4.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPv4: Fix a typo in tcp_v4_do_calc_md5_hash() which prevents compilation
From: Paul Moore [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 08:47:13 -0500 It's spelled unsigned. Signed-off-by: Paul Moore [EMAIL PROTECTED] Applied, thanks. I swore I did an allmodconfig test build with that change added, looks like I didn't :-/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing
From: Paul Moore [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 08:51:22 -0500 Ah, looks like I may not be crazy after all! It looks like the XFRM patches from Masahide NAKAMURA were pulled into net-2.6.25 just before mine last night which caused my patches to conflict ... Sorry. I had double checked and was going to rework your patches by hand if that was the case, but for some reason I didn't see his patches as being the culprit causing the rejects when I checked things out. I must have looked over the history correctly. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP] Avoid two divides in __tcp_grow_window()
From: Eric Dumazet [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 15:05:51 +0100 tcp_win_from_space() being signed, compiler might emit an integer divide to compute tcp_win_from_space()/2 . Using right shifts is OK here and less expensive. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] Applied, thanks Eric. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
From: Gavin McCullagh [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 13:31:06 + On Fri, 21 Dec 2007, David Miller wrote: When Gavin respins the patch I'll look at in the context of submitting it as a bug fix. So Gavin please generate the patch against Linus's vanilla GIT tree or net-2.6, your choise. The existing patch was against Linus' linux-2.6.git from a few days ago so I've updated my tree and regenerated the patch (below). Is that the right one? Yep, it is. I need to re-read properly, but I think the same problem affects the microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno, yeah, illinois). I might follow up with another patch which changes the behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that. Ok, let us know what you find. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [XFRM]: Fix outbound statistics.
On Fri, Dec 21, 2007 at 11:25:00PM +0900, Masahide NAKAMURA wrote: do { err = xfrm_state_check_space(x, skb); - if (err) + if (err) { + XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR); goto error_nolock; + } err = x-outer_mode-output(x, skb); - if (err) + if (err) { + XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEMODEERROR); BTW, none of our existing mode output functions actually return an error. I noticed that the description for this field is actually Transformation mode specific error, e.g. Outer header space is not enough. This is slightly misleading as output header space is checked by xfrm_state_check_space so if there's an error that's where it'll show up. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[TCP] Avoid two divides in __tcp_grow_window()
tcp_win_from_space() being signed, compiler might emit an integer divide to compute tcp_win_from_space()/2 . Using right shifts is OK here and less expensive. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 6931946..145b51a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -289,8 +289,8 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); /* Optimize this! */ - int truesize = tcp_win_from_space(skb-truesize)/2; - int window = tcp_win_from_space(sysctl_tcp_rmem[2])/2; + int truesize = tcp_win_from_space(skb-truesize) 1; + int window = tcp_win_from_space(sysctl_tcp_rmem[2]) 1; while (tp-rcv_ssthresh = window) { if (truesize = skb-len) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
On Fri, 21 Dec 2007, Gavin McCullagh wrote: On Fri, 21 Dec 2007, David Miller wrote: When Gavin respins the patch I'll look at in the context of submitting it as a bug fix. So Gavin please generate the patch against Linus's vanilla GIT tree or net-2.6, your choise. The existing patch was against Linus' linux-2.6.git from a few days ago so I've updated my tree and regenerated the patch (below). Is that the right one? I'm just checking through the existing CA modules. I don't see the rtt used for RTO anywhere. This is what I gather they're each using rtt for. I meant more timeout like fashion (e.g., to timeout some internal phase but I don't find that too likely)... Thanks for checking them. So as far as I can tell, timeout stuff is not ever altered using pkts_acked() so I guess this fix only affects westwood, htcp and cubic just now. I need to re-read properly, but I think the same problem affects the microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno, yeah, illinois). I might follow up with another patch which changes the behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that. Please do, you might have to remove fully_acked checks to do that right though so it won't be as straight-forward change as this one and requires some amount of thinking to result in a right thing. Signed-off-by: Gavin McCullagh [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 889c893..6fb7989 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, u32 cnt = 0; u32 reord = tp-packets_out; s32 seq_rtt = -1; + s32 ca_seq_rtt = -1; ktime_t last_ackt = net_invalid_timestamp(); while ((skb = tcp_write_queue_head(sk)) skb != tcp_send_head(sk)) { @@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, if (sacked TCPCB_SACKED_RETRANS) tp-retrans_out -= packets_acked; flag |= FLAG_RETRANS_DATA_ACKED; + ca_seq_rtt = -1; seq_rtt = -1; if ((flag FLAG_DATA_ACKED) || (packets_acked 1)) flag |= FLAG_NONHEAD_RETRANS_ACKED; } else { + ca_seq_rtt = now - scb-when; if (seq_rtt 0) { - seq_rtt = now - scb-when; + seq_rtt = ca_seq_rtt; if (fully_acked) last_ackt = skb-tstamp; } @@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, !before(end_seq, tp-snd_up)) tp-urg_mode = 0; } else { + ca_seq_rtt = now - scb-when; if (seq_rtt 0) { - seq_rtt = now - scb-when; + seq_rtt = ca_seq_rtt; if (fully_acked) last_ackt = skb-tstamp; } @@ -2772,8 +2776,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p, net_invalid_timestamp())) rtt_us = ktime_us_delta(ktime_get_real(), last_ackt); - else if (seq_rtt 0) - rtt_us = jiffies_to_usecs(seq_rtt); + else if (ca_seq_rtt 0) + rtt_us = jiffies_to_usecs(ca_seq_rtt); } ca_ops-pkts_acked(sk, pkts_acked, rtt_us); Acked-by: Ilpo Järvinen [EMAIL PROTECTED] / Reviewed-by... whatever, I don't know if they really started to use it or not... :-) -- i.
Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT
On Fri, 21 Dec 2007, Ilpo Järvinen wrote: On Fri, 21 Dec 2007, Gavin McCullagh wrote: I'm just checking through the existing CA modules. I don't see the rtt used for RTO anywhere. This is what I gather they're each using rtt for. I meant more timeout like fashion (e.g., to timeout some internal phase but I don't find that too likely)... I meant didn't, no need to recheck them due to that... :-) -- i.
Re: Top 10 kernel oopses/warnings for the week of December 21st 2007
Rank 2: uart_flush_buffer Warning at drivers/serial/serial_core.c:544 in uart_flush_buffer() Reported 16 times No specific version information reported; bug present in 2.6.24-rc5 Caused by a bug in the Bluetooth line discipline/tty code Rank 7: uart_write Warning at drivers/serial/serial_core.c:490 in uart_write() Reported 2 times Seems to be more of the Bluetooth stuf Alan -- DASD is not really IBM - it has a vowel -Arjan van de Ven -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TCP]: Convert several length variable to unsigned.
On Fri, 2007-12-21 at 16:48 +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote: diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 93980c3..3b4169c 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -985,7 +985,7 @@ static void tcp_v6_send_reset(struct sock *sk, struct sk_buff *skb) struct tcphdr *th = tcp_hdr(skb), *t1; struct sk_buff *buff; struct flowi fl; - int tot_len = sizeof(*th); + unsigned int tot_len = sizeof(*th); #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *key; #endif @@ -1085,7 +1085,7 @@ static void tcp_v6_send_ack(struct tcp_timewait_sock *tw, struct tcphdr *th = tcp_hdr(skb), *t1; struct sk_buff *buff; struct flowi fl; - int tot_len = sizeof(struct tcphdr); + unsigned int tot_len = sizeof(struct tcphdr); __be32 *topt; #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *key; Unrelated, but perhaps tot_len should be initialized in a consistent fashion? I think unsigned int tot_len = sizeof(struct tcphdr); is more readable. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Top 10 kernel oopses/warnings for the week of December 21st 2007
Andi Kleen wrote: Arjan van de Ven [EMAIL PROTECTED] writes: Rank 8: __change_page_attr BUG at arch/x86/mm/pageattr_64.c:176 Reported 2 times Reported this week for 2.6.24-rc5; history goes back to 2.6.15 There is no BUG on this line on 2.6.24-rc5. Since there are many BUG_ONs in this file it is unclear which you mean. Could you always include the version of the kernel where the actual oops in the line came from? in this case this is really all the version information available ;( it seems to be a patched kernel without patched EXTRAVERSION. But in the future if I have more specific information (eg it's only 1 kernel version) I'll mention it in more detail. It gets unwieldy if there's 500 reports for an oops of course ;) Anyways there are a lot of third party modules who do strange things with c_p_a(), not always legal, so you might look up out for that pattern too. Perhaps report the out of tree modules loaded in the summary too? I already always will mention if the oops is tainted or not (that I track specifically); I'll keep an eye out for other non-tainting out of tree modules as well. Thanks for the suggestions. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] phylib: add module owner to the mii_bus structure
Prevent unloading mii bus driver module when other modules have references to some phydevs on that bus. Added a new member (module owner) to struct mii_bus and added code to increment the mii bus owner module usage count on phy_connect and decrement it on phy_disconnect Set the module owner in the ucc_geth_mdio driver. Signed-off-by: Ionut Nicu [EMAIL PROTECTED] --- drivers/net/phy/phy_device.c |9 - drivers/net/ucc_geth_mii.c |3 +++ include/linux/phy.h |4 3 files changed, 15 insertions(+), 1 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 5b9e175..7dc5480 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -178,6 +178,10 @@ struct phy_device * phy_connect(struct net_device *dev, const char *phy_id, if (phydev-irq 0) phy_start_interrupts(phydev); + /* Increment the usage count of the mii bus owner */ + if (!try_module_get(phydev-bus-owner)) + return ERR_PTR(-EFAULT); + return phydev; } EXPORT_SYMBOL(phy_connect); @@ -192,9 +196,12 @@ void phy_disconnect(struct phy_device *phydev) phy_stop_interrupts(phydev); phy_stop_machine(phydev); - + phydev-adjust_link = NULL; + /* Decrement the reference count for the mii bus owner */ + module_put(phydev-bus-owner); + phy_detach(phydev); } EXPORT_SYMBOL(phy_disconnect); diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c index a3af4ea..84c7295 100644 --- a/drivers/net/ucc_geth_mii.c +++ b/drivers/net/ucc_geth_mii.c @@ -217,6 +217,9 @@ static int uec_mdio_probe(struct of_device *ofdev, const struct of_device_id *ma } } + /* register ourselves as the owner of this bus */ + new_bus-owner = THIS_MODULE; + err = mdiobus_register(new_bus); if (0 != err) { printk(KERN_ERR %s: Cannot register as MDIO bus\n, diff --git a/include/linux/phy.h b/include/linux/phy.h index 554836e..04ff6a5 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -82,6 +82,10 @@ struct mii_bus { const char *name; int id; void *priv; + + /* The module that owns this bus */ + struct module *owner; + int (*read)(struct mii_bus *bus, int phy_id, int regnum); int (*write)(struct mii_bus *bus, int phy_id, int regnum, u16 val); int (*reset)(struct mii_bus *bus); -- 1.5.4.rc0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[XFRM]: Fix outbound statistics.
Hello David, I'm sorry, the previous mail contains wrong code: David, I failed to include this statistics codes since I didn't notice the conflict with the latest Herbert XFRM fix. Please apply this, too. Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED] --- [snip] do { err = xfrm_state_check_space(x, skb); - if (err) + if (err) { goto error_nolock; + XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR); + } Please apply the below patch instead of the previous. Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED] --- net/xfrm/xfrm_output.c | 12 +--- 1 files changed, 9 insertions(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index 867484a..f8e5961 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -43,17 +43,23 @@ static int xfrm_output_one(struct sk_buff *skb, int err) do { err = xfrm_state_check_space(x, skb); - if (err) + if (err) { + XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR); goto error_nolock; + } err = x-outer_mode-output(x, skb); - if (err) + if (err) { + XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEMODEERROR); goto error_nolock; + } spin_lock_bh(x-lock); err = xfrm_state_check_expire(x); - if (err) + if (err) { + XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEEXPIRED); goto error; + } if (x-type-flags XFRM_TYPE_REPLAY_PROT) { XFRM_SKB_CB(skb)-seq = ++x-replay.oseq; -- 1.4.4.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] XFRM: Drop packets when replay counter would overflow
According to RFC4303, section 3.3.3 we need to drop outgoing packets which cause the replay counter to overflow: 3.3.3. Sequence Number Generation The sender's counter is initialized to 0 when an SA is established. The sender increments the sequence number (or ESN) counter for this SA and inserts the low-order 32 bits of the value into the Sequence Number field. Thus, the first packet sent using a given SA will contain a sequence number of 1. If anti-replay is enabled (the default), the sender checks to ensure that the counter has not cycled before inserting the new value in the Sequence Number field. In other words, the sender MUST NOT send a packet on an SA if doing so would cause the sequence number to cycle. An attempt to transmit a packet that would result in sequence number overflow is an auditable event. The audit log entry for this event SHOULD include the SPI value, current date/time, Source Address, Destination Address, and (in IPv6) the cleartext Flow ID. Signed-off-by: Paul Moore [EMAIL PROTECTED] --- net/xfrm/xfrm_output.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index 0951444..d73003c 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -57,8 +57,11 @@ static int xfrm_output_one(struct sk_buff *skb, int err) if (x-type-flags XFRM_TYPE_REPLAY_PROT) { XFRM_SKB_CB(skb)-seq = ++x-replay.oseq; - if (unlikely(x-replay.oseq == 0)) + if (unlikely(x-replay.oseq == 0)) { + x-replay.oseq--; xfrm_audit_state_replay_overflow(x, skb); + goto error; + } if (xfrm_aevent_is_on()) xfrm_replay_notify(x, XFRM_REPLAY_UPDATE); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] ucc_geth: split ucc_geth into two modules
Split ucc_geth_driver into 2 modules: - one module for the mii bus (phy devices register to this bus). - one module for the ethernet driver (uses phy_connect to get a phydev from the mii bus) Updated Makefile, Kconfig files and defconfigs (mpc836x, mpc832x_mds, mpc832x_rdb). Signed-off-by: Ionut Nicu [EMAIL PROTECTED] --- arch/powerpc/configs/mpc832x_mds_defconfig |1 + arch/powerpc/configs/mpc832x_rdb_defconfig |1 + arch/powerpc/configs/mpc836x_mds_defconfig |1 + drivers/net/Kconfig|8 drivers/net/Makefile |5 - drivers/net/ucc_geth.c | 18 +- drivers/net/ucc_geth_mii.c |9 + 7 files changed, 29 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/configs/mpc832x_mds_defconfig b/arch/powerpc/configs/mpc832x_mds_defconfig index 2d8951b..1c51739 100644 --- a/arch/powerpc/configs/mpc832x_mds_defconfig +++ b/arch/powerpc/configs/mpc832x_mds_defconfig @@ -500,6 +500,7 @@ CONFIG_NETDEV_1000=y # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set # CONFIG_GIANFAR is not set +CONFIG_UCC_MDIO=y CONFIG_UCC_GETH=y # CONFIG_UGETH_NAPI is not set # CONFIG_UGETH_MAGIC_PACKET is not set diff --git a/arch/powerpc/configs/mpc832x_rdb_defconfig b/arch/powerpc/configs/mpc832x_rdb_defconfig index 761718a..cb4d076 100644 --- a/arch/powerpc/configs/mpc832x_rdb_defconfig +++ b/arch/powerpc/configs/mpc832x_rdb_defconfig @@ -503,6 +503,7 @@ CONFIG_E1000=y # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set # CONFIG_GIANFAR is not set +CONFIG_UCC_MDIO=y CONFIG_UCC_GETH=y CONFIG_UGETH_NAPI=y # CONFIG_UGETH_MAGIC_PACKET is not set diff --git a/arch/powerpc/configs/mpc836x_mds_defconfig b/arch/powerpc/configs/mpc836x_mds_defconfig index c44fc56..92166e9 100644 --- a/arch/powerpc/configs/mpc836x_mds_defconfig +++ b/arch/powerpc/configs/mpc836x_mds_defconfig @@ -499,6 +499,7 @@ CONFIG_NETDEV_1000=y # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set # CONFIG_GIANFAR is not set +CONFIG_UCC_MDIO=y CONFIG_UCC_GETH=y # CONFIG_UGETH_NAPI is not set # CONFIG_UGETH_MAGIC_PACKET is not set diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index d9107e5..7314802 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2315,9 +2315,17 @@ config GFAR_NAPI bool Use Rx Polling (NAPI) depends on GIANFAR +config UCC_MDIO + tristate Freescale QE UCC MDIO Bus + depends on QUICC_ENGINE + select PHYLIB + help + Provides Bus interface for MII Management regs in the UCC register space. + config UCC_GETH tristate Freescale QE Gigabit Ethernet depends on QUICC_ENGINE + select UCC_MDIO select PHYLIB help This driver supports the Gigabit Ethernet mode of the QUICC Engine, diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 0e5fde4..97843a3 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -22,8 +22,11 @@ gianfar_driver-objs := gianfar.o \ gianfar_mii.o \ gianfar_sysfs.o +obj-$(CONFIG_UCC_MDIO) += ucc_geth_mdio.o +ucc_geth_mdio-objs := ucc_geth_mii.o + obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o -ucc_geth_driver-objs := ucc_geth.o ucc_geth_mii.o ucc_geth_ethtool.o +ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o # # link order important here diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c index c6a1902..c33a4cb 100644 --- a/drivers/net/ucc_geth.c +++ b/drivers/net/ucc_geth.c @@ -1612,9 +1612,12 @@ static int init_phy(struct net_device *dev) priv-oldspeed = 0; priv-oldduplex = -1; + request_module(ucc_geth_mdio); + snprintf(phy_id, BUS_ID_SIZE, PHY_ID_FMT, priv-ug_info-mdio_bus, priv-ug_info-phy_address); + phydev = phy_connect(dev, phy_id, adjust_link, 0, priv-phy_interface); if (IS_ERR(phydev)) { @@ -4025,12 +4028,7 @@ static struct of_platform_driver ucc_geth_driver = { static int __init ucc_geth_init(void) { - int i, ret; - - ret = uec_mdio_init(); - - if (ret) - return ret; + int i; if (netif_msg_drv(debug)) printk(KERN_INFO ucc_geth: DRV_DESC \n); @@ -4038,18 +4036,12 @@ static int __init ucc_geth_init(void) memcpy((ugeth_info[i]), ugeth_primary_info, sizeof(ugeth_primary_info)); - ret = of_register_platform_driver(ucc_geth_driver); - - if (ret) - uec_mdio_exit(); - - return ret; + return of_register_platform_driver(ucc_geth_driver); } static void __exit ucc_geth_exit(void) { of_unregister_platform_driver(ucc_geth_driver); - uec_mdio_exit(); } module_init(ucc_geth_init); diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c index df884f0..a3af4ea 100644 --- a/drivers/net/ucc_geth_mii.c +++ b/drivers/net/ucc_geth_mii.c
[patch net-2.6.25 2/2][NETNS] net: Add a helper function neigh_param_default_alloc
In the presence of multiple network namespaces the logic needed to allocate the a default parameter table is just barely non-trivial. So add a function to automate it to make everyone's life easier. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/net/neighbour.h |1 + net/core/neighbour.c| 15 +++ 2 files changed, 16 insertions(+) Index: net-2.6.25/include/net/neighbour.h === --- net-2.6.25.orig/include/net/neighbour.h +++ net-2.6.25/include/net/neighbour.h @@ -208,6 +208,7 @@ extern struct neighbour *neigh_event_ns struct net_device *dev); extern struct neigh_parms *neigh_parms_alloc(struct net_device *dev, struct neigh_table *tbl); +extern struct neigh_parms *neigh_parms_alloc_default(struct neigh_table *tbl, struct net *net); extern voidneigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms); extern voidneigh_parms_destroy(struct neigh_parms *parms); extern unsigned long neigh_rand_reach_time(unsigned long base); Index: net-2.6.25/net/core/neighbour.c === --- net-2.6.25.orig/net/core/neighbour.c +++ net-2.6.25/net/core/neighbour.c @@ -1325,6 +1325,20 @@ struct neigh_parms *neigh_parms_alloc(st return p; } +struct neigh_parms *neigh_parms_alloc_default(struct neigh_table *tbl, + struct net *net) +{ + struct neigh_parms *parms; + if (net != init_net) { + parms = neigh_parms_alloc(NULL, tbl); + release_net(parms-net); + parms-net = hold_net(net); + } + else + parms = neigh_parms_clone(tbl-parms); + return parms; +} + static void neigh_rcu_free_parms(struct rcu_head *head) { struct neigh_parms *parms = @@ -2787,6 +2801,7 @@ EXPORT_SYMBOL(neigh_ifdown); EXPORT_SYMBOL(neigh_lookup); EXPORT_SYMBOL(neigh_lookup_nodev); EXPORT_SYMBOL(neigh_parms_alloc); +EXPORT_SYMBOL(neigh_parms_alloc_default); EXPORT_SYMBOL(neigh_parms_release); EXPORT_SYMBOL(neigh_rand_reach_time); EXPORT_SYMBOL(neigh_resolve_output); -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Top 10 kernel oopses/warnings for the week of December 21st 2007
in this case this is really all the version information available ;( it seems to be a patched kernel without patched EXTRAVERSION. But in the future if I have more specific information (eg it's only 1 kernel version) I'll mention it in more detail. It gets unwieldy if there's 500 reports for an oops of course ;) Hmm would there be an automatic way to check out the file of the kernel version and then check if the BUG_ON/WARN_ON is on that line? Maybe it could be done using git. Anyways there are a lot of third party modules who do strange things with c_p_a(), not always legal, so you might look up out for that pattern too. Perhaps report the out of tree modules loaded in the summary too? I already always will mention if the oops is tainted or not (that I track specifically); I don't necessarily mean tainted, just out of tree modules in general. There are some GPL modules who do strange things too. Not saying that these oopses should be all ignored -- they might be legitimate kernel bugs that they just trigger -- just it should be visible somehow in the summary in case there is a pattern. Especially for c_p_a() i'm quite suspicious because it depends a lot on what the caller did. One way perhaps would be also to check if there is an out of tree module inside the backtrace. I suppose you could keep a list of in tree modules and do this automatically. Of course there could be false positives too with the standard inexact backtrace. -Andi -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] New driver sfc for Solarstorm SFC4000 controller - 3nd try
This is a resubmission of a new driver for Solarflare network controllers. The driver supports several types of PHY (10Gbase-T, XFP, CX4) on six different 10G and 1G boards. The previous thread was: [PATCH] [RFC] New driver sfc for Solarstorm SFC4000 controller http://marc.info/?l=linux-netdevm=119757352830103w=2 Thanks to the people who looked at the previous patches. We have addressed the following from received comments after the 2nd submission: - Use kzalloc where appropriate - Remove deprecated fastcall attributes - Use kernel routines for hex dumps and MAC address printing - Changes to logging MACROs The last two patches were marked with RFC but I now think that this driver is ready (withstanding any further review comments) and I would like to ask that this driver is considered for merging. The patch (against net-2.6.25) is at: https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.patch The new files may also be downloaded as a tarball: https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.tgz And for verification there is: https://support.solarflare.com/netdev/3/MD5SUMS Happy Christmas -- Robert Stonehouse -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly
On Fri, 21 Dec 2007 22:58:04 +0900 (JST) YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 22:49:59 +0900), Satoru SATOH [EMAIL PROTECTED] says: I agree. I mistakenly thought hz in that context must be larger than 1000.. As it's uncertain, your's looks much simpler and better. (btw, the lines else div = 1 is not needed, is it?) Simplest fix is as follows: Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] -- diff --git a/ip/iproute.c b/ip/iproute.c index f4200ae..7a885b0 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -509,7 +509,7 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) i != RTAX_RTO_MIN) fprintf(fp, %u, *(unsigned*)RTA_DATA(mxrta[i])); else { - unsigned val = *(unsigned*)RTA_DATA(mxrta[i]); + unsigned long long val = *(unsigned*)RTA_DATA(mxrta[i]); val *= 1000; if (i == RTAX_RTT) @@ -517,7 +517,7 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) else if (i == RTAX_RTTVAR) val /= 4; if (val = hz) - fprintf(fp, %ums, val/hz); + fprintf(fp, %llums, val/hz); else fprintf(fp, %.2fms, (float)val/hz); } applied thanks. -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix lost export-dynamic
On Mon, 17 Dec 2007 16:06:38 +0300 Vitaliy Gusev [EMAIL PROTECTED] wrote: get_link_kind() fails for statically linked modules (vlan, veth, etc.) if ip was linked without export-dynamic. Signed-off-by: Vitaliy Gusev [EMAIL PROTECTED] applied thanks -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Module for ip utility to support veth device
On Tue, 18 Dec 2007 15:15:38 +0300 Vitaliy Gusev [EMAIL PROTECTED] wrote: module link_veth Signed-off-by: Vitaliy Gusev [EMAIL PROTECTED] --- Applied both patches, and moved veth.h from ip/veth.h to include/net/veth.h so that the header file will be correctly updated if ABI changes. -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] New driver sfc for Solarstorm SFC4000 controller - 3nd try
On Fri, 21 Dec 2007 16:53:40 + Robert Stonehouse wrote: This is a resubmission of a new driver for Solarflare network controllers. The driver supports several types of PHY (10Gbase-T, XFP, CX4) on six different 10G and 1G boards. The previous thread was: [PATCH] [RFC] New driver sfc for Solarstorm SFC4000 controller http://marc.info/?l=linux-netdevm=119757352830103w=2 Thanks to the people who looked at the previous patches. We have addressed the following from received comments after the 2nd submission: - Use kzalloc where appropriate - Remove deprecated fastcall attributes - Use kernel routines for hex dumps and MAC address printing - Changes to logging MACROs The last two patches were marked with RFC but I now think that this driver is ready (withstanding any further review comments) and I would like to ask that this driver is considered for merging. The patch (against net-2.6.25) is at: https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.patch wow, 750+ KB How many drivers is this? The new files may also be downloaded as a tarball: https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.tgz And for verification there is: https://support.solarflare.com/netdev/3/MD5SUMS Hi, I have just a few comments, mostly related to infrastructure. a. Don't use kernel-doc comment flags (i.e., /** to begin a comment block) when the comment is not in kernel-doc format. Or use kernel-doc format. :) (as documented in Documentation/kernel-doc-nano-HOWTO.txt) E.g.: +/** @file + * + * Efx driverlink + * + * This header file defines the portions of the Efx driverlink + * interface that are used only by the kernel net driver, and are not + * part of the public interface specification. + */ b. Kconfig file: Use 1 tab (not 8 spaces) to indent below config items. Use 1 tab + 2 spaces to indent help text below help lines. c. Driver contains MTD, SPI, I2C (at least) code and needs to be reviewed by people in those areas as well (IMO). I see an MTD dependency in the Kconfig file. What about the SPI and I2C parts? Are they conditional or how is that handled? or does the driver not use the kernel infrastructure for these? --- ~Randy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BNX2 warning
On Thu, 2007-12-20 at 20:39 -0800, David Miller wrote: Michael, please fix this, thanks :-) drivers/net/bnx2.c: In function 'bnx2_init_napi': drivers/net/bnx2.c:7329: warning: no return statement in function returning non-void [BNX2]: Fix compiler warning. Change bnx2_init_napi() to void. Warning was noted by DaveM. Signed-off-by: Michael Chan [EMAIL PROTECTED] diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index aef2a9a..94d1857 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -7313,7 +7313,7 @@ bnx2_bus_string(struct bnx2 *bp, char *str) return str; } -static int __devinit +static void __devinit bnx2_init_napi(struct bnx2 *bp) { int i; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25 3/3] Uninline the inet_twsk_put function
Hi David, David Miller schrieb: inet_timewait_sock begins with a struct sock_common which is where the atomic_t is, and: #define tw_refcnt __tw_common.skc_refcnt So you would have to change struct sock_common over to kref, and thus the entire networking, in order to make such a change. Ok, that sounds too much. Many thanks for following up and taking the time to explain it. But you would have seen this instantly if you had spent 5 seconds looking at how these datastructures are defined. Instead you choose to make me do it and explain it to you instead. Sorry, just matched the wrong pattern here :-) Best Regards Ingo Oeser -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] New driver sfc for Solarstorm SFC4000 controller - 3nd try
Randy Dunlap wrote: On Fri, 21 Dec 2007 16:53:40 + Robert Stonehouse wrote: This is a resubmission of a new driver for Solarflare network controllers. snip The last two patches were marked with RFC but I now think that this driver is ready (withstanding any further review comments) and I would like to ask that this driver is considered for merging. The patch (against net-2.6.25) is at: https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.patch wow, 750+ KB How many drivers is this? Just two: sfc (net) and sfc_mtd (MTD). As Robert said, the net driver supports a variety of PHYs. It also has a fair amount of self-test and (conditional) debug code, and comments. I've noted your comments on kernel-doc and Kconfig format and will address those in the next version. snip c. Driver contains MTD, SPI, I2C (at least) code and needs to be reviewed by people in those areas as well (IMO). I see an MTD dependency in the Kconfig file. What about the SPI and I2C parts? Are they conditional or how is that handled? or does the driver not use the kernel infrastructure for these? We are not currently using the kernel infrastructure for those. I'm not sure whether we could do. I think everyone working on the net driver here will be away until the new year, so please forgive our silence in the mean time. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: santize headers for iproute2
iproute2 source uses headers that result from make headers_install. The header files net/tcp_states.h and net/veth.h are both being used but are not processed. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- This patch is not urgent but it would be good to get it in 2.6.24. --- a/include/Kbuild2007-12-21 09:33:36.0 -0800 +++ b/include/Kbuild2007-12-21 09:33:42.0 -0800 @@ -4,5 +4,6 @@ header-y += sound/ header-y += mtd/ header-y += rdma/ header-y += video/ +header-y += net/ header-y += asm-$(ARCH)/ --- /dev/null 1970-01-01 00:00:00.0 + +++ b/include/net/Kbuild2007-12-21 09:32:20.0 -0800 @@ -0,0 +1,2 @@ +header-y += tcp_states.h +header-y += veth.h -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TSO trimming question
On Fri, 21 Dec 2007, Bill Fink wrote: Or perhaps even: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = jiffies; /* need to return a non-zero value to defer, which means won't * defer if jiffies == 0 but it's only a 1 in 4 billion event * (and avoids a compare/branch by not checking jiffies) / return jiffies; Ack. I introduced my own 64-bit to 32-bit issue (too late at night). How about: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = jiffies; /* this won't defer if jiffies == 0 but it's only a 1 in * 4 billion event (and avoids a branch) */ return (jiffies != 0); -Bill -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [TCP]: Force TSO splits to MSS boundaries
On Thu, 20 Dec 2007, David Miller wrote: From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 22:00:12 +0800 On Thu, Dec 20, 2007 at 04:00:37AM -0800, David Miller wrote: In the most ideal sense, tcp_window_allows() should probably be changed to only return MSS multiples. Unfortunately this would add an expensive modulo operation, however I think it would elimiate this problem case. Well you only have to divide in the unlikely case of us being limited by the receiver window. In that case speed is probably not of the essence anyway. Agreed, to some extent. I say to some extent because it might be realistic, with lots (millions) of sockets to hit this case a lot. There are so many things that are a don't care performance wise until you have a lot of stinky connections over crappy links. How about this, I had to use another approach due to reasons outlined in the commit message: -- [PATCH] [TCP]: Force TSO splits to MSS boundaries If snd_wnd - snd_nxt wasn't multiple of MSS, skb was split on odd boundary by the callers of tcp_window_allows. We try really hard to avoid unnecessary modulos. Therefore the old caller side check if (skb-len limit) was too wide as well because limit is not bound in any way to skb-len and can cause spurious testing for trimming in the middle of the queue while we only wanted that to happen at the tail of the queue. A simple additional caller side check for tcp_write_queue_tail would likely have resulted 2 x modulos because the limit would have to be first calculated from window, however, doing that unnecessary modulo is not mandatory. After a minor change to the algorithm, simply determine first if the modulo is needed at all and at that point immediately decide also from which value it should be calculated from. This approach also kills some duplicated code. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_output.c | 51 - 1 files changed, 25 insertions(+), 26 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 1852698..ea92a1b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1008,13 +1008,29 @@ static void tcp_cwnd_validate(struct sock *sk) } } -static unsigned int tcp_window_allows(struct tcp_sock *tp, struct sk_buff *skb, unsigned int mss_now, unsigned int cwnd) +/* Returns the portion of skb which can be sent right away without + * introducing MSS oddities to segment boundaries. In rare cases where + * mss_now != mss_cache, we will request caller to create a small skb + * per input skb which could be mostly avoided here (if desired). + */ +static unsigned int tcp_mss_split_point(struct sock *sk, struct sk_buff *skb, + unsigned int mss_now, + unsigned int cwnd) { - u32 window, cwnd_len; + struct tcp_sock *tp = tcp_sk(sk); + u32 needed, window, cwnd_len; window = (tp-snd_una + tp-snd_wnd - TCP_SKB_CB(skb)-seq); cwnd_len = mss_now * cwnd; - return min(window, cwnd_len); + + if (likely(cwnd_len = window skb != tcp_write_queue_tail(sk))) + return cwnd_len; + + if (skb == tcp_write_queue_tail(sk) cwnd_len = skb-len) + return cwnd_len; + + needed = min(skb-len, window); + return needed - needed % mss_now; } /* Can at least one segment of SKB be sent right now, according to the @@ -1445,17 +1461,9 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle) } limit = mss_now; - if (tso_segs 1) { - limit = tcp_window_allows(tp, skb, - mss_now, cwnd_quota); - - if (skb-len limit) { - unsigned int trim = skb-len % mss_now; - - if (trim) - limit = skb-len - trim; - } - } + if (tso_segs 1) + limit = tcp_mss_split_point(sk, skb, mss_now, + cwnd_quota); if (skb-len limit unlikely(tso_fragment(sk, skb, limit, mss_now))) @@ -1502,7 +1510,6 @@ void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss, */ void tcp_push_one(struct sock *sk, unsigned int mss_now) { - struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb = tcp_send_head(sk); unsigned int tso_segs, cwnd_quota; @@ -1517,17 +1524,9 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now) BUG_ON(!tso_segs); limit = mss_now; - if (tso_segs 1) { - limit = tcp_window_allows(tp, skb, - mss_now, cwnd_quota); - -
Re: TSO trimming question
On Fri, 21 Dec 2007, Bill Fink wrote: On Fri, 21 Dec 2007, Bill Fink wrote: Or perhaps even: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = jiffies; /* need to return a non-zero value to defer, which means won't * defer if jiffies == 0 but it's only a 1 in 4 billion event * (and avoids a compare/branch by not checking jiffies) / return jiffies; Ack. I introduced my own 64-bit to 32-bit issue (too late at night). How about: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = jiffies; /* this won't defer if jiffies == 0 but it's only a 1 in * 4 billion event (and avoids a branch) */ return (jiffies != 0); I'm not sure how the jiffies work but is this racy as well? Simple return tp-tso_deferred; should work, shouldn't it? :-) -- i. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Force TSO splits to MSS boundaries
On Fri, 21 Dec 2007, Ilpo Järvinen wrote: How about this, I had to use another approach due to reasons outlined in the commit message: -- [PATCH] [TCP]: Force TSO splits to MSS boundaries If snd_wnd - snd_nxt wasn't multiple of MSS, skb was split on odd boundary by the callers of tcp_window_allows. We try really hard to avoid unnecessary modulos. Therefore the old caller side check if (skb-len limit) was too wide as well because limit is not bound in any way to skb-len and can cause spurious testing for trimming in the middle of the queue while we only wanted that to happen at the tail of the queue. A simple additional caller side check for tcp_write_queue_tail would likely have resulted 2 x modulos because the limit would have to be first calculated from window, however, doing that unnecessary modulo is not mandatory. After a minor change to the algorithm, simply determine first if the modulo is needed at all and at that point immediately decide also from which value it should be calculated from. This approach also kills some duplicated code. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_output.c | 51 - 1 files changed, 25 insertions(+), 26 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 1852698..ea92a1b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1008,13 +1008,29 @@ static void tcp_cwnd_validate(struct sock *sk) } } -static unsigned int tcp_window_allows(struct tcp_sock *tp, struct sk_buff *skb, unsigned int mss_now, unsigned int cwnd) +/* Returns the portion of skb which can be sent right away without + * introducing MSS oddities to segment boundaries. In rare cases where + * mss_now != mss_cache, we will request caller to create a small skb + * per input skb which could be mostly avoided here (if desired). + */ +static unsigned int tcp_mss_split_point(struct sock *sk, struct sk_buff *skb, + unsigned int mss_now, + unsigned int cwnd) { - u32 window, cwnd_len; + struct tcp_sock *tp = tcp_sk(sk); + u32 needed, window, cwnd_len; window = (tp-snd_una + tp-snd_wnd - TCP_SKB_CB(skb)-seq); cwnd_len = mss_now * cwnd; - return min(window, cwnd_len); + + if (likely(cwnd_len = window skb != tcp_write_queue_tail(sk))) + return cwnd_len; + + if (skb == tcp_write_queue_tail(sk) cwnd_len = skb-len) + return cwnd_len; ...if desired that this won't cause small skbs in the middle of the queue (except in case where just the sub-MSS portion is left out above window), something like this could be added here (again, avoiding more modulos): if (skb != tcp_write_queue_tail(sk) window skb-len) return skb-len; + needed = min(skb-len, window); + return needed - needed % mss_now; } /* Can at least one segment of SKB be sent right now, according to the -- i.
Re: TSO trimming question
On Fri, 21 Dec 2007, Ilpo Järvinen wrote: On Fri, 21 Dec 2007, Bill Fink wrote: On Fri, 21 Dec 2007, Bill Fink wrote: Or perhaps even: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = jiffies; /* need to return a non-zero value to defer, which means won't * defer if jiffies == 0 but it's only a 1 in 4 billion event * (and avoids a compare/branch by not checking jiffies) / return jiffies; Ack. I introduced my own 64-bit to 32-bit issue (too late at night). How about: /* Ok, it looks like it is advisable to defer. */ tp-tso_deferred = jiffies; /* this won't defer if jiffies == 0 but it's only a 1 in * 4 billion event (and avoids a branch) */ return (jiffies != 0); I'm not sure how the jiffies work but is this racy as well? Simple return tp-tso_deferred; should work, shouldn't it? :-) As long as tp-tso_deferred remains u32, pending the other issue. -Bill -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] ucc_geth: split ucc_geth into two modules
Tested-by: Emil Medve [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nicu Ioan Petru Sent: Friday, December 21, 2007 7:57 AM To: netdev@vger.kernel.org Cc: Nicu Ioan Petru Subject: [PATCH 1/2] ucc_geth: split ucc_geth into two modules Split ucc_geth_driver into 2 modules: - one module for the mii bus (phy devices register to this bus). - one module for the ethernet driver (uses phy_connect to get a phydev from the mii bus) Updated Makefile, Kconfig files and defconfigs (mpc836x, mpc832x_mds, mpc832x_rdb). Signed-off-by: Ionut Nicu [EMAIL PROTECTED] --- arch/powerpc/configs/mpc832x_mds_defconfig |1 + arch/powerpc/configs/mpc832x_rdb_defconfig |1 + arch/powerpc/configs/mpc836x_mds_defconfig |1 + drivers/net/Kconfig|8 drivers/net/Makefile |5 - drivers/net/ucc_geth.c | 18 +- drivers/net/ucc_geth_mii.c |9 + 7 files changed, 29 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/configs/mpc832x_mds_defconfig b/arch/powerpc/configs/mpc832x_mds_defconfig index 2d8951b..1c51739 100644 --- a/arch/powerpc/configs/mpc832x_mds_defconfig +++ b/arch/powerpc/configs/mpc832x_mds_defconfig @@ -500,6 +500,7 @@ CONFIG_NETDEV_1000=y # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set # CONFIG_GIANFAR is not set +CONFIG_UCC_MDIO=y CONFIG_UCC_GETH=y # CONFIG_UGETH_NAPI is not set # CONFIG_UGETH_MAGIC_PACKET is not set diff --git a/arch/powerpc/configs/mpc832x_rdb_defconfig b/arch/powerpc/configs/mpc832x_rdb_defconfig index 761718a..cb4d076 100644 --- a/arch/powerpc/configs/mpc832x_rdb_defconfig +++ b/arch/powerpc/configs/mpc832x_rdb_defconfig @@ -503,6 +503,7 @@ CONFIG_E1000=y # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set # CONFIG_GIANFAR is not set +CONFIG_UCC_MDIO=y CONFIG_UCC_GETH=y CONFIG_UGETH_NAPI=y # CONFIG_UGETH_MAGIC_PACKET is not set diff --git a/arch/powerpc/configs/mpc836x_mds_defconfig b/arch/powerpc/configs/mpc836x_mds_defconfig index c44fc56..92166e9 100644 --- a/arch/powerpc/configs/mpc836x_mds_defconfig +++ b/arch/powerpc/configs/mpc836x_mds_defconfig @@ -499,6 +499,7 @@ CONFIG_NETDEV_1000=y # CONFIG_TIGON3 is not set # CONFIG_BNX2 is not set # CONFIG_GIANFAR is not set +CONFIG_UCC_MDIO=y CONFIG_UCC_GETH=y # CONFIG_UGETH_NAPI is not set # CONFIG_UGETH_MAGIC_PACKET is not set diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index d9107e5..7314802 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2315,9 +2315,17 @@ config GFAR_NAPI bool Use Rx Polling (NAPI) depends on GIANFAR +config UCC_MDIO + tristate Freescale QE UCC MDIO Bus + depends on QUICC_ENGINE + select PHYLIB + help + Provides Bus interface for MII Management regs in the UCC register space. + config UCC_GETH tristate Freescale QE Gigabit Ethernet depends on QUICC_ENGINE + select UCC_MDIO select PHYLIB help This driver supports the Gigabit Ethernet mode of the QUICC Engine, diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 0e5fde4..97843a3 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -22,8 +22,11 @@ gianfar_driver-objs := gianfar.o \ gianfar_mii.o \ gianfar_sysfs.o +obj-$(CONFIG_UCC_MDIO) += ucc_geth_mdio.o +ucc_geth_mdio-objs := ucc_geth_mii.o + obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o -ucc_geth_driver-objs := ucc_geth.o ucc_geth_mii.o ucc_geth_ethtool.o +ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o # # link order important here diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c index c6a1902..c33a4cb 100644 --- a/drivers/net/ucc_geth.c +++ b/drivers/net/ucc_geth.c @@ -1612,9 +1612,12 @@ static int init_phy(struct net_device *dev) priv-oldspeed = 0; priv-oldduplex = -1; + request_module(ucc_geth_mdio); + snprintf(phy_id, BUS_ID_SIZE, PHY_ID_FMT, priv-ug_info-mdio_bus, priv-ug_info-phy_address); + phydev = phy_connect(dev, phy_id, adjust_link, 0, priv-phy_interface); if (IS_ERR(phydev)) { @@ -4025,12 +4028,7 @@ static struct of_platform_driver ucc_geth_driver = { static int __init ucc_geth_init(void) { - int i, ret; - - ret = uec_mdio_init(); - - if (ret) - return ret; + int i; if (netif_msg_drv(debug)) printk(KERN_INFO ucc_geth: DRV_DESC \n); @@ -4038,18 +4036,12 @@ static int __init ucc_geth_init(void) memcpy((ugeth_info[i]), ugeth_primary_info, sizeof(ugeth_primary_info)); - ret = of_register_platform_driver(ucc_geth_driver); - - if (ret) - uec_mdio_exit(); - - return ret; +
RE: [PATCH 2/2] phylib: add module owner to the mii_bus structure
Tested-by: Emil Medve [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nicu Ioan Petru Sent: Friday, December 21, 2007 7:58 AM To: netdev@vger.kernel.org Cc: Nicu Ioan Petru Subject: [PATCH 2/2] phylib: add module owner to the mii_bus structure Prevent unloading mii bus driver module when other modules have references to some phydevs on that bus. Added a new member (module owner) to struct mii_bus and added code to increment the mii bus owner module usage count on phy_connect and decrement it on phy_disconnect Set the module owner in the ucc_geth_mdio driver. Signed-off-by: Ionut Nicu [EMAIL PROTECTED] --- drivers/net/phy/phy_device.c |9 - drivers/net/ucc_geth_mii.c |3 +++ include/linux/phy.h |4 3 files changed, 15 insertions(+), 1 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 5b9e175..7dc5480 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -178,6 +178,10 @@ struct phy_device * phy_connect(struct net_device *dev, const char *phy_id, if (phydev-irq 0) phy_start_interrupts(phydev); + /* Increment the usage count of the mii bus owner */ + if (!try_module_get(phydev-bus-owner)) + return ERR_PTR(-EFAULT); + return phydev; } EXPORT_SYMBOL(phy_connect); @@ -192,9 +196,12 @@ void phy_disconnect(struct phy_device *phydev) phy_stop_interrupts(phydev); phy_stop_machine(phydev); - + phydev-adjust_link = NULL; + /* Decrement the reference count for the mii bus owner */ + module_put(phydev-bus-owner); + phy_detach(phydev); } EXPORT_SYMBOL(phy_disconnect); diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c index a3af4ea..84c7295 100644 --- a/drivers/net/ucc_geth_mii.c +++ b/drivers/net/ucc_geth_mii.c @@ -217,6 +217,9 @@ static int uec_mdio_probe(struct of_device *ofdev, const struct of_device_id *ma } } + /* register ourselves as the owner of this bus */ + new_bus-owner = THIS_MODULE; + err = mdiobus_register(new_bus); if (0 != err) { printk(KERN_ERR %s: Cannot register as MDIO bus\n, diff --git a/include/linux/phy.h b/include/linux/phy.h index 554836e..04ff6a5 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -82,6 +82,10 @@ struct mii_bus { const char *name; int id; void *priv; + + /* The module that owns this bus */ + struct module *owner; + int (*read)(struct mii_bus *bus, int phy_id, int regnum); int (*write)(struct mii_bus *bus, int phy_id, int regnum, u16 val); int (*reset)(struct mii_bus *bus); -- 1.5.4.rc0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ETH]: Combine format_addr() with print_mac().
[ETH]: Combine format_addr() with print_mac(). print_mac() used by most net drivers and format_addr() used by net-sysfs.c are very similar and they can be integrated. format_addr() is also identically redefined in the qla4xxx iscsi driver. Export a new function format_mac_addr() to be used by net-sysfs, qla4xxx and others in the future. Both print_mac() and format_mac_addr() call _format_mac_addr() to do the formatting. Signed-off-by: Michael Chan [EMAIL PROTECTED] Cc: Joe Perches [EMAIL PROTECTED] Cc: Mike Christie [EMAIL PROTECTED] Cc: David Somayajulu [EMAIL PROTECTED] diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c index 89460d2..0f4562b 100644 --- a/drivers/scsi/qla4xxx/ql4_os.c +++ b/drivers/scsi/qla4xxx/ql4_os.c @@ -173,18 +173,6 @@ static void qla4xxx_conn_stop(struct iscsi_cls_conn *conn, int flag) printk(KERN_ERR iscsi: invalid stop flag %d\n, flag); } -static ssize_t format_addr(char *buf, const unsigned char *addr, int len) -{ - int i; - char *cp = buf; - - for (i = 0; i len; i++) - cp += sprintf(cp, %02x%c, addr[i], - i == (len - 1) ? '\n' : ':'); - return cp - buf; -} - - static int qla4xxx_host_get_param(struct Scsi_Host *shost, enum iscsi_host_param param, char *buf) { @@ -193,7 +181,7 @@ static int qla4xxx_host_get_param(struct Scsi_Host *shost, switch (param) { case ISCSI_HOST_PARAM_HWADDRESS: - len = format_addr(buf, ha-my_mac, MAC_ADDR_LEN); + len = format_mac_addr(buf, ha-my_mac, MAC_ADDR_LEN); break; case ISCSI_HOST_PARAM_IPADDRESS: len = sprintf(buf, %d.%d.%d.%d\n, ha-ip_address[0], diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h index cc002cb..d20512c 100644 --- a/include/linux/if_ether.h +++ b/include/linux/if_ether.h @@ -124,10 +124,11 @@ int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr); extern struct ctl_table ether_table[]; #endif +extern ssize_t format_mac_addr(char *buf, const unsigned char *addr, int len); + /* * Display a 6 byte device address (MAC) in a readable format. */ -#define MAC_FMT %02x:%02x:%02x:%02x:%02x:%02x extern char *print_mac(char *buf, const u8 *addr); #define DECLARE_MAC_BUF(var) char var[18] __maybe_unused diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index e41f4b9..e72993b 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -95,17 +95,6 @@ NETDEVICE_SHOW(type, fmt_dec); NETDEVICE_SHOW(link_mode, fmt_dec); /* use same locking rules as GIFHWADDR ioctl's */ -static ssize_t format_addr(char *buf, const unsigned char *addr, int len) -{ - int i; - char *cp = buf; - - for (i = 0; i len; i++) - cp += sprintf(cp, %02x%c, addr[i], - i == (len - 1) ? '\n' : ':'); - return cp - buf; -} - static ssize_t show_address(struct device *dev, struct device_attribute *attr, char *buf) { @@ -114,7 +103,7 @@ static ssize_t show_address(struct device *dev, struct device_attribute *attr, read_lock(dev_base_lock); if (dev_isalive(net)) - ret = format_addr(buf, net-dev_addr, net-addr_len); + ret = format_mac_addr(buf, net-dev_addr, net-addr_len); read_unlock(dev_base_lock); return ret; } @@ -124,7 +113,7 @@ static ssize_t show_broadcast(struct device *dev, { struct net_device *net = to_net_dev(dev); if (dev_isalive(net)) - return format_addr(buf, net-broadcast, net-addr_len); + return format_mac_addr(buf, net-broadcast, net-addr_len); return -EINVAL; } diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 6b2e454..f760d41 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -359,10 +359,33 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, unsigned int queue_count) } EXPORT_SYMBOL(alloc_etherdev_mq); +static ssize_t _format_mac_addr(char *buf, const unsigned char *addr, int len) +{ + int i; + char *cp = buf; + + for (i = 0; i len; i++) { + cp += sprintf(cp, %02x, addr[i]); + if (i == len - 1) + break; + *cp++ = ':'; + } + return cp - buf; +} + +ssize_t format_mac_addr(char *buf, const unsigned char *addr, int len) +{ + ssize_t l; + + l = _format_mac_addr(buf, addr, len); + strcpy(buf + l, \n); + return l + 1; +} +EXPORT_SYMBOL(format_mac_addr); + char *print_mac(char *buf, const u8 *addr) { - sprintf(buf, MAC_FMT, - addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]); + _format_mac_addr(buf, addr, ETH_ALEN); return buf; } EXPORT_SYMBOL(print_mac); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More
Re: [PATCH 2/2] XFRM: Drop packets when replay counter would overflow
From: Paul Moore [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 09:15:00 -0500 According to RFC4303, section 3.3.3 we need to drop outgoing packets which cause the replay counter to overflow: 3.3.3. Sequence Number Generation The sender's counter is initialized to 0 when an SA is established. The sender increments the sequence number (or ESN) counter for this SA and inserts the low-order 32 bits of the value into the Sequence Number field. Thus, the first packet sent using a given SA will contain a sequence number of 1. If anti-replay is enabled (the default), the sender checks to ensure that the counter has not cycled before inserting the new value in the Sequence Number field. In other words, the sender MUST NOT send a packet on an SA if doing so would cause the sequence number to cycle. An attempt to transmit a packet that would result in sequence number overflow is an auditable event. The audit log entry for this event SHOULD include the SPI value, current date/time, Source Address, Destination Address, and (in IPv6) the cleartext Flow ID. Signed-off-by: Paul Moore [EMAIL PROTECTED] Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] XFRM: RFC4303 compliant auditing
From: Paul Moore [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 09:14:55 -0500 This patch adds a number of new IPsec audit events to meet the auditing requirements of RFC4303. This includes audit hooks for the following events: * Could not find a valid SA [sections 2.1, 3.4.2] . xfrm_audit_state_notfound() . xfrm_audit_state_notfound_simple() * Sequence number overflow [section 3.3.3] . xfrm_audit_state_replay_overflow() * Replayed packet [section 3.4.3] . xfrm_audit_state_replay() * Integrity check failure [sections 3.4.4.1, 3.4.4.2] . xfrm_audit_state_icvfail() While RFC4304 deals only with ESP most of the changes in this patch apply to IPsec in general, i.e. both AH and ESP. The one case, integrity check failure, where ESP specific code had to be modified the same was done to the AH code for the sake of consistency. Signed-off-by: Paul Moore [EMAIL PROTECTED] Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ETH]: Combine format_addr() with print_mac().
On Fri, 2007-12-21 at 14:05 -0800, Michael Chan wrote: [ETH]: Combine format_addr() with print_mac(). print_mac() used by most net drivers and format_addr() used by net-sysfs.c are very similar and they can be integrated. format_addr() is also identically redefined in the qla4xxx iscsi driver. diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 6b2e454..f760d41 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -359,10 +359,33 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, unsigned int queue_count) } EXPORT_SYMBOL(alloc_etherdev_mq); +static ssize_t _format_mac_addr(char *buf, const unsigned char *addr, int len) +{ + int i; + char *cp = buf; + + for (i = 0; i len; i++) { + cp += sprintf(cp, %02x, addr[i]); + if (i == len - 1) + break; + *cp++ = ':'; + } + return cp - buf; +} + +ssize_t format_mac_addr(char *buf, const unsigned char *addr, int len) +{ + ssize_t l; + + l = _format_mac_addr(buf, addr, len); + strcpy(buf + l, \n); + return l + 1; +} +EXPORT_SYMBOL(format_mac_addr); + char *print_mac(char *buf, const u8 *addr) { - sprintf(buf, MAC_FMT, - addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]); + _format_mac_addr(buf, addr, ETH_ALEN); return buf; } EXPORT_SYMBOL(print_mac); I think const unsigned char *addr should be const u8 *addr ssize_t? shouldn't it be size_t? Indexing buf by int len is unchecked. That could lead to unintended buffer overruns. Maybe add a buflen argument and use snprintf? I had a patch that added some type-safety to print_mac and prevented this unintended buffer overrun. It seems it wasn't applied. --- Subject: Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG Date: Mon, 24 Sep 2007 10:28:36 -0700 Here is a patch that adds some type safety to print_mac by using a struct print_mac_buf * instead of char *. It also reduces the defconfig vmlinux size by 8 bytes. Signed-off-by: Joe Perches [EMAIL PROTECTED] -- include/linux/if_ether.h | 12 ++-- net/ethernet/eth.c |6 +++--- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h index 57abca1..620d6b1 100644 --- a/include/linux/if_ether.h +++ b/include/linux/if_ether.h @@ -126,7 +126,15 @@ extern struct ctl_table ether_table[]; * Display a 6 byte device address (MAC) in a readable format. */ #define MAC_FMT %02x:%02x:%02x:%02x:%02x:%02x -extern char *print_mac(char *buf, const u8 *addr); -#define DECLARE_MAC_BUF(var) char var[18] __maybe_unused + +struct print_mac_buf { + char formatted_mac_addr[18]; +}; + +#define DECLARE_MAC_BUF(var) \ + struct print_mac_buf __maybe_unused _##var; \ + struct print_mac_buf __maybe_unused *var = _##var + +extern char *print_mac(struct print_mac_buf *buf, const u8 *addr); #endif /* _LINUX_IF_ETHER_H */ diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 2aaf6fa..ad82613 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -338,10 +338,10 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, unsigned int queue_count) } EXPORT_SYMBOL(alloc_etherdev_mq); -char *print_mac(char *buf, const u8 *addr) +char *print_mac(struct print_mac_buf *buf, const u8 *addr) { - sprintf(buf, MAC_FMT, + sprintf(buf-formatted_mac_addr, MAC_FMT, addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]); - return buf; + return buf-formatted_mac_addr; } EXPORT_SYMBOL(print_mac); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BNX2 warning
From: Michael Chan [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 10:34:15 -0800 [BNX2]: Fix compiler warning. Change bnx2_init_napi() to void. Warning was noted by DaveM. Signed-off-by: Michael Chan [EMAIL PROTECTED] Applied, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] phylib: add module owner to the mii_bus structure
On Fri, 21 Dec 2007 15:57:31 +0200 Ionut Nicu [EMAIL PROTECTED] wrote: Prevent unloading mii bus driver module when other modules have references to some phydevs on that bus. Added a new member (module owner) to struct mii_bus and added code to increment the mii bus owner module usage count on phy_connect and decrement it on phy_disconnect Set the module owner in the ucc_geth_mdio driver. Signed-off-by: Ionut Nicu [EMAIL PROTECTED] --- drivers/net/phy/phy_device.c |9 - drivers/net/ucc_geth_mii.c |3 +++ include/linux/phy.h |4 3 files changed, 15 insertions(+), 1 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 5b9e175..7dc5480 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -178,6 +178,10 @@ struct phy_device * phy_connect(struct net_device *dev, const char *phy_id, if (phydev-irq 0) phy_start_interrupts(phydev); + /* Increment the usage count of the mii bus owner */ + if (!try_module_get(phydev-bus-owner)) + return ERR_PTR(-EFAULT); Shouldn't you get a handle before the first usage (start_interrupts)? + return phydev; } EXPORT_SYMBOL(phy_connect); @@ -192,9 +196,12 @@ void phy_disconnect(struct phy_device *phydev) phy_stop_interrupts(phydev); phy_stop_machine(phydev); - + phydev-adjust_link = NULL; + /* Decrement the reference count for the mii bus owner */ + module_put(phydev-bus-owner); + phy_detach(phydev); } EXPORT_SYMBOL(phy_disconnect); diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c index a3af4ea..84c7295 100644 --- a/drivers/net/ucc_geth_mii.c +++ b/drivers/net/ucc_geth_mii.c @@ -217,6 +217,9 @@ static int uec_mdio_probe(struct of_device *ofdev, const struct of_device_id *ma } } + /* register ourselves as the owner of this bus */ + new_bus-owner = THIS_MODULE; + err = mdiobus_register(new_bus); if (0 != err) { printk(KERN_ERR %s: Cannot register as MDIO bus\n, diff --git a/include/linux/phy.h b/include/linux/phy.h index 554836e..04ff6a5 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -82,6 +82,10 @@ struct mii_bus { const char *name; int id; void *priv; + + /* The module that owns this bus */ + struct module *owner; + int (*read)(struct mii_bus *bus, int phy_id, int regnum); int (*write)(struct mii_bus *bus, int phy_id, int regnum, u16 val); int (*reset)(struct mii_bus *bus); -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] [CORE]: datagram: basic memory accounting functions
David Miller wrote: From: Hideo AOKI [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 23:18:54 -0500 Also, the memory accounting is done at different parts in the socket code paths for stream vs. datagram. This is why everything is inconsistent, and, a mess. Could you tell me more detailed information? I think the core thing is that TCP and INET protocols call into the memory accounting internally, either inside their own code paths or with inet_*() helpers. This is versus what we really want is everything happening via generic sk_foo() helpers. If that's what's happening already, great, just consolidate the datagram vs. stream stuff and it should be good. Thank you for the explanation. I'll do my best. Regards, Hideo -- Hitachi Computer Products (America) Inc. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v9 06/18] LSM: Add inet_sys_snd_skb() LSM hook (fwd)
This is part of a large patchset which finally fixes labeled networking, which we're hoping to get into 2.6.25. Thread @ http://thread.gmane.org/gmane.linux.kernel.lsm/4894 The patch below is the only one which is not self-contained impacts on core networking code. If anyone has any objections or comments on this patch, please let us know. -- Forwarded message -- Date: Fri, 21 Dec 2007 12:09:28 -0500 From: Paul Moore [EMAIL PROTECTED] To: [EMAIL PROTECTED], [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: [RFC PATCH v9 06/18] LSM: Add inet_sys_snd_skb() LSM hook Add an inet_sys_snd_skb() LSM hook to allow the LSM to provide packet level access control for all outbound packets. Using the existing postroute_last netfilter hook turns out to be problematic as it is can be invoked multiple times for a single packet, e.g. individual IPsec transforms, adding unwanted overhead and complicating the security policy. Signed-off-by: Paul Moore [EMAIL PROTECTED] --- include/linux/security.h | 11 +++ net/ipv4/ip_output.c |7 +++ net/ipv6/ip6_output.c|5 + security/dummy.c |8 +++- security/security.c |6 ++ 5 files changed, 36 insertions(+), 1 deletions(-) diff --git a/include/linux/security.h b/include/linux/security.h index db19c92..1b8d332 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -876,6 +876,10 @@ struct request_sock; * Sets the connection's peersid to the secmark on skb. * @req_classify_flow: * Sets the flow's sid to the openreq sid. + * @inet_sys_snd_skb: + * Check permissions on outgoing network packets. + * @skb is the packet to check + * @family is the packet's address family * * Security hooks for XFRM operations. * @@ -1416,6 +1420,7 @@ struct security_operations { void (*inet_csk_clone)(struct sock *newsk, const struct request_sock *req); void (*inet_conn_established)(struct sock *sk, struct sk_buff *skb); void (*req_classify_flow)(const struct request_sock *req, struct flowi *fl); + int (*inet_sys_snd_skb)(struct sk_buff *skb, int family); #endif /* CONFIG_SECURITY_NETWORK */ #ifdef CONFIG_SECURITY_NETWORK_XFRM @@ -2328,6 +2333,7 @@ void security_sk_free(struct sock *sk); void security_sk_clone(const struct sock *sk, struct sock *newsk); void security_sk_classify_flow(struct sock *sk, struct flowi *fl); void security_req_classify_flow(const struct request_sock *req, struct flowi *fl); +int security_inet_sys_snd_skb(struct sk_buff *skb, int family); void security_sock_graft(struct sock*sk, struct socket *parent); int security_inet_conn_request(struct sock *sk, struct sk_buff *skb, struct request_sock *req); @@ -2471,6 +2477,11 @@ static inline void security_req_classify_flow(const struct request_sock *req, st { } +static inline int security_inet_sys_snd_skb(struct sk_buff *skb, int family) +{ + return 0; +} + static inline void security_sock_graft(struct sock* sk, struct socket *parent) { } diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index fd99fbd..82a7297 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -204,6 +204,8 @@ static inline int ip_skb_dst_mtu(struct sk_buff *skb) static int ip_finish_output(struct sk_buff *skb) { + int err; + #if defined(CONFIG_NETFILTER) defined(CONFIG_XFRM) /* Policy lookup after SNAT yielded a new policy */ if (skb-dst-xfrm != NULL) { @@ -211,6 +213,11 @@ static int ip_finish_output(struct sk_buff *skb) return dst_output(skb); } #endif + + err = security_inet_sys_snd_skb(skb, AF_INET); + if (err) + return err; + if (skb-len ip_skb_dst_mtu(skb) !skb_is_gso(skb)) return ip_fragment(skb, ip_finish_output2); else diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 6338a9c..44ddf32 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -72,8 +72,13 @@ static __inline__ void ipv6_select_ident(struct sk_buff *skb, struct frag_hdr *f static int ip6_output_finish(struct sk_buff *skb) { + int err; struct dst_entry *dst = skb-dst; + err = security_inet_sys_snd_skb(skb, AF_INET6); + if (err) + return err; + if (dst-hh) return neigh_hh_output(dst-hh, skb); else if (dst-neighbour) diff --git a/security/dummy.c b/security/dummy.c index 0b62f95..384979a 100644 --- a/security/dummy.c +++ b/security/dummy.c @@ -848,6 +848,11 @@ static inline void dummy_req_classify_flow(const struct request_sock *req, struct flowi *fl) { } + +static inline int dummy_inet_sys_snd_skb(struct sk_buff *skb, int family) +{ + return 0; +} #endif /* CONFIG_SECURITY_NETWORK */ #ifdef CONFIG_SECURITY_NETWORK_XFRM @@ -1122,7 +1127,8 @@ void security_fixup_ops (struct security_operations *ops)
[PATCH] Reduce locking in TX path of forcedth driver
Reduce the amount of locking in the TX path. Instead of using both netif_tx_lock and dev-priv-lock on transmitting, a single private lock (dev-priv-tx_lock) is used. This method is similar to that of the e1000 driver, including the logic to stop the queue in the start xmit functions, and the logic to wake the queue in the TX done functions. We see some performance improvement with this patch. Signed-off-by: Tom Herbert [EMAIL PROTECTED] --- linux-2.6/drivers/net/forcedeth.c.orig 2007-12-21 16:26:15.743639000 -0800 +++ linux-2.6/drivers/net/forcedeth.c 2007-12-21 16:51:19.001325000 -0800 @@ -525,6 +525,12 @@ union ring_type { #define RING_MAX_DESC_VER_11024 #define RING_MAX_DESC_VER_2_3 16384 +/* + * Maxmimum number of fragments that a single packet could need in + * transmit. + */ +#defineMAX_TX_FRAGS(MAX_SKB_FRAGS + (65535 NV_TX2_TSO_MAX_SHIFT)) + /* rx/tx mac addr + type + vlan + align + slack*/ #define NV_RX_HEADERS (64) /* even more slack. */ @@ -738,9 +744,8 @@ struct nv_skb_map { * critical parts: * - rx is (pseudo-) lockless: it relies on the single-threading provided * by the arch code for interrupts. - * - tx setup is lockless: it relies on netif_tx_lock. Actual submission - * needs dev-priv-lock :-( - * - set_multicast_list: preparation lockless, relies on netif_tx_lock. + * - tx uses dev-priv-tx_lock and not netif_tx_lock. TX done processing + * only acquires dev-priv-tx_lock when the queue needs to be awoken. */ /* in dev: base, irq */ @@ -806,6 +811,7 @@ struct fe_priv { /* * tx specific fields. */ + spinlock_t tx_lock; union ring_type get_tx, put_tx, first_tx, last_tx; struct nv_skb_map *get_tx_ctx, *put_tx_ctx; struct nv_skb_map *first_tx_ctx, *last_tx_ctx; @@ -814,7 +820,6 @@ struct fe_priv { union ring_type tx_ring; u32 tx_flags; int tx_ring_size; - int tx_stop; /* vlan fields */ struct vlan_group *vlangrp; @@ -1775,9 +1780,16 @@ static inline u32 nv_get_empty_tx_slots( return (u32)(np-tx_ring_size - ((np-tx_ring_size + (np-put_tx_ctx - np-get_tx_ctx)) % np-tx_ring_size)); } +static inline u32 nv_get_used_tx_slots(struct fe_priv *np) +{ + return (u32)((np-tx_ring_size + (np-put_tx_ctx - np-get_tx_ctx)) % +np-tx_ring_size); +} + +#defineTX_WAKE_THRESHOLD(np) ((np)-tx_ring_size / 4) + /* * nv_start_xmit: dev-hard_start_xmit function - * Called with netif_tx_lock held. */ static int nv_start_xmit(struct sk_buff *skb, struct net_device *dev) { @@ -1790,7 +1802,7 @@ static int nv_start_xmit(struct sk_buff u32 bcnt; u32 size = skb-len-skb-data_len; u32 entries = (size NV_TX2_TSO_MAX_SHIFT) + ((size (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0); - u32 empty_slots; + unsigned long irq_flags; struct ring_desc* put_tx; struct ring_desc* start_tx; struct ring_desc* prev_tx; @@ -1802,12 +1814,22 @@ static int nv_start_xmit(struct sk_buff ((skb_shinfo(skb)-frags[i].size (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0); } - empty_slots = nv_get_empty_tx_slots(np); - if (unlikely(empty_slots = entries)) { - spin_lock_irq(np-lock); - netif_stop_queue(dev); - np-tx_stop = 1; - spin_unlock_irq(np-lock); + local_irq_save(irq_flags); + if (!spin_trylock(np-tx_lock)) { + /* Collision - tell upper layer to requeue */ + local_irq_restore(irq_flags); + return NETDEV_TX_LOCKED; + } + + if (unlikely(nv_get_empty_tx_slots(np) = entries)) { + if (!netif_queue_stopped(dev)) { + netif_stop_queue(dev); + + /* This is a hard error, log it. */ + printk(KERN_ERR %s: BUG! Tx Ring full when + queue awake\n, dev-name); + } + spin_unlock_irqrestore(np-tx_lock, irq_flags); return NETDEV_TX_BUSY; } @@ -1858,6 +1880,12 @@ static int nv_start_xmit(struct sk_buff } while (size); } + /* +* Make put_tx_ctx visible to nx_tx_done as soon as possible, +* this might avoid an unnecessary queue wakeup. +*/ + smp_mb(); + /* set last fragment flag */ prev_tx-flaglen |= cpu_to_le32(tx_flags_extra); @@ -1870,14 +1898,10 @@ static int nv_start_xmit(struct sk_buff tx_flags_extra = skb-ip_summed == CHECKSUM_PARTIAL ? NV_TX2_CHECKSUM_L3 | NV_TX2_CHECKSUM_L4 : 0; - spin_lock_irq(np-lock); - /* set tx flags */ start_tx-flaglen |= cpu_to_le32(tx_flags | tx_flags_extra); np-put_tx.orig = put_tx; - spin_unlock_irq(np-lock); - dprintk(KERN_DEBUG %s: nv_start_xmit: entries %d queued for
[no subject]
Subject: [XFRM] Documentaion: Fix error example at XFRMOUTSTATEMODEERROR. (Re: [XFRM]: Fix outbound statistics.) Hello, On Fri, 21 Dec 2007 23:11:11 +0800 Herbert Xu [EMAIL PROTECTED] wrote: On Fri, Dec 21, 2007 at 11:25:00PM +0900, Masahide NAKAMURA wrote: do { err = xfrm_state_check_space(x, skb); - if (err) + if (err) { + XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR); goto error_nolock; + } err = x-outer_mode-output(x, skb); - if (err) + if (err) { + XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEMODEERROR); BTW, none of our existing mode output functions actually return an error. I noticed that the description for this field is actually Transformation mode specific error, e.g. Outer header space is not enough. This is slightly misleading as output header space is checked by xfrm_state_check_space so if there's an error that's where it'll show up. Thanks for comment, Herbert. I fix the documentation to remove e.g. Outer header space is not enough from XFRMSTATEMODEERROR. About error code from xfrm_state_check_space(), I still map it XFRMOUTERROR (other errors) this time because I think the error here is not a length error by protocol (e.g MTU related things) but an internal buffer management. Any comments for the statistics are still welcomed. David, please apply the following patch, too. [XFRM] Documentaion: Fix error example at XFRMOUTSTATEMODEERROR. Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED] --- Documentation/networking/xfrm_proc.txt |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/Documentation/networking/xfrm_proc.txt b/Documentation/networking/xfrm_proc.txt index ec9045b..53c1a58 100644 --- a/Documentation/networking/xfrm_proc.txt +++ b/Documentation/networking/xfrm_proc.txt @@ -60,7 +60,6 @@ XfrmOutStateProtoError: Transformation protocol specific error XfrmOutStateModeError: Transformation mode specific error - e.g. Outer header space is not enough XfrmOutStateExpired: State is expired XfrmOutPolBlock: -- 1.4.4.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Reduce locking in TX path of forcedth driver
On Fri, 21 Dec 2007 17:41:34 -0800 (PST) [EMAIL PROTECTED] (Tom Herbert) wrote: Reduce the amount of locking in the TX path. Instead of using both netif_tx_lock and dev-priv-lock on transmitting, a single private lock (dev-priv-tx_lock) is used. This method is similar to that of the e1000 driver, including the logic to stop the queue in the start xmit functions, and the logic to wake the queue in the TX done functions. We see some performance improvement with this patch. Signed-off-by: Tom Herbert [EMAIL PROTECTED] + spin_lock_init(np-tx_lock); + + dev-features |= NETIF_F_LLTX; + NAK - lockless transmit is not desirable for real devices. use netif_tx_lock() instead of your private lock -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ETH]: Combine format_addr() with print_mac().
On Fri, 2007-12-21 at 14:36 -0800, Joe Perches wrote: On Fri, 2007-12-21 at 14:05 -0800, Michael Chan wrote: diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 6b2e454..f760d41 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -359,10 +359,33 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, unsigned int queue_count) } EXPORT_SYMBOL(alloc_etherdev_mq); +static ssize_t _format_mac_addr(char *buf, const unsigned char *addr, int len) +{ + int i; + char *cp = buf; + + for (i = 0; i len; i++) { + cp += sprintf(cp, %02x, addr[i]); + if (i == len - 1) + break; + *cp++ = ':'; + } + return cp - buf; +} + +ssize_t format_mac_addr(char *buf, const unsigned char *addr, int len) +{ + ssize_t l; + + l = _format_mac_addr(buf, addr, len); + strcpy(buf + l, \n); + return l + 1; +} +EXPORT_SYMBOL(format_mac_addr); + char *print_mac(char *buf, const u8 *addr) { - sprintf(buf, MAC_FMT, - addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]); + _format_mac_addr(buf, addr, ETH_ALEN); return buf; } EXPORT_SYMBOL(print_mac); I think const unsigned char *addr should be const u8 *addr The dev_addr is declared as unsigned char* in struct net_device. To be consistent, can we change print_mac() and MAC_BUF to use unsigned char*? They are really the same. ssize_t? shouldn't it be size_t? I'm just keeping the prototype unchanged as originally defined in net- sysfs.c Indexing buf by int len is unchecked. That could lead to unintended buffer overruns. Maybe add a buflen argument and use snprintf? Again, I kept the semantics the same as the original, but will be happy to add a buflen for better checking. I had a patch that added some type-safety to print_mac and prevented this unintended buffer overrun. It seems it wasn't applied. --- Subject: Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG Date: Mon, 24 Sep 2007 10:28:36 -0700 Here is a patch that adds some type safety to print_mac by using a struct print_mac_buf * instead of char *. It also reduces the defconfig vmlinux size by 8 bytes. Signed-off-by: Joe Perches [EMAIL PROTECTED] -- include/linux/if_ether.h | 12 ++-- net/ethernet/eth.c |6 +++--- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h index 57abca1..620d6b1 100644 --- a/include/linux/if_ether.h +++ b/include/linux/if_ether.h @@ -126,7 +126,15 @@ extern struct ctl_table ether_table[]; * Display a 6 byte device address (MAC) in a readable format. */ #define MAC_FMT %02x:%02x:%02x:%02x:%02x:%02x -extern char *print_mac(char *buf, const u8 *addr); -#define DECLARE_MAC_BUF(var) char var[18] __maybe_unused + +struct print_mac_buf { + char formatted_mac_addr[18]; +}; + +#define DECLARE_MAC_BUF(var) \ + struct print_mac_buf __maybe_unused _##var; \ + struct print_mac_buf __maybe_unused *var = _##var + +extern char *print_mac(struct print_mac_buf *buf, const u8 *addr); #endif /* _LINUX_IF_ETHER_H */ diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 2aaf6fa..ad82613 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -338,10 +338,10 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, unsigned int queue_count) } EXPORT_SYMBOL(alloc_etherdev_mq); -char *print_mac(char *buf, const u8 *addr) +char *print_mac(struct print_mac_buf *buf, const u8 *addr) { - sprintf(buf, MAC_FMT, + sprintf(buf-formatted_mac_addr, MAC_FMT, addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]); - return buf; + return buf-formatted_mac_addr; } EXPORT_SYMBOL(print_mac); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html