Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
From: Herbert Xu [EMAIL PROTECTED] Date: Sat, 15 Dec 2007 21:58:52 +0800 [SNMP]: Fix SNMP counters with PREEMPT The SNMP macros use raw_smp_processor_id() in process context which is illegal because the process may be preempted and then migrated to another CPU. This patch makes it use get_cpu/put_cpu to disable preemption. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied to net-2.6.25, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
| | I didn't hear any objections so here is the patch again. | | [SNMP]: Fix SNMP counters with PREEMPT | I don't feel competent to say whether this is correct, but it seems that this is going a long way towards resolving older problems with the SNMP counters. What I can say is that a year ago, when doing the UDP/-Lite work, I found that the counters were not accurate, which was due to the double-counting problem that Wang Chen brought up again. Resolving this issue was stalled at that time due to the fact that a counter may be incremented on one CPU and then later decremented on another. It looks as if this work is reaching the cause of the problem, which would be good since the problem propagates to all users of such counters (UDP, UDP-Lite, and similar MIB counters). So thank you for taking this issue up, I hope that this will lead to a patch set resolving the counter issues. Best regards, Gerrit -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Tue, Dec 18, 2007 at 11:43:44AM +, Gerrit Renker wrote: It looks as if this work is reaching the cause of the problem, which would be good since the problem propagates to all users of such counters (UDP, UDP-Lite, and similar MIB counters). So thank you for taking this issue up, I hope that this will lead to a patch set resolving the counter issues. Heh the counter issue is what led to this patch. I think the UDP counters should be fixed already. Please have a look at net-2.6.25 and let us know if it's still broken. As to the SNMP counter issue in general, Christoph Lameter's new percpu code will greatly simplify this and we won't need to disable preemption or do the fancy softirq/user splitting that we have right now. I eagerly await its acceptance into the Linux kernel. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
The cpu alloc patches also fix this issue one way (disabling preempt) or the other (atomic instruction that does not need disabling of preeemption). -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Sun, 16 Dec 2007, Herbert Xu wrote: If we can get the address of the per-cpu counter against some sort of a per-cpu base pointer, e.g., %gs on x86, then we can do incq%gs:(%rax) where %rax would be the offset with %gs as the base. This would obviate the need for the CPU ID and therefore avoid disabling preemption. Hmm, wasn't Christoph working on something like that? Yes that is what the cpu alloc patchset implements. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
* Herbert Xu [EMAIL PROTECTED] wrote: On Sat, Dec 15, 2007 at 07:43:28PM +0100, Ingo Molnar wrote: we could perhaps introduce stat_smp_processor_id(), which signals that the CPU id is used for statistical purposes and does not have to be exact? In any case, your patch looks good too. Unfortunately that doesn't work because we can then have two CPUs trying to update the same counter which may corrupt it. ah, indeed. I missed that correctness aspect of your patch. Good catch! Ingo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
Ob Tue, Dec 04, 2007 at 12:17:23AM +1100, Herbert Xu wrote: Never mind, we already have that in local_t and as Alexey correctly points out, USER is still going to be the expensive variant with the preempt_disable (well until BH gets threaded). So how about this patch? I didn't hear any objections so here is the patch again. [SNMP]: Fix SNMP counters with PREEMPT The SNMP macros use raw_smp_processor_id() in process context which is illegal because the process may be preempted and then migrated to another CPU. This patch makes it use get_cpu/put_cpu to disable preemption. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/include/net/snmp.h b/include/net/snmp.h index ea206bf..9444b54 100644 --- a/include/net/snmp.h +++ b/include/net/snmp.h @@ -23,6 +23,7 @@ #include linux/cache.h #include linux/snmp.h +#include linux/smp.h /* * Mibs are stored in array of unsigned long. @@ -137,7 +138,10 @@ struct linux_mib { #define SNMP_INC_STATS_OFFSET_BH(mib, field, offset) \ (per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field + (offset)]++) #define SNMP_INC_STATS_USER(mib, field) \ - (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field]++) + do { \ + per_cpu_ptr(mib[1], get_cpu())-mibs[field]++; \ + put_cpu(); \ + } while (0) #define SNMP_INC_STATS(mib, field) \ (per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())-mibs[field]++) #define SNMP_DEC_STATS(mib, field) \ @@ -145,6 +149,9 @@ struct linux_mib { #define SNMP_ADD_STATS_BH(mib, field, addend) \ (per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field] += addend) #define SNMP_ADD_STATS_USER(mib, field, addend)\ - (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field] += addend) + do { \ + per_cpu_ptr(mib[1], get_cpu())-mibs[field] += addend; \ + put_cpu(); \ + } while (0) #endif -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
Herbert Xu a écrit : Ob Tue, Dec 04, 2007 at 12:17:23AM +1100, Herbert Xu wrote: Never mind, we already have that in local_t and as Alexey correctly points out, USER is still going to be the expensive variant with the preempt_disable (well until BH gets threaded). So how about this patch? I didn't hear any objections so here is the patch again. [SNMP]: Fix SNMP counters with PREEMPT The SNMP macros use raw_smp_processor_id() in process context which is illegal because the process may be preempted and then migrated to another CPU. This patch makes it use get_cpu/put_cpu to disable preemption. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, #define SNMP_INC_STATS_USER(mib, field) \ - (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field]++) + do { \ + per_cpu_ptr(mib[1], get_cpu())-mibs[field]++; \ + put_cpu(); \ + } while (0) #define SNMP_INC_STATS(mib, field) \ (per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())-mibs[field]++) How come you change SNMP_INC_STATS_USER() but not SNMP_INC_STATS() ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
* Herbert Xu [EMAIL PROTECTED] wrote: Ob Tue, Dec 04, 2007 at 12:17:23AM +1100, Herbert Xu wrote: Never mind, we already have that in local_t and as Alexey correctly points out, USER is still going to be the expensive variant with the preempt_disable (well until BH gets threaded). So how about this patch? I didn't hear any objections so here is the patch again. [SNMP]: Fix SNMP counters with PREEMPT The SNMP macros use raw_smp_processor_id() in process context which is illegal because the process may be preempted and then migrated to another CPU. nit: please use 'invalid' instead of 'illegal'. This patch makes it use get_cpu/put_cpu to disable preemption. Signed-off-by: Herbert Xu [EMAIL PROTECTED] - (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field]++) + do { \ + per_cpu_ptr(mib[1], get_cpu())-mibs[field]++; \ + put_cpu(); \ + } while (0) - (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field] += addend) + do { \ + per_cpu_ptr(mib[1], get_cpu())-mibs[field] += addend; \ + put_cpu(); \ + } while (0) we could perhaps introduce stat_smp_processor_id(), which signals that the CPU id is used for statistical purposes and does not have to be exact? In any case, your patch looks good too. Ingo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Sat, Dec 15, 2007 at 07:43:28PM +0100, Ingo Molnar wrote: we could perhaps introduce stat_smp_processor_id(), which signals that the CPU id is used for statistical purposes and does not have to be exact? In any case, your patch looks good too. Unfortunately that doesn't work because we can then have two CPUs trying to update the same counter which may corrupt it. However, an optimisation is indeed possible with some work. If we can get the address of the per-cpu counter against some sort of a per-cpu base pointer, e.g., %gs on x86, then we can do incq%gs:(%rax) where %rax would be the offset with %gs as the base. This would obviate the need for the CPU ID and therefore avoid disabling preemption. Hmm, wasn't Christoph working on something like that? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Mon, Dec 03, 2007 at 10:39:35PM +1100, Herbert Xu wrote: So we need to fix this, and whatever the fix is will probably render the BH/USER distinction obsolete. Hmm, I would think opposite. USER (or generic) is expensive variant, BH is lite. No? Alexey -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Mon, Dec 03, 2007 at 02:49:12PM +0300, Alexey Kuznetsov wrote: On Mon, Dec 03, 2007 at 10:39:35PM +1100, Herbert Xu wrote: So we need to fix this, and whatever the fix is will probably render the BH/USER distinction obsolete. Hmm, I would think opposite. USER (or generic) is expensive variant, BH is lite. No? Yes that would certainly be the obvious fix. In other words, just use smp_processor_id instead of the raw version. I suppose the only issue would be that disabling/enabling preemption isn't exactly cost-free and we're going to be doing that for every single increment. Hmm, wasn't someone else talking about a non-atomic version of atomic ops lately (i.e., atomic with respect to the local CPU only)? Perhaps this is the killer app for it :) Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Mon, Dec 03, 2007 at 03:19:35PM +0800, Wang Chen wrote: System calls should be USER. So change the BH to USER for UDP*_INC_STATS_BH(). Signed-off-by: Wang Chen [EMAIL PROTECTED] I've rearragned it so that we the new INC_STATS call is USER in the first patch. Otherwise it's all in net-2.6.25 now. BTW, thanks to Gerrit's note I took a look at the underlying code for the _BH/_USER stuff. It's in fact totally broken when PREEMPT is on. It relies on the fact that process-context kernel code does not get preempted by other process-context kernel code or for that matter migrate to other CPUs, neither of which is true with PREEMPT on. So we need to fix this, and whatever the fix is will probably render the BH/USER distinction obsolete. Any takers? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Mon, Dec 03, 2007 at 10:54:35PM +1100, Herbert Xu wrote: Hmm, wasn't someone else talking about a non-atomic version of atomic ops lately (i.e., atomic with respect to the local CPU only)? Perhaps this is the killer app for it :) Never mind, we already have that in local_t and as Alexey correctly points out, USER is still going to be the expensive variant with the preempt_disable (well until BH gets threaded). So how about this patch? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/include/net/snmp.h b/include/net/snmp.h index ea206bf..9444b54 100644 --- a/include/net/snmp.h +++ b/include/net/snmp.h @@ -23,6 +23,7 @@ #include linux/cache.h #include linux/snmp.h +#include linux/smp.h /* * Mibs are stored in array of unsigned long. @@ -137,7 +138,10 @@ struct linux_mib { #define SNMP_INC_STATS_OFFSET_BH(mib, field, offset) \ (per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field + (offset)]++) #define SNMP_INC_STATS_USER(mib, field) \ - (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field]++) + do { \ + per_cpu_ptr(mib[1], get_cpu())-mibs[field]++; \ + put_cpu(); \ + } while (0) #define SNMP_INC_STATS(mib, field) \ (per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())-mibs[field]++) #define SNMP_DEC_STATS(mib, field) \ @@ -145,6 +149,9 @@ struct linux_mib { #define SNMP_ADD_STATS_BH(mib, field, addend) \ (per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field] += addend) #define SNMP_ADD_STATS_USER(mib, field, addend)\ - (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field] += addend) + do { \ + per_cpu_ptr(mib[1], get_cpu())-mibs[field] += addend; \ + put_cpu(); \ + } while (0) #endif -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
Herbert Xu said the following on 2007-12-3 21:17: On Mon, Dec 03, 2007 at 10:54:35PM +1100, Herbert Xu wrote: diff --git a/include/net/snmp.h b/include/net/snmp.h index ea206bf..9444b54 100644 --- a/include/net/snmp.h +++ b/include/net/snmp.h @@ -23,6 +23,7 @@ #include linux/cache.h #include linux/snmp.h +#include linux/smp.h It's no need to include smp.h? -- WCN -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Tue, Dec 04, 2007 at 11:50:55AM +0800, Wang Chen wrote: #include linux/cache.h #include linux/snmp.h +#include linux/smp.h It's no need to include smp.h? We need it for smp_processor_id. Well we needed it before too but there's probably some implicit inclusion which has made it work. It's better to declare these inclusions explicitly as otherwise they may break on less common architectures later. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
Herbert Xu said the following on 2007-12-1 9:54: On Fri, Nov 30, 2007 at 11:19:49AM +, Gerrit Renker wrote: | csum_copy_err: | - UDP6_INC_STATS_USER(UDP_MIB_INERRORS, is_udplite); | + UDP6_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite); |skb_kill_datagram(sk, skb, flags); | |if (flags MSG_DONTWAIT) | Is it not the other way round ?? :- I agree. Wang Chen, please change this and other appropriate BH calls to USER. Basically recvmsg should always be USER while rcv is BH. Resubmit the patch. System calls should be USER. So change the BH to USER for UDP*_INC_STATS_BH(). Signed-off-by: Wang Chen [EMAIL PROTECTED] --- ipv4/udp.c |4 ++-- ipv6/udp.c |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) --- linux-2.6.24.rc3.org/net/ipv4/udp.c 2007-12-03 15:10:49.0 +0800 +++ linux-2.6.24.rc3/net/ipv4/udp.c 2007-12-03 14:52:54.0 +0800 @@ -874,7 +874,7 @@ try_again: if (err) goto out_free; - UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite); + UDP_INC_STATS_USER(UDP_MIB_INDATAGRAMS, is_udplite); sock_recv_timestamp(msg, sk, skb); @@ -899,7 +899,7 @@ out: return err; csum_copy_err: - UDP_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite); + UDP_INC_STATS_USER(UDP_MIB_INERRORS, is_udplite); skb_kill_datagram(sk, skb, flags); --- linux-2.6.24.rc3.org/net/ipv6/udp.c 2007-12-03 15:10:49.0 +0800 +++ linux-2.6.24.rc3/net/ipv6/udp.c 2007-12-03 15:09:55.0 +0800 @@ -164,7 +164,7 @@ try_again: if (err) goto out_free; - UDP6_INC_STATS_BH(UDP_MIB_INDATAGRAMS, is_udplite); + UDP6_INC_STATS_USER(UDP_MIB_INDATAGRAMS, is_udplite); sock_recv_timestamp(msg, sk, skb); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
Quoting Wang Chen: | (This patch base on PATCH 2/3.) | | UDP_MIB_INERRORS increment is in different way for IPv4 | and IPv6. For the consistence, change UDP6_INC_STATS_USER | to UDP6_INC_STATS_BH. | | Signed-off-by: Wang Chen [EMAIL PROTECTED] | --- | udp.c |2 +- | 1 files changed, 1 insertion(+), 1 deletion(-) | | --- linux-2.6.24.rc3.org/net/ipv6/udp.c 2007-11-30 10:38:36.0 +0800 | +++ linux-2.6.24.rc3/net/ipv6/udp.c 2007-11-30 10:39:01.0 +0800 | @@ -207,7 +207,7 @@ out: | return err; | | csum_copy_err: | - UDP6_INC_STATS_USER(UDP_MIB_INERRORS, is_udplite); | + UDP6_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite); | skb_kill_datagram(sk, skb, flags); | | if (flags MSG_DONTWAIT) | Is it not the other way round ?? :- struct proto udp{,v6}_prot = { // ... .recvmsg = udp{,v6}_recvmsg, }; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Fri, Nov 30, 2007 at 11:19:49AM +, Gerrit Renker wrote: | csum_copy_err: | - UDP6_INC_STATS_USER(UDP_MIB_INERRORS, is_udplite); | + UDP6_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite); | skb_kill_datagram(sk, skb, flags); | | if (flags MSG_DONTWAIT) | Is it not the other way round ?? :- I agree. Wang Chen, please change this and other appropriate BH calls to USER. Basically recvmsg should always be USER while rcv is BH. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] [UDP6]: Counter increment on BH mode
(This patch base on PATCH 2/3.) UDP_MIB_INERRORS increment is in different way for IPv4 and IPv6. For the consistence, change UDP6_INC_STATS_USER to UDP6_INC_STATS_BH. Signed-off-by: Wang Chen [EMAIL PROTECTED] --- udp.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.24.rc3.org/net/ipv6/udp.c 2007-11-30 10:38:36.0 +0800 +++ linux-2.6.24.rc3/net/ipv6/udp.c 2007-11-30 10:39:01.0 +0800 @@ -207,7 +207,7 @@ out: return err; csum_copy_err: - UDP6_INC_STATS_USER(UDP_MIB_INERRORS, is_udplite); + UDP6_INC_STATS_BH(UDP_MIB_INERRORS, is_udplite); skb_kill_datagram(sk, skb, flags); if (flags MSG_DONTWAIT) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html