Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
On Mon, 2017-08-21 at 17:10 -0700, Florian Fainelli wrote: > Well if we return 1 from atomic_inc_return() and the previous value was > zero, of course we are going to be bugging here. The idea behind the > patch I suppose is to make sure that we always get an odd number upon > u64_stats_update_begin()/entry, and an even number upon > u64_stats_update_end()/exit, right? Yes, this is right.
Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
On 08/21/2017 04:23 PM, Florian Fainelli wrote: > On 08/04/2017 10:11 AM, Eric Dumazet wrote: >> On Fri, 2017-08-04 at 08:51 -0700, Florian Fainelli wrote: >>> On 08/03/2017 10:36 PM, Eric Dumazet wrote: On Thu, 2017-08-03 at 21:33 -0700, Florian Fainelli wrote: > During testing with a background iperf pushing 1Gbit/sec worth of > traffic and having both ifconfig and ethtool collect statistics, we > could see quite frequent deadlocks. Convert the often accessed DSA slave > network devices statistics to per-cpu 64-bit statistics to remove these > deadlocks and provide fast efficient statistics updates. > This seems to be a bug fix, it would be nice to get a proper tag like : Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") >>> >>> Right, should have been added, thanks! >>> Problem here is that if multiple cpus can call dsa_switch_rcv() at the same time, then u64_stats_update_begin() contract is not respected. >>> >>> This is really where I struggled understanding what is wrong in the >>> non-per CPU version, my understanding is that we have: >>> >>> - writers for xmit executes in process context >>> - writers for receive executes from NAPI (from the DSA's master network >>> device through it's own NAPI doing netif_receive_skb -> netdev_uses_dsa >>> -> netif_receive_skb) >>> >>> readers should all execute in process context. The test scenario that >>> led to a deadlock involved running iperf in the background, having a >>> while loop with both ifconfig and ethtool reading stats, and somehow >>> when iperf exited, either reader would just be locked. So I guess this >>> leaves us with the two writers not being mutually excluded then, right? >> >> You could add a debug version of u64_stats_update_begin() >> >> doing >> >> int ret = atomic_inc((atomic_t *)syncp); >> >> BUG_ON(ret & 1);> >> >> And u64_stats_update_end() >> >> int ret = atomic_inc((atomic_t *)syncp); > > so with your revised suggested patch: > > static inline void u64_stats_update_begin(struct u64_stats_sync *syncp) > { > #if BITS_PER_LONG==32 && defined(CONFIG_SMP) > int ret = atomic_inc_return((atomic_t *)syncp); > BUG_ON(ret & 1); > #endif > #if 0 > #if BITS_PER_LONG==32 && defined(CONFIG_SMP) > write_seqcount_begin(&syncp->seq); > #endif > #endif > } > > static inline void u64_stats_update_end(struct u64_stats_sync *syncp) > { > #if BITS_PER_LONG==32 && defined(CONFIG_SMP) > int ret = atomic_inc_return((atomic_t *)syncp); > BUG_ON(!(ret & 1)); > #endif > #if 0 > #if BITS_PER_LONG==32 && defined(CONFIG_SMP) > write_seqcount_end(&syncp->seq); > #endif > #endif > } > > and this makes us choke pretty early in IRQ accounting, did I get your > suggestion right? Well if we return 1 from atomic_inc_return() and the previous value was zero, of course we are going to be bugging here. The idea behind the patch I suppose is to make sure that we always get an odd number upon u64_stats_update_begin()/entry, and an even number upon u64_stats_update_end()/exit, right? > > [0.015149] [ cut here ] > [0.020051] kernel BUG at ./include/linux/u64_stats_sync.h:82! > [0.026221] Internal error: Oops - BUG: 0 [#1] SMP ARM > [0.031661] Modules linked in: > [0.034970] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 4.13.0-rc5-01297-g7d3f0cd43fee-dirty #33 > [0.043990] Hardware name: Broadcom STB (Flattened Device Tree) > [0.050237] task: c180a500 task.stack: c180 > [0.055065] PC is at irqtime_account_delta+0xa4/0xa8 > [0.060322] LR is at 0x1 > [0.063057] pc : []lr : [<0001>]psr: 01d3 > [0.069652] sp : c1801eec ip : ee78b458 fp : c0e5ea48 > [0.075212] r10: c18b4b40 r9 : f0803000 r8 : ee00a800 > [0.080781] r7 : 0001 r6 : c180a500 r5 : c180 r4 : > [0.087680] r3 : r2 : ec8c r1 : ee78b3c0 r0 : ee78b440 > [0.094546] Flags: nzcv IRQs off FIQs off Mode SVC_32 ISA ARM > Segment user > [0.102314] Control: 30c5387d Table: 3000 DAC: fffd > [0.108414] Process swapper/0 (pid: 0, stack limit = 0xc1800210) > [0.114791] Stack: (0xc1801eec to 0xc1802000) > [0.119431] 1ee0:ee78b440 c180 > c180a500 0001 c02505c8 > [0.128079] 1f00: 0004 ee00a800 e000 > c0227890 c17e6f20 c0278910 > [0.136665] 1f20: c185724c c18079a0 f080200c c1801f58 f0802000 > c0201494 c0e00c18 2053 > [0.145303] 1f40: c1801f8c c180 c18b4b40 > c020d238 001f > [0.153915] 1f60: 00040d00 efffc940 c18b4b40 > c1807440 > [0.162571] 1f80: c18b4b40 c0e5ea48 0004 c1801fa8 c0322fb0 > c0e00c18 2053 > [0.171226] 1fa0: c18b4b40 > c0e006c0 > [0.179890] 1fc0: c1807448 c0e5ea48 >
Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
On 08/04/2017 10:11 AM, Eric Dumazet wrote: > On Fri, 2017-08-04 at 08:51 -0700, Florian Fainelli wrote: >> On 08/03/2017 10:36 PM, Eric Dumazet wrote: >>> On Thu, 2017-08-03 at 21:33 -0700, Florian Fainelli wrote: During testing with a background iperf pushing 1Gbit/sec worth of traffic and having both ifconfig and ethtool collect statistics, we could see quite frequent deadlocks. Convert the often accessed DSA slave network devices statistics to per-cpu 64-bit statistics to remove these deadlocks and provide fast efficient statistics updates. >>> >>> This seems to be a bug fix, it would be nice to get a proper tag like : >>> >>> Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") >> >> Right, should have been added, thanks! >> >>> >>> Problem here is that if multiple cpus can call dsa_switch_rcv() at the >>> same time, then u64_stats_update_begin() contract is not respected. >> >> This is really where I struggled understanding what is wrong in the >> non-per CPU version, my understanding is that we have: >> >> - writers for xmit executes in process context >> - writers for receive executes from NAPI (from the DSA's master network >> device through it's own NAPI doing netif_receive_skb -> netdev_uses_dsa >> -> netif_receive_skb) >> >> readers should all execute in process context. The test scenario that >> led to a deadlock involved running iperf in the background, having a >> while loop with both ifconfig and ethtool reading stats, and somehow >> when iperf exited, either reader would just be locked. So I guess this >> leaves us with the two writers not being mutually excluded then, right? > > You could add a debug version of u64_stats_update_begin() > > doing > > int ret = atomic_inc((atomic_t *)syncp); > > BUG_ON(ret & 1);> > > And u64_stats_update_end() > > int ret = atomic_inc((atomic_t *)syncp); so with your revised suggested patch: static inline void u64_stats_update_begin(struct u64_stats_sync *syncp) { #if BITS_PER_LONG==32 && defined(CONFIG_SMP) int ret = atomic_inc_return((atomic_t *)syncp); BUG_ON(ret & 1); #endif #if 0 #if BITS_PER_LONG==32 && defined(CONFIG_SMP) write_seqcount_begin(&syncp->seq); #endif #endif } static inline void u64_stats_update_end(struct u64_stats_sync *syncp) { #if BITS_PER_LONG==32 && defined(CONFIG_SMP) int ret = atomic_inc_return((atomic_t *)syncp); BUG_ON(!(ret & 1)); #endif #if 0 #if BITS_PER_LONG==32 && defined(CONFIG_SMP) write_seqcount_end(&syncp->seq); #endif #endif } and this makes us choke pretty early in IRQ accounting, did I get your suggestion right? [0.015149] [ cut here ] [0.020051] kernel BUG at ./include/linux/u64_stats_sync.h:82! [0.026221] Internal error: Oops - BUG: 0 [#1] SMP ARM [0.031661] Modules linked in: [0.034970] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-rc5-01297-g7d3f0cd43fee-dirty #33 [0.043990] Hardware name: Broadcom STB (Flattened Device Tree) [0.050237] task: c180a500 task.stack: c180 [0.055065] PC is at irqtime_account_delta+0xa4/0xa8 [0.060322] LR is at 0x1 [0.063057] pc : []lr : [<0001>]psr: 01d3 [0.069652] sp : c1801eec ip : ee78b458 fp : c0e5ea48 [0.075212] r10: c18b4b40 r9 : f0803000 r8 : ee00a800 [0.080781] r7 : 0001 r6 : c180a500 r5 : c180 r4 : [0.087680] r3 : r2 : ec8c r1 : ee78b3c0 r0 : ee78b440 [0.094546] Flags: nzcv IRQs off FIQs off Mode SVC_32 ISA ARM Segment user [0.102314] Control: 30c5387d Table: 3000 DAC: fffd [0.108414] Process swapper/0 (pid: 0, stack limit = 0xc1800210) [0.114791] Stack: (0xc1801eec to 0xc1802000) [0.119431] 1ee0:ee78b440 c180 c180a500 0001 c02505c8 [0.128079] 1f00: 0004 ee00a800 e000 c0227890 c17e6f20 c0278910 [0.136665] 1f20: c185724c c18079a0 f080200c c1801f58 f0802000 c0201494 c0e00c18 2053 [0.145303] 1f40: c1801f8c c180 c18b4b40 c020d238 001f [0.153915] 1f60: 00040d00 efffc940 c18b4b40 c1807440 [0.162571] 1f80: c18b4b40 c0e5ea48 0004 c1801fa8 c0322fb0 c0e00c18 2053 [0.171226] 1fa0: c18b4b40 c0e006c0 [0.179890] 1fc0: c1807448 c0e5ea48 c18b4dd4 c180745c c0e5ea44 [0.188546] 1fe0: c180c0d0 7000 420f00f3 8090 [0.197165] [] (irqtime_account_delta) from [] (irqtime_account_irq+0xc0/0xc4) [0.206664] [] (irqtime_account_irq) from [] (irq_exit+0x28/0x154) [0.215012] [] (irq_exit) from [] (__handle_domain_irq+0x60/0xb4) [0.223245] [] (__handle_domain_irq) from [] (gic_handle_irq+0x48/0x8c) [0.232035] [] (gic_handle_irq) from [] (__irq_svc+0x58/0x74) [0.239941] Exception stack(0xc1801f58 to 0xc
Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
From: Florian Fainelli Date: Thu, 3 Aug 2017 21:33:27 -0700 > During testing with a background iperf pushing 1Gbit/sec worth of > traffic and having both ifconfig and ethtool collect statistics, we > could see quite frequent deadlocks. Convert the often accessed DSA slave > network devices statistics to per-cpu 64-bit statistics to remove these > deadlocks and provide fast efficient statistics updates. > > Signed-off-by: Florian Fainelli Applied with appropriate Fixes: tag added. Thanks.
Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
On Fri, 2017-08-04 at 10:11 -0700, Eric Dumazet wrote: > You could add a debug version of u64_stats_update_begin() > > doing > > int ret = atomic_inc((atomic_t *)syncp); I meant atomic_inc_return() of course. > > BUG_ON(ret & 1); > > > And u64_stats_update_end() > > int ret = atomic_inc((atomic_t *)syncp); > > BUG_ON(!(ret & 1)); > > > We probably could have a CONFIG_DEBUG_U64_STATS that could be used on > 64bit kernels as well... > > >
Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
On Fri, 2017-08-04 at 08:51 -0700, Florian Fainelli wrote: > On 08/03/2017 10:36 PM, Eric Dumazet wrote: > > On Thu, 2017-08-03 at 21:33 -0700, Florian Fainelli wrote: > >> During testing with a background iperf pushing 1Gbit/sec worth of > >> traffic and having both ifconfig and ethtool collect statistics, we > >> could see quite frequent deadlocks. Convert the often accessed DSA slave > >> network devices statistics to per-cpu 64-bit statistics to remove these > >> deadlocks and provide fast efficient statistics updates. > >> > > > > This seems to be a bug fix, it would be nice to get a proper tag like : > > > > Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") > > Right, should have been added, thanks! > > > > > Problem here is that if multiple cpus can call dsa_switch_rcv() at the > > same time, then u64_stats_update_begin() contract is not respected. > > This is really where I struggled understanding what is wrong in the > non-per CPU version, my understanding is that we have: > > - writers for xmit executes in process context > - writers for receive executes from NAPI (from the DSA's master network > device through it's own NAPI doing netif_receive_skb -> netdev_uses_dsa > -> netif_receive_skb) > > readers should all execute in process context. The test scenario that > led to a deadlock involved running iperf in the background, having a > while loop with both ifconfig and ethtool reading stats, and somehow > when iperf exited, either reader would just be locked. So I guess this > leaves us with the two writers not being mutually excluded then, right? You could add a debug version of u64_stats_update_begin() doing int ret = atomic_inc((atomic_t *)syncp); BUG_ON(ret & 1); And u64_stats_update_end() int ret = atomic_inc((atomic_t *)syncp); BUG_ON(!(ret & 1)); We probably could have a CONFIG_DEBUG_U64_STATS that could be used on 64bit kernels as well...
Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
On 08/03/2017 10:36 PM, Eric Dumazet wrote: > On Thu, 2017-08-03 at 21:33 -0700, Florian Fainelli wrote: >> During testing with a background iperf pushing 1Gbit/sec worth of >> traffic and having both ifconfig and ethtool collect statistics, we >> could see quite frequent deadlocks. Convert the often accessed DSA slave >> network devices statistics to per-cpu 64-bit statistics to remove these >> deadlocks and provide fast efficient statistics updates. >> > > This seems to be a bug fix, it would be nice to get a proper tag like : > > Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") Right, should have been added, thanks! > > Problem here is that if multiple cpus can call dsa_switch_rcv() at the > same time, then u64_stats_update_begin() contract is not respected. This is really where I struggled understanding what is wrong in the non-per CPU version, my understanding is that we have: - writers for xmit executes in process context - writers for receive executes from NAPI (from the DSA's master network device through it's own NAPI doing netif_receive_skb -> netdev_uses_dsa -> netif_receive_skb) readers should all execute in process context. The test scenario that led to a deadlock involved running iperf in the background, having a while loop with both ifconfig and ethtool reading stats, and somehow when iperf exited, either reader would just be locked. So I guess this leaves us with the two writers not being mutually excluded then, right? > > include/linux/u64_stats_sync.h states : > > * Usage : > * > * Stats producer (writer) should use following template granted it already > got > * an exclusive access to counters (a lock is already taken, or per cpu > * data is used [in a non preemptable context]) > * > * spin_lock_bh(...) or other synchronization to get exclusive access > * ... > * u64_stats_update_begin(&stats->syncp); > > > -- Florian
Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
On Thu, 2017-08-03 at 21:33 -0700, Florian Fainelli wrote: > During testing with a background iperf pushing 1Gbit/sec worth of > traffic and having both ifconfig and ethtool collect statistics, we > could see quite frequent deadlocks. Convert the often accessed DSA slave > network devices statistics to per-cpu 64-bit statistics to remove these > deadlocks and provide fast efficient statistics updates. > This seems to be a bug fix, it would be nice to get a proper tag like : Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") Problem here is that if multiple cpus can call dsa_switch_rcv() at the same time, then u64_stats_update_begin() contract is not respected. include/linux/u64_stats_sync.h states : * Usage : * * Stats producer (writer) should use following template granted it already got * an exclusive access to counters (a lock is already taken, or per cpu * data is used [in a non preemptable context]) * * spin_lock_bh(...) or other synchronization to get exclusive access * ... * u64_stats_update_begin(&stats->syncp);