Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock
From: David AhernDate: Fri, 4 Aug 2017 11:11:40 -0600 > On 8/4/17 11:07 AM, Eric Dumazet wrote: >> On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote: >>> From: Shaohua Li >>> >>> In a syn flooding test, the fib6_table rwlock is a significant >>> bottleneck. While converting the rwlock to rcu sounds straighforward, >>> but is very challenging if it's possible. A percpu spinlock (lglock has >>> been removed from kernel, so I added a simple implementation here) is >>> quite trival for this problem since updating the routing table is a rare >>> event. In my test, the server receives around 1.5 Mpps in syn flooding >>> test without the patch in a dual sockets and 56-CPU system. With the >>> patch, the server receives around 3.8Mpps, and perf report doesn't show >>> the locking issue. >>> >>> Of course the percpu lock isn't as good as rcu, so this isn't intended >>> to replace rcu, but this is much better than current readwrite lock. >>> Before we have a rcu implementation, this is a good temporary solution. >>> Plus, this is a trival change, there is nothing to prevent pursuing a >>> rcu implmentation. >>> >>> Cc: Wei Wang >>> Cc: Eric Dumazet >>> Cc: Stephen Hemminger >>> Signed-off-by: Shaohua Li >>> --- >> >> Wei has almost done the RCU conversion. >> >> This patch is probably coming too late. > > > +1 > > I'd rather see the RCU conversion than a move to per-cpu locks. Me too.
Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock
On Fri, Aug 4, 2017 at 10:11 AM, David Ahernwrote: > On 8/4/17 11:07 AM, Eric Dumazet wrote: >> On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote: >>> From: Shaohua Li >>> >>> In a syn flooding test, the fib6_table rwlock is a significant >>> bottleneck. While converting the rwlock to rcu sounds straighforward, >>> but is very challenging if it's possible. A percpu spinlock (lglock has >>> been removed from kernel, so I added a simple implementation here) is >>> quite trival for this problem since updating the routing table is a rare >>> event. In my test, the server receives around 1.5 Mpps in syn flooding >>> test without the patch in a dual sockets and 56-CPU system. With the >>> patch, the server receives around 3.8Mpps, and perf report doesn't show >>> the locking issue. >>> >>> Of course the percpu lock isn't as good as rcu, so this isn't intended >>> to replace rcu, but this is much better than current readwrite lock. >>> Before we have a rcu implementation, this is a good temporary solution. >>> Plus, this is a trival change, there is nothing to prevent pursuing a >>> rcu implmentation. >>> >>> Cc: Wei Wang >>> Cc: Eric Dumazet >>> Cc: Stephen Hemminger >>> Signed-off-by: Shaohua Li >>> --- >> >> Wei has almost done the RCU conversion. >> >> This patch is probably coming too late. > > > +1 > > I'd rather see the RCU conversion than a move to per-cpu locks. I am actively working on the RCU conversion. The main coding part is mostly done and I am working on testing it. Some more time is needed to catch the rest of the missing pieces and get the patches ready. Thanks. Wei
Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock
On 8/4/17 11:07 AM, Eric Dumazet wrote: > On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote: >> From: Shaohua Li>> >> In a syn flooding test, the fib6_table rwlock is a significant >> bottleneck. While converting the rwlock to rcu sounds straighforward, >> but is very challenging if it's possible. A percpu spinlock (lglock has >> been removed from kernel, so I added a simple implementation here) is >> quite trival for this problem since updating the routing table is a rare >> event. In my test, the server receives around 1.5 Mpps in syn flooding >> test without the patch in a dual sockets and 56-CPU system. With the >> patch, the server receives around 3.8Mpps, and perf report doesn't show >> the locking issue. >> >> Of course the percpu lock isn't as good as rcu, so this isn't intended >> to replace rcu, but this is much better than current readwrite lock. >> Before we have a rcu implementation, this is a good temporary solution. >> Plus, this is a trival change, there is nothing to prevent pursuing a >> rcu implmentation. >> >> Cc: Wei Wang >> Cc: Eric Dumazet >> Cc: Stephen Hemminger >> Signed-off-by: Shaohua Li >> --- > > Wei has almost done the RCU conversion. > > This patch is probably coming too late. +1 I'd rather see the RCU conversion than a move to per-cpu locks.
Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock
On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote: > From: Shaohua Li> > In a syn flooding test, the fib6_table rwlock is a significant > bottleneck. While converting the rwlock to rcu sounds straighforward, > but is very challenging if it's possible. A percpu spinlock (lglock has > been removed from kernel, so I added a simple implementation here) is > quite trival for this problem since updating the routing table is a rare > event. In my test, the server receives around 1.5 Mpps in syn flooding > test without the patch in a dual sockets and 56-CPU system. With the > patch, the server receives around 3.8Mpps, and perf report doesn't show > the locking issue. > > Of course the percpu lock isn't as good as rcu, so this isn't intended > to replace rcu, but this is much better than current readwrite lock. > Before we have a rcu implementation, this is a good temporary solution. > Plus, this is a trival change, there is nothing to prevent pursuing a > rcu implmentation. > > Cc: Wei Wang > Cc: Eric Dumazet > Cc: Stephen Hemminger > Signed-off-by: Shaohua Li > --- Wei has almost done the RCU conversion. This patch is probably coming too late.
[PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock
From: Shaohua LiIn a syn flooding test, the fib6_table rwlock is a significant bottleneck. While converting the rwlock to rcu sounds straighforward, but is very challenging if it's possible. A percpu spinlock (lglock has been removed from kernel, so I added a simple implementation here) is quite trival for this problem since updating the routing table is a rare event. In my test, the server receives around 1.5 Mpps in syn flooding test without the patch in a dual sockets and 56-CPU system. With the patch, the server receives around 3.8Mpps, and perf report doesn't show the locking issue. Of course the percpu lock isn't as good as rcu, so this isn't intended to replace rcu, but this is much better than current readwrite lock. Before we have a rcu implementation, this is a good temporary solution. Plus, this is a trival change, there is nothing to prevent pursuing a rcu implmentation. Cc: Wei Wang Cc: Eric Dumazet Cc: Stephen Hemminger Signed-off-by: Shaohua Li --- include/net/ip6_fib.h | 57 +- net/ipv6/addrconf.c | 8 +++--- net/ipv6/ip6_fib.c| 76 --- net/ipv6/route.c | 54 ++-- 4 files changed, 129 insertions(+), 66 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 1d790ea..124eb04 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -247,7 +247,7 @@ struct rt6_statistics { struct fib6_table { struct hlist_node tb6_hlist; u32 tb6_id; - rwlock_ttb6_lock; + spinlock_t __percpu *percpu_tb6_lock; struct fib6_nodetb6_root; struct inet_peer_base tb6_peers; unsigned intflags; @@ -255,6 +255,61 @@ struct fib6_table { #define RT6_TABLE_HAS_DFLT_ROUTER BIT(0) }; +static inline void fib6_table_read_lock_bh(struct fib6_table *table) +{ + preempt_disable(); + spin_lock_bh(this_cpu_ptr(table->percpu_tb6_lock)); +} + +static inline void fib6_table_read_unlock_bh(struct fib6_table *table) +{ + spin_unlock_bh(this_cpu_ptr(table->percpu_tb6_lock)); + preempt_enable(); +} + +static inline void fib6_table_read_lock(struct fib6_table *table) +{ + preempt_disable(); + spin_lock(this_cpu_ptr(table->percpu_tb6_lock)); +} + +static inline void fib6_table_read_unlock(struct fib6_table *table) +{ + spin_unlock(this_cpu_ptr(table->percpu_tb6_lock)); + preempt_enable(); +} + +static inline void fib6_table_write_lock_bh(struct fib6_table *table) +{ + int first = nr_cpu_ids; + int i; + + for_each_possible_cpu(i) { + if (first == nr_cpu_ids) { + first = i; + spin_lock_bh(per_cpu_ptr(table->percpu_tb6_lock, i)); + } else + spin_lock_nest_lock( + per_cpu_ptr(table->percpu_tb6_lock, i), + per_cpu_ptr(table->percpu_tb6_lock, first)); + } +} + +static inline void fib6_table_write_unlock_bh(struct fib6_table *table) +{ + int first = nr_cpu_ids; + int i; + + for_each_possible_cpu(i) { + if (first == nr_cpu_ids) { + first = i; + continue; + } + spin_unlock(per_cpu_ptr(table->percpu_tb6_lock, i)); + } + spin_unlock_bh(per_cpu_ptr(table->percpu_tb6_lock, first)); +} + #define RT6_TABLE_UNSPEC RT_TABLE_UNSPEC #define RT6_TABLE_MAIN RT_TABLE_MAIN #define RT6_TABLE_DFLT RT6_TABLE_MAIN diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 30ee23e..22e2ad2 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -2313,7 +2313,7 @@ static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx, if (!table) return NULL; - read_lock_bh(>tb6_lock); + fib6_table_read_lock_bh(table); fn = fib6_locate(>tb6_root, pfx, plen, NULL, 0); if (!fn) goto out; @@ -2330,7 +2330,7 @@ static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx, break; } out: - read_unlock_bh(>tb6_lock); + fib6_table_read_unlock_bh(table); return rt; } @@ -5929,7 +5929,7 @@ void addrconf_disable_policy_idev(struct inet6_dev *idev, int val) struct fib6_table *table = rt->rt6i_table; int cpu; - read_lock(>tb6_lock); + fib6_table_read_lock(table); addrconf_set_nopolicy(ifa->rt, val); if (rt->rt6i_pcpu) { for_each_possible_cpu(cpu) { @@ -5939,7 +5939,7 @@ void addrconf_disable_policy_idev(struct