Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread David Miller
From: David Ahern 
Date: Fri, 4 Aug 2017 11:11:40 -0600

> On 8/4/17 11:07 AM, Eric Dumazet wrote:
>> On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote:
>>> From: Shaohua Li 
>>>
>>> In a syn flooding test, the fib6_table rwlock is a significant
>>> bottleneck. While converting the rwlock to rcu sounds straighforward,
>>> but is very challenging if it's possible. A percpu spinlock (lglock has
>>> been removed from kernel, so I added a simple implementation here) is
>>> quite trival for this problem since updating the routing table is a rare
>>> event. In my test, the server receives around 1.5 Mpps in syn flooding
>>> test without the patch in a dual sockets and 56-CPU system. With the
>>> patch, the server receives around 3.8Mpps, and perf report doesn't show
>>> the locking issue.
>>>
>>> Of course the percpu lock isn't as good as rcu, so this isn't intended
>>> to replace rcu, but this is much better than current readwrite lock.
>>> Before we have a rcu implementation, this is a good temporary solution.
>>> Plus, this is a trival change, there is nothing to prevent pursuing a
>>> rcu implmentation.
>>>
>>> Cc: Wei Wang 
>>> Cc: Eric Dumazet 
>>> Cc: Stephen Hemminger 
>>> Signed-off-by: Shaohua Li 
>>> ---
>> 
>> Wei has almost done the RCU conversion.
>> 
>> This patch is probably coming too late.
> 
> 
> +1
> 
> I'd rather see the RCU conversion than a move to per-cpu locks.

Me too.


Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread Wei Wang
On Fri, Aug 4, 2017 at 10:11 AM, David Ahern  wrote:
> On 8/4/17 11:07 AM, Eric Dumazet wrote:
>> On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote:
>>> From: Shaohua Li 
>>>
>>> In a syn flooding test, the fib6_table rwlock is a significant
>>> bottleneck. While converting the rwlock to rcu sounds straighforward,
>>> but is very challenging if it's possible. A percpu spinlock (lglock has
>>> been removed from kernel, so I added a simple implementation here) is
>>> quite trival for this problem since updating the routing table is a rare
>>> event. In my test, the server receives around 1.5 Mpps in syn flooding
>>> test without the patch in a dual sockets and 56-CPU system. With the
>>> patch, the server receives around 3.8Mpps, and perf report doesn't show
>>> the locking issue.
>>>
>>> Of course the percpu lock isn't as good as rcu, so this isn't intended
>>> to replace rcu, but this is much better than current readwrite lock.
>>> Before we have a rcu implementation, this is a good temporary solution.
>>> Plus, this is a trival change, there is nothing to prevent pursuing a
>>> rcu implmentation.
>>>
>>> Cc: Wei Wang 
>>> Cc: Eric Dumazet 
>>> Cc: Stephen Hemminger 
>>> Signed-off-by: Shaohua Li 
>>> ---
>>
>> Wei has almost done the RCU conversion.
>>
>> This patch is probably coming too late.
>
>
> +1
>
> I'd rather see the RCU conversion than a move to per-cpu locks.

I am actively working on the RCU conversion.
The main coding part is mostly done and I am working on testing it.
Some more time is needed to catch the rest of the missing pieces and
get the patches ready.

Thanks.
Wei


Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread David Ahern
On 8/4/17 11:07 AM, Eric Dumazet wrote:
> On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote:
>> From: Shaohua Li 
>>
>> In a syn flooding test, the fib6_table rwlock is a significant
>> bottleneck. While converting the rwlock to rcu sounds straighforward,
>> but is very challenging if it's possible. A percpu spinlock (lglock has
>> been removed from kernel, so I added a simple implementation here) is
>> quite trival for this problem since updating the routing table is a rare
>> event. In my test, the server receives around 1.5 Mpps in syn flooding
>> test without the patch in a dual sockets and 56-CPU system. With the
>> patch, the server receives around 3.8Mpps, and perf report doesn't show
>> the locking issue.
>>
>> Of course the percpu lock isn't as good as rcu, so this isn't intended
>> to replace rcu, but this is much better than current readwrite lock.
>> Before we have a rcu implementation, this is a good temporary solution.
>> Plus, this is a trival change, there is nothing to prevent pursuing a
>> rcu implmentation.
>>
>> Cc: Wei Wang 
>> Cc: Eric Dumazet 
>> Cc: Stephen Hemminger 
>> Signed-off-by: Shaohua Li 
>> ---
> 
> Wei has almost done the RCU conversion.
> 
> This patch is probably coming too late.


+1

I'd rather see the RCU conversion than a move to per-cpu locks.


Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread Eric Dumazet
On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote:
> From: Shaohua Li 
> 
> In a syn flooding test, the fib6_table rwlock is a significant
> bottleneck. While converting the rwlock to rcu sounds straighforward,
> but is very challenging if it's possible. A percpu spinlock (lglock has
> been removed from kernel, so I added a simple implementation here) is
> quite trival for this problem since updating the routing table is a rare
> event. In my test, the server receives around 1.5 Mpps in syn flooding
> test without the patch in a dual sockets and 56-CPU system. With the
> patch, the server receives around 3.8Mpps, and perf report doesn't show
> the locking issue.
> 
> Of course the percpu lock isn't as good as rcu, so this isn't intended
> to replace rcu, but this is much better than current readwrite lock.
> Before we have a rcu implementation, this is a good temporary solution.
> Plus, this is a trival change, there is nothing to prevent pursuing a
> rcu implmentation.
> 
> Cc: Wei Wang 
> Cc: Eric Dumazet 
> Cc: Stephen Hemminger 
> Signed-off-by: Shaohua Li 
> ---

Wei has almost done the RCU conversion.

This patch is probably coming too late.





[PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread Shaohua Li
From: Shaohua Li 

In a syn flooding test, the fib6_table rwlock is a significant
bottleneck. While converting the rwlock to rcu sounds straighforward,
but is very challenging if it's possible. A percpu spinlock (lglock has
been removed from kernel, so I added a simple implementation here) is
quite trival for this problem since updating the routing table is a rare
event. In my test, the server receives around 1.5 Mpps in syn flooding
test without the patch in a dual sockets and 56-CPU system. With the
patch, the server receives around 3.8Mpps, and perf report doesn't show
the locking issue.

Of course the percpu lock isn't as good as rcu, so this isn't intended
to replace rcu, but this is much better than current readwrite lock.
Before we have a rcu implementation, this is a good temporary solution.
Plus, this is a trival change, there is nothing to prevent pursuing a
rcu implmentation.

Cc: Wei Wang 
Cc: Eric Dumazet 
Cc: Stephen Hemminger 
Signed-off-by: Shaohua Li 
---
 include/net/ip6_fib.h | 57 +-
 net/ipv6/addrconf.c   |  8 +++---
 net/ipv6/ip6_fib.c| 76 ---
 net/ipv6/route.c  | 54 ++--
 4 files changed, 129 insertions(+), 66 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 1d790ea..124eb04 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -247,7 +247,7 @@ struct rt6_statistics {
 struct fib6_table {
struct hlist_node   tb6_hlist;
u32 tb6_id;
-   rwlock_ttb6_lock;
+   spinlock_t __percpu *percpu_tb6_lock;
struct fib6_nodetb6_root;
struct inet_peer_base   tb6_peers;
unsigned intflags;
@@ -255,6 +255,61 @@ struct fib6_table {
 #define RT6_TABLE_HAS_DFLT_ROUTER  BIT(0)
 };
 
+static inline void fib6_table_read_lock_bh(struct fib6_table *table)
+{
+   preempt_disable();
+   spin_lock_bh(this_cpu_ptr(table->percpu_tb6_lock));
+}
+
+static inline void fib6_table_read_unlock_bh(struct fib6_table *table)
+{
+   spin_unlock_bh(this_cpu_ptr(table->percpu_tb6_lock));
+   preempt_enable();
+}
+
+static inline void fib6_table_read_lock(struct fib6_table *table)
+{
+   preempt_disable();
+   spin_lock(this_cpu_ptr(table->percpu_tb6_lock));
+}
+
+static inline void fib6_table_read_unlock(struct fib6_table *table)
+{
+   spin_unlock(this_cpu_ptr(table->percpu_tb6_lock));
+   preempt_enable();
+}
+
+static inline void fib6_table_write_lock_bh(struct fib6_table *table)
+{
+   int first = nr_cpu_ids;
+   int i;
+
+   for_each_possible_cpu(i) {
+   if (first == nr_cpu_ids) {
+   first = i;
+   spin_lock_bh(per_cpu_ptr(table->percpu_tb6_lock, i));
+   } else
+   spin_lock_nest_lock(
+   per_cpu_ptr(table->percpu_tb6_lock, i),
+   per_cpu_ptr(table->percpu_tb6_lock, first));
+   }
+}
+
+static inline void fib6_table_write_unlock_bh(struct fib6_table *table)
+{
+   int first = nr_cpu_ids;
+   int i;
+
+   for_each_possible_cpu(i) {
+   if (first == nr_cpu_ids) {
+   first = i;
+   continue;
+   }
+   spin_unlock(per_cpu_ptr(table->percpu_tb6_lock, i));
+   }
+   spin_unlock_bh(per_cpu_ptr(table->percpu_tb6_lock, first));
+}
+
 #define RT6_TABLE_UNSPEC   RT_TABLE_UNSPEC
 #define RT6_TABLE_MAIN RT_TABLE_MAIN
 #define RT6_TABLE_DFLT RT6_TABLE_MAIN
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 30ee23e..22e2ad2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2313,7 +2313,7 @@ static struct rt6_info *addrconf_get_prefix_route(const 
struct in6_addr *pfx,
if (!table)
return NULL;
 
-   read_lock_bh(>tb6_lock);
+   fib6_table_read_lock_bh(table);
fn = fib6_locate(>tb6_root, pfx, plen, NULL, 0);
if (!fn)
goto out;
@@ -2330,7 +2330,7 @@ static struct rt6_info *addrconf_get_prefix_route(const 
struct in6_addr *pfx,
break;
}
 out:
-   read_unlock_bh(>tb6_lock);
+   fib6_table_read_unlock_bh(table);
return rt;
 }
 
@@ -5929,7 +5929,7 @@ void addrconf_disable_policy_idev(struct inet6_dev *idev, 
int val)
struct fib6_table *table = rt->rt6i_table;
int cpu;
 
-   read_lock(>tb6_lock);
+   fib6_table_read_lock(table);
addrconf_set_nopolicy(ifa->rt, val);
if (rt->rt6i_pcpu) {
for_each_possible_cpu(cpu) {
@@ -5939,7 +5939,7 @@ void addrconf_disable_policy_idev(struct