Re: [PATCH net-next v4] net: ipv4: add support for ECMP hash policy choice

2017-03-21 Thread David Miller
From: Nikolay Aleksandrov 
Date: Thu, 16 Mar 2017 15:28:00 +0200

> This patch adds support for ECMP hash policy choice via a new sysctl
> called fib_multipath_hash_policy and also adds support for L4 hashes.
> The current values for fib_multipath_hash_policy are:
>  0 - layer 3 (default)
>  1 - layer 4
> If there's an skb hash already set and it matches the chosen policy then it
> will be used instead of being calculated (currently only for L4).
> In L3 mode we always calculate the hash due to the ICMP error special
> case, the flow dissector's field consistentification should handle the
> address order thus we can remove the address reversals.
> If the skb is provided we always use it for the hash calculation,
> otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.
> 
> Signed-off-by: Nikolay Aleksandrov 

Applied, thanks Nikolay.


Re: [PATCH net-next v4] net: ipv4: add support for ECMP hash policy choice

2017-03-17 Thread Nikolay Aleksandrov
On 16/03/17 18:49, Nikolay Aleksandrov wrote:
> On 16/03/17 18:41, Stephen Hemminger wrote:
>> On Thu, 16 Mar 2017 15:28:00 +0200
>> Nikolay Aleksandrov  wrote:
>>
>>> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
>>> index d6880a6149ee..62c4f94923e5 100644
>>> --- a/net/ipv4/sysctl_net_ipv4.c
>>> +++ b/net/ipv4/sysctl_net_ipv4.c
>>> @@ -1004,6 +1004,15 @@ static struct ctl_table ipv4_net_table[] = {
>>> .extra1 = ,
>>> .extra2 = ,
>>> },
>>> +   {
>>> +   .procname   = "fib_multipath_hash_policy",
>>> +   .data   = 
>>> _net.ipv4.sysctl_fib_multipath_hash_policy,
>>> +   .maxlen = sizeof(int),
>>> +   .mode   = 0644,
>>> +   .proc_handler   = proc_dointvec_minmax,
>>> +   .extra1 = ,
>>> +   .extra2 = ,
>>> +   
>>
>> Rather than having magic integer values, it would be better to use
>> strings (like TCP congestion control).  Especially if you want to support
>> more values in the future.
> 
> With strings we'll need two sysctls - one to export the available ones, and 
> one to set.
> I too am not happy with plain integers as I've said in v1 of this patch but 
> if people
> are okay with having two new sysctls exported then I don't mind reworking it 
> with strings.
> Or any other ideas are welcome.
> 
>>
>> Also what about IPv6?
>>
> 
> Patches are welcome, too. :-)
> We can update it at any time, IPv4 and 6 have been different for a very long 
> time
> and it's not the point of this patch to bring them closer, though IPv6 already
> does L4 by default.

Just to clarify, we're interested in getting IPv4 and 6 on the same level and 
will
work towards that but let's take it a step at a time.

 



Re: [PATCH net-next v4] net: ipv4: add support for ECMP hash policy choice

2017-03-16 Thread Nikolay Aleksandrov
On 16/03/17 18:41, Stephen Hemminger wrote:
> On Thu, 16 Mar 2017 15:28:00 +0200
> Nikolay Aleksandrov  wrote:
> 
>> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
>> index d6880a6149ee..62c4f94923e5 100644
>> --- a/net/ipv4/sysctl_net_ipv4.c
>> +++ b/net/ipv4/sysctl_net_ipv4.c
>> @@ -1004,6 +1004,15 @@ static struct ctl_table ipv4_net_table[] = {
>>  .extra1 = ,
>>  .extra2 = ,
>>  },
>> +{
>> +.procname   = "fib_multipath_hash_policy",
>> +.data   = 
>> _net.ipv4.sysctl_fib_multipath_hash_policy,
>> +.maxlen = sizeof(int),
>> +.mode   = 0644,
>> +.proc_handler   = proc_dointvec_minmax,
>> +.extra1 = ,
>> +.extra2 = ,
>> +
> 
> Rather than having magic integer values, it would be better to use
> strings (like TCP congestion control).  Especially if you want to support
> more values in the future.

With strings we'll need two sysctls - one to export the available ones, and one 
to set.
I too am not happy with plain integers as I've said in v1 of this patch but if 
people
are okay with having two new sysctls exported then I don't mind reworking it 
with strings.
Or any other ideas are welcome.

> 
> Also what about IPv6?
> 

Patches are welcome, too. :-)
We can update it at any time, IPv4 and 6 have been different for a very long 
time
and it's not the point of this patch to bring them closer, though IPv6 already
does L4 by default.

 



Re: [PATCH net-next v4] net: ipv4: add support for ECMP hash policy choice

2017-03-16 Thread Stephen Hemminger
On Thu, 16 Mar 2017 15:28:00 +0200
Nikolay Aleksandrov  wrote:

> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index d6880a6149ee..62c4f94923e5 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -1004,6 +1004,15 @@ static struct ctl_table ipv4_net_table[] = {
>   .extra1 = ,
>   .extra2 = ,
>   },
> + {
> + .procname   = "fib_multipath_hash_policy",
> + .data   = 
> _net.ipv4.sysctl_fib_multipath_hash_policy,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = proc_dointvec_minmax,
> + .extra1 = ,
> + .extra2 = ,
> + 

Rather than having magic integer values, it would be better to use
strings (like TCP congestion control).  Especially if you want to support
more values in the future.

Also what about IPv6?


[PATCH net-next v4] net: ipv4: add support for ECMP hash policy choice

2017-03-16 Thread Nikolay Aleksandrov
This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
 0 - layer 3 (default)
 1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.

Signed-off-by: Nikolay Aleksandrov 
---
v4:
 - fix the ICMP send ip_forward case to use the skb_in addresses
 - use hash_keys directly when extracting ICMP headers
 - always use the skb when available

v3:
 - keep the ICMP error special handling and always calc L3 hash

v2:
 - removed the output_key_hash as it's not needed anymore
 - reverted to my original/internal patch with L3 as default hash

 Documentation/networking/ip-sysctl.txt |  8 +++
 include/net/ip_fib.h   | 14 ++
 include/net/netns/ipv4.h   |  1 +
 include/net/route.h|  6 +--
 net/ipv4/fib_semantics.c   | 11 ++--
 net/ipv4/icmp.c| 19 +--
 net/ipv4/route.c   | 92 ++
 net/ipv4/sysctl_net_ipv4.c |  9 
 8 files changed, 100 insertions(+), 60 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index fc73eeb7b3b8..558e19653106 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -73,6 +73,14 @@ fib_multipath_use_neigh - BOOLEAN
0 - disabled
1 - enabled
 
+fib_multipath_hash_policy - INTEGER
+   Controls which hash policy to use for multipath routes. Only valid
+   for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled.
+   Default: 0 (Layer 3)
+   Possible values:
+   0 - Layer 3
+   1 - Layer 4
+
 route/max_size - INTEGER
Maximum number of routes allowed in the kernel.  Increase
this when using large numbers of interfaces and/or routes.
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index d9cee9659978..706705f7ef23 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -383,17 +383,13 @@ int fib_sync_down_dev(struct net_device *dev, unsigned 
long event, bool force);
 int fib_sync_down_addr(struct net_device *dev, __be32 local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 
-extern u32 fib_multipath_secret __read_mostly;
-
-static inline int fib_multipath_hash(__be32 saddr, __be32 daddr)
-{
-   return jhash_2words((__force u32)saddr, (__force u32)daddr,
-   fib_multipath_secret) >> 1;
-}
-
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+int fib_multipath_hash(const struct fib_info *fi, const struct flowi4 *fl4,
+  const struct sk_buff *skb);
+#endif
 void fib_select_multipath(struct fib_result *res, int hash);
 void fib_select_path(struct net *net, struct fib_result *res,
-struct flowi4 *fl4, int mp_hash);
+struct flowi4 *fl4, const struct sk_buff *skb);
 
 /* Exported by fib_trie.c */
 void fib_trie_init(void);
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 622d2da27135..70a1d4251790 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -152,6 +152,7 @@ struct netns_ipv4 {
 #endif
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
int sysctl_fib_multipath_use_neigh;
+   int sysctl_fib_multipath_hash_policy;
 #endif
 
unsigned intfib_seq;/* protected by rtnl_mutex */
diff --git a/include/net/route.h b/include/net/route.h
index c0874c87c173..2cc0e14c6359 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -113,13 +113,13 @@ struct in_device;
 int ip_rt_init(void);
 void rt_cache_flush(struct net *net);
 void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key_hash(struct net *, struct flowi4 *flp,
- int mp_hash);
+struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *flp,
+ const struct sk_buff *skb);
 
 static inline struct rtable *__ip_route_output_key(struct net *net,
   struct flowi4 *flp)
 {
-   return __ip_route_output_key_hash(net, flp, -1);
+   return __ip_route_output_key_hash(net, flp, NULL);
 }
 
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 317026a39cfa..da449ddb8cc1 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -57,7 +57,6