Re: [net v1] fib_rules: interface group matching

2016-09-14 Thread Vincent Bernat
 ❦ 14 septembre 2016 17:25 CEST, David Ahern  :

>> I could just give more time to VRF. I also had some concerns over
>> performance with the way Netfilter integration is done, but I understand
>> that I could just stay away from POSTROUTING rules which is the only
>> hook executed twice?

> With the changes that were committed this past weekend, the VRF code
> is now setup where I can set a flag on a per VRF basis to disable the
> extra rx and tx processing - ie., no network taps, no netfilter, no
> qdisc, etc. Drops the overhead of VRF to ~3% maybe a bit less. I need
> to think about the user api a bit more and formalize the patch. Given
> my other commitments that probably won't happen until mid-October. But
> in terms of a building block, the overhead of VRF is continuing to
> drop.

Fine by me. We can drop my patch.

Thanks!
-- 
Program defensively.
- The Elements of Programming Style (Kernighan & Plauger)


Re: [net v1] fib_rules: interface group matching

2016-09-14 Thread David Ahern
On 9/14/16 9:14 AM, Vincent Bernat wrote:
> I could just give more time to VRF. I also had some concerns over
> performance with the way Netfilter integration is done, but I understand
> that I could just stay away from POSTROUTING rules which is the only
> hook executed twice?
> 

With the changes that were committed this past weekend, the VRF code is now 
setup where I can set a flag on a per VRF basis to disable the extra rx and tx 
processing - ie., no network taps, no netfilter, no qdisc, etc. Drops the 
overhead of VRF to ~3% maybe a bit less. I need to think about the user api a 
bit more and formalize the patch. Given my other commitments that probably 
won't happen until mid-October. But in terms of a building block, the overhead 
of VRF is continuing to drop.


Re: [net v1] fib_rules: interface group matching

2016-09-14 Thread Vincent Bernat
 ❦ 14 septembre 2016 16:39 CEST, David Ahern  :

 When a user wants to assign a routing table to a group of incoming
 interfaces, the current solutions are:

  - one IP rule for each interface (scalability problems)
  - use of fwmark and devgroup matcher (don't work with internal route
lookups, used for example by RPF)
  - use of VRF devices (more complex)
>>>
>>> Why do you believe that? A VRF is a formalized grouping of interfaces
>>> that includes an API for locally generated traffic to specify which
>>> VRF/group to use. And, with the l3mdev rule you only need 1 rule for
>>> all VRFs regardless of the number which is the best solution to the
>>> scalability problem of adding rules per device/group/VRF.
>>>
>>> What use case are trying to solve?
>> 
>> Local processes have to be made aware of the VRF by binding to the
>> pseudo-device. Some processes may be tricked by LD_PRELOAD but some
>> won't (like stuff written in Go). Maybe I should just find a better way
>> to bind a process to a VRF without its cooperation.
>
> What API are you using for interface groups? How does an app tell the
> kernel to use interface group 1 versus group 2?

In my testbed, I have only one local application which is dnsmasq as a
DHCP server. It sends back the answer to the physical interface (with
sendmsg() and auxillary data). So it makes my argument a bit moot as the
situation is in fact worse without VRF. :-/

My testbed is here (with use of VRF, more recent commits just use plain
ip rules):

 
https://github.com/vincentbernat/network-lab/blob/d86e9ed658863ef0f51d7b853d0dc9f8b7427b21/lab-l3-hyperv/setup

I could just give more time to VRF. I also had some concerns over
performance with the way Netfilter integration is done, but I understand
that I could just stay away from POSTROUTING rules which is the only
hook executed twice?
-- 
All things that are, are with more spirit chased than enjoyed.
-- Shakespeare, "Merchant of Venice"


Re: [net v1] fib_rules: interface group matching

2016-09-14 Thread David Ahern
On 9/14/16 8:25 AM, Vincent Bernat wrote:
>  ❦ 14 septembre 2016 16:15 CEST, David Ahern  :
> 
>>> When a user wants to assign a routing table to a group of incoming
>>> interfaces, the current solutions are:
>>>
>>>  - one IP rule for each interface (scalability problems)
>>>  - use of fwmark and devgroup matcher (don't work with internal route
>>>lookups, used for example by RPF)
>>>  - use of VRF devices (more complex)
>>
>> Why do you believe that? A VRF is a formalized grouping of interfaces
>> that includes an API for locally generated traffic to specify which
>> VRF/group to use. And, with the l3mdev rule you only need 1 rule for
>> all VRFs regardless of the number which is the best solution to the
>> scalability problem of adding rules per device/group/VRF.
>>
>> What use case are trying to solve?
> 
> Local processes have to be made aware of the VRF by binding to the
> pseudo-device. Some processes may be tricked by LD_PRELOAD but some
> won't (like stuff written in Go). Maybe I should just find a better way
> to bind a process to a VRF without its cooperation.
> 

What API are you using for interface groups? How does an app tell the kernel to 
use interface group 1 versus group 2?


LD_PRELOAD and overloading socket is an ad-hoc hack at best with many holes - 
as you have found.

We (Cumulus Linux) are using this cgroups patch:
   http://www.mail-archive.com/netdev@vger.kernel.org/msg93408.html

I want something formal like the cgroups patch or even the first idea of adding 
a default sk_bound_dev_if to the task struct:

https://github.com/dsahern/linux/commit/b3e5ccc291505c8a503edb20ea2c2b5e86bed96f

Parent-to-child inheritance of the setting is a requirement as is the setting 
getting applied to all IPv4/v6 sockets without action by the process itself.

Still have some work to do to get a solution into the kernel.



Re: [net v1] fib_rules: interface group matching

2016-09-14 Thread Vincent Bernat
 ❦ 14 septembre 2016 16:15 CEST, David Ahern  :

>> When a user wants to assign a routing table to a group of incoming
>> interfaces, the current solutions are:
>> 
>>  - one IP rule for each interface (scalability problems)
>>  - use of fwmark and devgroup matcher (don't work with internal route
>>lookups, used for example by RPF)
>>  - use of VRF devices (more complex)
>
> Why do you believe that? A VRF is a formalized grouping of interfaces
> that includes an API for locally generated traffic to specify which
> VRF/group to use. And, with the l3mdev rule you only need 1 rule for
> all VRFs regardless of the number which is the best solution to the
> scalability problem of adding rules per device/group/VRF.
>
> What use case are trying to solve?

Local processes have to be made aware of the VRF by binding to the
pseudo-device. Some processes may be tricked by LD_PRELOAD but some
won't (like stuff written in Go). Maybe I should just find a better way
to bind a process to a VRF without its cooperation.
-- 
Instrument your programs.  Measure before making "efficiency" changes.
- The Elements of Programming Style (Kernighan & Plauger)


Re: [net v1] fib_rules: interface group matching

2016-09-14 Thread David Ahern
On 9/14/16 6:40 AM, Vincent Bernat wrote:
> When a user wants to assign a routing table to a group of incoming
> interfaces, the current solutions are:
> 
>  - one IP rule for each interface (scalability problems)
>  - use of fwmark and devgroup matcher (don't work with internal route
>lookups, used for example by RPF)
>  - use of VRF devices (more complex)

Why do you believe that? A VRF is a formalized grouping of interfaces that 
includes an API for locally generated traffic to specify which VRF/group to 
use. And, with the l3mdev rule you only need 1 rule for all VRFs regardless of 
the number which is the best solution to the scalability problem of adding 
rules per device/group/VRF.

What use case are trying to solve?


Re: [net v1] fib_rules: interface group matching

2016-09-14 Thread Vincent Bernat
 ❦ 14 septembre 2016 14:40 CEST, Vincent Bernat  :

> Each interface can be assigned to a numeric group using IFLA_GROUP. This
> commit enables a user to reference such a group into an IP rule. Here is
> an example of output of iproute2:
>
> $ ip rule show
> 0:  from all lookup local
> 32764:  from all iifgroup 2 lookup 2
> 32765:  from all iifgroup 1 lookup 1
> 32766:  from all lookup main
> 32767:  from all lookup default

The patch for iproute2 is available here (didn't post it inline to avoid
confuse patchwork):
 http://paste.debian.net/821247/
-- 
Say what you mean, simply and directly.
- The Elements of Programming Style (Kernighan & Plauger)


[net v1] fib_rules: interface group matching

2016-09-14 Thread Vincent Bernat
When a user wants to assign a routing table to a group of incoming
interfaces, the current solutions are:

 - one IP rule for each interface (scalability problems)
 - use of fwmark and devgroup matcher (don't work with internal route
   lookups, used for example by RPF)
 - use of VRF devices (more complex)

Each interface can be assigned to a numeric group using IFLA_GROUP. This
commit enables a user to reference such a group into an IP rule. Here is
an example of output of iproute2:

$ ip rule show
0:  from all lookup local
32764:  from all iifgroup 2 lookup 2
32765:  from all iifgroup 1 lookup 1
32766:  from all lookup main
32767:  from all lookup default

Signed-off-by: Vincent Bernat 
---
 include/net/fib_rules.h|  6 -
 include/uapi/linux/fib_rules.h |  2 ++
 net/core/fib_rules.c   | 57 +-
 3 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 456e4a6006ab..a96b186ccd02 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -28,6 +28,8 @@ struct fib_rule {
u32 pref;
int suppress_ifgroup;
int suppress_prefixlen;
+   int iifgroup;
+   int oifgroup;
chariifname[IFNAMSIZ];
charoifname[IFNAMSIZ];
struct rcu_head rcu;
@@ -92,7 +94,9 @@ struct fib_rules_ops {
[FRA_SUPPRESS_PREFIXLEN] = { .type = NLA_U32 }, \
[FRA_SUPPRESS_IFGROUP] = { .type = NLA_U32 }, \
[FRA_GOTO]  = { .type = NLA_U32 }, \
-   [FRA_L3MDEV]= { .type = NLA_U8 }
+   [FRA_L3MDEV]= { .type = NLA_U8 }, \
+   [FRA_IIFGROUP] = { .type = NLA_U32 }, \
+   [FRA_OIFGROUP] = { .type = NLA_U32 }
 
 static inline void fib_rule_get(struct fib_rule *rule)
 {
diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
index 14404b3ebb89..0bf5a5e94d9a 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -51,6 +51,8 @@ enum {
FRA_OIFNAME,
FRA_PAD,
FRA_L3MDEV, /* iif or oif is l3mdev goto its table */
+   FRA_IIFGROUP,   /* interface group */
+   FRA_OIFGROUP,
__FRA_MAX
 };
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index be4629c344a6..f8ed6ba85c72 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -37,6 +37,9 @@ int fib_default_rule_add(struct fib_rules_ops *ops,
r->suppress_prefixlen = -1;
r->suppress_ifgroup = -1;
 
+   r->iifgroup = -1;
+   r->oifgroup = -1;
+
/* The lock is not required here, the list in unreacheable
 * at the moment this function is called */
list_add_tail(>list, >rules_list);
@@ -193,6 +196,30 @@ static int fib_rule_match(struct fib_rule *rule, struct 
fib_rules_ops *ops,
if (rule->l3mdev && !l3mdev_fib_rule_match(rule->fr_net, fl, arg))
goto out;
 
+   if (rule->iifgroup != -1) {
+   struct net_device *dev;
+
+   rcu_read_lock();
+   dev = dev_get_by_index_rcu(rule->fr_net, fl->flowi_iif);
+   if (!dev || dev->group != rule->iifgroup) {
+   rcu_read_unlock();
+   goto out;
+   }
+   rcu_read_unlock();
+   }
+
+   if (rule->oifgroup != -1) {
+   struct net_device *dev;
+
+   rcu_read_lock();
+   dev = dev_get_by_index_rcu(rule->fr_net, fl->flowi_oif);
+   if (!dev || dev->group != rule->oifgroup) {
+   rcu_read_unlock();
+   goto out;
+   }
+   rcu_read_unlock();
+   }
+
ret = ops->match(rule, fl, flags);
 out:
return (rule->flags & FIB_RULE_INVERT) ? !ret : ret;
@@ -305,6 +332,12 @@ static int rule_exists(struct fib_rules_ops *ops, struct 
fib_rule_hdr *frh,
if (r->l3mdev != rule->l3mdev)
continue;
 
+   if (r->iifgroup != rule->iifgroup)
+   continue;
+
+   if (r->oifgroup != rule->oifgroup)
+   continue;
+
if (!ops->compare(r, frh, tb))
continue;
return 1;
@@ -391,6 +424,16 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr 
*nlh)
goto errout_free;
}
 
+   if (tb[FRA_IIFGROUP])
+   rule->iifgroup = nla_get_u32(tb[FRA_IIFGROUP]);
+   else
+   rule->iifgroup = -1;
+
+   if (tb[FRA_OIFGROUP])
+   rule->oifgroup = nla_get_u32(tb[FRA_OIFGROUP]);
+   else
+   rule->oifgroup = -1;
+
rule->action = frh->action;
rule->flags = frh->flags;
rule->table = frh_get_table(frh, tb);
@@ -552,6